How do I extract text from a scanned PDF for free?

Open RightPDFKit, click OCR PDF, upload your scanned document and click Run OCR. RightPDFKit detects and embeds a text layer so you can search, select and copy the text.

What is OCR and how does it work?

OCR (Optical Character Recognition) analyses page images pixel by pixel, identifies letter shapes and reconstructs the underlying text. RightPDFKit uses Tesseract.js to run OCR entirely in your browser.

Why is my OCR text full of errors?

OCR accuracy depends on scan quality. Low resolution (below 200 DPI), skewed pages, poor contrast or unusual fonts will reduce accuracy. Rescan at 300 DPI with good lighting for best results.

OCR ⏱ 5 min read 📅 2026-05-12

Guide

How to Extract Text from a Scanned PDF Using OCR

Scanned PDFs are essentially images — you can't select text, search within them or copy content. OCR (Optical Character Recognition) converts them into real, searchable text. Here's how to do it entirely in your browser, for free.

Extract text from any scanned PDF — free, no upload

Open OCR Tool Free →

What is OCR?

Optical Character Recognition (OCR) is technology that analyses an image and identifies individual characters, words and layout to produce machine-readable text. Modern OCR tools can handle printed text in dozens of languages, various fonts and sizes, and even partially skewed or degraded documents.

RightPDFKit uses Tesseract.js — the leading open-source OCR engine developed by Google, running entirely in your browser via WebAssembly.

How to OCR a PDF — step by step

Open the OCR tool on RightPDFKit.
Upload your scanned PDF or image-based PDF.
Choose which pages to process — all pages or specific ones.
Click Run OCR.
The extracted text appears in the panel. Copy it or download as a .txt file.

Tips for better OCR results

Scan quality — 300 DPI or higher gives the best accuracy. Phone photos work but may be less accurate than a flatbed scanner.
Straight pages — straighten skewed scans first using the Rotate tool before running OCR.
High contrast — black text on white background works best. Faded, yellowed or low-contrast documents may produce errors.
Printed text — OCR works best on printed text. Handwriting recognition is much less reliable.
Language — Tesseract.js defaults to English. For other languages, let us know and we can look at adding language options.

Common OCR use cases

Extracting text from scanned contracts or legal documents
Making old scanned books and reports searchable
Pulling data from scanned receipts or invoices for accounting
Converting paper forms to digital text for editing
Extracting content from image-based PDFs from government or banks

Is my scanned document sent to a server?

No. Tesseract.js runs entirely inside your browser. Your scanned document is processed on your device — nothing is uploaded anywhere. This is critical for sensitive documents like medical records, legal papers or financial statements.

OCR vs copy-paste — when each works

If you can select and copy text from a PDF already, you don't need OCR — the document already has a text layer. OCR is only needed when the PDF is a flat image, which happens when:

The document was physically scanned with a scanner or photocopier
A PDF was created by photographing pages with a phone camera
The PDF was deliberately flattened to remove selectable text (common for signed documents)
The PDF was created from a fax transmission

A quick test: try selecting text on the page. If your cursor turns into a text cursor and highlights words, the document already has a text layer. If it only lets you draw a rectangle (image selection), you need OCR.

Getting better OCR results — scan quality checklist

OCR accuracy is almost entirely determined by the quality of the original scan. Here's what matters most:

Resolution — scan at 300 DPI minimum. 150 DPI scans produce significantly more errors. 600 DPI is ideal for small text.
Contrast — the sharper the difference between text and background, the more accurately Tesseract identifies characters. Avoid pale or faded documents.
Alignment — pages should be straight. A 5-degree skew noticeably reduces accuracy. Use a flatbed scanner rather than a handheld phone photo where possible.
Clean originals — coffee stains, pen marks and creases are interpreted as characters and produce errors.
Font type — standard serif and sans-serif fonts (Times, Arial, Calibri) OCR very accurately. Decorative, script or hand-lettered fonts do not.

What to do after OCR

OCR output always needs a proofread. Common error patterns to look for:

1 vs l vs I — these characters look identical at low resolution and OCR regularly confuses them
0 vs O — zero and capital O are frequently swapped, especially in numbers
rn vs m — the pair "rn" at small sizes looks identical to "m"
Punctuation — commas, periods and apostrophes are often missed or doubled
Line breaks — hyphenated words at line ends may appear as two separate words

For legal, financial or medical documents where accuracy is critical, always verify OCR output against the original before relying on it.

OCR for different document types

Document type	Expected accuracy	Notes
Typed letter (300 DPI)	98%+	Minimal errors expected
Invoice or receipt	90–97%	Check numbers carefully
Newspaper or book page	85–95%	Columns and small fonts reduce accuracy
Phone photo of document	70–90%	Lighting and angle critical
Handwritten notes	20–60%	Unreliable — manual transcription better