office/ocrmypdf/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them
to be searched

Main features
Generates a searchable PDF/A file from a regular PDF
Places OCR text accurately below the image to ease copy / paste
Keeps the exact resolution of the original embedded images
When possible, inserts OCR information as a "lossless" operation
without disrupting any other content
Optimizes PDF images, often producing files smaller than the input
file
If requested, deskews and/or cleans the image before performing OCR
Validates input and output files
Distributes work across all available CPU cores
Uses Tesseract OCR engine to recognize more than 100 languages
Keeps your private data private.
Scales properly to handle files with thousands of pages.
Battle-tested on millions of PDFs.

OCRmyPDF uses Tesseract for OCR, and relies on its language packs.

Once OCRmyPDF is installed, the built-in help which explains the
command syntax and options can be accessed via:

ocrmypdf --help

Please support the software author and the build author if you find
the software useful.