Pdftools OCR Service

Pdftools OCR Service enables optical character recognition (OCR) to extract text from images and scanned documents, transforming them into searchable and editable PDF documents.

You can use Pdftools OCR Service in two ways:

With Conversion Service: Configure OCR as a processing step in Conversion Service on Windows Server or in Docker. Refer to Getting started.
With Pdftools SDK: Use built-in OCR support in Pdftools SDK to process PDFs programmatically in .NET, Java, Python, or C. Refer to OCR with Pdftools SDK.

Key features

Embeds the recognized text in Unicode format into PDF or PDF/A files.
Supports over 180 natural and technical languages.
Provides an OCR service mode for shared use across multiple platforms.
Predefined and custom OCR profiles to optimize for accuracy or performance.
Automatic skew correction, rotation, and resolution handling.
Detection of tables, barcodes, engineering drawings, and other complex layout elements.

System architecture

Pdftools OCR Service comprises two .NET Core applications, both running on Kestrel servers:

Manager node: This node handles HTTP requests and dispatches jobs to available workers.
Worker node: Performs the OCR processing. This node handles OCR processing and remains the most resource-intensive component.

This separation allows for flexible deployment and scaling, where a single manager can coordinate multiple workers. For more information, review Scale Pdftools OCR Service.

System requirements

Pdftools OCR Service runs as a manager node and one or more worker nodes, which have different hardware needs.

Worker nodes

These nodes run the OCR engine and require more processing power and memory. The worker node recommended hardware setup is:

Windows Server or Linux (x64)
8 GB RAM or more
Quad-core CPU or better
SSD with at least 4 GB free space

Each worker runs one OCR subprocess per CPU core, up to 8 in parallel, so more cores increase throughput. Larger documents, such as A4 at 600 DPI with many pages, benefit from more memory.

Manager nodes

These nodes coordinate OCR jobs and handle requests, with significantly lower system requirements. The manager node recommended hardware setup is:

Windows Server or Linux (x64)
4 GB RAM
Quad-core CPU
SSD storage

For standalone installations

You can also install the manager and the worker on the same machine. The requirements allocated for Worker nodes suffice.

Supported operating systems

Pdftools OCR Service runs on Windows Server, in Docker, and on native Linux. The worker uses ABBYY FineReader Engine 12 for text recognition.

Windows

Pdftools OCR Service runs on the following Windows Server versions:

Operating system	Architecture
Windows Server 2019	x64
Windows Server 2022	x64
Windows Server 2025	x64

Docker

Running Pdftools OCR Service in Docker requires the following versions:

Component	Version
Docker	20.10 or later
Docker Compose	1.27 or later

Linux

On a native Linux installation, support depends on the distribution and its major version. For example, Enterprise Linux 10 (EL10), the Red Hat Enterprise Linux 10 family, is supported, but Enterprise Linux 9 (EL9) isn’t. For installation steps, refer to Pdftools OCR Service on Linux.

Supported

The following distributions are validated end-to-end:

Distribution	Package format	Notes
Rocky Linux 10	RPM	Validated end-to-end. Covers the EL10 family: Rocky, AlmaLinux, RHEL, and Oracle Linux 10.
Ubuntu 24.04 LTS	DEB	Validated end-to-end.

Likely supported

These distributions aren’t yet in the validation matrix but are expected to work. Installing on these distributions requires bypassing the OS compatibility check, as described in Install on a likely supported distribution. Contact support to report any issues you observe so they can be added to the validated set.

Distribution	Why it’s expected to work
AlmaLinux 10, RHEL 10, Oracle Linux 10	RHEL 10 rebuilds with the same EL10 toolchain as Rocky Linux 10.
Ubuntu 22.04 LTS	Native `libxml2.so.2`, `libheif1` in main, and glibc 2.35 is compatible with the bundled `libstdc++`.
Debian 12 (Bookworm)	Shares the toolchain base of the supported Docker image.
Debian 13 (Trixie)	Same compatibility profile as Debian 12 because `libxml2.so.2` was retained for Trixie.

Not supported

The installer refuses on the following distributions, and the override doesn’t help. Setting the override leads to a cryptic runtime crash on the first OCR job, not a clean install failure.

Distribution	Reason
EL9 family (Rocky, AlmaLinux, RHEL, and Oracle Linux 9)	Pdftools OCR Service requires `GLIBCXX_3.4.30` (GCC 12 or newer), which isn’t available on EL9.
Ubuntu 25.10 and newer	Ubuntu 25.10 ships libxml2 2.14 or newer, which bumped the SONAME from `libxml2.so.2` to `libxml2.so.16` without a backward-compatibility shim. The OCR plugin and worker binary both link `libxml2.so.2`.
Amazon Linux 2023	Fedora-derived but pinned to GCC 11.3, with the same `GLIBCXX_3.4.29` ceiling as EL9.
SLES, openSUSE Leap 15.x	System `libstdc++` from GCC 7 caps under `GLIBCXX_3.4.30`.

caution

Don’t use the PDFTOOLS_SKIP_OS_CHECK=1 override on a distribution listed under Not supported. The install proceeds, but the first OCR job crashes with a cryptic native error instead of failing cleanly at install time.

For an unsupported distribution, deploy Pdftools OCR Service in Docker instead. The Docker image bundles its own runtime and isn’t affected by the host’s libstdc++ or libxml2 version.

Licensing

Pdftools OCR Service requires a license key to operate. You can use one of two license key types:

Trial license key
Full license key

For information about configuring license keys, review Get started with OCR in Conversion Service. For offline licensing options, review Pdftools OCR Service licensing.

Release management

For information about Pdftools versioning, supported releases, and version availability, review Release management, in Pdftools support documentation.

Key features​

System architecture​

System requirements​

Worker nodes​

Manager nodes​

For standalone installations​

Supported operating systems​

Windows​

Docker​

Linux​

Supported​

Likely supported​

Not supported​

Licensing​

Release management​