Extract
The Toolbox add-on lets you extract information such as text, images, and signatures from a PDF document. You can also extract document attributes like the conformance level, whether the document is encrypted or protected, and metadata like author, title, and creation date.
This functionality is part of the Toolbox add-on, a separate SDK that you can use with the same license key as the Pdftools SDK. To use and integrate this add-on, review Getting started with the Toolbox add-on and Toolbox add-on code samples.
Extract text
Learn how to extract text content from a PDF document using the Extract all text from PDF sample project. This project also illustrates the use of heuristics to assemble text content into words and sentences based on their position on the page.
Download the full sample now in C# and Java.
Interested in C or other language samples? Let us know on the contact page, and we'll add it to our sample backlog.
Extract images
Learn how to extract images from a PDF document using the Extract all images and image masks from a PDF sample project. The extract images functionality accepts an image embedded as a content element in a PDF file and outputs it as an image file.
- BMP
- JPEG
- JPEG2000
- JBIG2
- PNG
- GIF
- TIFF
Extract signatures
Learn how to extract signature content from a PDF document using the List Signatures in PDF sample project. You can automatically extract signature information such as name, date, and contact information.
For a guide to comprehensive validation of digital signatures, review the Validate signatures in a signed PDF document page.
Extract document attributes and metadata
You can learn how to extract document attributes and metadata from a PDF document using our List document information of PDF sample project.