Optimization profiles
An optimization profile contains predetermined configuration settings that help you optimize a PDF document for a specific use. The Pdftools SDK offers four optimization profiles:
-
Web: Compress the file without affecting viewing quality on digital devices
- Removes redundant and unnecessary data for electronic document exchange
- Downsamples, clips, and intelligently compresses images
- Merges and subsets fonts
- Converts colors to RGB
-
Archive: Prepare a document for archiving in PDF/A format
- Removes redundant and unnecessary data for archiving
- Intelligently compresses images
- Merges and subsets fonts
-
Print: Compress the file without affecting print quality
- Removes redundant and unnecessary data for printing
- Downsamples, clips, and intelligently compresses images
- Merges and subsets fonts
- Converts colors to CMYK
-
Minimal file size: Remove redundant data and reduce image resolution to achieve a minimal viable file size
- Downsamples images
- Converts Type1 fonts to Type1C
- Removes metadata and output intents
-
Mixed Raster Content (MRC): Apply Mixed Raster Content (MRC) optimization intended for compressing scanned PDF files containing large amounts of text. The MRC can include:
- Splitting documents into foreground, background, and mask layers
- Heavily downsampling and compressing foreground and background layers
- Lossless compression of mask layers
- Merging and subseting fonts
Web optimization profile
All documents related to the web should be kept small in file size. As a consequence, they take less storage on the web server and can be transferred quicker, resulting in shorter download times.
To reduce the file size as much as possible, all information that is not required for displaying the document without a visual loss can be removed.
This may include:
- Downsampling images
- Clipping images to their visible parts
- Applying compressions algorithms with high compression ratios
- Collapsing redundant objects
- Removing unused resources
- Removing irrelevant information such as article threads, metadata, alternate images, document structure information, etc.
- Merging and sub-setting embedded font programs
With the Web optimization profile, images above 210 DPI are down-sampled and recompressed to 150 DPI. This leads to smaller output files.
When an image is recompressed, the Balanced
strategy is used; however, this can be overridden through ImageRecompressionOptions
.
Depending on the PDF documents to be optimized, font programs of embedded standard fonts can even be removed.
Additionally, PDF documents can be linearized. A PDF file is prepared so that pages can be accessed randomly via a PDF viewer web browser plugin, i.e. selected pages can be displayed before the whole file is downloaded. For this to work, the PDF viewer web browser plugin has to support correct interpretation of linearized PDF.
Documents that are intended to be displayed on a display should be saved in RGB (red green blue) color space. RGB is the native form for any light-emitting device, such as computer monitor or television. An RGB image uses three channels, and therefore takes up less space than a CMYK (cyan magenta yellow black) image, which uses four channels. With the Web optimization profile, all colors are converted to RGB.
With the Web optimization profile, the output PDF version is updated to PDF 1.7 or higher and PDF/A conformance is removed.
Print optimization profile
For printing applications, the file size is not the highest priority. It is more important to have a document that prints predictably. This means that correct fonts should be used, colors should look as expected, and images should be high in resolution.
For that reason, data from the original document that is used for a well-defined reproduction should not be removed or altered. Embedded fonts should not be removed, images should not be downsampled, with exceptions.
For many printing applications, it may be beneficial to convert images to the CMYK color space because this is primarily used in systems that reflect light (such as printed paper). With the Print optimization profile, all colors are converted to CMYK for optimal output on printing devices.
In certain documents, the same font is embedded multiple times. For example, if a PDF-producing software embeds the same font for each created page, then large multi-page documents may contain many copies of a font program. Also, a document can contain a complete font program of which only very few glyphs are used for display. In such situations, merging and sub-setting font programs can lead to faster printing. Embedded Type1 (PostScript) fonts are converted to Type1C (Compact Font Format), which further reduces the file size.
There are still further ways to decrease the file size:
- Clipping images to their visible parts
- Compressing uncompressed images, e.g. with a lossless compression type
- Collapsing redundant objects
- Removing unused resources
- Removing irrelevant information for printing such as thumbnails, article threads, document structure information, etc.
With this profile, spider (web capture) information is removed.
The resolution of images is not modified.
When an image is recompressed, the PreserveQuality
strategy is used; this can be overridden through the property ImageRecompressionOptions
.
With the Print profile, the output PDF version is updated to PDF 1.7 or higher and PDF/A conformance is removed.
Archive optimization profile
For archiving, the priority is to preserve PDF/A conformance, maintaining document fidelity and reproducibility over time. Only minimal document modification is performed.
This may include:
- Removing alternative images
- Removing thumbnails
- Collapsing redundant objects
- Removing unused resources
The resolution and color space of images must remain untouched.
When an image is recompressed, the PreserveQuality
strategy is used to ensure the image keeps as high a quality as possible compared to the original; however, it can be overridden through the property ImageRecompressionOptions
.
ALl content objects such as annotations, form fields, and links are copied with the Archiving profile. Article threads, metadata, piece-info dictionaries, and the structure tree are not removed. Signature appearances are flattened.
For PDF/A conforming input files, the PDF/A conformance is preserved if possible. For other files, the PDF version is updated to PDF 1.7 or higher.
Minimal file size optimization profile
In most cases, the focus in PDF optimization to decrease the file size.
This can be achieved by:
- Compressing images with an appropriate compression algorithm for lower image quality
- Collapsing redundant objects
- Removing unused resources
- Removing irrelevant information for printing such as thumbnails, article threads, document structure information, piece-info dictionaries, etc.
- Remove output intents
With the MinimalFileSize
conversion profile, the output file size is further reduced by converting Embedded Type1 (PostScript) fonts to Type1C (Compact Font Format).
Metadata and spider (web capture) information are removed.
Output intents provide a means for matching the color characteristics of PDF page content with those of a target output device or production environment in which the document is printed. Output intents are removed with the minimal file size optimization profile.
Images above 182 DPI are downsampled and recompressed to 130 DPI.
This leads to smaller output files.
When an image is recompressed, the Balanced
strategy is used; this can be overridden through the property ImageRecompressionOptions
.
Mixed raster content optimization profile
PDF files with scanned content can have a very large file size due to the high resolution of the images stored in scanned PDF files. Minimizing the image size while maintaining text readability is important for workflows involving scanned text documents.
The Mixed Raster Content (MRC) algorithm minimizes the image size while maintaining text readability. The MRC divides scanned documents into foreground, background, and mask layers, storing the textual information in the mask layer using a lossless compression type. The MRC heavily down-samples and compresses foreground and background layers. The file size is further reduced by removing redundant objects, optimizing resources, merging and subsetting embedded fonts.
With the Minimal file size profile, the output PDF version is updated to PDF 1.7 or higher and PDF/A conformance is removed.