Glossary of all things PDF
AES - Advanced Encryption Standard
Symmetric encryption method published as standard by NIST.
Annotation
Associates an object (for example, a memo, a piece of music or a film) with a position on the page, or represents an opportunity to interact with the user with the help of mouse and keyboard.
Many PDF documents are designed in a way that does not allow the user to change them, but to interact nonetheless through the use of form fields and checkboxes.
Anti-aliasing
Distortion, or aliasing, may occur at the edges of an object depending on the image‘s resolution.
Anti-aliasing methods can be used to minimize this effect. The edges are smoothed out with adjusted color values via retroactive filtering.
Array object
A one-dimensional collection of sequential objects with implicit numbering starting at 0.
ASCII
The American Standard Code for Information Interchange, a widely used convention for the binary encoding of a specific set of 128 characters. The ASCII character set contains the space character (or blank) and the following characters:
"#$%& '()*+,-. /0123456 789:;<=> ?@ABCDEF GHIJKLMN OPQRSTUV WXYZ[]^ _`abcdef ghijklmn opqrstuv wxyz~
ASN.1 - Abstract Syntax Notation #1
Description language for the syntax of digital messages. For the binary encoding of the messages suitable standards are BER and DER of X.690.
BER - Basic Encoding Rules
Easy to handle rules for the binary encoding of digital messages.
Binary data
An ordered sequence of bytes. Images and fonts are examples of objects stored as binary data.
Boolean object
Either the keyword true or the keyword false.
Byte
A group of 8 binary digits (8 bit) that collectively can represent one of 256 different values. These 8 binary digits are used in a multitude of today's electronic devices.
CA - Certification Authority
Accredited issuer of certificates.
CAdES - CMS Advanced Electronic Signatures
An ETSI Standard for the standardization of CMS-based digital signatures.
Catalog
The primary dictionary object that contains the direct or indirect references to all other objects in the document with the exception of the trailer, which the catalog does not reference.
Certificate
A certificate is an electronic certification of the identity of a natural or legal person. The certificate also contains a public key for which only the person possesses a corresponding private key. With this private key, the person can generate digital signatures. Any person can verify this signature with the help of the certificate.
Character
A byte whose value is usually interpreted as a symbol within a symbol set with 256 or fewer members. Character examples: 1, 2, a, b, A, &, etc.
Character set
A defined set of symbols, whereby a unique byte value is assigned to each character. Character examples:
ASCII
Unicode
CMS - Cryptographic Message Syntax
Message format for digital signatures based on PKCS#7 using the ASN.1 syntax.
Conforming product
Software application that is both a conforming reader and a conforming writer.
Conforming reader
Software application that can read and edit a PDF file that conforms to a specification (e.g. [ISO 32000] or [ISO 19005-1]), and that is compliant with the requirements of a conforming reader.
Conforming writer
Software application that can write PDF files that conform to a specification such as [ISO 32000] or [ISO 19005-1].
Content stream
A datastream object whose data consists of a sequence of instructions that describe the graphic elements of a page.
Corrupt PDF file
A PDF document that is not correct and may therefore be unreadable. Possible causes include:
The document was not generated correctly
The document was damaged after its creation (e.g. incomplete copying process)
CRL - Certificate Revocation List
List of revoked certificates published by the issuer.
Cross-reference table
Data structure containing the byte offset start for all of the file‘s indirect objects.
DER - Distinguished Encoding Rules
Rules for the binary and unique encoding of digital messages based on BER.
Dictionary object
An associative table of object pairs; the first object is the object name and functions as the key, the second object is the value and can be any type of object, including another dictionary.
Direct object
Any object that has not been made an indirect object.
DSA - Digital Signature Algorithm
by the NIST
DSS (Cryptography) - Digital Signature Standard
by the NIST
DSS (PDF) - Document Security Store
Structure in a PDF document to embed signature validation information such as CRLs, OCSPs, and certificates.
eIDAS - Electronic Identification, Authentication and trust Services
An EU regulation set of standards for electronic transactions.
Electronic document
An electronic representation of a page-oriented compilation of text, images, and graphic data, as well as metadata that helps to identify, understand, and display the data. Electronic documents can be reproduced on paper or displayed on screen without any significant loss of information.
Encryption
Data are encrypted so that outsiders cannot deduce their meaning. For the communication between sender and recipient, the recipient generates a key pair consisting of a private and a public key. If the sender now encrypts the data with the public key, only the recipient can decrypt the data because the recipient remains the sole owner of the private key. For the encryption, algorithms like RSA with key lengths of currently 2048 bits are used. The usual procedures for digital signatures are based on this technology.
End–of–line marker (EOL marker)
A sequence of one or two characters marking the end of a line and consisting of:
a CARRIAGE RETURN character (U+000D)
or a LINE FEED character (U+000A)
or a CARRIAGE RETURN followed directly by a LINE FEED.
ETSI - European Telecommunications Standards Institute
European organization for the standardization of digital signatures.
Filter
An optional component of a datastream specification that defines how datastream data should be decoded before it is used. Filter examples: Flate, DCT.
Font
Identifies collections of graphics that can be glyphs or other graphic elements [ISO 15930-4].
A font file defines how glyphs are displayed. If a font file is contained in a PDF file, then the associated font is embedded in the file.
If the font does not contain a complete character set but, for example, only the glyphs of the characters used in the document, the term used is subsetted font.
Function
A special type of object representing a parameterized class, including mathematical formulae and sampled representations of arbitrary resolution.
Gaussian Filter
A filter that can minimize image noise by smoothing or applying a soft-focus effect during the image editing process.
Glyph
Recognizable abstract graphic symbol, independent of any specific design [ISO/IEC 9541-1]. Glyph examples of the character “A” include: A, A, A
Graphic state
The uppermost element of a memory stack contains the parameters that control graphic representation. The graphic state contains information such as color, font, font size, current transformation matrix, etc.
Hash
A hash value (hash for short) is a number that is calculated from any quantity of data such as documents, certificates, messages, etc. This number is often much shorter than the original data (a few bytes). The hash value has the characteristic that it is the same for the same data and is almost certainly unique for different data. The original data can also not be determined from the hash value. For the calculation, hash algorithms are used such as SHA-1 or SHA-2.
Hinting
Hinting is a method that improves the display quality of fonts by optimizing the outlines when displaying the characters.
HSM - Hardware Security Module
Device for securely storing private keys and also for encryption, decryption, or creation of digital signatures and efficient and secure implementation of encryption and signature algorithms.
ICC Profile
Color profile compliant with the ICC specification [ISO 15076-1:2005].
Indirect object
An object designated by a positive integer object followed by a non-negative integer generation number followed by obj and ending with endobj.
Integer object
Mathematical integer implemented so that 0 forms the center of the interval. The number can have one or more digits and an optional sign.
Interpolation
A method that controls the combination of pixel density and color depth in raster images during editing. Bilinear interpolation is an extension of linear interpolation for scaling and displaying textures in rendered images.
ISO - International Standards Organisation
International organization for the standardization of PDF and PDF/A, etc. Switzerland is represented in the ISO by the Swiss Standards Body (SNV).
ISO 19005
See PDF/A
ISO 32000
See PDF.
ISO/IEC 18014
ISO Standard for time stamping services
ITU-T - ITU Telecommunication Standardization Sector
Coordinates standards for telecommunications and is one of three sectors of the ITU (International Telecommunication Union).
LTV - Long-Term Validation
Enhancement of digital signatures with additional data so that long-term verifiability is possible without online services. The additional data consist of the trust chain of the certificates from the owner certificate up to the root certificate of the issuer and also information that certifies the validity of the certificates at the time of signature.
MDP - Modification Detection and Prevention Signature
Enable detection of disallowed changes specified by the author. A document can contain only one MDP signature, which must be the first in the document. Other types of signatures may be present.
Multiple Master Fonts
Variant of the PostScript Type 1 format, which allows for all conceivable display variations of a specific font. Other elements such as line thickness and proportions can be adjusted alongside the common specifications.
Name object
An atomic symbol uniquely defined by a sequence of characters beginning with a forward slash (/, U+002F), whereby the forward slash is not part of the name.
Name tree
Similar to a dictionary that associates keys and values, whereby the keys in a name tree are strings and are ordered.
NIST - National Institute of Standards and Technology
United States Federal Agency is responsible for standardization processes.
Null object
A singular object of type null, designated by the keyword null, whose type and value are different to every other object.
Number tree
Similar to a dictionary that associates keys and values, whereby the keys in a number tree are strings and are ordered.
Numeric object
Either an integer object or a real object.
OASIS/DSS - Organization for the Advancement of Structured Information Standards /Digital Signing Services
A standard of the OASIS organization for signing services based on the XML syntax.
Object
A basic data structure used to construct PDF files. An object can be of the following types: array, Boolean, dictionary, integer, null, real, datastream or string.
Object reference
An object value that allows one object to be referenced with another. It has the form “<n> <m> R”, where <n> is an indirect object number, <m> is its version number and R is the uppercase letter R.
Object stream
A datastream containing a sequence of PDF objects.
OCR
Optical character recognition (optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into text, whether from a scanned document or a photo of a document.
OCSP - Online Certificate Status Protocol
Protocol for the online query of the validity status of a specific certificate based on the ASN.1 syntax.
PAdES - PDF Advanced Electronic Signature Profiles
An ETSI Standard for the structure of CMS signatures and their embedding in PDF documents.
A file format standardized by ISO (ISO-32000) for document exchange. For frequent PDF applications, there are special sub-standards such as PDF/A (ISO-19005) for archiving digital documents.
PDF/A
Portable Document Format file format for archiving, defined in [ISO 19005]. Describes the requirements that PDF documents must fulfill to comply with the standards PDF/A-1a and PDF/A-1b. The basic requirements of PDF/A-1b are:
Conformity with PDF Version 1.4
Embedding of all fonts used for visible text
Embedding of color profiles if specified by the color space used
No encryption
No transparency
The following applies additionally to PDF/A-1a:
Encoding text as UNICODE
Structural information must exist (tagging)
PIN - Personal Identification Number
Secret code needed for the access to a token.
PKCS - Public Key Cryptography Standards
A series of proprietary standards of RSA Security Incorporated. The most common standards are: encryption of signatures (PKCS#1), message format for signatures (PKCS#7), interface to token (PKCS#11), and file format for keys and certificates (PKCS#12).
PKI - Public Key Infrastructure
System that creates, stores, and verifies a pair of a private and a public key.
Real object
Approximate mathematical real numbers but with limited range and precision and written as one or more digits with an optional sign and optional decimal point.
Rectangle
A specific array object that defines the position and bounding boxes on a page for various objects. It is represented as an array of 4 numbers designating the coordinate pairs of two diagonally opposed corners, usually in the form [bottom left X, Y, top right X, Y].
Resource dictionary
Associates resources with names, uses the objects in content datastreams with the resource objects themselves and organizes them in various categories (e.g. font, color space, pattern).
Signature, signing
Data with which the integrity and, optionally, the authenticity of a document can be ensured. The signature is essentially made as follows: the hash value is formed from the data to be signed and encrypted with the private key. The signature is packed into a CMS message together with certificates and other information.
Space character, white-space character
Text character used to represent an orthographic white space. Includes the following characters:
HORIZONTAL TABULATION (U+0009)
LINE FEED (U+000A)
VERTICAL TABULATION (U+000B)
FORM FEED (U+000C)
CARRIAGE RETURN (U+000D)
SPACE (U+0020)
NOBREAK SPACE (U+00A0)
EN SPACE (U+2002)
EM SPACE (U+2003)
FIGURE SPACE (U+2007)
PUNCTUATION SPACE (U+2008)
THIN SPACE (U+2009)
HAIR SPACE (U+200A)
ZERO WIDTH SPACE (U+200B)
IDEOGRAPHIC SPACE (U+3000)
Stream object
Consists of a dictionary followed by zero or more bytes parenthesized by the keywords stream and endstream.
String object
Consists of a series of bytes (unsigned integer values ranging from 0 to 255). The bytes are not integer objects but are stored in a more compact form.
TLS - Transport Layer Security
Further development of Secure Sockets Layer (SSL), a hybrid encryption protocol for secure data transmission on the internet.
Token
A “container” (part of the HSM, USB stick, smartcard, etc.) that contains private keys and protects against unauthorized access. For practical reasons, the token often also contains corresponding certificates and public keys, which do not need to be protected.
Transparency
In a PDF, graphic objects are applied onto a page in sequence, where each object is composited with the already present background. Initially, this background is only the empty page and in later steps it consist of all the composited objects added so far. In addition to the objects, a page defines a mode of compositing for each object. Depending on this mode, the underlying background either blends transparently with the new object, or it is covered opaquely. In general, the presence or absence of transparency on a PDF page cannot easily be detected by hand. But certain transparency isn’t allowed when working with PDF/A-1 formats, so converting a PDF with transparency to a PDF/A-1 can cause visual differences. The standards PDF/A-2, A-3 and A-4 on the other hand allow for transparency.
TSA - Time Stamp Authority
Accredited provider of time stamp services.
TSP - Time Stamp Protocol
Protocol for the online retrieval of cryptographic time stamps based on the ASN.1 syntax.
Verification, verifying
Validity check of a digital signature. A signature is verified as follows: the signature is decrypted with the public key. The hash value contained in the signature message is compared with the hash value calculated from the signed data. If the hashes match then the signature is valid.
Version
Designates the PDF reference used to generate the document. The processing PDF software must support this version to guarantee correct processing. PDF versions range from 1.0 to 1.8 (as per 2009). PDF 1.4 corresponds to Acrobat 5, PDF 1.8 corresponds to Acrobat 9.
Web capture
Designates the process of generating PDF content by importing and possibly converting files from the Internet or local files. The files can be imported in any format such as HTML, GIF, JPEG, text, and PDF.
WebAssembly
WebAssembly (often abbreviated to "Wasm") is a portable data format for binary code that can be executed in a suitable runtime environment, for example in a web browser. Unlike JavaScript, the code is in highly optimized binary form that is close to the hardware, which provides a significant performance advantage. The W3C (World Wide Web Consortium) launched the standard in 2017 with the goal of abstracting, optimizing, and more broadly supporting its predecessor technology, asm.js. Since WebAssembly is a compilation target, different programming languages can be used.
X.509
ITU-T Standard for a public key infrastructure to create digital certificates based on the ASN.1 syntax.
X.690
ITU-T Standard for encoding digital messages based on the ASN.1 syntax: Basic Encoding Rules (BER), Canonical Encoding Rules (CER), and Distinguished Encoding Rules (DER).
XAdES - XML Advanced Electronic Signatures
An ETSI Standard for the creation of signatures and their embedding in XML data.
XML - Extensible Markup Language
Format for the exchange of hierarchically structured data in text form between machines.
XMP packet
Structured wrapper for serialized XML metadata that can be embedded in various file formats.