All about PDF/A for long term archiving

All about PDF/A for long term archiving

PDF/A is the standard for archiving electronic documents. The format PDF is widespread globally. It is used for a variety of purposes in the industrial, public and private sectors. The PDF/A Standard is the perfect instrument to ensure preservation and reproducibility of documents over extended periods.

What is PDF/A?

PDF/A is a format that meets the requirements for long-term archiving. It combines the benefits of the PDF format with other specific requirements for long-term archiving.

ISO Standard 19005 defines a file format based on PDF called PDF/A. The format offers the principle of a self-contained document. This means that the visual appearance of a document remains preserved for an extended period, independent of tools and systems for producing, saving and reproducing it.

This standard specifies neither the methods nor the intention or the purpose of preservation. The standard is thus intended to guarantee that electronic documents can be viewed in their original appearance, even in the future.

For this reason, the document may not refer, either indirectly or directly, to an external source, for example an external image or a font that is not embedded in the document itself.

The PDF/A standard is a set of rules that defines which criteria a document must meet to be PDF/A-compliant. It is more limited in scope than PDF, because PDF itself is already the underlying standard.

What is the difference between PDF and PDF/A?

The normal PDF format does not guarantee long-term reproducibility or complete independence from the software and the output device. In order to guarantee both principles, it was necessary to both limit and expand the existing PDF specification. It was clear from the outset that PDF/A had to be based on an existing version of PDF in order to achieve the acceptance of a wide audience.

Adobe’s PDF reference 1.4 (Acrobat 5) forms the basis for the PDF/A standard (ISO 19005). It states that PDF/A “must meet all requirements of the PDF reference which additionally include this part of the ISO 19005 standard”. In other words, the standard describes only the differences from the reference.

Hier gibt es eine Beschreibung

Certain functions that are supported by PDF 1.4, such as transparency or audio/video reproduction, were excluded from PDF/A. At the same time, PDF 1.4 describes optional elements that are mandatory in PDF/A. For example, all visibly used fonts must be embedded in PDF/A.

In short, PDF/A primarily defines the specific properties set out in PDF reference 1.4 that are mandatory, recommended, restricted or forbidden.

Where does PDF/A format come from?

How has PDF/A evolved? Why was a PDF/A initiative founded? How were documents archived in the past?

What are the versions and conformance levels of PDF/A?

PDF/A is designed as a multi-part standard series: PDF/A-1, PDF/A-2, PDF/A-3 und PDF/A-4. A later issued version does not replace or supersede an earlier issued one in any way. For example, PDF/A-1 conform documents that were already created will remain valid for long-term archiving. They can remain unchanged, so an „upgrade“ to PDF/A-2 is not necessary.

The PDF/A versions 1-3 are additionally divided into two to three conformance levels, which indicate whether a document, in addition to unambiguous visual reproducibility (Basic = b), also allows the use of Unicode text (Unicode = u) or barrier-free use (Accessibility = a). With PDF/A-4, only two levels are formed depending on the content or intended use.

Recommendation

Unless the nature of documents suggests a different standard, PDF Tools recommends archiving in the PDF/A-2 standard. Conversion to PDF/A-2 is easy and of high quality with the right software tools. In addition, this standard avoids risks associated with unknown file attachments (see PDF/A-3).

Using PDF/A-1 for today's conversion processes is not advisable, since this standard's lack of functionality, such as transparency, can lead to visual changes in the document or prevent the conversion from being implemented. PDF/A-4 also appears to make little sense, since the vast majority of existing documents are based on PDF 1.X, and conversion to PDF/A-4 would therefore represent an unnecessarily complex conversion compared to PDF/A-2.

How are Microsoft Office documents, emails and web pages archived?

When compared with the preservation of data in its original format, there are many advantages to archiving documents from digital sources as PDF/A. The source applications are rapidly being developed further. As a result of this, after only a few years, the readability and the authentic display of data can no longer be guaranteed. Furthermore, a company must maintain all of the applications that are used and all of the platforms on which they operate.

How are PDF/A-compliant documents created and processed?

Detailed knowledge of PDF/A standards is necessary in order to create and accurately display PDF/A documents. Nevertheless, this knowledge alone is not sufficient in the attempt to optimally configure PDF/A-related processes.

What happens during conversion to PDF/A?

The conversion of a document into a PDF/A is a hybrid conversion. This means that not only the PDF/A specification influences the conversion parameters, but also those of the PDF standard itself. Typical examples are that the embedded fonts and the colors used must be calibrated. Less known is that the PDF/A standard contains additional, stricter rules.

PDF/A has been designed with document creation in mind not conversion. Nevertheless, a PDF to PDF/A converter must generate a new a PDF file, which follows the rules of the standard. Here are some examples:

  • Uncalibrated color spaces can be easily replaced with calibrated ones by choosing an ICC color profile for each of the device dependent color space DeviceGray, DeviceRGB and Device CMYK.

  • It is not necessary to introduce an output intent if it is not present in the input file. However, if the input file already has an output intent profile, e.g. a CMYK profile, then it is advised to keep it and the device dependent colors that refer to it.

  • Embed missing font programs is only easy if the original font is available which is often not the case. If the font is not available, it must be replaced with a substitute font that should be similar to the original one.

  • If transparency is prohibited, such as with PDF/A-1, then the converter must perform some sort of transparency flattening or refuse the file if it cannot.

  • With prohibited features such as JavaScript, multimedia content, some kind of actions etc. the converter has the option of removing the features or refuse.

Why does a PDF/A document need to be validated?

For businesses, it is essential that they know that the PDF and PDF/A documents passing through the business-relevant processes actually meet the respective standard. Not everything labeled as PDF/A is actually PDF/A – PDF/A is a quality criterion that supports the standard-compliant archiving in a long-term digital archive. Yet how can it be ensured that PDF/A documents generated externally as well as internally meet all of the standard’s criteria?

A PDF validator checks the conformance of a PDF document with a certain specification. The tool includes several sets of rules – mostly in the form of profiles – that are used to analyze the documents accordingly.

For what purpose are PDF/A documents signed?

In today’s world, digital documents are closely intertwined with business processes. Electronic signatures play a key role in this respect. Knowledge in this area is sorely lacking, however. Electronic signatures have four main functions:

  • Replacing handwritten signatures: Electronic signatures can satisfy the same requirement as a hand-written signature, provided that they meet the applicable legal requirements.

  • Protecting integrity: Electronic signatures are a “seal” for digital documents, because they make subsequent changes to the document obvious.

  • Guaranteeing authenticity: The electronic signature can be used to ensure that the signature can be clearly assigned to a natural or legal person.

  • Ensuring authorization: Rights and authorities can be defined and managed in the certificate and therefore traced back to a certain person. An electronic signature can ensure that the change is always identifiable and traceable.

How electronic signatures are actually used in business processes depends on the particular situation. In the case of signed documents, the format PDF/A is recommended along with digital signing software that meets all requirements for valid signatures and long-term archiving.

Like what you see? Share with a friend.

Grüezi! How can we help?

Phone