PDF/D Overview

Better by Design

PDF is a mature format and one of its strengths is backward compatibility: any feature in a 15 year old PDF file can still be understood by today’s PDF tools. The downside of this is that PDF has become an unreasonably complex format. Writing PDF files was never the problem: you just write some small subset of PDF that is well understood. However, reading PDF files, is almost impossible for independent software developers to do reliably. Just because Acrobat Reader can open a PDF file, does not mean it is compliant or that it can be easily read.

PDF/D is a deliberate departure from this obsession with backward compatibility. We are deliberately designing a better PDF and not allowing ourselves to be hampered by legacy.

Of course any PDF/D file is still 100% PDF compliant. However, software created to read PDF/D files can safely assume a much smaller set of PDF features and a much stricter format.

Concise

PDF Reference 1.7 is over 1000 pages. ISO 32000-1 whittled it down to 750 pages. We’d like to see a complete PDF/D specification that includes all the features we need in less than 150 pages. A software developer new to PDF should be able to pick up PDF/D and become familiar with all features in a weekend.

Choose One

In PDF there are often multiple methods of achieving the same goal. Instead of supporting all these methods, we simply pick the one that best suits our needs for PDF/D and prohibit the rest. For example, limit compression filters to FlateDecode, CCITTFaxDecode and JPGDecode (possibly also JBIG2Decode and JPXDecode). Discard LZWDecode, ASCIIHexDecode, ASCII85Decode and RunLengthDecode.

Another example is the support of two different ways of doing XRef tables. Compressed XRef streams are obviously superior so we’d love to discard the pre-1.5 legacy plain text Xref tables. However, in order to support PDF/A-1, we'll still keep a minimal subset of this legacy feature.

Best Practices

Some aspects of the PDF/D design are not just about the file format but also about how it is used. For example, object numbers are meaningless beyond being a mechanism to store unique references between objects. Renumbering on saving to PDF/D in order to reduce the length in bytes of object numbers and allocating shorter numbers to the most commonly referenced objects makes sense as a "best practice".

Obsolete

Some features are mentioned in ISO 32000-1 as obsolete. These should obviously be discarded.

Others, like supporting incremental updates and multiple generations of the same object in a PDF file, appear to us to be needlessly complex mechanisms that were designed in a time of much weaker and smaller computers. Discarding these kinds of "features" makes sense too.

Legacy

Obviously a handy tool for software developers using PDF/D would be a "lossless" PDF to PDF/D conversion library. This allows developers to limit their attention to PDF/D while still being able to read legacy PDF files from the wild. One of our projects is such a tool which will be available to consortium members in source code form.