Overview
Inspired by the Isartor test set for validating PDF/A compliance we are working on a similar style set of negative tests for basic XMP compliance (PDF/A XMP TechNotes). While it is clear that this work needs to be done, nobody has applied the required resources to the issue since the release of PDF/A 19005-1 in 2005. We're helping to fill the gap.
Approach
The PDF/A extension schemas were one of the few new features introduced with PDF/A. While these schemas are a great way to clearly specify schemas used by XMP in PDF/A files, introducing new features in a specification without at least one existing reference implementation has its pitfalls. Three years later we're catching up and on trying to validate XMP we're discovering holes and errors in the PDF/A extension schemas. (example: TechNote 0009 is not clear about 'required' vs 'optional' properties. For pdfaSchema, it would be good if only one of pdfaSchema:property or pdfaSchema:valueType were required to contain members - this makes implementation of the 'Auxiliary Schemas' like 'Dimensions' value type possible). This immaturity of the PDF/A extension schemas is what led us to implement all the pre-defined schemas of PDF/A (and their required auxiliary schemas) using the very same PDF/A extension schemas. By eating our own dog food in this way, we got much closer to the nuts and bolts issues surrounding the PDF/A extension schemas.
Deliverables
While each vendor will obviously implement their own XMP validator for PDF/A validation and conversion, there are some areas where we can easily collaborate. We believe that it is in all our interests to openly share an RDF and PDF/A compliant XMP implementation of the pre-defined schemas required to validate PDF/A files. This implementation is available to members and non-members of the PDF/D Consortium under the regular LGPL license (in a nutshell: all we require is attribution/credit) Feedback, corrections and questions welcome:
RDF Schemas for PDF/A: pdfa.rdf.zip (1.1 - last updated in September 2009) In addition to the schema implementation, the consortium is working on a rich set of validation tests for XMP using the same testing methodology as the Isartor compliance tests. These tests are only available to PDF/D Consortium members. This screenshot shows a sampling of the tests illustrating the use of the PDF/A or TechNote clause and the test name used in naming each test case file:

pdfaValidate Schema
The XMP Specification makes provision for extending existing XMP Properties with Qualifier Properties that are ignored by applications that are not aware of them. We used this feature and the pdfaValidate schema to extend both pdfaProperty and pdfaField to add validation information. When defining the schemas we wish to validate, we can now add the following attributes:
status
Description: Used by validator to flag errors of omission, inclusion or raise warnings. Type: Closed Choice of Text Values: required|prohibited|deprecated|restricted|recommended|ignored Note: 'deprecated' is similar to 'prohibited' only it is flagged as a warning and not an error by validators.
constraint
Description: Regular expression used to constrain "Closed Choice of " values. We still need a way to flag Open vs Closed. Regular expressions always need to match all input (start with '^' and end with '$'). Other valid constraint values include: 'base64': used to validate Thumbnail xapGImg:image property for example. Numeric ranges depicted as: '[0,255]', '(0,)', '[-128,127]', etc. Type: Text Comparison to other properties: '>=@OtherProperty', '==@OtherProperty'
predefined
Description: Predefined or unpredefined entire schema or specific property or specific type. Default is True (predefined). Type: Boolean Values: False | True
default
Description: A default value for a required property. It shall be corresponding type of the property value. Type: Text Values: string with value of corresponding value type of property
subst
Description: Property name of a predefined schema property used to for substitution instead of this one, used with pdfaValidate:status = "prohibited | deprecated". Type: Text Values: string with qualified property name, e.g. "xmp:Identifier"
count
Description: Maximum count of items for array value type properties (e.g. dc:creator) Type: Integer Values: Maximum number of array items for the array value type
standard
Description: This value determines which specification is violated when constraints are not met. Type: Closed Choice of Text Values: pdf|pdfa|pdfd|xmp
clause
Description: This is the clause in the specification which is violated when constraints are not met. Type: Text Value: string, typically dot delimited integers This schema is defined as a regular PDF/A extension schema and is included in the pre-defined schema download. The fields 'clause' and 'standard' will already be familiar with those of you who have been following our work on the Open Compliance Reporting format.
|