Open
Description
Use PdfMerger with a single PDF/A compliant document I would expect almost exactly the same output file as the input file. But it's way different - and PDF/A compliance is broken.
Code + PDF
Using this as an example document: https://www.pdfa.org/wp-content/uploads/2011/08/PDFA-in-a-Nutshell_1b.pdf
And https://demo.verapdf.org/ to verify if the document is compliant.
from PyPDF2 import PdfReader, PdfMerger
reader = PdfReader("PDFA-in-a-Nutshell_1b.pdf")
metadata = reader.metadata
merger = PdfMerger()
merger.append(reader)
merger.add_metadata(metadata)
with open("merged.pdf", "wb") as fp:
merger.write(fp)
Issues
PDFA-in-a-Nutshell_1b.pdf
has 6.4 MB and is PDF/A compliantmerged.pdf
has 5.0 MB and is NOT PDF/compliant
verapdf.org mentions that 100
issues were detected. It lists the following 3:
- The document catalog dictionary of a conforming file shall contain the Metadata key. (docs) x1
- DeviceCMYK may be used only if the file has a PDF/A-1 OutputIntent that uses a CMYK colour space (docs) x97
- The file trailer dictionary shall contain the ID keyword. The file trailer referred to is either the last trailer dictionary in a PDF file, as described in PDF Reference 3.4.4 and 3.4.5, or the first page trailer in a linearized PDF file, as described in PDF Reference F.2 (docs) x1
- If a document information dictionary does appear at a document, then all of its entries that have analogous properties in predefined XMP schemas, shall also be embedded in the file in XMP form with equivalent values. (docs) x1