Skip to content

PdfMerger breaks PDF/A complianceΒ #1012

Open
@MartinThoma

Description

@MartinThoma

Use PdfMerger with a single PDF/A compliant document I would expect almost exactly the same output file as the input file. But it's way different - and PDF/A compliance is broken.

Code + PDF

Using this as an example document: https://www.pdfa.org/wp-content/uploads/2011/08/PDFA-in-a-Nutshell_1b.pdf

And https://demo.verapdf.org/ to verify if the document is compliant.

from PyPDF2 import PdfReader, PdfMerger

reader = PdfReader("PDFA-in-a-Nutshell_1b.pdf")
metadata = reader.metadata

merger = PdfMerger()
merger.append(reader)
merger.add_metadata(metadata)

with open("merged.pdf", "wb") as fp:
    merger.write(fp)

Issues

  • PDFA-in-a-Nutshell_1b.pdf has 6.4 MB and is PDF/A compliant
  • merged.pdf has 5.0 MB and is NOT PDF/compliant

verapdf.org mentions that 100 issues were detected. It lists the following 3:

  1. The document catalog dictionary of a conforming file shall contain the Metadata key. (docs) x1
  2. DeviceCMYK may be used only if the file has a PDF/A-1 OutputIntent that uses a CMYK colour space (docs) x97
  3. The file trailer dictionary shall contain the ID keyword. The file trailer referred to is either the last trailer dictionary in a PDF file, as described in PDF Reference 3.4.4 and 3.4.5, or the first page trailer in a linearized PDF file, as described in PDF Reference F.2 (docs) x1
  4. If a document information dictionary does appear at a document, then all of its entries that have analogous properties in predefined XMP schemas, shall also be embedded in the file in XMP form with equivalent values. (docs) x1

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfMergerThe PdfMerger component is affectedis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-pdf/a-complianceAnything related to PDF/A compliance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions