Skip to content

JustusRijke/DiffPDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiffPDF

Build codecov Python 3.10+ License: MIT PyPI - Version PyPI - Downloads

CLI tool for detecting structural, textual, and visual differences between PDF files, for use in automatic regression tests.

How It Works

DiffPDF uses a fail-fast sequential pipeline to compare PDFs:

  1. Hash Check - SHA-256 comparison. If identical, exit immediately with pass.
  2. Page Count - Verify both PDFs have the same number of pages.
  3. Text Content - Extract and compare text from all pages (ignoring whitespace).
  4. Visual Check - Render pages to images and compare using pixelmatch-fast.

Each stage only runs if all previous stages pass.

Installation

Install Python (v3.10 or higher) and install the package:

pip install diffpdf

CLI Usage

Usage: diffpdf [OPTIONS] REFERENCE ACTUAL

  Compare two PDF files for structural, textual, and visual differences.

Options:
  --threshold FLOAT       Pixelmatch threshold (0.0-1.0)
  --dpi INTEGER           Render resolution
  --output-dir DIRECTORY  Diff image output directory (optional, if not specified no diff images are saved)
  -v, --verbose           Increase verbosity
  --version               Show the version and exit.
  --help                  Show this message and exit.

Exit Codes

  • 0 — Pass (PDFs are equivalent)
  • 1 — Fail (differences detected)
  • 2 — Error (invalid input or processing error)

Library Usage

from diffpdf import diffpdf

# Basic usage (no diff images saved)
diffpdf("reference.pdf", "actual.pdf")

# With options (save diff images to ./output directory, extract higher quality images)
diffpdf("reference.pdf", "actual.pdf", output_dir="./output", dpi=300)

Development

Install uv. Then, install dependencies & activate the automatically generated virtual environment:

uv sync --locked
source .venv/bin/activate

Skip --locked to use the newest dependencies (this might modify uv.lock)

Run tests:

pytest

Check code quality:

ruff check
ruff format --check
ty check

Better yet, install the pre-commit hook, which runs code quality checks before every commit:

cp hooks/pre-commit .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

Acknowledgements

Built with PyMuPDF for PDF parsing and pixelmatch-fast (Python port of pixelmatch) for visual comparison.

About

A tool for comparing PDF files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •