CLI tool for detecting structural, textual, and visual differences between PDF files, for use in automatic regression tests.
DiffPDF uses a fail-fast sequential pipeline to compare PDFs:
- Hash Check - SHA-256 comparison. If identical, exit immediately with pass.
- Page Count - Verify both PDFs have the same number of pages.
- Text Content - Extract and compare text from all pages (ignoring whitespace).
- Visual Check - Render pages to images and compare using pixelmatch-fast.
Each stage only runs if all previous stages pass.
Install Python (v3.10 or higher) and install the package:
pip install diffpdfUsage: diffpdf [OPTIONS] REFERENCE ACTUAL
Compare two PDF files for structural, textual, and visual differences.
Options:
--threshold FLOAT Pixelmatch threshold (0.0-1.0)
--dpi INTEGER Render resolution
--output-dir DIRECTORY Diff image output directory (optional, if not specified no diff images are saved)
-v, --verbose Increase verbosity
--version Show the version and exit.
--help Show this message and exit.
Exit Codes
0— Pass (PDFs are equivalent)1— Fail (differences detected)2— Error (invalid input or processing error)
from diffpdf import diffpdf
# Basic usage (no diff images saved)
diffpdf("reference.pdf", "actual.pdf")
# With options (save diff images to ./output directory, extract higher quality images)
diffpdf("reference.pdf", "actual.pdf", output_dir="./output", dpi=300)Install uv. Then, install dependencies & activate the automatically generated virtual environment:
uv sync --locked
source .venv/bin/activateSkip --locked to use the newest dependencies (this might modify uv.lock)
Run tests:
pytestCheck code quality:
ruff check
ruff format --check
ty checkBetter yet, install the pre-commit hook, which runs code quality checks before every commit:
cp hooks/pre-commit .git/hooks/pre-commit
chmod +x .git/hooks/pre-commitBuilt with PyMuPDF for PDF parsing and pixelmatch-fast (Python port of pixelmatch) for visual comparison.