This repository contains a minimal setup for extracting invoice fields with the dots.ocr model using a CPU-only stack.
Build and start the service with Docker Compose:
cd invoice-extract
docker compose up -d --buildThe API will be available at http://localhost:8000.
Send a PDF file to the /extract endpoint using curl:
curl -F "file=@/path/to/invoice.pdf" http://localhost:8000/extractThe response is a JSON object containing the parsed invoice fields.
Process all PDF or image files in a folder and save the outputs:
mkdir -p out
for f in /invoices/*.{pdf,jpg,png,jpeg}; do
[ -e "$f" ] || continue
b=$(basename "$f")
curl -s -F "file=@$f" http://localhost:8000/extract > "out/${b%.*}.json"
done