Skip to content

PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) using Tesseract and Llama3.2-vision to extract relevant information from a photo of a receipt.

Notifications You must be signed in to change notification settings

Y1D1R/PyFacture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PyFacture

PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) to extract relevant information from a photo of a receipt, such as purchased products, their prices, and the date of purchase.

Features

  • Image Processing: Enhances receipt images for better OCR accuracy.
  • Optical Character Recognition (OCR): Extracts text from receipt images using Tesseract or Llama.
  • Data Extraction: Analyzes OCR text to identify products, prices, and dates.
  • Excel File Management: Creates and updates Excel files to store extracted data.

Installation

1. Clone the Repository

git clone https://github.com/Y1D1R/PyFacture.git
cd PyFacture

2. Install Dependencies

Install the required Python packages using pip:

pip install -r requirements.txt

3. Install Tesseract OCR and Ollama

PyFacture relies on Tesseract OCR for text extraction.
Follow the instructions below based on your operating system.

Once you have Ollama installed, install the Llama 3.2-Vision model(6 GB):

ollama run llama3.2-vision

More information here : https://sebastian-petrus.medium.com/build-a-local-ollama-ocr-application-using-llama-3-2-vision-bfc3014e3ad6

4. Usage

4.1. Prepare Your Data

Place your receipt images in the "data/input/" directory.
Ensure that the images are clear, well-lit, and free from distortions for optimal OCR results.

4.2. Run the Application

Execute the main script, then choose the method from the menu to process the receipts and extract data:

python pyfacture/main.py

Menu

4.3. View the Results

4.3.1 Tesseract OCR

Original Receipt

Thresholded Receipt

The extracted data will be saved as Excel files in the "data/output/" directory.

OCR Result

4.3.1 Llama OCR

Llam OCR

About

PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) using Tesseract and Llama3.2-vision to extract relevant information from a photo of a receipt.

Topics

Resources

Stars

Watchers

Forks

Languages