A simple and intuitive PDF OCR application built with PySide6 (Qt6) and Tesseract OCR.
Download and run - no installation required!
The pre-built executables are 100% standalone and include:
- β Python interpreter
- β All Python packages
- β Poppler (PDF processing)
- β Tesseract OCR (text recognition)
No additional software installation needed! Just download and run.
See Installation below for download links.
- π Drag & Drop Interface - Simply drag PDF files into the window
- π File Browser - Or use the file picker to select PDFs
- π OCR Processing - Extract text from scanned PDFs using Tesseract
- π Progress Feedback - Real-time status updates during processing
- π Copy to Clipboard - One-click copy functionality (macOS/Linux/Windows)
- π Error Recovery - Retry or start over options on failure
- π¨ Modern UI - Clean, user-friendly interface with visual feedback
- π¦ Fully Standalone - Zero dependencies, zero installation required
Nothing required! The executable includes everything you need - Python, Poppler, and Tesseract OCR are all bundled.
Just download and run! π
macOS:
brew install tesseract popplerLinux (Ubuntu/Debian):
sudo apt-get install tesseract-ocr poppler-utilsWindows:
- Install Tesseract OCR:
- Recommended: Using winget:
winget install --id UB-Mannheim.TesseractOCR - Or download from Tesseract OCR
- Recommended: Using winget:
- Install Poppler
- Optional: For WSL users, you can also install via:
wsl sudo apt-get install tesseract-ocr poppler-utils
100% Standalone - No installation required!
-
Download the latest release for your platform from Releases
- Windows:
QuickPdfOcr.exe - macOS:
QuickPdfOcr.app(ARM64 or Intel) - Linux:
QuickPdfOcr
- Windows:
-
Run the application! That's it! π
What's Included:
- β Python interpreter (no Python installation needed)
- β All Python packages (PySide6, pytesseract, pdf2image, Pillow, PyPDF2)
- β Poppler binaries (for PDF processing)
- β Tesseract OCR with English language data (for text recognition)
Note: The bundled Tesseract includes English language data by default. For other languages, you can still install Tesseract system-wide and the app will use it instead.
- Clone the repository:
git clone https://github.com/KSEGIT/QuickPdfOcr.git
cd QuickPdfOcr- Install Python dependencies:
pip install -r requirements.txt- Install system dependencies (see above)
-
Clone and install dependencies (see Option 2)
-
Build executable:
python build.py- Find your executable in the
dist/folder
Run the graphical interface:
python main.pyWorkflow:
- Drag and drop a PDF file or click "Open PDF File"
- Click "Start OCR" to begin text extraction
- Wait for processing (progress updates shown)
- Copy extracted text or start over with a new file
You can also use the OCR processor directly from command line:
python components/pdf_ocr.py document.pdf output.txtOptions:
--dpi <value>- Set DPI for conversion (default: auto-detect)--lang <code>- Set language for OCR (default: eng)
Examples:
# Auto-detect DPI
python components/pdf_ocr.py document.pdf
# Manual DPI and output file
python components/pdf_ocr.py document.pdf output.txt --dpi 400
# French language
python components/pdf_ocr.py document.pdf --lang fraCommon language codes:
eng- Englishfra- Frenchdeu- Germanspa- Spanishchi_sim- Chinese Simplifiedjpn- Japanese
QuickPdfOcr/
βββ main.py # Application entry point
βββ components/
β βββ __init__.py
β βββ pdf_ocr.py # OCR processor component
β βββ ocr_worker.py # Background worker for GUI
βββ ui/
β βββ __init__.py
β βββ main_window.py # Main application window
βββ requirements.txt # Python dependencies
- PySide6 - Qt6 framework for Python (GUI)
- Tesseract OCR - Open-source OCR engine
- pdf2image - PDF to image conversion
- PyPDF2 - PDF manipulation and analysis
- Pillow - Image processing
- Tesseract OCR (must be installed on your system)
- Poppler (bundled with pre-built binaries, or install separately if running from source)
See requirements.txt for Python package versions:
- pytesseract>=0.3.10
- pdf2image>=1.16.0
- Pillow>=10.0.0
- PyPDF2>=3.0.0
- PySide6>=6.6.0
- pyinstaller>=6.0.0 (for building binaries)
This project is open source and available under the MIT License.
See the LICENSE file for details.
For third-party component licenses (including Poppler), see THIRD_PARTY_LICENSES.md.
Build for your current platform:
pip install -r requirements.txt
python build.pyThe executable will be in the dist/ folder.
Note: If you want to bundle Poppler with your local build, you need to:
- Install Poppler on your system (see Prerequisites above)
- The build script will automatically detect and bundle it
Alternatively, you can manually create a poppler_binaries directory in the project root and place the Poppler binaries there before building.
The project includes GitHub Actions workflow that automatically builds executables for all platforms when you:
- Push a tag starting with
v(e.g.,v1.0.0) - Manually trigger the workflow
The workflow automatically downloads and bundles Poppler for each platform.
To create a release:
git tag v1.0.0
git push origin v1.0.0Artifacts will be available in the GitHub release.
Contributions are welcome! Please feel free to submit a Pull Request.
Issue: "Tesseract not found"
- Make sure Tesseract is installed and in your system PATH
- macOS:
brew install tesseract - Linux:
sudo apt-get install tesseract-ocr - Windows:
winget install --id UB-Mannheim.TesseractOCRor download from here
Issue: "Failed to convert PDF to images"
- If using pre-built binary: This should not occur as Poppler is bundled
- If running from source: Ensure Poppler is installed
- macOS:
brew install poppler - Linux:
sudo apt-get install poppler-utils - Windows: Install from here
- macOS:
Issue: Poor OCR quality
- Try increasing DPI (e.g.,
--dpi 400) - Ensure the PDF has good scan quality
- The system auto-detects optimal DPI based on page size
Created by KSEGIT