Skip to content

πŸ§ πŸŽ™οΈ OCR Forge CLI β€” A smart command-line assistant that captures screen text, processes it with OCR, and responds with LLM-powered answers! βš‘πŸ”

Notifications You must be signed in to change notification settings

DewashishCodes/OCR-Forge-CLI

Repository files navigation

OCR Forge CLI Documentation πŸš€

Overview πŸ”

OCR Forge CLI is a powerful screen capture and text extraction tool that combines Optical Character Recognition (OCR) with Large Language Model (LLM) processing. Created by Dewashish Lambore, this Python-based utility allows users to capture screen regions, extract text using Tesseract OCR, and then enhance the results using Groq's LLM capabilities to provide cleaner, more usable text output.

Features ✨

  • Region-based Screen Capture πŸ“±: Select specific areas of your screen for targeted text extraction
  • Full Screen Capture πŸ–₯️: Capture and process the entire screen
  • LLM-Enhanced Text Cleaning 🧠: Automatically improve OCR results using Groq's LLM
  • Interactive Follow-up Questions πŸ’¬: Ask questions about the captured text
  • History Logging πŸ“š: Save and review past captures
  • Code Highlighting πŸ”†: Properly format and display code blocks in the output
  • Hotkey Integration ⌨️: Quick activation with keyboard shortcuts
  • Voice Talkback πŸ”Š: Text-to-speech playback of OCR output and follow-up responses.
  • Added CLI commandsπŸ–₯️ : Added helpful CLI commands which can be accesed via using argument --help

Requirements πŸ“‹

  • Python 3.6+
  • Tesseract OCR installed (D:\Tesseract\tesseract.exe by default path)
  • Groq API key
  • Required Python packages (see Installation)

Installation πŸ’»

  1. Clone or download the OCR Forge CLI repository
  2. Install required dependencies:
    pip install mss pillow pytesseract keyboard plyer requests groq rich
    
  3. Set up your Groq API key as an environment variable:
    export GROQ_API_KEY="your_groq_api_key_here"  # Linux/macOS
    set GROQ_API_KEY=your_groq_api_key_here       # Windows CMD
    $env:GROQ_API_KEY="your_groq_api_key_here"    # Windows PowerShell
    
  4. Ensure Tesseract OCR is installed. If needed, update the path in the script to match your installation:
    pytesseract.pytesseract.tesseract_cmd = r"path\to\tesseract.exe"

Usage πŸ› οΈ

Starting the Application

Run the script from your terminal or command prompt:

python ocrforge.py

Keyboard Shortcuts ⌨️

  • Ctrl + Print Screen πŸ“Έ: Activate the region selection tool for targeted screen capture
  • Esc ❌: Exit the application

Region Selection πŸ–±οΈ

When you press Ctrl + Print Screen:

  1. Your screen will dim and become semi-transparent
  2. Click and drag to select the region you want to capture
  3. Release the mouse button to confirm your selection
  4. Press Esc to cancel the selection

Follow-up Questions πŸ’¬

After text is captured and processed:

  1. The cleaned text will be displayed in your terminal
  2. You'll be prompted to ask follow-up questions about the captured text
  3. Type your question at the prompt
  4. Type exit to stop asking follow-up questions

Viewing History πŸ“œ

To view your capture history:

python ocrforge.py --history

This will display the last 5 captures with their timestamps and cleaned text.

CLI Commands πŸ–₯️

Added helpful commands to aid user experience. The command list can be accesed by puttinng in:

python ocrforge.py --help

Voice Talkback πŸ”Š

Togglable option to turn on voice talkback of OCR and follow up outputs To turn on:

python ocrforge.py --v
# --nv to turn off

How It Works βš™οΈ

  1. Screen Capture πŸ“Έ:

    • Either full-screen or region-based using the MSS library
    • Image saved temporarily as PNG
  2. Text Extraction πŸ”:

    • Tesseract OCR extracts raw text from the image
    • Raw text is sent to Groq's LLM for processing
  3. LLM Processing 🧠:

    • The Llama 3.3 70B Versatile model cleans and summarizes the text
    • Code blocks are automatically detected and highlighted
  4. User Interaction πŸ’¬:

    • Follow-up questions are processed by the LLM with the context of the captured text
    • Responses are streamed in real-time
  5. History Management πŸ“š:

    • Each capture is logged with timestamp, raw text, and cleaned text
    • Log is stored in JSON format in history_log.json

Configuration Options βš™οΈ

Change Tesseract Path

Update the following line to match your Tesseract installation:

pytesseract.pytesseract.tesseract_cmd = r"D:\Tesseract\tesseract.exe"

Change LLM Model

You can modify the Groq model used for processing by updating:

model="llama-3.3-70b-versatile"

Available options include:

  • llama-3.3-70b-versatile
  • llama3-8b-8192

Adjust the temperature, token count, and other parameters as needed.

Create your own history_log.json

To protect user privacy, no history_log.json file included in git repository, kindly create one before running the tool in the sam folder as the main.py py. I f you wish to store history somewhere else, specify it in config.py

Troubleshooting πŸ”§

No Text Extracted ❌

If OCR fails to extract text:

  • Try selecting a region with clearer text
  • Ensure the text is visible and not obscured
  • Check that Tesseract is properly installed and configured

Groq API Errors ⚠️

If you encounter API errors:

  • Verify your API key is correct and properly set
  • Check your internet connection
  • Ensure you have sufficient API credits

Region Selection Issues πŸ–±οΈ

If region selection doesn't work:

  • Try restarting the application
  • Ensure tkinter is properly installed
  • Try using full-screen capture instead

Files πŸ“

  • ocrforge.py: Main application script
  • favicon.ico: Icon file for notifications
  • history_log.json: Log of captured text

Contributing πŸ‘₯

Contributions are welcome! Please feel free to submit a Pull Requestor create an issue.

License πŸ“„

MIT License

Copyright (c) 2025 Dewashish Lambore

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Acknowledgments πŸ™

  • Created by Dewashish Lambore LinkedinGitHub
  • Uses Tesseract OCR for text extraction
  • Uses Groq's LLM API for text processing

About

πŸ§ πŸŽ™οΈ OCR Forge CLI β€” A smart command-line assistant that captures screen text, processes it with OCR, and responds with LLM-powered answers! βš‘πŸ”

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages