Skip to content

vikasmistry/logseq-formula-ocr-plugin

 
 

Repository files navigation

Logseq LaTeX Formula OCR Plugin

Convert LaTeX formula images from clipboard to LaTeX code in Logseq using various OCR providers like Hugging Face Transformers, Google Gemini, or a local Pix2Text server.

Features

  • Formula OCR: Convert images of LaTeX formulas into editable LaTeX code.
  • Table OCR: Convert images of tables into Markdown tables.
  • Multiple OCR Providers: Choose from several backends:
    • Google Gemini: High-quality formula and table recognition.
    • OpenAI Compatible: Connect to any OpenAI-compatible API (e.g., Local LLMs, Groq, OpenRouter).
    • Pix2Text (Local): A private, offline-first OCR server.
    • Hugging Face API: Cloud-based processing using the Nougat model.
    • Docker (Self-hosted): Run the Nougat OCR model in a local Docker container.

Commands

  • /display-formula-ocr: Insert LaTeX code on a new line
  • /inline-formula-ocr: Insert LaTeX code within a paragraph
  • /table-ocr: Insert a Markdown table from an image. Currently works best with the Gemini provider.

Notes:

  • The image in the clipboard must be a LaTex formula image
  • Initial use may be slow due to model loading
  • With the free Hugging Face plan you can make about 30k calls per month
  • The Google Gemini API has a free tier with usage limits. Check the official pricing page for details.

Installation Options

  1. Manual + Gemini (Recomended)

    • Requirements: Google Gemini API Key
    • Download the zip file from releases and unzip it.
    • Enable developer mode: Logseq > Settings > Advanced > Developer mode
    • Import Plugin: Logseq > Plugins > Load unpacked plugin and point to the unzipped folder.
    • Go to plugin settings, select "Gemini" as the OCR Provider.
    • Paste your Google Gemini API Key in the API Key setting field.
  2. Manual + OpenAI Compatible

    • Requirements: An OpenAI-compatible API (e.g., OpenAI, Groq, Local LLM)
    • Download the zip file from releases and unzip it.
    • Enable developer mode: Logseq > Settings > Advanced > Developer mode
    • Import Plugin: Logseq > Plugins > Load unpacked plugin and point to the unzipped folder.
    • Go to plugin settings, select "OpenAI Compatible" as the OCR Provider.
    • Enter your API Key in the API Key field.
    • Enter your API Endpoint in the API Endpoint field (e.g., https://api.openai.com/v1 or http://localhost:11434/v1).
    • (Optional) Set the Model Name (default: gpt-4o).
  3. Manual + Pix2Text (Offline)

    • Install Pix2Text Python package
    • Start the server, eg. p2t serve -l en -H 0.0.0.0 -p 8503
    • Download the zip file from releases and unzip it.
    • Enable developer mode: Logseq > Settings > Advanced > Developer mode
    • Import Plugin: Logseq > Plugins > Load unpacked plugin and point to the unzipped folder.
    • In the plugin settings, select "Local" as the OCR Provider and set the API Endpoint to the appropriate IP address and port (default is http://0.0.0.0:8503)
  4. Manual + Hugging Face

    • Requirements: Node.js, Yarn, Parcel, Hugging Face User Access Token
    • Clone repo: git clone https://github.com/olmobaldoni/logseq-formula-ocr-plugin.git
    • Install dependencies: cd logseq-formula-ocr-plugin && yarn && yarn build
    • Enable developer mode: Logseq > Settings > Advanced > Developer mode
    • Import Plugin: Logseq > Plugins > Load unpacked plugin and point to the cloned repo
  5. Marketplace + Hugging Face

  6. Marketplace + Docker

    • Requirements: Docker
    • Search for LaTeX Formula OCR in the Logseq marketplace and install directly
    • Pull image: docker pull olmobaldoni/nougat-ocr-api:latest
    • Run container: docker run -d -p 80:80 olmobaldoni/nougat-ocr-api:latest

Note: For more information on how to use the other local API visit: https://github.com/olmobaldoni/LaTex-Formula-OCR-API

Settings

Demo

  • Demo 1

Demo 1

  • Demo 2

Demo 2

Known Issues

Hugging Face API may truncate responses (see Issuee #2 and Issue #487)

Note: Docker or Local(Pix2Text) method recommended for full functionality

Credits

This plugin is based on nougat-latex-base, a fine-tuning of facebook/nougat-base with im2latex-100k, and made by NormXU.

Pix2Text: Used for the local OCR server.

Google Gemini: Used as one of the OCR providers.

In addition, this plugin was also inspired by xxchan and its plugin logseq-ocr

License

MIT

About

This Logseq plugin is designed to transform LaTex formula images from the clipboard into LaTex code using Transformers or API

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 96.5%
  • HTML 3.5%