OAR (ONNXRuntime And Rust) OCR

A comprehensive OCR and document understanding library built in Rust with ONNX Runtime.

Quick Start

Installation

cargo add oar-ocr

With GPU support:

cargo add oar-ocr --features cuda

Basic Usage

use oar_ocr::prelude::*;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize the OCR pipeline
    let ocr = OAROCRBuilder::new(
        "pp-ocrv5_mobile_det.onnx",
        "pp-ocrv5_mobile_rec.onnx",
        "ppocrv5_dict.txt",
    )
    .build()?;

    // Load an image
    let image = load_image(Path::new("document.jpg"))?;
    
    // Run prediction
    let results = ocr.predict(vec![image])?;

    // Process results
    for text_region in &results[0].text_regions {
        if let Some((text, confidence)) = text_region.text_with_confidence() {
            println!("Text: {} ({:.2})", text, confidence);
        }
    }

    Ok(())
}

Document Structure Analysis

use oar_ocr::prelude::*;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize structure analysis pipeline
    let structure = OARStructureBuilder::new("pp-doclayout_plus-l.onnx")
        .with_table_classification("pp-lcnet_x1_0_table_cls.onnx")
        .with_table_structure_recognition("slanet_plus.onnx", "wireless")
        .table_structure_dict_path("table_structure_dict_ch.txt")
        .with_ocr(
            "pp-ocrv5_mobile_det.onnx", 
            "pp-ocrv5_mobile_rec.onnx", 
            "ppocrv5_dict.txt"
        )
        .build()?;
        
    // Analyze document
    let result = structure.predict("document.jpg")?;
    
    // Output Markdown
    println!("{}", result.to_markdown());
    
    Ok(())
}

Vision-Language Models (VLM)

For advanced document understanding using Vision-Language Models (like PaddleOCR-VL and UniRec), check out the oar-ocr-vl crate.

Documentation

Usage Guide - Detailed API usage, builder patterns, GPU configuration
Pre-trained Models - Model download links and recommended configurations

Examples

The examples/ directory contains complete examples for various tasks:

# General OCR
cargo run --example ocr -- --help

# Document Structure Analysis
cargo run --example structure -- --help

# Layout Detection
cargo run --example layout_detection -- --help

# Table Structure Recognition
cargo run --example table_structure_recognition -- --help

Acknowledgments

This project builds upon the excellent work of several open-source projects:

ort: Rust bindings for ONNX Runtime by pykeio. This crate provides the Rust interface to ONNX Runtime that powers the efficient inference engine in this OCR library.
PaddleOCR: Baidu's awesome multilingual OCR toolkits based on PaddlePaddle. This project utilizes PaddleOCR's pre-trained models, which provide excellent accuracy and performance for text detection and recognition across multiple languages.
OpenOCR: An open-source toolkit for general OCR research and applications by the FVL Laboratory at Fudan University. We use the UniRec model for unified text, formula, and table recognition.
Candle: A minimalist ML framework for Rust by Hugging Face. We use Candle to implement Vision-Language model inference.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
docs		docs
examples		examples
oar-ocr-core		oar-ocr-core
oar-ocr-derive		oar-ocr-derive
oar-ocr-vl		oar-ocr-vl
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OAR (ONNXRuntime And Rust) OCR

Quick Start

Installation

Basic Usage

Document Structure Analysis

Vision-Language Models (VLM)

Documentation

Examples

Acknowledgments

About

Uh oh!

Releases 11

Uh oh!

Contributors 7

Languages

License

GreatV/oar-ocr

Folders and files

Latest commit

History

Repository files navigation

OAR (ONNXRuntime And Rust) OCR

Quick Start

Installation

Basic Usage

Document Structure Analysis

Vision-Language Models (VLM)

Documentation

Examples

Acknowledgments

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 11

Uh oh!

Contributors 7

Languages