Agentic RAG - Multi-Modal Reasoning Engine

Perceive • Retrieve • Re-Rank • Reason • Audit

🎯 Overview

Agentic RAG is a production-grade, multi-modal reasoning engine designed to run specifically on constrained consumer hardware (RTX 3050 / 6GB VRAM). Unlike traditional "Silent Failure" RAG systems, it uses a self-corrective StateGraph architecture to achieve 100% recall on technical domain data.

In V2, the system transcends text, integrating CLIP-based visual perception to ingest, retrieve, and reason over diagrams and images within technical documentation.

🏗️ Architecture: The 4 Pillars of Leverage

1. Multi-Modal Perception (V2)

The system treats visual assets as first-class citizens:

Extraction: Uses PyMuPDF to extract images from PDFs and anchors them to surrounding textual context.
Embeddings: Employs CLIP (ViT-B-32) for a joint text-image semantic space.
Vision-Aware Agent: Uses a Vision-Language Model (Gemini Flash) via an examine_image tool for nuanced analysis of retrieved diagrams.

2. Hybrid Hyper-Retrieval

We solve the "Semantic Smear" problem of dense vector search:

Recall: Merges BM25 (Keyword) and ChromaDB (Vector) in a hybrid pipeline.
Re-Ranking: Employs a local TinyBERT Cross-Encoder (ms-marco-MiniLM-L-6-v2) on GPU to surgically identify the most relevant context.

3. Omni-Corpus Management

Solves "Context Fragmentation" where the model gets small bits but lacks the full technical narrative:

Parent-Child Indexing: Children (~400 chars) are used for high-precision search; upon a hit, the full Parent context (~2000 chars) is retrieved from a JSON store.

4. Graph-Based Self-Correction

Built on LangGraph, the agent iterates until it finds the "Gold" answer:

Query Rewriter: Transforms vague user input into precise technical queries.
ReAct Loop: Explicit Reasoning-Action-Observation loop that manages tool failure and hallucination.

🚀 Technical Specifications

Feature	v1 (Baseline)	v2 (Multi-Modal)
Main LLM	Phi-3-mini-4k-instruct	Phi-3-mini-4k-instruct
Fallback LLM	Gemini 1.5 Pro (REST)	Gemini 2.0 Flash (REST)
Embedding Model	`all-MiniLM-L6-v2`	CLIP (ViT-B-32)
Embedding Method	Text-Only Semantic	Joint Text-Image Semantic
Vector Space	ChromaDB (Local)	ChromaDB (Local + Metadata)
Context Window	4,096 Tokens	128K - 1M (Cloud Fallback)
Tokens / Sec	~15 TPS	~15 TPS (Local)
Re-Ranker	None	TinyBERT (Cross-Encoder)
VRAM Usage	~4.2 GB	~5.5 GB (on RTX 3050)

📊 Performance Benchmarks

Project Orion (Technical Dataset)

Validation performed on 50+ fictional technical documentation pairs using the Gold-Standard dataset generator.

Metric	local Phi-3 (Zero-Shot)	Agentic RAG (V2)
Accuracy (Text)	0% (Hallucination)	98.2%
Accuracy (Vision)	N/A	62.5%
Recall	12%	100%
Mean Latency	~2.1s	~5.8s

📂 Project Structure

agentic-rag/
├── src/agentic_rag/
│   ├── agent.py        # LangGraph StateGraph & ReAct Loop
│   ├── ingestor.py     # Parent-Child Ingestion & Image Extraction
│   ├── retriever.py    # Hybrid (BM25 + Chroma) + Cross-Encoder
│   ├── embedding.py    # CLIP Multi-modal Embedding Logic
│   ├── llm.py          # Local Llama-cpp + Gemini REST Fallback
│   └── tools.py        # Discovery & Vision Analaysis Tools
├── scripts/
│   ├── demo_v2.py      # End-to-end Multi-modal Demo
│   ├── evaluate_v2.py  # Text-based RAG Benchmarks
│   └── evaluate_vision.py # Vision-based RAG Benchmarks
└── data/               # Vector Store & Vision Cache

🛠️ Usage

⚙️ Prerequisites

Python 3.10+
NVIDIA GPU (6GB+ VRAM recommended for local Re-ranking)
Gemini API Key (for vision reasoning/fallback)

1. Ingest Documents

python scripts/test_v2_ingestion.py

2. Run Interactive Agent

python scripts/demo_v2.py

3. Run Benchmarks

python scripts/evaluate_vision.py

Built with ❤️ by STiFLeR7

Enterprise Intelligence without Cloud Reliance.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
data		data
doc		doc
docs		docs
scripts		scripts
src/agentic_rag		src/agentic_rag
tests		tests
.gitignore		.gitignore
README.md		README.md
agent_memory.json		agent_memory.json
badge_test.md		badge_test.md
models.json		models.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic RAG - Multi-Modal Reasoning Engine

🎯 Overview

🏗️ Architecture: The 4 Pillars of Leverage

1. Multi-Modal Perception (V2)

2. Hybrid Hyper-Retrieval

3. Omni-Corpus Management

4. Graph-Based Self-Correction

🚀 Technical Specifications

📊 Performance Benchmarks

Project Orion (Technical Dataset)

📂 Project Structure

🛠️ Usage

⚙️ Prerequisites

1. Ingest Documents

2. Run Interactive Agent

3. Run Benchmarks

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

STiFLeR7/agentic-rag

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG - Multi-Modal Reasoning Engine

🎯 Overview

🏗️ Architecture: The 4 Pillars of Leverage

1. Multi-Modal Perception (V2)

2. Hybrid Hyper-Retrieval

3. Omni-Corpus Management

4. Graph-Based Self-Correction

🚀 Technical Specifications

📊 Performance Benchmarks

Project Orion (Technical Dataset)

📂 Project Structure

🛠️ Usage

⚙️ Prerequisites

1. Ingest Documents

2. Run Interactive Agent

3. Run Benchmarks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages