🧠 Scalable RAG Engine

A production-grade Retrieval-Augmented Generation (RAG) pipeline designed for scalability and modularity.
The system supports parallel ingestion, distributed document processing, and query-time retrieval with context-aware LLM generation, using Supabase, Redis, and Qdrant Cloud.

🏗️ Architecture Overview

The system follows a distributed worker-based architecture:

FastAPI Server — handles user requests for ingestion and querying.
Redis Queue — decouples ingestion requests from heavy processing.
Ingestion Workers — fetch content, chunk, embed, and store it in Qdrant.
Supabase (PostgreSQL) — manages document metadata and ingestion job tracking.
Qdrant Cloud — vector store for similarity search.
Google Gemini API — provides LLM responses using retrieved context.

🧩 Final Architecture Diagram

🔁 Sequence Diagram (Pipeline Flow)

⚙️ Technology Stack & Justifications

Component	Technology	Reasoning
Backend API	FastAPI	Lightweight, async, and production-friendly.
Task Queue	Redis	Simple, fast message broker for decoupled ingestion.
Database	Supabase (PostgreSQL)	Manages metadata and ingestion states; integrates easily.
Vector Store	Qdrant Cloud	Cloud-hosted, scalable, and efficient for similarity search.
Embeddings	`all-MiniLM-L6-v2` (384-dim)	Compact yet high-quality embeddings for semantic retrieval.
LLM	Google Gemini API	High-quality contextual reasoning and summarization.

🗃️ Database Schema

Documents (Supabase)

Column	Type	Description
`doc_id`	`uuid`	Primary key
`url`	`text`	Source document URL
`source`	`text`	Origin identifier (e.g., "website", "pdf")
`status`	`text`	Processing status
`created_at`	`timestamp`	Record creation time

Vector Store (Qdrant)

Field	Type	Description
`id`	`UUID`	Unique chunk ID
`vector`	`float[384]`	Embedding vector
`text_snippet`	`TEXT`	Extracted document chunk
`url`	`TEXT`	Source document URL

🧩 API Documentation

1️⃣ Ingest URL

Endpoint:
POST /ingest-url

Body:

{
  "url": "https://python.langchain.com/docs/get_started/introduction/",
  "source": "LangChain Docs"
}

Response:

{
  "job_id": "c89a7a7e-9129-4a3a-a2f9-b5af0b25a643",
  "status": "queued"
}

Curl Example:

curl -X POST http://localhost:8000/ingest-url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://python.langchain.com/docs/get_started/introduction/", "source": "LangChain Docs"}'

2️⃣ Query Endpoint

Endpoint:
POST /query

Body:

{
  "query": "What is LangChain used for?"
}

Response:

{
  "response": "LangChain is used to build applications powered by large language models.",
  "sources": ["https://python.langchain.com/docs/get_started/introduction/"]
}

Curl Example:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is LangChain used for?"}'

⚙️ Setup Instructions

1️⃣ Clone the Repository

git clone https://github.com/<your-username>/rag-engine.git
cd rag-engine

2️⃣ Create Virtual Environment

python -m venv venv
source venv/bin/activate   # or venv\Scripts\activate on Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Environment Variables

Create a .env file based on .env.example:

SUPABASE_URL=<your_supabase_url>
SUPABASE_KEY=<your_supabase_key>
QDRANT_URL=<your_qdrant_cloud_url>
QDRANT_API_KEY=<your_qdrant_api_key>
REDIS_URL=redis://localhost:6379
GOOGLE_API_KEY=<your_gemini_api_key>

5️⃣ Run API Server

uvicorn app.main:app --reload

6️⃣ Run Ingestion Worker

python -m app.workers.ingestion_worker

(Optional) Run multiple workers to test scalability.

🧪 Stress Testing

To simulate concurrent ingestion:

python scripts/stress_test_ingest.py

This will trigger 10 simultaneous ingestion jobs against /ingest-url.

📈 Scalability Highlights

Redis ensures distributed, idempotent job processing.
Workers can scale horizontally (multiple processes or containers).
Qdrant Cloud handles concurrent read/write loads.
FastAPI is fully async, capable of handling concurrent queries.
Supabase maintains ingestion/job consistency.

▶️ Demo

See the pipline in action!

https://drive.google.com/file/d/1ydQwsPfh6i1cXSDJ_OeS684h2bDJd8v3/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
app		app
images		images
test		test
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Scalable RAG Engine

🏗️ Architecture Overview

🧩 Final Architecture Diagram

🔁 Sequence Diagram (Pipeline Flow)

⚙️ Technology Stack & Justifications

🗃️ Database Schema

Documents (Supabase)

Vector Store (Qdrant)

🧩 API Documentation

1️⃣ Ingest URL

2️⃣ Query Endpoint

⚙️ Setup Instructions

1️⃣ Clone the Repository

2️⃣ Create Virtual Environment

3️⃣ Install Dependencies

4️⃣ Environment Variables

5️⃣ Run API Server

6️⃣ Run Ingestion Worker

🧪 Stress Testing

📈 Scalability Highlights

▶️ Demo

About

Uh oh!

Releases

Packages

Languages

License

AviralJ58/rag-engine

Folders and files

Latest commit

History

Repository files navigation

🧠 Scalable RAG Engine

🏗️ Architecture Overview

🧩 Final Architecture Diagram

🔁 Sequence Diagram (Pipeline Flow)

⚙️ Technology Stack & Justifications

🗃️ Database Schema

Documents (Supabase)

Vector Store (Qdrant)

🧩 API Documentation

1️⃣ Ingest URL

2️⃣ Query Endpoint

⚙️ Setup Instructions

1️⃣ Clone the Repository

2️⃣ Create Virtual Environment

3️⃣ Install Dependencies

4️⃣ Environment Variables

5️⃣ Run API Server

6️⃣ Run Ingestion Worker

🧪 Stress Testing

📈 Scalability Highlights

▶️ Demo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages