📚 RAG Document Q&A System

An intelligent document Q&A system powered by RAG (Retrieval-Augmented Generation)

Documentation • Report Bug • Request Feature

📸 Demo

🎯 What is This?

Ever wished you could chat with your PDFs? This is a production-ready RAG (Retrieval-Augmented Generation) system that lets you:

📄 Upload any PDF or text document
💬 Ask questions in natural language
🎯 Get accurate answers grounded in your documents
📚 Source Citations - know exactly where each answer comes from
🔍 Filter by file - search specific documents by name
💾 Persistent storage - your documents and chat history survive restarts

Perfect for: Research papers, legal documents, manuals, reports, study materials, or any text-heavy content you need to understand quickly.

✨ Key Features

🧠 Intelligent Q&A

Ask questions in natural language
Context-aware responses from Llama 3.3 70B via Groq
Automatic source citation with chunk references

🎯 Smart Document Filtering

Ask "What is in report.pdf?" to search only that file
Or search across all documents simultaneously
Automatic filename detection in queries

💡 Developer-Friendly

100% Free - Groq free tier (30 requests/min)
No GPU needed - Embeddings run on CPU
Fast setup - 5 minutes from clone to running
Clean architecture - FastAPI backend + Streamlit frontend

📊 Production Features

Persistent vector storage (FAISS)
Chat history with timestamps
Document management (upload/delete/list)
Error handling and rate limiting
Health checks and monitoring endpoints

🏗️ System Architecture

graph TB
    subgraph "Frontend - Streamlit"
        A[User Interface]
        B[File Uploader]
        C[Chat Interface]
    end
    
    subgraph "Backend - FastAPI"
        D[API Endpoints]
        E[Document Manager]
        F[RAG Engine]
    end
    
    subgraph "Processing Pipeline"
        G[PDF/TXT Parser]
        H[Text Splitter<br/>1500 chars]
        I[Local Embeddings<br/>HuggingFace]
        J[FAISS Vector Store]
    end
    
    subgraph "LLM Layer"
        K[Query Embeddings]
        L[Similarity Search<br/>Top 9 chunks]
        M[Groq Cloud API<br/>Llama 3.3 70B]
    end
    
    A --> B
    A --> C
    B --> D
    C --> D
    D --> E
    D --> F
    E --> G
    G --> H
    H --> I
    I --> J
    F --> K
    K --> L
    L --> J
    L --> M
    M --> C
    
    style M fill:#ff9800
    style J fill:#4caf50
    style I fill:#2196f3

🔄 How It Works

Document Upload → PDF/TXT parsed → Split into 1500-char chunks
Embedding → Each chunk embedded using HuggingFace (local, free)
Storage → Embeddings stored in FAISS vector database
Query → User question embedded → Find top 9 similar chunks
Generation → Groq (Llama 3.3 70B) generates answer from chunks
Response → Answer + source citations returned to user

🛠️ Tech Stack

Component	Technology	Why?
Backend	FastAPI	Fast, modern, async Python framework
Frontend	Streamlit	Rapid prototyping, beautiful UI out-of-the-box
LLM	Groq (Llama 3.3 70B)	Fastest inference, free tier, excellent quality
Embeddings	HuggingFace MiniLM	Local, free, no API costs
Vector DB	FAISS	Fast similarity search, persistent storage
Orchestration	LangChain	RAG pipeline management
Document Parsing	PyPDF	Reliable PDF text extraction

📋 Prerequisites

Before you begin, ensure you have:

✅ Python 3.11+ installed
✅ Groq API Key (free) - Get it here
✅ ~2GB disk space for dependencies
✅ Basic terminal/command line knowledge

🚀 Quick Start

1️⃣ Clone the Repository

git clone https://github.com/swati048/rag-document-qa.git
cd rag-document-qa

2️⃣ Get Your Free Groq API Key

Visit console.groq.com
Sign up (free) with Google/GitHub
Navigate to "API Keys" → "Create API Key"
Copy your key (starts with gsk_...)

3️⃣ Set Up Environment

# Create virtual environment
python -m venv venv

# Activate it
# On Mac/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

4️⃣ Configure API Key

Create a .env file in the project root:

# .env
GROQ_API_KEY=gsk_your_actual_api_key_here

⚠️ Important: Never commit this file! It's already in .gitignore.

5️⃣ Run the Application

Terminal 1 - Backend:

cd backend
python main.py

Backend runs at http://localhost:8000

Terminal 2 - Frontend:

cd frontend
streamlit run app.py

Frontend opens at http://localhost:8501

6️⃣ Start Chatting!

Upload a PDF or TXT file in the sidebar
Wait for indexing to complete (~10-30 seconds)
Ask questions in the chat interface
Enjoy AI-powered answers with source citations! 🎉

📚 Usage Examples

Basic Questions

❓ "What is the main topic of this document?"
❓ "Summarize the key findings"
❓ "What are the conclusions?"
❓ "Who are the authors mentioned?"

File-Specific Queries

❓ "What is in research_paper.pdf?"
❓ "Summarize report.txt"
❓ "What does contract.pdf say about payment terms?"

Advanced Queries

❓ "Compare the methodologies in section 2 and section 4"
❓ "What recommendations are mentioned in the conclusion?"
❓ "List all the statistics about climate change"

🎨 Project Structure

rag-document-qa/
├── backend/
│   ├── __init__.py              # Empty init file
│   ├── config.py                # Configuration & API keys
│   ├── document_manager.py      # Upload, delete, list docs
│   ├── rag_engine.py           # RAG logic with Groq & FAISS
│   └── main.py                 # FastAPI app & endpoints
│
├── frontend/
│   └── app.py                  # Streamlit UI
│
├── data/                       # Created automatically
│   ├── uploads/                # Uploaded documents
│   ├── vectorstore/            # FAISS index
│   └── chat_history.json       # Persisted conversations
│
├── .env                        # API keys (YOU CREATE THIS)
├── .gitignore                  # Git ignore rules
├── requirements.txt            # Python dependencies
└── README.md                   # This file

⚙️ Configuration

Edit backend/config.py to customize:

# LLM Model Selection
GROQ_MODEL = "llama-3.3-70b-versatile"  # Best quality (default)
# GROQ_MODEL = "llama-3.1-8b-instant"   # Faster responses

# Document Chunking
CHUNK_SIZE = 1500              # Characters per chunk
CHUNK_OVERLAP = 300            # Overlap between chunks

# Retrieval Settings
TOP_K_RESULTS = 9              # Number of chunks to retrieve

# Embeddings (local - no cost)
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

💰 Cost & Limits

Groq Free Tier

✅ 30 requests per minute
✅ 14,400 tokens per minute
✅ No credit card required
✅ Access to Llama 3.3 70B

Models Available:

llama-3.1-70b-versatile ⭐ (default - best quality)
llama-3.1-8b-instant (faster)
mixtral-8x7b-32768 (good balance)

Cost Breakdown

Operation	API Calls	Cost
Upload document	0	FREE (local embeddings)
Each question	1	FREE (within limits)
Daily usage	~100-200	100% FREE ✅

Typical Usage: 100-200 questions per day = completely free!

🚢 Deployment

Recommended Stack

Frontend: Streamlit Community Cloud (Free)

✅ Free hosting for Streamlit apps
✅ Auto-deploys from GitHub
✅ Built-in secrets management

Backend: Render (Free tier available)

✅ Persistent storage for vector DB
✅ Environment variables
✅ Auto-scaling
⚠️ Free tier spins down after inactivity (cold start ~30s)

Alternative: Railway for unified deployment

Deployment Steps

1. Deploy Backend to Render

Push code to GitHub
Create new Web Service on Render
Connect repository
Configure:
- Build Command: pip install -r backend/requirements.txt
- Start Command: cd backend && python main.py
- Environment Variables:
  - GROQ_API_KEY = your_api_key
Note the deployed URL (e.g., https://your-app.onrender.com)

2. Deploy Frontend to Streamlit Cloud

Update frontend/app.py:

API_URL = "https://your-app.onrender.com"  # Your Render URL

Push changes to GitHub
Go to share.streamlit.io
Click "New app" → Connect repository
Set Main file path: frontend/app.py
Deploy!

3. Connect Services

Frontend will call backend API automatically
First query may be slow (Render cold start)

🧪 Testing the API

Health Check

curl http://localhost:8000/health

Upload Document

curl -X POST http://localhost:8000/upload \
  -F "file=@document.pdf"

Ask Question

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is this document about?"}'

🐛 Troubleshooting

"GROQ_API_KEY not found"

# Check .env file exists
ls -la .env

# Verify it contains:
GROQ_API_KEY=gsk_...

"Backend not reachable"

# Test backend
curl http://localhost:8000/health

# Should return JSON

"Rate limit exceeded"

Wait 60 seconds
You've hit 30 requests/minute limit
Consider switching to llama-3.1-8b-instant in config

Slow first query

First query loads embedding model (~5-10s)
Subsequent queries are faster (1-3s)
This is normal!

Documents not indexing

Check file size (PDFs >50MB may timeout)
Check backend logs for errors
Ensure GROQ_API_KEY is valid

🔒 Security Best Practices

✅ Never commit .env - Already in .gitignore
✅ Rotate API keys if exposed
✅ Use environment variables in production
✅ Enable HTTPS for production deployments
✅ Add authentication if handling sensitive documents

🚀 Future Enhancements

Want to contribute? Open an issue or PR!

📊 Performance Benchmarks

Metric	Value
First query (cold start)	5-10s
Subsequent queries	1-3s
Document upload (10 pages)	15-30s
Embedding speed	~1000 chars/sec
Vector search	<100ms
Max file size tested	100MB PDF

🤝 Contributing

This is a portfolio project, but suggestions are welcome!

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see LICENSE file for details.

🙏 Acknowledgments

LangChain - RAG orchestration framework
Groq - Lightning-fast LLM inference
FAISS - Efficient vector search
Streamlit - Beautiful UI framework
HuggingFace - Free local embeddings
FastAPI - Modern Python web framework

👤 Author

Swati Thakur

⭐ Star this repository if you found it helpful! ⭐

Made with ❤️ and ☕

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
data		data
frontend		frontend
screenshots		screenshots
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
render.yaml		render.yaml
requirements.txt		requirements.txt

License

swati048/rag-document-qa

Folders and files

Latest commit

History

Repository files navigation