An intelligent document Q&A system powered by RAG (Retrieval-Augmented Generation)
Ever wished you could chat with your PDFs? This is a production-ready RAG (Retrieval-Augmented Generation) system that lets you:
- 📄 Upload any PDF or text document
- 💬 Ask questions in natural language
- 🎯 Get accurate answers grounded in your documents
- 📚 Source Citations - know exactly where each answer comes from
- 🔍 Filter by file - search specific documents by name
- 💾 Persistent storage - your documents and chat history survive restarts
Perfect for: Research papers, legal documents, manuals, reports, study materials, or any text-heavy content you need to understand quickly.
- Ask questions in natural language
- Context-aware responses from Llama 3.3 70B via Groq
- Automatic source citation with chunk references
- Ask "What is in report.pdf?" to search only that file
- Or search across all documents simultaneously
- Automatic filename detection in queries
- 100% Free - Groq free tier (30 requests/min)
- No GPU needed - Embeddings run on CPU
- Fast setup - 5 minutes from clone to running
- Clean architecture - FastAPI backend + Streamlit frontend
- Persistent vector storage (FAISS)
- Chat history with timestamps
- Document management (upload/delete/list)
- Error handling and rate limiting
- Health checks and monitoring endpoints
graph TB
subgraph "Frontend - Streamlit"
A[User Interface]
B[File Uploader]
C[Chat Interface]
end
subgraph "Backend - FastAPI"
D[API Endpoints]
E[Document Manager]
F[RAG Engine]
end
subgraph "Processing Pipeline"
G[PDF/TXT Parser]
H[Text Splitter<br/>1500 chars]
I[Local Embeddings<br/>HuggingFace]
J[FAISS Vector Store]
end
subgraph "LLM Layer"
K[Query Embeddings]
L[Similarity Search<br/>Top 9 chunks]
M[Groq Cloud API<br/>Llama 3.3 70B]
end
A --> B
A --> C
B --> D
C --> D
D --> E
D --> F
E --> G
G --> H
H --> I
I --> J
F --> K
K --> L
L --> J
L --> M
M --> C
style M fill:#ff9800
style J fill:#4caf50
style I fill:#2196f3
- Document Upload → PDF/TXT parsed → Split into 1500-char chunks
- Embedding → Each chunk embedded using HuggingFace (local, free)
- Storage → Embeddings stored in FAISS vector database
- Query → User question embedded → Find top 9 similar chunks
- Generation → Groq (Llama 3.3 70B) generates answer from chunks
- Response → Answer + source citations returned to user
| Component | Technology | Why? |
|---|---|---|
| Backend | FastAPI | Fast, modern, async Python framework |
| Frontend | Streamlit | Rapid prototyping, beautiful UI out-of-the-box |
| LLM | Groq (Llama 3.3 70B) | Fastest inference, free tier, excellent quality |
| Embeddings | HuggingFace MiniLM | Local, free, no API costs |
| Vector DB | FAISS | Fast similarity search, persistent storage |
| Orchestration | LangChain | RAG pipeline management |
| Document Parsing | PyPDF | Reliable PDF text extraction |
Before you begin, ensure you have:
- ✅ Python 3.11+ installed
- ✅ Groq API Key (free) - Get it here
- ✅ ~2GB disk space for dependencies
- ✅ Basic terminal/command line knowledge
git clone https://github.com/swati048/rag-document-qa.git
cd rag-document-qa- Visit console.groq.com
- Sign up (free) with Google/GitHub
- Navigate to "API Keys" → "Create API Key"
- Copy your key (starts with
gsk_...)
# Create virtual environment
python -m venv venv
# Activate it
# On Mac/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root:
# .env
GROQ_API_KEY=gsk_your_actual_api_key_here.gitignore.
Terminal 1 - Backend:
cd backend
python main.pyBackend runs at http://localhost:8000
Terminal 2 - Frontend:
cd frontend
streamlit run app.pyFrontend opens at http://localhost:8501
- Upload a PDF or TXT file in the sidebar
- Wait for indexing to complete (~10-30 seconds)
- Ask questions in the chat interface
- Enjoy AI-powered answers with source citations! 🎉
❓ "What is the main topic of this document?"
❓ "Summarize the key findings"
❓ "What are the conclusions?"
❓ "Who are the authors mentioned?"
❓ "What is in research_paper.pdf?"
❓ "Summarize report.txt"
❓ "What does contract.pdf say about payment terms?"
❓ "Compare the methodologies in section 2 and section 4"
❓ "What recommendations are mentioned in the conclusion?"
❓ "List all the statistics about climate change"
rag-document-qa/
├── backend/
│ ├── __init__.py # Empty init file
│ ├── config.py # Configuration & API keys
│ ├── document_manager.py # Upload, delete, list docs
│ ├── rag_engine.py # RAG logic with Groq & FAISS
│ └── main.py # FastAPI app & endpoints
│
├── frontend/
│ └── app.py # Streamlit UI
│
├── data/ # Created automatically
│ ├── uploads/ # Uploaded documents
│ ├── vectorstore/ # FAISS index
│ └── chat_history.json # Persisted conversations
│
├── .env # API keys (YOU CREATE THIS)
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
└── README.md # This file
Edit backend/config.py to customize:
# LLM Model Selection
GROQ_MODEL = "llama-3.3-70b-versatile" # Best quality (default)
# GROQ_MODEL = "llama-3.1-8b-instant" # Faster responses
# Document Chunking
CHUNK_SIZE = 1500 # Characters per chunk
CHUNK_OVERLAP = 300 # Overlap between chunks
# Retrieval Settings
TOP_K_RESULTS = 9 # Number of chunks to retrieve
# Embeddings (local - no cost)
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"- ✅ 30 requests per minute
- ✅ 14,400 tokens per minute
- ✅ No credit card required
- ✅ Access to Llama 3.3 70B
Models Available:
llama-3.1-70b-versatile⭐ (default - best quality)llama-3.1-8b-instant(faster)mixtral-8x7b-32768(good balance)
| Operation | API Calls | Cost |
|---|---|---|
| Upload document | 0 | FREE (local embeddings) |
| Each question | 1 | FREE (within limits) |
| Daily usage | ~100-200 | 100% FREE ✅ |
Typical Usage: 100-200 questions per day = completely free!
Frontend: Streamlit Community Cloud (Free)
- ✅ Free hosting for Streamlit apps
- ✅ Auto-deploys from GitHub
- ✅ Built-in secrets management
Backend: Render (Free tier available)
- ✅ Persistent storage for vector DB
- ✅ Environment variables
- ✅ Auto-scaling
⚠️ Free tier spins down after inactivity (cold start ~30s)
Alternative: Railway for unified deployment
- Push code to GitHub
- Create new Web Service on Render
- Connect repository
- Configure:
- Build Command:
pip install -r backend/requirements.txt - Start Command:
cd backend && python main.py - Environment Variables:
GROQ_API_KEY= your_api_key
- Build Command:
- Note the deployed URL (e.g.,
https://your-app.onrender.com)
- Update
frontend/app.py:API_URL = "https://your-app.onrender.com" # Your Render URL
- Push changes to GitHub
- Go to share.streamlit.io
- Click "New app" → Connect repository
- Set Main file path:
frontend/app.py - Deploy!
- Frontend will call backend API automatically
- First query may be slow (Render cold start)
curl http://localhost:8000/healthcurl -X POST http://localhost:8000/upload \
-F "file=@document.pdf"curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "What is this document about?"}'# Check .env file exists
ls -la .env
# Verify it contains:
GROQ_API_KEY=gsk_...# Test backend
curl http://localhost:8000/health
# Should return JSON- Wait 60 seconds
- You've hit 30 requests/minute limit
- Consider switching to
llama-3.1-8b-instantin config
- First query loads embedding model (~5-10s)
- Subsequent queries are faster (1-3s)
- This is normal!
- Check file size (PDFs >50MB may timeout)
- Check backend logs for errors
- Ensure GROQ_API_KEY is valid
- ✅ Never commit
.env- Already in.gitignore - ✅ Rotate API keys if exposed
- ✅ Use environment variables in production
- ✅ Enable HTTPS for production deployments
- ✅ Add authentication if handling sensitive documents
- Support DOCX, XLSX, CSV files
- Multi-language support
- User authentication
- Document comparison mode
- Export chat history
- OCR for scanned PDFs
- Advanced filters (date, author, tags)
- Batch processing
- Vector store backup/restore
Want to contribute? Open an issue or PR!
| Metric | Value |
|---|---|
| First query (cold start) | 5-10s |
| Subsequent queries | 1-3s |
| Document upload (10 pages) | 15-30s |
| Embedding speed | ~1000 chars/sec |
| Vector search | <100ms |
| Max file size tested | 100MB PDF |
This is a portfolio project, but suggestions are welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see LICENSE file for details.
- LangChain - RAG orchestration framework
- Groq - Lightning-fast LLM inference
- FAISS - Efficient vector search
- Streamlit - Beautiful UI framework
- HuggingFace - Free local embeddings
- FastAPI - Modern Python web framework
Swati Thakur
- GitHub: @swati048
- LinkedIn: Swati Thakur
- Email: thakurswati048@gmail.com
⭐ Star this repository if you found it helpful! ⭐
Made with ❤️ and ☕
