Athena is a lightweight, full-stack AI assistant for querying academic PDFs using LLMs + Retrieval-Augmented Generation (RAG).
Optimized for local, cloud, and GPU-accelerated environments.
-
PDF Parsing & Retrieval
- Extracts text using PyMuPDF
- Chunks + embeds text using MiniLM / BGE
- Stores embeddings in FAISS for fast top-k retrieval
-
Modular RAG Pipeline
- Switchable models:
- Mistral-7B (4-bit) → LlamaCpp
- Phi-mini → Hugging Face
- Mistral FP16 → Hugging Face
- Controlled via environment variables
- Switchable models:
-
Flexible Deployment
- macOS (MPS acceleration)
- AWS EC2 (GPU/CPU)
- Azure App Services
- Dockerized portability
-
Performance Gains
- Response time reduced from 5–10 min ➝ ~35s using quantized LlamaCpp
-
User Experience Enhancements
- Chat memory across sessions
- Summarization toggle
- Chunk-source referencing
- Interactive HTML/CSS/JS frontend
flowchart TD
U[User] -->|Query| FE["Frontend (HTML/CSS/JS)"]
FE -->|REST API| BE["FastAPI Backend"]
BE -->|Extract Text| PDF["PyMuPDF Parser"]
PDF -->|Chunks| EMB["Embeddings (MiniLM/BGE)"]
EMB --> FAISS[(FAISS Index)]
BE -->|Retrieve & Rerank| FAISS
BE -->|Model Switch| LLM["LLMs (LlamaCpp / HuggingFace)"]
LLM --> BE
BE -->|Response with Sources| FE
subgraph Cloud
S3[(AWS S3 Storage)]
end
PDF --> S3
| Layer | Tools & Frameworks |
|---|---|
| Core AI | LlamaCpp, Hugging Face Transformers, FAISS |
| Text Processing | PyMuPDF, MiniLM, BGE |
| Backend | FastAPI |
| Frontend | HTML, CSS, JavaScript |
| Cloud | AWS S3, AWS EC2, Azure App Services |
| Deployment | Docker, macOS MPS |
git clone https://github.com/joze-Lee/Athena.git
cd athenapython3 -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # WindowsCopy code
pip install -r requirements.txtCreate a .env file:
MODEL_BACKEND=llamacpp # options: llamacpp | huggingface
EMBEDDING_MODEL=MiniLM # options: MiniLM | BGE
AWS_ACCESS_KEY=your-key
AWS_SECRET_KEY=your-secret
S3_BUCKET=athena-pdfsuvicorn app.main:app --reload
Open in browser → http://localhost:8000docker build -t athena .
docker run -p 8000:8000 athena
- Upload a PDF → Stored in S3 (or locally)
- Query → Athena retrieves relevant chunks via FAISS
- LLM → Generates answers with sources
- User → Can enable summarization, switch models, view references
| Model | Backend | Precision | Avg Response Time |
|---|---|---|---|
| Mistral-7B (4-bit) | LlamaCpp | High | ~35s |
| Mistral FP16 | Hugging Face | Higher | ~1.5–2 min |
| Phi-mini | Hugging Face | Moderate | ~50s |
LangChain orchestration
Multi-modal support (PDF + images)
Domain-specific embedding fine-tuning
WebSocket streaming responses
MIT License – see LICENSE for details.
Fork the repo
Create a feature branch
Submit a PR 🚀