Skip to content

Athena is a lightweight, fast, and private question-answering system powered by a quantized LLM (Mistral-7B-Instruct-v0.2.Q4_K_M) running locally with full GPU (MPS) acceleration on macOS. Built with LangChain, LlamaCpp, and FAISS, Athena enables efficient retrieval-augmented generation (RAG) for context-aware Q&A from your own data — NO API CALLS

Notifications You must be signed in to change notification settings

joze-Lee/Athena

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡ Athena – Local/Cloud AI Research Assistant

Python FastAPI Docker MIT License

Athena is a lightweight, full-stack AI assistant for querying academic PDFs using LLMs + Retrieval-Augmented Generation (RAG).
Optimized for local, cloud, and GPU-accelerated environments.


✨ Features

  • PDF Parsing & Retrieval

    • Extracts text using PyMuPDF
    • Chunks + embeds text using MiniLM / BGE
    • Stores embeddings in FAISS for fast top-k retrieval
  • Modular RAG Pipeline

    • Switchable models:
      • Mistral-7B (4-bit) → LlamaCpp
      • Phi-mini → Hugging Face
      • Mistral FP16 → Hugging Face
    • Controlled via environment variables
  • Flexible Deployment

    • macOS (MPS acceleration)
    • AWS EC2 (GPU/CPU)
    • Azure App Services
    • Dockerized portability
  • Performance Gains

    • Response time reduced from 5–10 min ➝ ~35s using quantized LlamaCpp
  • User Experience Enhancements

    • Chat memory across sessions
    • Summarization toggle
    • Chunk-source referencing
    • Interactive HTML/CSS/JS frontend

🏗 Architecture

flowchart TD
    U[User] -->|Query| FE["Frontend (HTML/CSS/JS)"]
    FE -->|REST API| BE["FastAPI Backend"]
    BE -->|Extract Text| PDF["PyMuPDF Parser"]
    PDF -->|Chunks| EMB["Embeddings (MiniLM/BGE)"]
    EMB --> FAISS[(FAISS Index)]
    BE -->|Retrieve & Rerank| FAISS
    BE -->|Model Switch| LLM["LLMs (LlamaCpp / HuggingFace)"]
    LLM --> BE
    BE -->|Response with Sources| FE
    subgraph Cloud
      S3[(AWS S3 Storage)]
    end
    PDF --> S3
Loading

🛠 Tech Stack

Layer Tools & Frameworks
Core AI LlamaCpp, Hugging Face Transformers, FAISS
Text Processing PyMuPDF, MiniLM, BGE
Backend FastAPI
Frontend HTML, CSS, JavaScript
Cloud AWS S3, AWS EC2, Azure App Services
Deployment Docker, macOS MPS

⚙ Setup & Installation

1️⃣ Clone Repository

git clone https://github.com/joze-Lee/Athena.git
cd athena

2️⃣ Create Virtual Environment

python3 -m venv venv
source venv/bin/activate   # Linux/Mac
venv\Scripts\activate      # Windows

3️⃣ Install Dependencies

Copy code
pip install -r requirements.txt

4️⃣ Configure Environment

Create a .env file:

MODEL_BACKEND=llamacpp    # options: llamacpp | huggingface
EMBEDDING_MODEL=MiniLM    # options: MiniLM | BGE
AWS_ACCESS_KEY=your-key
AWS_SECRET_KEY=your-secret
S3_BUCKET=athena-pdfs

▶️ Running Athena

uvicorn app.main:app --reload
Open in browser → http://localhost:8000

Docker

docker build -t athena .
docker run -p 8000:8000 athena

AWS EC2 → GPU/CPU instances + S3 integration

Azure App Service → Deploy FastAPI container

macOS MPS → Enable with LLAMA_CPP_MPS=1

💡 Usage

- Upload a PDF → Stored in S3 (or locally)
- Query → Athena retrieves relevant chunks via FAISS
- LLM → Generates answers with sources
- User → Can enable summarization, switch models, view references

📊 Benchmarks

Model Backend Precision Avg Response Time
Mistral-7B (4-bit) LlamaCpp High ~35s
Mistral FP16 Hugging Face Higher ~1.5–2 min
Phi-mini Hugging Face Moderate ~50s

🧩 Roadmap

 LangChain orchestration
 Multi-modal support (PDF + images)
 Domain-specific embedding fine-tuning
 WebSocket streaming responses

📜 License

 MIT License – see LICENSE for details.

🤝 Contributing

 Fork the repo
 Create a feature branch   
 Submit a PR 🚀

About

Athena is a lightweight, fast, and private question-answering system powered by a quantized LLM (Mistral-7B-Instruct-v0.2.Q4_K_M) running locally with full GPU (MPS) acceleration on macOS. Built with LangChain, LlamaCpp, and FAISS, Athena enables efficient retrieval-augmented generation (RAG) for context-aware Q&A from your own data — NO API CALLS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published