This projects implements a RAG AI assistant that searches your organisation’s trusted content and answers questions using it. It delivers more accurate, relevant responses grounded in your data than what is obtained by simple LLM apps like ChatGPT.
RAGs have become the most common application of AI in enterprise environments. This specific RAG focuses on two core features: data privacy and observability.
- Option to run fully on-prem (locally). No calls outside the intranet needed. For decent performance, this requires a dedicated server spec'd for large open source models.
- Frontier cloud models can also be used. Privacy is then ensured by performing data anynimization on any request to the cloud and inserting back the redacted data on responses.
This covers both measures of the quality of the data (accuracy, completeness, groundedness / hallucination rate, relevance) and Operational metrics values like cost, latency and speed. This project provides dashboards that should allow the admin to determine the best combinations of LLM models used and settings for their data and organisation constraints. This is an ever improving area of RAGs, including this project.
- Backend: Python, FastAPI, Redis, Celery (async processing)
- RAG Pipeline: Docling, LlamaIndex
- Vector DB: ChromaDB
- Search: Hybrid (BM25 + Vector + RRF)
- LLM: Ollama (local) or cloud providers (OpenAI, Anthropic, etc.)
- Infrastructure: Docker compose
- Docker - Docker Desktop, OrbStack, or Podman
- Ollama - For local AI models (optional if using cloud only)
- 4GB RAM - For local development with slow inference.
- 2GB disk - For models and data for development.
This is a development/research project, not production-ready software. It lacks authentication, enterprise security, monitoring, high availability features to name some main ones.
This project is developed using Claude Code (Anthropic) as the primary coding assitant. OpenAI GPT and Google Gemini models are also used to explore alternative implementations.
All code is reviewed, tested (TDD), and validated for correctness and security.
macOS:
# Install Ollama
brew install ollama
# Download AI models
ollama pull gemma3:4b
ollama pull nomic-embed-text
# Install Docker
brew install orbstack # or Docker Desktop if you preferLinux:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Download AI models
ollama pull gemma3:4b
ollama pull nomic-embed-text# Clone the repository
git clone https://github.com/gittycat/ragbench.git
cd ragbenchReview and edit config.yml for your preferences.
Also edit secrets/.env with your API KEYS if using cloud models.
cp secrets/.env.example secrets/.env# Start Ollama (if not already running)
ollama serve &
# Start RAG Lab
docker compose up -dOpen http://localhost:8000 in your browser.
docker compose down- Go to Documents page
- Click Upload and select files
- Wait for processing (progress bar shows status)
Supported formats: PDF, DOCX, PPTX, XLSX, TXT, Markdown, HTML, AsciiDoc
- Go to Chat page
- Type your question
- Get AI-powered answers with source citations
- View all uploaded documents in Documents page
- Delete documents you no longer need
- Start new chat sessions anytime
docker compose down -v
docker compose up -dFor development setup, testing, and technical documentation, see DEVELOPMENT.md.
Built on the shoulder of a multitude of great open source software. MIT License