Skip to content

Langchain, LlamaIndex, Chroma, Neo4j, LanGraph, Gradio

Notifications You must be signed in to change notification settings

EunbiYoon/meChatBot

Repository files navigation

MeChatBot — Gradio + LangGraph RAG (Vector → Optional GraphRAG)

A lightweight AI assistant service with a Gradio UI, LangGraph workflow orchestration, and RAG retrieval (Chroma via LlamaIndex).
Local development runs with Ollama, while production can use the OpenAI API.

  • Current: Vector RAG (LlamaIndex Retriever → Chroma Vector DB)
  • 🔜 Optional: KG-augmented RAG / GraphRAG (Graph Retriever → Neo4j Knowledge Graph)

Architecture

User (Browser)
      │
      ▼
┌───────────────────────────────┐
│            Gradio UI           │
│  - Question input              │
│  - Answer + Citations output   │
└───────────────┬───────────────┘
                │
                ▼
┌───────────────────────────────┐
│          LangGraph             │
│  Workflow Orchestration        │
│  - State management            │
│  - (optional) retry/branching  │
└───────────────┬───────────────┘
                │
                ▼
┌─────────────────────────────────────────────────────┐
│                 Retrieval Layer                      │
│                                                     │
│  (Current) Vector RAG                               │
│   LlamaIndex Retriever ─────▶ Chroma (Vector DB)     │
│                                                     │
│  (Future, Optional) KG-augmented RAG                 │
│   Graph Retriever ─────────▶ Neo4j (Knowledge Graph) │
└───────────────┬─────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────┐
│          LLM Layer             │
│  Prompting + Invocation        │
│                               │
│  - Local Dev: Ollama           │
│  - Production: OpenAI API      │
└───────────────┬───────────────┘
                │
                ▼
        Answer + Citations

What “RAG” means here (no fine-tuning)

This project does not fine-tune the model.
Your documents are indexed into a vector database and later retrieved and injected into the prompt at question time.

  • ✅ “Remembers” via retrieval (Chroma)
  • ❌ Does not change model weights (no training / fine-tuning)

Project layout

Typical files in this repo:

.
├─ app.py              # Gradio UI entrypoint
├─ graph.py            # LangGraph workflow (decide → retrieve → generate)
├─ rag_vector.py       # Vector RAG retrieval (Chroma via LlamaIndex)
├─ rag_graph.py        # (Optional) GraphRAG retrieval (Neo4j)
├─ llm_factory.py      # LLM provider factory (Ollama vs OpenAI)
├─ ingest.py           # One-time (or on-change) docs → Chroma indexing
└─ data/docs/          # Your knowledge base source documents (md, txt, etc.)

Script responsibilities

ingest.py

Indexes your knowledge base into Chroma.

  • Reads files under data/docs/
  • Chunks documents and computes embeddings
  • Upserts embeddings into Chroma (persistent storage)

Run this before app.py (and again whenever docs change).

rag_vector.py

Query-time retrieval against Chroma:

  • Connects to Chroma (CHROMA_DIR, COLLECTION_NAME)
  • Retrieves top-k relevant chunks using LlamaIndex retriever
  • Returns:
    • context (joined passages)
    • sources (source path + score per passage)

rag_graph.py (optional)

GraphRAG retrieval for entity/relationship-centric queries:

  • Uses Neo4j as a Knowledge Graph store
  • Produces additional context + sources to merge with vector results

graph.py

LangGraph orchestration:

  • node_decide: switches GraphRAG on/off via USE_GRAPH_RAG
  • node_vector_retrieve: runs Vector RAG
  • node_graph_retrieve: runs GraphRAG (optional)
  • node_generate: prompts the LLM using retrieved context
    and forces exactly one Citations: block in the final output.

llm_factory.py

Provider switching:

  • Local dev: ChatOllama(...) (requires Ollama running)
  • Production: OpenAI chat model (requires OPENAI_API_KEY)

app.py

Gradio UI:

  • Chat input
  • Chat output with citations
  • CSS tuned to avoid “double scroll” (one scroll area inside the chat only)

Quickstart

1) Create a virtualenv and install deps

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2) Start Ollama (local dev)

ollama serve
# Pull your chat model (example)
ollama pull llama3.1:8b
# Pull embedding model used by ingest (example)
ollama pull nomic-embed-text

3) Index documents into Chroma

python ingest.py

4) Run the app

python app.py

Environment variables (.env)

Minimal local dev:

LLM_PROVIDER=ollama
USE_GRAPH_RAG=false

CHROMA_DIR=./storage/chroma
COLLECTION_NAME=mechatbot

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
OLLAMA_EMBED_MODEL=nomic-embed-text

MAX_CITATIONS=2

Production (OpenAI):

LLM_PROVIDER=openai
OPENAI_API_KEY=YOUR_KEY

⚠️ Never commit .env to Git.


Troubleshooting

“Connection refused” to localhost:11434

Ollama is not running:

ollama serve

404 “model not found”

Pull the model name you configured:

ollama pull llama3.1:8b

Citations look unrelated / too many files

  • Lower top_k in graph.py / rag_vector.py
  • Use MAX_CITATIONS=2 (or 1)
  • Add a similarity threshold in rag_vector.py (optional)

Notes on privacy

  • RAG indexing stores your docs locally (Chroma).
  • If you use OpenAI in production, retrieved text may be sent to the API at inference time.
  • With Ollama (local), inference stays on your machine.

License

Add your preferred license here.

Remove Chroma and get new KB

rm -rf ./storage/chroma

About

Langchain, LlamaIndex, Chroma, Neo4j, LanGraph, Gradio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published