Skip to content

sv2506/rag-chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fulcrum

A retrieval‑augmented, streaming chat application built with FastAPI (Python) and React + TypeScript. It demonstrates:

  • Ephemeral → persisted chat session lifecycle (persist only after first user message)
  • WebSocket streaming of assistant responses (deterministic or LLM backed)
  • Retrieval over a local policy corpus with inline citations [id]
  • Clickable citations revealing document metadata
  • Full assistant message persistence after streaming completes
  • Optional OpenAI LLM integration for answer assembly (fallback to deterministic builder)
  • Independent backend + frontend deploy (Render blueprint included)

Table of Contents

  1. Features
  2. Tech Stack
  3. Repository Structure
  4. Quick Start (Local)
  5. Backend Details
  6. Frontend Details
  7. Retrieval & Grounding Logic
  8. Streaming Protocol
  9. Session Persistence Model
  10. Environment Variables
  11. Running With LLM Enabled
  12. Testing & Validation
  13. Deployment (Render + Alternatives)
  14. Production Hardening Tips
  15. Troubleshooting
  16. Extension Ideas

1. Features

  • FastAPI backend exposing REST + WebSocket endpoints under /api.
  • React/TypeScript UI with simple dummy login gate.
  • New chat always creates a fresh (ephemeral) session id; only saved after first user message.
  • Retrieval (BM25‑lite) across policies.json documents with inline [id] citations.
  • Click citations to view doc title metadata.
  • Streaming: assistant tokens -> final -> complete.
  • Optional LLM (OpenAI) streaming: automatic fallback to deterministic answer if disabled or error.
  • Policies index browsing page with search filter.

2. Tech Stack

Backend:

  • Python 3.13, FastAPI, SQLModel (SQLite), WebSockets, AnyIO, OpenAI (optional) Frontend:
  • React (CRA), TypeScript, react-router-dom v6 Retrieval:
  • Custom BM25‑style scorer (store/scorer.py) over JSON docs Persistence:
  • SQLite via SQLModel; chat history stored as JSON blob

3. Repository Structure

backend/
  app/
    main.py            # FastAPI app factory & router includes
    config.py          # Settings (env based)
    db.py              # Session + engine setup
    models.py          # SQLModel ChatSession model
    routers/
      chat.py          # REST + WebSocket chat, policies endpoints
      health.py        # /api/health
      home.py, echo.py # Simple endpoints
  store/
    policies.json      # Policy corpus
    retrieval.py       # Index loading, retrieve(), build_grounded_answer()
    scorer.py          # BM25-lite scoring implementation
    llm.py             # Optional OpenAI streaming integration
  requirements.txt
client/
  src/
    api.ts             # REST helper (REACT_APP_API_BASE aware)
    types/chat.ts      # Chat & WS message types
    pages/             # Chats, NewMessage, Policies, etc.
    components/NavBar.tsx
render.yaml            # Render blueprint (backend + static site)
DEPLOYMENT.md          # Deployment guide
README.md              # (this file)

4. Quick Start (Local)

Prerequisites

  • Node.js 18+ (for CRA build)
  • Python 3.11+ (3.13 used here)

4.1 Backend Setup

cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn backend.app.main:app --reload --port 8000

Backend available at: http://localhost:8000 (API prefix /api). Health check: http://localhost:8000/api/health

4.2 Frontend Setup

cd client
npm install
npm start

Frontend dev server: http://localhost:3000

4.3 Login & Use

  • Visit http://localhost:3000
  • Enter any username/password (stored only in localStorage for gating)
  • Start a new chat: type question; tokens stream in.
  • Click [id] citations inside assistant replies to view doc title.
  • View all policies via “All Policies” nav link.

5. Backend Details

Key endpoints (all under /api):

  • POST /chat/session create or resume session (ephemeral until first user message)
  • GET /chat/sessions?user_id=... list persisted sessions
  • GET /chat/session/{session_id}?user_id=... fetch a session
  • DELETE /chat/session/{session_id}?user_id=... delete persisted session
  • GET /chat/policies_meta minimal id/title list (for citation mapping)
  • GET /chat/policies full policy documents (transparency UI)
  • WS /chat/ws/{session_id}?user_id=... chat stream (send {type:"user_message", content:"..."})
  • GET /health health probe via router health.py (exposed as /api/health)

Data Model

ChatSession:

  • session_id (UUID string provided by client creation endpoint)
  • user_id (string)
  • chat_history (list[ {role, content, ts} ]) stored as JSON
  • created_at, updated_at

6. Frontend Details

Routing (react-router v6):

  • /welcome unauth gate
  • /home intro
  • /chats persisted sessions list
  • /new_chat always issues new ephemeral session
  • /chat/:sessionId resume existing session
  • /policies browse policy corpus

WebSocket handling in NewMessage.tsx: builds ws:// / wss:// URL, streams events, updates local state & localStorage for persistence offline (client-side caching of history).

7. Retrieval & Grounding Logic

retrieve(query, k) ranks documents using BM25-lite. build_grounded_answer() concatenates top k doc texts with appended citation markers. Fallback when no docs: predefined clarifying message.

LLM path (if enabled) uses same retrieval for context, constructs prompt with system guardrails, streams model deltas. Invalid citations removed in post-pass.

8. Streaming Protocol

Events (JSON) server → client:

  • history {type, history[]} initial backlog
  • ack acknowledgement of user message received/persisted
  • token incremental chunk (either word from deterministic path or model delta substring)
  • final full authoritative assistant message (overwrites partial)
  • complete marks end of assistant turn
  • error error message (client may show or ignore)

Client sends:

  • { "type": "user_message", "content": "..." }

9. Session Persistence Model

  1. Client asks for session -> backend returns ephemeral id without DB entry.
  2. First user message over WebSocket triggers creation + persistence.
  3. Empty or abandoned ephemeral session IDs never clutter DB.
  4. On assistant streaming, assistant message placeholder appended early to support incremental updates.

10. Environment Variables

Backend (see config.py + llm.py):

  • APP_NAME (optional)
  • API_PREFIX (default /api)
  • FRONTEND_ORIGIN (CORS allowlist; default http://localhost:3000)
  • USE_LLM set 1 to enable LLM streaming
  • OPENAI_API_KEY required if USE_LLM=1
  • LLM_MODEL override model name

Frontend build-time:

  • REACT_APP_API_BASE e.g. http://localhost:8000/api or production URL. (Optional future: REACT_APP_API_WS_BASE for explicit websocket host)

11. Running With LLM Enabled

# Backend
export USE_LLM=1
export OPENAI_API_KEY=sk-...yourkey...
export LLM_MODEL=gpt-4o-mini   # optional
uvicorn backend.app.main:app --reload --port 8000

If OpenAI call fails, system falls back to deterministic build_grounded_answer() streaming automatically.

12. Testing & Validation

Current repo includes a sample test (tests/test_health.py). Run (from repo root):

cd backend
pytest  # if pytest added; otherwise you can just curl the /api/health endpoint

Manual validation:

  1. Start backend, start frontend.
  2. Open browser dev tools Network tab.
  3. Send a chat message; observe WebSocket frames (token → final → complete).
  4. Refresh page; session history loads.
  5. Click citations; correct doc titles appear.
  6. Toggle USE_LLM to confirm new streaming style (smaller deltas, no artificial delays).

13. Deployment (Render + Alternatives)

See DEPLOYMENT.md for deeply detailed instructions (Render blueprint, Railway, Fly.io, Heroku, Vercel migration notes). Essential Render steps summary:

1. Push repo with render.yaml
2. Render -> New Blueprint -> select repo
3. Deploy backend + frontend
4. Set REACT_APP_API_BASE to https://<backend-host>/api and redeploy frontend
5. (Optionally) set USE_LLM=1 + OPENAI_API_KEY then redeploy backend

14. Production Hardening Tips

  • Add auth (JWT/session) instead of dummy gate.
  • Rate limit per user_id.
  • Add request logging & structured logs.
  • Store embeddings + use hybrid retrieval (BM25 + vector) for larger corpora.
  • Add tests for websocket flow & retrieval ranking quality.
  • Introduce caching layer for retrieval results (question hash).
  • Implement message truncation / token budget enforcement for LLM prompts.
  • Observability: add /api/version and metrics endpoint.

15. Troubleshooting

Symptom Likely Cause Fix
404 on API calls from frontend Incorrect REACT_APP_API_BASE Set correct URL and rebuild frontend
WebSocket closed immediately Wrong host or CORS Ensure backend allows frontend origin; adjust ws host logic
Citations not clickable Metadata fetch failed Check /api/chat/policies_meta network request
LLM never streams USE_LLM unset or key invalid Export USE_LLM=1 & valid OPENAI_API_KEY
Random spaces in deterministic streaming Word token join Switch to LLM path or refine splitting logic
Empty sessions appearing Client crash pre first message By design they’re skipped; ok

16. Extension Ideas

  • Add REACT_APP_API_WS_BASE override + env injection in build.
  • Add conversation title generation (first user message truncation already in list view—could store dedicated title).
  • Implement multi-user real auth & per-user rate limiting.
  • Add semantic search with embeddings (e.g., sentence-transformers) + hybrid ranking.
  • Provide downloadable conversation transcript (JSON export).
  • Add dark mode / UI polish (styling system, component library).
  • Dockerize full stack for Fly.io single deployment.

Feel free to request a Dockerfile, Next.js migration plan, or test harness expansion next.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages