A retrieval‑augmented, streaming chat application built with FastAPI (Python) and React + TypeScript. It demonstrates:
- Ephemeral → persisted chat session lifecycle (persist only after first user message)
- WebSocket streaming of assistant responses (deterministic or LLM backed)
- Retrieval over a local policy corpus with inline citations
[id] - Clickable citations revealing document metadata
- Full assistant message persistence after streaming completes
- Optional OpenAI LLM integration for answer assembly (fallback to deterministic builder)
- Independent backend + frontend deploy (Render blueprint included)
- Features
- Tech Stack
- Repository Structure
- Quick Start (Local)
- Backend Details
- Frontend Details
- Retrieval & Grounding Logic
- Streaming Protocol
- Session Persistence Model
- Environment Variables
- Running With LLM Enabled
- Testing & Validation
- Deployment (Render + Alternatives)
- Production Hardening Tips
- Troubleshooting
- Extension Ideas
- FastAPI backend exposing REST + WebSocket endpoints under
/api. - React/TypeScript UI with simple dummy login gate.
- New chat always creates a fresh (ephemeral) session id; only saved after first user message.
- Retrieval (BM25‑lite) across
policies.jsondocuments with inline[id]citations. - Click citations to view doc title metadata.
- Streaming: assistant tokens -> final -> complete.
- Optional LLM (OpenAI) streaming: automatic fallback to deterministic answer if disabled or error.
- Policies index browsing page with search filter.
Backend:
- Python 3.13, FastAPI, SQLModel (SQLite), WebSockets, AnyIO, OpenAI (optional) Frontend:
- React (CRA), TypeScript, react-router-dom v6 Retrieval:
- Custom BM25‑style scorer (
store/scorer.py) over JSON docs Persistence: - SQLite via SQLModel; chat history stored as JSON blob
backend/
app/
main.py # FastAPI app factory & router includes
config.py # Settings (env based)
db.py # Session + engine setup
models.py # SQLModel ChatSession model
routers/
chat.py # REST + WebSocket chat, policies endpoints
health.py # /api/health
home.py, echo.py # Simple endpoints
store/
policies.json # Policy corpus
retrieval.py # Index loading, retrieve(), build_grounded_answer()
scorer.py # BM25-lite scoring implementation
llm.py # Optional OpenAI streaming integration
requirements.txt
client/
src/
api.ts # REST helper (REACT_APP_API_BASE aware)
types/chat.ts # Chat & WS message types
pages/ # Chats, NewMessage, Policies, etc.
components/NavBar.tsx
render.yaml # Render blueprint (backend + static site)
DEPLOYMENT.md # Deployment guide
README.md # (this file)
- Node.js 18+ (for CRA build)
- Python 3.11+ (3.13 used here)
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn backend.app.main:app --reload --port 8000Backend available at: http://localhost:8000 (API prefix /api).
Health check: http://localhost:8000/api/health
cd client
npm install
npm startFrontend dev server: http://localhost:3000
- Visit http://localhost:3000
- Enter any username/password (stored only in
localStoragefor gating) - Start a new chat: type question; tokens stream in.
- Click
[id]citations inside assistant replies to view doc title. - View all policies via “All Policies” nav link.
Key endpoints (all under /api):
POST /chat/sessioncreate or resume session (ephemeral until first user message)GET /chat/sessions?user_id=...list persisted sessionsGET /chat/session/{session_id}?user_id=...fetch a sessionDELETE /chat/session/{session_id}?user_id=...delete persisted sessionGET /chat/policies_metaminimal id/title list (for citation mapping)GET /chat/policiesfull policy documents (transparency UI)WS /chat/ws/{session_id}?user_id=...chat stream (send{type:"user_message", content:"..."})GET /healthhealth probe via routerhealth.py(exposed as/api/health)
ChatSession:
session_id(UUID string provided by client creation endpoint)user_id(string)chat_history(list[ {role, content, ts} ]) stored as JSONcreated_at,updated_at
Routing (react-router v6):
/welcomeunauth gate/homeintro/chatspersisted sessions list/new_chatalways issues new ephemeral session/chat/:sessionIdresume existing session/policiesbrowse policy corpus
WebSocket handling in NewMessage.tsx: builds ws:// / wss:// URL, streams events, updates local state & localStorage for persistence offline (client-side caching of history).
retrieve(query, k) ranks documents using BM25-lite. build_grounded_answer() concatenates top k doc texts with appended citation markers. Fallback when no docs: predefined clarifying message.
LLM path (if enabled) uses same retrieval for context, constructs prompt with system guardrails, streams model deltas. Invalid citations removed in post-pass.
Events (JSON) server → client:
history{type, history[]} initial backlogackacknowledgement of user message received/persistedtokenincremental chunk (either word from deterministic path or model delta substring)finalfull authoritative assistant message (overwrites partial)completemarks end of assistant turnerrorerror message (client may show or ignore)
Client sends:
{ "type": "user_message", "content": "..." }
- Client asks for session -> backend returns ephemeral id without DB entry.
- First user message over WebSocket triggers creation + persistence.
- Empty or abandoned ephemeral session IDs never clutter DB.
- On assistant streaming, assistant message placeholder appended early to support incremental updates.
Backend (see config.py + llm.py):
APP_NAME(optional)API_PREFIX(default/api)FRONTEND_ORIGIN(CORS allowlist; defaulthttp://localhost:3000)USE_LLMset1to enable LLM streamingOPENAI_API_KEYrequired ifUSE_LLM=1LLM_MODELoverride model name
Frontend build-time:
REACT_APP_API_BASEe.g.http://localhost:8000/apior production URL. (Optional future:REACT_APP_API_WS_BASEfor explicit websocket host)
# Backend
export USE_LLM=1
export OPENAI_API_KEY=sk-...yourkey...
export LLM_MODEL=gpt-4o-mini # optional
uvicorn backend.app.main:app --reload --port 8000If OpenAI call fails, system falls back to deterministic build_grounded_answer() streaming automatically.
Current repo includes a sample test (tests/test_health.py). Run (from repo root):
cd backend
pytest # if pytest added; otherwise you can just curl the /api/health endpointManual validation:
- Start backend, start frontend.
- Open browser dev tools Network tab.
- Send a chat message; observe WebSocket frames (token → final → complete).
- Refresh page; session history loads.
- Click citations; correct doc titles appear.
- Toggle USE_LLM to confirm new streaming style (smaller deltas, no artificial delays).
See DEPLOYMENT.md for deeply detailed instructions (Render blueprint, Railway, Fly.io, Heroku, Vercel migration notes).
Essential Render steps summary:
1. Push repo with render.yaml
2. Render -> New Blueprint -> select repo
3. Deploy backend + frontend
4. Set REACT_APP_API_BASE to https://<backend-host>/api and redeploy frontend
5. (Optionally) set USE_LLM=1 + OPENAI_API_KEY then redeploy backend
- Add auth (JWT/session) instead of dummy gate.
- Rate limit per user_id.
- Add request logging & structured logs.
- Store embeddings + use hybrid retrieval (BM25 + vector) for larger corpora.
- Add tests for websocket flow & retrieval ranking quality.
- Introduce caching layer for retrieval results (question hash).
- Implement message truncation / token budget enforcement for LLM prompts.
- Observability: add
/api/versionand metrics endpoint.
| Symptom | Likely Cause | Fix |
|---|---|---|
| 404 on API calls from frontend | Incorrect REACT_APP_API_BASE | Set correct URL and rebuild frontend |
| WebSocket closed immediately | Wrong host or CORS | Ensure backend allows frontend origin; adjust ws host logic |
| Citations not clickable | Metadata fetch failed | Check /api/chat/policies_meta network request |
| LLM never streams | USE_LLM unset or key invalid | Export USE_LLM=1 & valid OPENAI_API_KEY |
| Random spaces in deterministic streaming | Word token join | Switch to LLM path or refine splitting logic |
| Empty sessions appearing | Client crash pre first message | By design they’re skipped; ok |
- Add
REACT_APP_API_WS_BASEoverride + env injection in build. - Add conversation title generation (first user message truncation already in list view—could store dedicated title).
- Implement multi-user real auth & per-user rate limiting.
- Add semantic search with embeddings (e.g., sentence-transformers) + hybrid ranking.
- Provide downloadable conversation transcript (JSON export).
- Add dark mode / UI polish (styling system, component library).
- Dockerize full stack for Fly.io single deployment.
Feel free to request a Dockerfile, Next.js migration plan, or test harness expansion next.