Fulcrum

A retrieval‑augmented, streaming chat application built with FastAPI (Python) and React + TypeScript. It demonstrates:

Ephemeral → persisted chat session lifecycle (persist only after first user message)
WebSocket streaming of assistant responses (deterministic or LLM backed)
Retrieval over a local policy corpus with inline citations [id]
Clickable citations revealing document metadata
Full assistant message persistence after streaming completes
Optional OpenAI LLM integration for answer assembly (fallback to deterministic builder)
Independent backend + frontend deploy (Render blueprint included)

Features
Tech Stack
Repository Structure
Quick Start (Local)
Backend Details
Frontend Details
Retrieval & Grounding Logic
Streaming Protocol
Session Persistence Model
Environment Variables
Running With LLM Enabled
Testing & Validation
Deployment (Render + Alternatives)
Production Hardening Tips
Troubleshooting
Extension Ideas

1. Features

FastAPI backend exposing REST + WebSocket endpoints under /api.
React/TypeScript UI with simple dummy login gate.
New chat always creates a fresh (ephemeral) session id; only saved after first user message.
Retrieval (BM25‑lite) across policies.json documents with inline [id] citations.
Click citations to view doc title metadata.
Streaming: assistant tokens -> final -> complete.
Optional LLM (OpenAI) streaming: automatic fallback to deterministic answer if disabled or error.
Policies index browsing page with search filter.

2. Tech Stack

Backend:

Python 3.13, FastAPI, SQLModel (SQLite), WebSockets, AnyIO, OpenAI (optional) Frontend:
React (CRA), TypeScript, react-router-dom v6 Retrieval:
Custom BM25‑style scorer (store/scorer.py) over JSON docs Persistence:
SQLite via SQLModel; chat history stored as JSON blob

3. Repository Structure

backend/
  app/
    main.py            # FastAPI app factory & router includes
    config.py          # Settings (env based)
    db.py              # Session + engine setup
    models.py          # SQLModel ChatSession model
    routers/
      chat.py          # REST + WebSocket chat, policies endpoints
      health.py        # /api/health
      home.py, echo.py # Simple endpoints
  store/
    policies.json      # Policy corpus
    retrieval.py       # Index loading, retrieve(), build_grounded_answer()
    scorer.py          # BM25-lite scoring implementation
    llm.py             # Optional OpenAI streaming integration
  requirements.txt
client/
  src/
    api.ts             # REST helper (REACT_APP_API_BASE aware)
    types/chat.ts      # Chat & WS message types
    pages/             # Chats, NewMessage, Policies, etc.
    components/NavBar.tsx
render.yaml            # Render blueprint (backend + static site)
DEPLOYMENT.md          # Deployment guide
README.md              # (this file)

4. Quick Start (Local)

Prerequisites

Node.js 18+ (for CRA build)
Python 3.11+ (3.13 used here)

4.1 Backend Setup

cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn backend.app.main:app --reload --port 8000

Backend available at: http://localhost:8000 (API prefix /api). Health check: http://localhost:8000/api/health

4.2 Frontend Setup

cd client
npm install
npm start

Frontend dev server: http://localhost:3000

4.3 Login & Use

Visit http://localhost:3000
Enter any username/password (stored only in localStorage for gating)
Start a new chat: type question; tokens stream in.
Click [id] citations inside assistant replies to view doc title.
View all policies via “All Policies” nav link.

5. Backend Details

Key endpoints (all under /api):

POST /chat/session create or resume session (ephemeral until first user message)
GET /chat/sessions?user_id=... list persisted sessions
GET /chat/session/{session_id}?user_id=... fetch a session
DELETE /chat/session/{session_id}?user_id=... delete persisted session
GET /chat/policies_meta minimal id/title list (for citation mapping)
GET /chat/policies full policy documents (transparency UI)
WS /chat/ws/{session_id}?user_id=... chat stream (send {type:"user_message", content:"..."})
GET /health health probe via router health.py (exposed as /api/health)

Data Model

ChatSession:

session_id (UUID string provided by client creation endpoint)
user_id (string)
chat_history (list[ {role, content, ts} ]) stored as JSON
created_at, updated_at

6. Frontend Details

Routing (react-router v6):

/welcome unauth gate
/home intro
/chats persisted sessions list
/new_chat always issues new ephemeral session
/chat/:sessionId resume existing session
/policies browse policy corpus

WebSocket handling in NewMessage.tsx: builds ws:// / wss:// URL, streams events, updates local state & localStorage for persistence offline (client-side caching of history).

7. Retrieval & Grounding Logic

retrieve(query, k) ranks documents using BM25-lite. build_grounded_answer() concatenates top k doc texts with appended citation markers. Fallback when no docs: predefined clarifying message.

LLM path (if enabled) uses same retrieval for context, constructs prompt with system guardrails, streams model deltas. Invalid citations removed in post-pass.

8. Streaming Protocol

Events (JSON) server → client:

history {type, history[]} initial backlog
ack acknowledgement of user message received/persisted
token incremental chunk (either word from deterministic path or model delta substring)
final full authoritative assistant message (overwrites partial)
complete marks end of assistant turn
error error message (client may show or ignore)

Client sends:

{ "type": "user_message", "content": "..." }

9. Session Persistence Model

Client asks for session -> backend returns ephemeral id without DB entry.
First user message over WebSocket triggers creation + persistence.
Empty or abandoned ephemeral session IDs never clutter DB.
On assistant streaming, assistant message placeholder appended early to support incremental updates.

10. Environment Variables

Backend (see config.py + llm.py):

APP_NAME (optional)
API_PREFIX (default /api)
FRONTEND_ORIGIN (CORS allowlist; default http://localhost:3000)
USE_LLM set 1 to enable LLM streaming
OPENAI_API_KEY required if USE_LLM=1
LLM_MODEL override model name

Frontend build-time:

REACT_APP_API_BASE e.g. http://localhost:8000/api or production URL. (Optional future: REACT_APP_API_WS_BASE for explicit websocket host)

11. Running With LLM Enabled

# Backend
export USE_LLM=1
export OPENAI_API_KEY=sk-...yourkey...
export LLM_MODEL=gpt-4o-mini   # optional
uvicorn backend.app.main:app --reload --port 8000

If OpenAI call fails, system falls back to deterministic build_grounded_answer() streaming automatically.

12. Testing & Validation

Current repo includes a sample test (tests/test_health.py). Run (from repo root):

cd backend
pytest  # if pytest added; otherwise you can just curl the /api/health endpoint

Manual validation:

Start backend, start frontend.
Open browser dev tools Network tab.
Send a chat message; observe WebSocket frames (token → final → complete).
Refresh page; session history loads.
Click citations; correct doc titles appear.
Toggle USE_LLM to confirm new streaming style (smaller deltas, no artificial delays).

13. Deployment (Render + Alternatives)

See DEPLOYMENT.md for deeply detailed instructions (Render blueprint, Railway, Fly.io, Heroku, Vercel migration notes). Essential Render steps summary:

1. Push repo with render.yaml
2. Render -> New Blueprint -> select repo
3. Deploy backend + frontend
4. Set REACT_APP_API_BASE to https://<backend-host>/api and redeploy frontend
5. (Optionally) set USE_LLM=1 + OPENAI_API_KEY then redeploy backend

14. Production Hardening Tips

Add auth (JWT/session) instead of dummy gate.
Rate limit per user_id.
Add request logging & structured logs.
Store embeddings + use hybrid retrieval (BM25 + vector) for larger corpora.
Add tests for websocket flow & retrieval ranking quality.
Introduce caching layer for retrieval results (question hash).
Implement message truncation / token budget enforcement for LLM prompts.
Observability: add /api/version and metrics endpoint.

15. Troubleshooting

Symptom	Likely Cause	Fix
404 on API calls from frontend	Incorrect REACT_APP_API_BASE	Set correct URL and rebuild frontend
WebSocket closed immediately	Wrong host or CORS	Ensure backend allows frontend origin; adjust ws host logic
Citations not clickable	Metadata fetch failed	Check `/api/chat/policies_meta` network request
LLM never streams	USE_LLM unset or key invalid	Export USE_LLM=1 & valid OPENAI_API_KEY
Random spaces in deterministic streaming	Word token join	Switch to LLM path or refine splitting logic
Empty sessions appearing	Client crash pre first message	By design they’re skipped; ok

16. Extension Ideas

Add REACT_APP_API_WS_BASE override + env injection in build.
Add conversation title generation (first user message truncation already in list view—could store dedicated title).
Implement multi-user real auth & per-user rate limiting.
Add semantic search with embeddings (e.g., sentence-transformers) + hybrid ranking.
Provide downloadable conversation transcript (JSON export).
Add dark mode / UI polish (styling system, component library).
Dockerize full stack for Fly.io single deployment.

Feel free to request a Dockerfile, Next.js migration plan, or test harness expansion next.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fulcrum

Table of Contents

1. Features

2. Tech Stack

3. Repository Structure

4. Quick Start (Local)

Prerequisites

4.1 Backend Setup

4.2 Frontend Setup

4.3 Login & Use

5. Backend Details

Data Model

6. Frontend Details

7. Retrieval & Grounding Logic

8. Streaming Protocol

9. Session Persistence Model

10. Environment Variables

11. Running With LLM Enabled

12. Testing & Validation

13. Deployment (Render + Alternatives)

14. Production Hardening Tips

15. Troubleshooting

16. Extension Ideas

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
old		old
DEPLOYMENT.md		DEPLOYMENT.md
README.md		README.md
render.yaml		render.yaml

sv2506/rag-chat

Folders and files

Latest commit

History

Repository files navigation

Fulcrum

Table of Contents

1. Features

2. Tech Stack

3. Repository Structure

4. Quick Start (Local)

Prerequisites

4.1 Backend Setup

4.2 Frontend Setup

4.3 Login & Use

5. Backend Details

Data Model

6. Frontend Details

7. Retrieval & Grounding Logic

8. Streaming Protocol

9. Session Persistence Model

10. Environment Variables

11. Running With LLM Enabled

12. Testing & Validation

13. Deployment (Render + Alternatives)

14. Production Hardening Tips

15. Troubleshooting

16. Extension Ideas

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages