derekparent · derekparent · Feb 7, 2026 · Feb 6, 2026 · Feb 6, 2026 · Feb 6, 2026
diff --git a/docs/LLM-Powered-Manuals-Assistant-Plan.md b/docs/LLM-Powered-Manuals-Assistant-Plan.md
@@ -0,0 +1,231 @@
+# LLM-Powered Manuals Assistant
+
+**Overview:** Add a conversational troubleshooting assistant to the manuals section, leveraging the existing FTS5 RAG infrastructure. Optimized for quality over cost — on-demand expert use at low volume, not high-throughput cheap queries. The assistant retrieves relevant manual excerpts, cites sources, and streams responses via SSE.
+
+**Model:** Claude Sonnet 4.5 (~$7/mo at 10 queries/day). Best technical reasoning, excellent instruction-following for RAG grounding, familiar SDK.
+
+**Usage profile:** On-demand, ~5-10 queries/day when actively troubleshooting. Cost is negligible at this volume — optimize for best possible answers.
+
+---
+
+## Todos
+
+- [ ] **Phase 0:** Spike — validate Claude Sonnet 4.5 + FTS5 RAG pipeline with real manual queries
+- [ ] **Phase 1.1:** Create `llm_service.py` (Anthropic SDK wrapper, sync, Sonnet 4.5)
+- [ ] **Phase 1.2:** Create system prompt in `src/prompts/manuals_assistant.py`
+- [ ] **Phase 1.3:** Add `get_context_for_llm()` to `manuals_service.py`
+- [ ] **Phase 1.4:** Create `chat_service.py` (context assembly, conversation history, token management)
+- [ ] **Phase 1.5:** Create chat routes + mobile-first chat UI with SSE streaming
+- [ ] **Phase 1.6:** Error handling, tests
+
+---
+
+## Architecture Overview
+
+```mermaid
+flowchart TB
+    subgraph ui [Chat Interface]
+        ChatUI[Chat Component]
+        SearchUI[Existing Search]
+    end
+
+    subgraph services [Service Layer]
+        ChatService[Chat Service]
+        LLMService[LLM Service]
+        ManualsService[Existing Manuals Service]
+    end
+
+    subgraph storage [Data Layer]
+        ChatHistory[(chat_sessions)]
+        FTS5[(engine_search.db)]
+    end
+
+    ChatUI --> ChatService
+    ChatService --> LLMService
+    ChatService --> ManualsService
+    LLMService -->|Claude Sonnet 4.5| AnthropicAPI
+    ManualsService --> FTS5
+    ChatService --> ChatHistory
+```
+
+---
+
+## Phase 0: Spike (1 session)
+
+**Goal:** Validate Claude Sonnet 4.5 with real manual content before building infrastructure.
+
+Build a standalone script that:
+1. Takes a query (e.g., "3516 fuel rack actuator troubleshooting")
+2. Calls `search_manuals()` to get top 5 FTS5 results
+3. Formats results as structured context with citations
+4. Sends to Claude Sonnet 4.5 via the Anthropic SDK with the marine engineering system prompt
+5. Streams the response
+
+**Validate:**
+- Response quality on technical content (torque specs, clearances, diagnostic codes)
+- Citation accuracy (does it reference the right manual sections?)
+- Handling of multi-step procedures (step-by-step clarity)
+- Behavior when RAG context doesn't contain the answer (does it say so or hallucinate?)
+- Streaming latency (time to first token)
+- Anthropic SDK reliability (streaming, error messages, token counting)
+
+**Kill decision:** If response quality is poor with real manual content, reassess model choice before building infrastructure.
+
+---
+
+## Phase 1: Core Chat + RAG
+
+### 1.1 LLM Service
+
+Create `src/services/llm_service.py`:
+
+- Wrapper around the Anthropic Python SDK (`anthropic`)
+- **Synchronous** — Flask is sync, keep it simple
+- Claude Sonnet 4.5 (`claude-sonnet-4-5-20250929`)
+- Retry with exponential backoff (3 attempts)
+- Configurable timeout (30s default)
+
+```python
+class LLMService:
+    def complete(self, messages: list[dict], context: str) -> str
+    def stream(self, messages: list[dict], context: str) -> Iterator[str]
+    def count_tokens(self, text: str) -> int
+```
+
+One clean wrapper around the Anthropic SDK. If you ever need to swap providers, refactor one file.
+
+### 1.2 System Prompt
+
+Create `src/prompts/manuals_assistant.py`:
+
+This is where the feature quality lives. The prompt must handle:
+
+- **Identity:** Marine engineering assistant for CAT engines (3516, C18, C32, C4.4)
+- **Grounding:** Use retrieved manual excerpts as authoritative source. Always cite document name and page number.
+- **Honesty:** If the retrieved context doesn't contain the answer, say so explicitly. Never hallucinate specs, clearances, or procedures.
+- **Safety:** For safety-critical values (torque specs, valve clearances, pressure limits), quote the manual verbatim and recommend verifying against the physical manual.
+- **Clarification:** Ask about specific equipment model, symptoms, and operating conditions before diagnosing.
+- **Scope:** Decline questions outside the indexed manual content. Redirect to search.
+
+```python
+SYSTEM_PROMPT = """..."""
+
+def format_context(results: list[dict]) -> str:
+    """Format RAG results into structured context for the LLM."""
+    ...
+
+def build_messages(system: str, context: str, history: list, query: str) -> list:
+    """Assemble full message list within token budget."""
+    ...
+```
+
+### 1.3 RAG Integration
+
+Enhance `src/services/manuals_service.py`:
+
+- Add `get_context_for_llm(query: str, limit: int = 5) -> list[dict]`
+- Return structured results with: content, source document, page number, authority level
+- Leverage existing `search_manuals()` and `search_cards()`
+- Format with clear citation markers for the LLM
+
+### 1.4 Chat Service
+
+Create `src/services/chat_service.py`:
+
+- Context assembly from RAG search results
+- Conversation history (in-memory, max 10 turns per session)
+- Token budget management (system prompt + RAG context + history fits within model limits)
+- Formats the full prompt: system + context + history + user query
+
+### 1.5 Chat Routes + UI
+
+Create `src/routes/chat.py` and `templates/manuals/chat.html`:
+
+**Routes:**
+- `GET /manuals/chat` — Chat interface
+- `POST /api/chat/message` — Send message, get streamed response (SSE)
+
+**UI:**
+- Mobile-first chat interface matching existing design system
+- Streaming response display (tokens appear as they arrive)
+- Source citations as tappable links to manual sections
+- Clear conversation / new chat button
+
+**Streaming:** Use Flask's `Response(stream_with_context(generator))` with `text/event-stream` content type. Sync generator from `LLMService.stream()`.
+
+### 1.6 Error Handling + Tests
+
+**Degradation path:**
+
+| Condition | Behavior |
+|-----------|----------|
+| API works | Stream LLM response with citations |
+| API slow (>30s) | Show "Thinking..." with cancel button, timeout |
+| API fails (500) | Show error + FTS5 search results as fallback |
+| API rate-limited (429) | Retry once after backoff, then show error |
+
+**Tests:**
+- Unit tests for `get_context_for_llm()` — context assembly and formatting
+- Unit tests for system prompt building and token budget management
+- Integration tests with mocked LLM responses — full pipeline without API calls
+- One end-to-end test hitting real API (marked `@pytest.mark.slow`)
+
+---
+
+## Key Files to Create/Modify
+
+**New Files:**
+
+- `src/services/llm_service.py` — LLM wrapper
+- `src/services/chat_service.py` — Chat logic + context assembly
+- `src/prompts/manuals_assistant.py` — System prompt + formatting
+- `src/routes/chat.py` — Chat endpoints
+- `templates/manuals/chat.html` — Chat UI
+
+**Modified Files:**
+
+- `src/services/manuals_service.py` — Add `get_context_for_llm()`
+- `src/models.py` — Add `ChatSession` model
+- `src/app.py` — Register chat blueprint
+- `src/config.py` — Add LLM API key + chat settings
+- `requirements.txt` — Add `anthropic`
+
+---
+
+## Security Considerations
+
+- Never send PII to LLM APIs
+- Sanitize manual content before sending (strip any crew names from context)
+- Rate limiting on chat endpoints (existing infrastructure)
+- API key stored in environment variable, never in code
+
+---
+
+## Database Migration
+
+New table:
+
+- `chat_sessions` — Conversation history (user_id, messages JSON, created_at, updated_at)
+
+---
+
+## Session Plan
+
+| Session | Deliverable | Output |
+|---------|-------------|--------|
+| **1** | Spike | Validate Sonnet 4.5 + FTS5 RAG with real queries. Go/no-go. |
+| **2** | Services | `llm_service.py`, `chat_service.py`, `get_context_for_llm()`, system prompt |
+| **3** | UI + Routes | Chat route, SSE streaming, mobile chat UI |
+| **4** | Hardening | Error handling, fallback behavior, tests |
+
+---
+
+## Future Considerations (Post-Ship)
+
+Evaluate after using the assistant on real troubleshooting scenarios:
+
+- Conversation persistence across sessions
+- Response caching for repeated questions
+- User pattern tracking (search history, preferred docs)
+- Guided troubleshooting workflows with step-by-step diagnosis
+- Upgrade to Opus if deeper reasoning needed on complex diagnostics
diff --git a/migrations/versions/d4db138494c9_add_chat_sessions_table.py b/migrations/versions/d4db138494c9_add_chat_sessions_table.py
@@ -0,0 +1,36 @@
+"""add chat_sessions table
+
+Revision ID: d4db138494c9
+Revises: 2e194345a0a0
+Create Date: 2026-02-05 21:39:40.892152
+
+"""
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision = 'd4db138494c9'
+down_revision = '2e194345a0a0'
+branch_labels = None
+depends_on = None
+
+
+def upgrade():
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.create_table('chat_sessions',
+    sa.Column('id', sa.Integer(), nullable=False),
+    sa.Column('user_id', sa.Integer(), nullable=False),
+    sa.Column('messages', sa.Text(), nullable=False),
+    sa.Column('created_at', sa.DateTime(), nullable=False),
+    sa.Column('updated_at', sa.DateTime(), nullable=False),
+    sa.ForeignKeyConstraint(['user_id'], ['users.id'], ),
+    sa.PrimaryKeyConstraint('id')
+    )
+    # ### end Alembic commands ###
+
+
+def downgrade():
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.drop_table('chat_sessions')
+    # ### end Alembic commands ###
diff --git a/requirements.txt b/requirements.txt
@@ -25,6 +25,9 @@ reportlab==4.2.5
 # OCR / Google Cloud Vision
 google-cloud-vision==3.8.1
 
+# LLM
+anthropic>=0.39.0
+
 # Production Server
 gunicorn==23.0.0
 

diff --git a/src/app.py b/src/app.py
@@ -169,16 +169,22 @@ def health_check():
     from routes.auth import auth_bp
     from routes.secure_api import secure_api_bp, init_secure_api
     from routes.manuals import manuals_bp
+    from routes.chat import chat_bp
 
     # Register all APIs
     app.register_blueprint(api_bp, url_prefix="/api")
     app.register_blueprint(auth_bp, url_prefix="/auth")
     app.register_blueprint(secure_api_bp, url_prefix="/api/v1")
     app.register_blueprint(manuals_bp)  # url_prefix already set in blueprint
-
+    app.register_blueprint(chat_bp)    # url_prefix set in blueprint (/manuals/chat)
+
     # Initialize secure API rate limiter with app
     init_secure_api(app)
 
+    # Initialize LLM service (graceful if no API key)
+    from services.llm_service import create_llm_service
+    create_llm_service(app)
+
     # Main routes
     @app.route("/")
     @login_required

diff --git a/src/config.py b/src/config.py
@@ -46,6 +46,13 @@ class Config:
     SESSION_COOKIE_HTTPONLY = True
     SESSION_COOKIE_SAMESITE = "Lax"
 
+    # LLM / Chat Assistant
+    ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY", "")
+    ANTHROPIC_MODEL = os.environ.get("ANTHROPIC_MODEL", "claude-sonnet-4-5-20250929")
+    CHAT_MAX_TURNS = int(os.environ.get("CHAT_MAX_TURNS", "10"))
+    CHAT_TIMEOUT = int(os.environ.get("CHAT_TIMEOUT", "30"))
+    CHAT_MAX_CONTEXT_TOKENS = int(os.environ.get("CHAT_MAX_CONTEXT_TOKENS", "4000"))
+
     # Logging configuration
     LOG_LEVEL = os.environ.get("LOG_LEVEL", "INFO")
     LOG_DIR = BASE_DIR / "logs"

diff --git a/src/models.py b/src/models.py
@@ -1,5 +1,6 @@
 """Database models for Oil Record Book Tool."""
 
+import json
 from datetime import datetime, timezone
 from flask_sqlalchemy import SQLAlchemy
 from flask_login import UserMixin
@@ -507,3 +508,39 @@ def to_dict(self) -> dict:
             "end_date": self.end_date.isoformat() if self.end_date else None,
             "created_at": self.created_at.isoformat(),
         }
+
+
+class ChatSession(db.Model):
+    """LLM chat conversation session."""
+
+    __tablename__ = "chat_sessions"
+
+    id: int = db.Column(db.Integer, primary_key=True)
+    user_id: int = db.Column(db.Integer, db.ForeignKey("users.id"), nullable=False)
+    messages: str = db.Column(db.Text, nullable=False, default="[]")
+    created_at: datetime = db.Column(
+        db.DateTime, nullable=False, default=lambda: datetime.now(UTC)
+    )
+    updated_at: datetime = db.Column(
+        db.DateTime, nullable=False, default=lambda: datetime.now(UTC),
+        onupdate=lambda: datetime.now(UTC)
+    )
+
+    user = db.relationship("User")
+
+    def get_messages(self) -> list[dict]:
+        """Deserialize messages JSON."""
+        return json.loads(self.messages) if self.messages else []
+
+    def set_messages(self, msgs: list[dict]) -> None:
+        """Serialize messages to JSON."""
+        self.messages = json.dumps(msgs)
+
+    def to_dict(self) -> dict:
+        return {
+            "id": self.id,
+            "user_id": self.user_id,
+            "messages": self.get_messages(),
+            "created_at": self.created_at.isoformat(),
+            "updated_at": self.updated_at.isoformat(),
+        }
diff --git a/src/prompts/__init__.py b/src/prompts/__init__.py