Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
231 changes: 231 additions & 0 deletions docs/LLM-Powered-Manuals-Assistant-Plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# LLM-Powered Manuals Assistant

**Overview:** Add a conversational troubleshooting assistant to the manuals section, leveraging the existing FTS5 RAG infrastructure. Optimized for quality over cost — on-demand expert use at low volume, not high-throughput cheap queries. The assistant retrieves relevant manual excerpts, cites sources, and streams responses via SSE.

**Model:** Claude Sonnet 4.5 (~$7/mo at 10 queries/day). Best technical reasoning, excellent instruction-following for RAG grounding, familiar SDK.

**Usage profile:** On-demand, ~5-10 queries/day when actively troubleshooting. Cost is negligible at this volume — optimize for best possible answers.

---

## Todos

- [ ] **Phase 0:** Spike — validate Claude Sonnet 4.5 + FTS5 RAG pipeline with real manual queries
- [ ] **Phase 1.1:** Create `llm_service.py` (Anthropic SDK wrapper, sync, Sonnet 4.5)
- [ ] **Phase 1.2:** Create system prompt in `src/prompts/manuals_assistant.py`
- [ ] **Phase 1.3:** Add `get_context_for_llm()` to `manuals_service.py`
- [ ] **Phase 1.4:** Create `chat_service.py` (context assembly, conversation history, token management)
- [ ] **Phase 1.5:** Create chat routes + mobile-first chat UI with SSE streaming
- [ ] **Phase 1.6:** Error handling, tests

---

## Architecture Overview

```mermaid
flowchart TB
subgraph ui [Chat Interface]
ChatUI[Chat Component]
SearchUI[Existing Search]
end

subgraph services [Service Layer]
ChatService[Chat Service]
LLMService[LLM Service]
ManualsService[Existing Manuals Service]
end

subgraph storage [Data Layer]
ChatHistory[(chat_sessions)]
FTS5[(engine_search.db)]
end

ChatUI --> ChatService
ChatService --> LLMService
ChatService --> ManualsService
LLMService -->|Claude Sonnet 4.5| AnthropicAPI
ManualsService --> FTS5
ChatService --> ChatHistory
```

---

## Phase 0: Spike (1 session)

**Goal:** Validate Claude Sonnet 4.5 with real manual content before building infrastructure.

Build a standalone script that:
1. Takes a query (e.g., "3516 fuel rack actuator troubleshooting")
2. Calls `search_manuals()` to get top 5 FTS5 results
3. Formats results as structured context with citations
4. Sends to Claude Sonnet 4.5 via the Anthropic SDK with the marine engineering system prompt
5. Streams the response

**Validate:**
- Response quality on technical content (torque specs, clearances, diagnostic codes)
- Citation accuracy (does it reference the right manual sections?)
- Handling of multi-step procedures (step-by-step clarity)
- Behavior when RAG context doesn't contain the answer (does it say so or hallucinate?)
- Streaming latency (time to first token)
- Anthropic SDK reliability (streaming, error messages, token counting)

**Kill decision:** If response quality is poor with real manual content, reassess model choice before building infrastructure.

---

## Phase 1: Core Chat + RAG

### 1.1 LLM Service

Create `src/services/llm_service.py`:

- Wrapper around the Anthropic Python SDK (`anthropic`)
- **Synchronous** — Flask is sync, keep it simple
- Claude Sonnet 4.5 (`claude-sonnet-4-5-20250929`)
- Retry with exponential backoff (3 attempts)
- Configurable timeout (30s default)

```python
class LLMService:
def complete(self, messages: list[dict], context: str) -> str
def stream(self, messages: list[dict], context: str) -> Iterator[str]
def count_tokens(self, text: str) -> int
```

One clean wrapper around the Anthropic SDK. If you ever need to swap providers, refactor one file.

### 1.2 System Prompt

Create `src/prompts/manuals_assistant.py`:

This is where the feature quality lives. The prompt must handle:

- **Identity:** Marine engineering assistant for CAT engines (3516, C18, C32, C4.4)
- **Grounding:** Use retrieved manual excerpts as authoritative source. Always cite document name and page number.
- **Honesty:** If the retrieved context doesn't contain the answer, say so explicitly. Never hallucinate specs, clearances, or procedures.
- **Safety:** For safety-critical values (torque specs, valve clearances, pressure limits), quote the manual verbatim and recommend verifying against the physical manual.
- **Clarification:** Ask about specific equipment model, symptoms, and operating conditions before diagnosing.
- **Scope:** Decline questions outside the indexed manual content. Redirect to search.

```python
SYSTEM_PROMPT = """..."""

def format_context(results: list[dict]) -> str:
"""Format RAG results into structured context for the LLM."""
...

def build_messages(system: str, context: str, history: list, query: str) -> list:
"""Assemble full message list within token budget."""
...
```

### 1.3 RAG Integration

Enhance `src/services/manuals_service.py`:

- Add `get_context_for_llm(query: str, limit: int = 5) -> list[dict]`
- Return structured results with: content, source document, page number, authority level
- Leverage existing `search_manuals()` and `search_cards()`
- Format with clear citation markers for the LLM

### 1.4 Chat Service

Create `src/services/chat_service.py`:

- Context assembly from RAG search results
- Conversation history (in-memory, max 10 turns per session)
- Token budget management (system prompt + RAG context + history fits within model limits)
- Formats the full prompt: system + context + history + user query

### 1.5 Chat Routes + UI

Create `src/routes/chat.py` and `templates/manuals/chat.html`:

**Routes:**
- `GET /manuals/chat` — Chat interface
- `POST /api/chat/message` — Send message, get streamed response (SSE)

**UI:**
- Mobile-first chat interface matching existing design system
- Streaming response display (tokens appear as they arrive)
- Source citations as tappable links to manual sections
- Clear conversation / new chat button

**Streaming:** Use Flask's `Response(stream_with_context(generator))` with `text/event-stream` content type. Sync generator from `LLMService.stream()`.

### 1.6 Error Handling + Tests

**Degradation path:**

| Condition | Behavior |
|-----------|----------|
| API works | Stream LLM response with citations |
| API slow (>30s) | Show "Thinking..." with cancel button, timeout |
| API fails (500) | Show error + FTS5 search results as fallback |
| API rate-limited (429) | Retry once after backoff, then show error |

**Tests:**
- Unit tests for `get_context_for_llm()` — context assembly and formatting
- Unit tests for system prompt building and token budget management
- Integration tests with mocked LLM responses — full pipeline without API calls
- One end-to-end test hitting real API (marked `@pytest.mark.slow`)

---

## Key Files to Create/Modify

**New Files:**

- `src/services/llm_service.py` — LLM wrapper
- `src/services/chat_service.py` — Chat logic + context assembly
- `src/prompts/manuals_assistant.py` — System prompt + formatting
- `src/routes/chat.py` — Chat endpoints
- `templates/manuals/chat.html` — Chat UI

**Modified Files:**

- `src/services/manuals_service.py` — Add `get_context_for_llm()`
- `src/models.py` — Add `ChatSession` model
- `src/app.py` — Register chat blueprint
- `src/config.py` — Add LLM API key + chat settings
- `requirements.txt` — Add `anthropic`

---

## Security Considerations

- Never send PII to LLM APIs
- Sanitize manual content before sending (strip any crew names from context)
- Rate limiting on chat endpoints (existing infrastructure)
- API key stored in environment variable, never in code

---

## Database Migration

New table:

- `chat_sessions` — Conversation history (user_id, messages JSON, created_at, updated_at)

---

## Session Plan

| Session | Deliverable | Output |
|---------|-------------|--------|
| **1** | Spike | Validate Sonnet 4.5 + FTS5 RAG with real queries. Go/no-go. |
| **2** | Services | `llm_service.py`, `chat_service.py`, `get_context_for_llm()`, system prompt |
| **3** | UI + Routes | Chat route, SSE streaming, mobile chat UI |
| **4** | Hardening | Error handling, fallback behavior, tests |

---

## Future Considerations (Post-Ship)

Evaluate after using the assistant on real troubleshooting scenarios:

- Conversation persistence across sessions
- Response caching for repeated questions
- User pattern tracking (search history, preferred docs)
- Guided troubleshooting workflows with step-by-step diagnosis
- Upgrade to Opus if deeper reasoning needed on complex diagnostics
36 changes: 36 additions & 0 deletions migrations/versions/d4db138494c9_add_chat_sessions_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""add chat_sessions table

Revision ID: d4db138494c9
Revises: 2e194345a0a0
Create Date: 2026-02-05 21:39:40.892152

"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = 'd4db138494c9'
down_revision = '2e194345a0a0'
branch_labels = None
depends_on = None


def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('chat_sessions',
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('user_id', sa.Integer(), nullable=False),
sa.Column('messages', sa.Text(), nullable=False),
sa.Column('created_at', sa.DateTime(), nullable=False),
sa.Column('updated_at', sa.DateTime(), nullable=False),
sa.ForeignKeyConstraint(['user_id'], ['users.id'], ),
sa.PrimaryKeyConstraint('id')
)
# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.drop_table('chat_sessions')
# ### end Alembic commands ###
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ reportlab==4.2.5
# OCR / Google Cloud Vision
google-cloud-vision==3.8.1

# LLM
anthropic>=0.39.0

# Production Server
gunicorn==23.0.0

Expand Down
8 changes: 7 additions & 1 deletion src/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,16 +169,22 @@ def health_check():
from routes.auth import auth_bp
from routes.secure_api import secure_api_bp, init_secure_api
from routes.manuals import manuals_bp
from routes.chat import chat_bp

# Register all APIs
app.register_blueprint(api_bp, url_prefix="/api")
app.register_blueprint(auth_bp, url_prefix="/auth")
app.register_blueprint(secure_api_bp, url_prefix="/api/v1")
app.register_blueprint(manuals_bp) # url_prefix already set in blueprint

app.register_blueprint(chat_bp) # url_prefix set in blueprint (/manuals/chat)

# Initialize secure API rate limiter with app
init_secure_api(app)

# Initialize LLM service (graceful if no API key)
from services.llm_service import create_llm_service
create_llm_service(app)

# Main routes
@app.route("/")
@login_required
Expand Down
7 changes: 7 additions & 0 deletions src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,13 @@ class Config:
SESSION_COOKIE_HTTPONLY = True
SESSION_COOKIE_SAMESITE = "Lax"

# LLM / Chat Assistant
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY", "")
ANTHROPIC_MODEL = os.environ.get("ANTHROPIC_MODEL", "claude-sonnet-4-5-20250929")
CHAT_MAX_TURNS = int(os.environ.get("CHAT_MAX_TURNS", "10"))
CHAT_TIMEOUT = int(os.environ.get("CHAT_TIMEOUT", "30"))
CHAT_MAX_CONTEXT_TOKENS = int(os.environ.get("CHAT_MAX_CONTEXT_TOKENS", "4000"))

# Logging configuration
LOG_LEVEL = os.environ.get("LOG_LEVEL", "INFO")
LOG_DIR = BASE_DIR / "logs"
Expand Down
37 changes: 37 additions & 0 deletions src/models.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""Database models for Oil Record Book Tool."""

import json
from datetime import datetime, timezone
from flask_sqlalchemy import SQLAlchemy
from flask_login import UserMixin
Expand Down Expand Up @@ -507,3 +508,39 @@ def to_dict(self) -> dict:
"end_date": self.end_date.isoformat() if self.end_date else None,
"created_at": self.created_at.isoformat(),
}


class ChatSession(db.Model):
"""LLM chat conversation session."""

__tablename__ = "chat_sessions"

id: int = db.Column(db.Integer, primary_key=True)
user_id: int = db.Column(db.Integer, db.ForeignKey("users.id"), nullable=False)
messages: str = db.Column(db.Text, nullable=False, default="[]")
created_at: datetime = db.Column(
db.DateTime, nullable=False, default=lambda: datetime.now(UTC)
)
updated_at: datetime = db.Column(
db.DateTime, nullable=False, default=lambda: datetime.now(UTC),
onupdate=lambda: datetime.now(UTC)
)

user = db.relationship("User")

def get_messages(self) -> list[dict]:
"""Deserialize messages JSON."""
return json.loads(self.messages) if self.messages else []

def set_messages(self, msgs: list[dict]) -> None:
"""Serialize messages to JSON."""
self.messages = json.dumps(msgs)

def to_dict(self) -> dict:
return {
"id": self.id,
"user_id": self.user_id,
"messages": self.get_messages(),
"created_at": self.created_at.isoformat(),
"updated_at": self.updated_at.isoformat(),
}
Empty file added src/prompts/__init__.py
Empty file.
Loading