Skip to content

Structured Memory Pipeline — Full Roadmap #12

@ashu17706

Description

@ashu17706

What

Transform Smriti from flat text ingestion to a structured, queryable memory pipeline — where every tool call, file edit, git operation, error, and thinking block is parsed, typed, stored in sidecar tables, and available for analytics, search, and team sharing.

Why

Currently Smriti drops 80%+ of the structured data in AI coding sessions. A Claude Code transcript contains tool calls with typed inputs, file diffs, command outputs, git operations, token costs, and thinking blocks — but the flat text parser reduces all of this to a single string. This means:

  • No file tracking: Can't answer "what files did I edit this week?"
  • No error analysis: Can't find sessions where builds failed or tests broke
  • No cost visibility: No token/cost tracking across sessions or projects
  • No git correlation: Can't link sessions to commits, branches, or PRs
  • No cross-agent view: Different agents (Claude, Cline, Aider) can't share a unified memory
  • No security layer: Secrets in sessions get shared without redaction

This roadmap addresses all of these gaps across 5 phases.

Sub-Issues

Phase Overview

Phase Deliverable Status
Phase 1 Enriched Claude Code Parser (#5) Done — 13 block types, 6 sidecar tables, 142 tests
Phase 2 Cline + Aider Parsers (#6) Planned
Phase 3 Watch Daemon (#7) + Search & Analytics (#8) Planned
Phase 4 Secret Redaction & Policy (#9) Planned
Phase 5 Telemetry (#10) + Testing & Perf (#11) Planned

Storage Inventory

Complete map of every data type, where it lives, and whether it's indexed:

Data Source Table Key Columns Indexed?
Session text (FTS) All agents memory_fts (QMD) content FTS5 full-text
Session metadata Ingestion smriti_session_meta session_id, agent_id, project_id Yes (agent, project)
Project registry Path derivation smriti_projects id, path, description PK
Agent registry Seed data smriti_agents id, parser, log_pattern PK
Tool usage Block extraction smriti_tool_usage message_id, tool_name, success, duration_ms Yes (session, tool_name)
File operations Block extraction smriti_file_operations message_id, operation, file_path, project_id Yes (session, path)
Commands Block extraction smriti_commands message_id, command, exit_code, is_git Yes (session, is_git)
Git operations Block extraction smriti_git_operations message_id, operation, branch, pr_url Yes (session, operation)
Errors Block extraction smriti_errors message_id, error_type, message Yes (session, type)
Token costs Metadata accumulation smriti_session_costs session_id, model, input/output/cache tokens, cost PK
Category tags (session) Categorization smriti_session_tags session_id, category_id, confidence, source Yes (category)
Category tags (message) Categorization smriti_message_tags message_id, category_id, confidence, source Yes (category)
Category taxonomy Seed data smriti_categories id, name, parent_id PK
Share tracking Team sharing smriti_shares session_id, content_hash, author Yes (hash)
Vector embeddings smriti embed content_vectors + vectors_vec (QMD) content_hash, embedding Virtual table
Telemetry events Opt-in collection ~/.smriti/telemetry.json timestamp, event, data N/A (JSONL file)
Structured blocks Block extraction memory_messages.metadata.blocks (JSON) MessageBlock[] No (JSON blob)
Message metadata Parsing memory_messages.metadata (JSON) cwd, gitBranch, model, tokenUsage No (JSON blob)

Block Type Reference

The 13 MessageBlock types extracted during ingestion:

Block Type Fields Stored In
text text FTS (via plainText)
thinking thinking, budgetTokens JSON blob only
tool_call toolId, toolName, input smriti_tool_usage
tool_result toolId, success, output, error, durationMs Updates tool_usage success
file_op operation, path, diff, pattern smriti_file_operations
command command, cwd, exitCode, stdout, stderr, isGit smriti_commands
search searchType, pattern, path, url, resultCount JSON blob only
git operation, branch, message, files, prUrl, prNumber smriti_git_operations
error errorType, message, retryable smriti_errors
image mediaType, path, dataHash JSON blob only
code language, code, filePath, lineStart JSON blob only
system_event eventType, data Cost accumulation
control controlType, command JSON blob only

Real User Testing Plan

Scenario What to Measure Risk if Untested
Fresh install + first ingest Time-to-first-search, error quality Bad first impression, confusing errors
500+ sessions accumulated Search latency, DB file size, smriti status accuracy Performance cliff after months of use
Multi-project workspace Project ID derivation accuracy, cross-project search Wrong project attribution for sessions
Team sharing (2+ devs) Sync conflicts, dedup accuracy, content hash stability Duplicate or lost knowledge articles
Long-running session (4+ hrs) Memory during ingest, block count accuracy, cost tracking OOM or missed data at end of session
Rapid session creation Watch daemon debouncing, no duplicate ingestion Double-counting sessions
Agent switch mid-task Cross-agent file tracking, unified timeline Gaps in activity log
Secret in session Detection rate, redaction completeness, share blocking Leaked credentials in .smriti/
Large JSONL file (50MB+) Parse time, memory usage, incremental ingest Crash or multi-minute ingest
Corrupt/truncated files Error messages, graceful skip, no data loss Silent data corruption

Configuration Reference

Env Var Default Phase Description
QMD_DB_PATH ~/.cache/qmd/index.sqlite Database path
CLAUDE_LOGS_DIR ~/.claude/projects 1 Claude Code logs
CODEX_LOGS_DIR ~/.codex Codex CLI logs
SMRITI_PROJECTS_ROOT ~/zero8.dev 1 Projects root for ID derivation
OLLAMA_HOST http://127.0.0.1:11434 Ollama endpoint
QMD_MEMORY_MODEL qwen3:8b-tuned Ollama model for synthesis
SMRITI_CLASSIFY_THRESHOLD 0.5 LLM classification trigger
SMRITI_AUTHOR $USER Git author for team sharing
SMRITI_WATCH_DEBOUNCE_MS 2000 3 Watch daemon debounce interval
SMRITI_TELEMETRY 0 5 Enable telemetry collection

Current State

Phase 1 is complete:

  • 13 structured block types defined in src/ingest/types.ts
  • Block extraction engine in src/ingest/blocks.ts
  • Enriched Claude parser in src/ingest/claude.ts
  • 6 sidecar tables in src/db.ts with indexes and insert helpers
  • 142 tests passing, 415 expect() calls across 9 test files

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestepicEpic / parent issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions