Skip to content

Real User Testing & Performance Validation #11

@ashu17706

Description

@ashu17706

What

A comprehensive testing and benchmarking plan that validates Smriti against real-world usage scenarios: large databases, concurrent access, cross-agent queries, and performance under load.

Why

Unit tests verify correctness in isolation, but real usage involves hundreds of sessions, thousands of messages, multiple agents writing simultaneously, and databases that grow over months. We need to validate performance doesn't degrade and structured data stays consistent at scale.

Tasks

Correctness Testing

  • Round-trip fidelity: ingest → search → recall → share produces accurate, complete results
  • Cross-agent dedup: same session referenced by multiple agents doesn't create duplicates
  • Sidecar consistency: every tool_call block has a matching `smriti_tool_usage` row
  • Category integrity: hierarchical categories maintain parent-child relationships after bulk operations
  • Share/sync round-trip: `smriti share` → `smriti sync` on another machine restores all metadata

Performance Benchmarks

  • Ingestion throughput: time to ingest 100/500/1000 sessions
  • Search latency: FTS query time at 1k/10k/50k messages (target: < 50ms at 10k)
  • Vector search latency: embedding search at 1k/10k vectors (target: < 200ms at 10k)
  • Sidecar query speed: analytics queries on sidecar tables at scale
  • Database size: measure SQLite file size at 1k/10k/50k messages
  • Memory usage: peak RSS during ingestion of large sessions (target: < 256MB)
  • Watch daemon overhead: CPU/memory when idle vs during active session

Stress Testing

  • Large session files: JSONL files > 50MB (long coding sessions)
  • Many small sessions: 1000+ sessions with < 10 messages each
  • Concurrent ingestion: two agents writing to DB simultaneously
  • Corrupt data handling: malformed JSONL, truncated files, missing fields
  • Disk space: behavior when SQLite DB approaches filesystem limits

Security Testing

  • Secret detection coverage: test against curated list of real secret patterns
  • Redaction completeness: no secrets survive ingestion → search → share pipeline
  • Path traversal: crafted file paths in tool calls don't escape expected directories
  • SQL injection: category names, project IDs with special characters

Files

  • `test/benchmark.test.ts` — new Performance benchmarks
  • `test/stress.test.ts` — new Stress and edge case tests
  • `test/security.test.ts` — new Security validation tests
  • `test/e2e.test.ts` — new End-to-end round-trip tests
  • `test/fixtures/large/` — new Large synthetic test data
  • `scripts/generate-fixtures.ts` — new Test data generator

Acceptance Criteria

  • All correctness tests pass on a clean install
  • Ingestion throughput: ≥ 50 sessions/second
  • FTS search: < 50ms at 10k messages
  • Vector search: < 200ms at 10k vectors
  • No memory leaks during 1-hour watch daemon run
  • Zero secrets survive the full pipeline in security tests
  • Corrupt/malformed input produces clear error messages, never crashes

Real User Testing Plan

Scenario What to Measure Risk if Untested
Fresh install + first ingest Time-to-first-search, error messages Bad first impression
500+ sessions accumulated Search latency, DB size, `smriti status` accuracy Performance cliff
Multi-project workspace Project ID derivation accuracy, cross-project search Wrong project attribution
Team sharing (2+ developers) Sync conflicts, dedup accuracy, content hash stability Duplicate/lost knowledge
Long-running session (4+ hours) Memory during ingest, block count accuracy, cost tracking OOM or missed data
Rapid session creation Watch daemon debouncing, no duplicate ingestion Double-counting
Agent switch mid-task Cross-agent file operation tracking, timeline accuracy Gaps in activity log

Testing

bun test test/benchmark.test.ts   # Performance benchmarks
bun test test/stress.test.ts      # Stress tests
bun test test/security.test.ts    # Security validation
bun test test/e2e.test.ts         # End-to-end round-trips
bun run scripts/generate-fixtures.ts  # Generate large test data

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestphase-5Phase 5: Telemetry & validation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions