-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
enhancementNew feature or requestNew feature or requestphase-5Phase 5: Telemetry & validationPhase 5: Telemetry & validation
Description
What
A comprehensive testing and benchmarking plan that validates Smriti against real-world usage scenarios: large databases, concurrent access, cross-agent queries, and performance under load.
Why
Unit tests verify correctness in isolation, but real usage involves hundreds of sessions, thousands of messages, multiple agents writing simultaneously, and databases that grow over months. We need to validate performance doesn't degrade and structured data stays consistent at scale.
Tasks
Correctness Testing
- Round-trip fidelity: ingest → search → recall → share produces accurate, complete results
- Cross-agent dedup: same session referenced by multiple agents doesn't create duplicates
- Sidecar consistency: every tool_call block has a matching `smriti_tool_usage` row
- Category integrity: hierarchical categories maintain parent-child relationships after bulk operations
- Share/sync round-trip: `smriti share` → `smriti sync` on another machine restores all metadata
Performance Benchmarks
- Ingestion throughput: time to ingest 100/500/1000 sessions
- Search latency: FTS query time at 1k/10k/50k messages (target: < 50ms at 10k)
- Vector search latency: embedding search at 1k/10k vectors (target: < 200ms at 10k)
- Sidecar query speed: analytics queries on sidecar tables at scale
- Database size: measure SQLite file size at 1k/10k/50k messages
- Memory usage: peak RSS during ingestion of large sessions (target: < 256MB)
- Watch daemon overhead: CPU/memory when idle vs during active session
Stress Testing
- Large session files: JSONL files > 50MB (long coding sessions)
- Many small sessions: 1000+ sessions with < 10 messages each
- Concurrent ingestion: two agents writing to DB simultaneously
- Corrupt data handling: malformed JSONL, truncated files, missing fields
- Disk space: behavior when SQLite DB approaches filesystem limits
Security Testing
- Secret detection coverage: test against curated list of real secret patterns
- Redaction completeness: no secrets survive ingestion → search → share pipeline
- Path traversal: crafted file paths in tool calls don't escape expected directories
- SQL injection: category names, project IDs with special characters
Files
- `test/benchmark.test.ts` — new Performance benchmarks
- `test/stress.test.ts` — new Stress and edge case tests
- `test/security.test.ts` — new Security validation tests
- `test/e2e.test.ts` — new End-to-end round-trip tests
- `test/fixtures/large/` — new Large synthetic test data
- `scripts/generate-fixtures.ts` — new Test data generator
Acceptance Criteria
- All correctness tests pass on a clean install
- Ingestion throughput: ≥ 50 sessions/second
- FTS search: < 50ms at 10k messages
- Vector search: < 200ms at 10k vectors
- No memory leaks during 1-hour watch daemon run
- Zero secrets survive the full pipeline in security tests
- Corrupt/malformed input produces clear error messages, never crashes
Real User Testing Plan
| Scenario | What to Measure | Risk if Untested |
|---|---|---|
| Fresh install + first ingest | Time-to-first-search, error messages | Bad first impression |
| 500+ sessions accumulated | Search latency, DB size, `smriti status` accuracy | Performance cliff |
| Multi-project workspace | Project ID derivation accuracy, cross-project search | Wrong project attribution |
| Team sharing (2+ developers) | Sync conflicts, dedup accuracy, content hash stability | Duplicate/lost knowledge |
| Long-running session (4+ hours) | Memory during ingest, block count accuracy, cost tracking | OOM or missed data |
| Rapid session creation | Watch daemon debouncing, no duplicate ingestion | Double-counting |
| Agent switch mid-task | Cross-agent file operation tracking, timeline accuracy | Gaps in activity log |
Testing
bun test test/benchmark.test.ts # Performance benchmarks
bun test test/stress.test.ts # Stress tests
bun test test/security.test.ts # Security validation
bun test test/e2e.test.ts # End-to-end round-trips
bun run scripts/generate-fixtures.ts # Generate large test dataReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestphase-5Phase 5: Telemetry & validationPhase 5: Telemetry & validation