Skip to content

3-Stage Knowledge Segmentation Pipeline for smriti share #14

@ashu17706

Description

@ashu17706

Overview

This branch implements a 3-stage prompt architecture for the smriti share command that intelligently segments sessions into distinct knowledge units, generates category-specific documentation, and exports team knowledge to .smriti/ directories.

Architecture Stages

Stage 1: Segment

  • Purpose: Analyze sessions and extract distinct knowledge units
  • Process: LLM analyzes session content, identifies topics, categories, and relevance scores
  • Metadata Injection: Tool usage, files modified, git operations, and errors are extracted and injected into prompts for better context
  • Output: KnowledgeUnit[] with categories, relevance (1-10), and entity tags

Stage 2: Document

  • Purpose: Generate polished markdown documentation for each unit
  • Process: Select category-specific templates and apply unit content
  • Categories Supported:
    • bug/* - Symptoms → Root Cause → Investigation → Fix → Prevention
    • architecture/* / decision/* - Context → Options → Decision → Consequences
    • code/* - Implementation → Key Decisions → Gotchas
    • feature/* - Requirements → Design → Implementation Notes
    • topic/* - Concept → Relevance → Examples → Resources
    • project/* - What Changed → Why → Steps → Verification
  • Output: Markdown files organized in .smriti/knowledge/<category>/

Stage 3: Defer

  • Purpose: Metadata enrichment (phase 2)
  • Future: Entity extraction, freshness detection, version tracking

Key Design Patterns

  1. Graceful Degradation: Stage 1 fails → fallback to single unit → Stage 2 still generates docs
  2. Category Validation: LLM suggestions validated against smriti_categories table
  3. Unit-Level Deduplication: Hash(content + category + entities) prevents re-sharing
  4. Sequential Processing: Units processed one-by-one (safety) not in parallel
  5. Template Flexibility: Checks .smriti/prompts/ first before using built-in templates

Implementation Details

Files Created

  • src/team/types.ts - Type definitions
  • src/team/segment.ts - Stage 1 segmentation logic
  • src/team/document.ts - Stage 2 documentation generation
  • src/team/prompts/stage1-segment.md - Segmentation prompt
  • src/team/prompts/stage2-*.md (7 templates) - Category-specific templates
  • test/team-segmented.test.ts - Comprehensive test suite (14 tests)

Files Modified

  • src/db.ts - Extended smriti_shares table with unit_id, relevance_score, entities
  • src/team/share.ts - Added shareSegmentedKnowledge() function + flag routing
  • src/index.ts - Added CLI flags: --segmented, --min-relevance

Usage

# Legacy (unchanged)
smriti share --project myapp

# New 3-stage pipeline
smriti share --project myapp --segmented

# With custom relevance threshold (default: 6/10)
smriti share --project myapp --segmented --min-relevance 7

Testing

  • 14 unit tests covering:
    • Graceful fallback logic
    • Unit validation and filtering
    • Relevance thresholding
    • Edge cases
  • All tests passing
  • Uses in-memory DB (no external dependencies)

Backward Compatibility

✅ No breaking changes - legacy smriti share behavior unchanged. New flags are optional.

Future Phases

  • Phase 2: Entity extraction, freshness detection, tech version tracking
  • Phase 3: Relationship graphs, contradiction detection, smriti conflicts command

Related Issues

Related to discussion of knowledge organization and team sharing workflows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions