jerry609 · jerry609 · Jan 5, 2026 · Jan 3, 2026 · Jan 3, 2026 · Jan 4, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -48,3 +48,10 @@ jobs:
           python evals/runners/run_track_pipeline_smoke.py
           python evals/runners/run_eventlog_replay_smoke.py
 
+      - name: Run memory range and acceptance testing
+        env:
+          PYTHONPATH: src
+        run: |
+          python evals/memory/test_deletion_compliance.py
+          python evals/memory/test_retrieval_hit_rate.py
+
diff --git a/docs/P0_memory_scope_acceptance_design.md b/docs/P0_memory_scope_acceptance_design.md
diff --git a/docs/memory_isolation.md b/docs/memory_isolation.md
@@ -0,0 +1,144 @@
+# Memory Namespace and Isolation Strategy
+
+This document describes the isolation boundaries and query resolution rules implemented in the memory system.
+
+> **Reference Implementation:** `src/paperbot/infrastructure/stores/memory_store.py`
+
+## Isolation Hierarchy
+
+Memory items are isolated using a three-level namespace hierarchy, as defined in `memory_todo.md`:
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      Level 1: user_id (required)                │
+│  ┌───────────────────────────────────────────────────────────┐  │
+│  │              Level 2: workspace_id (optional)             │  │
+│  │  ┌─────────────────────────────────────────────────────┐  │  │
+│  │  │        Level 3: scope_type:scope_id (optional)      │  │  │
+│  │  └─────────────────────────────────────────────────────┘  │  │
+│  └───────────────────────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Level 1: user_id (Required)
+
+The primary isolation boundary. Cross-user access is forbidden.
+
+**Implementation:** Every query in `memory_store.py` starts with:
+```python
+stmt = select(MemoryItemModel).where(MemoryItemModel.user_id == user_id)
+```
+
+## Level 2: workspace_id (Optional)
+
+Team or project-level isolation within a user's memories.
+
+**Implementation:** `memory_store.py` line 293-294:
+```python
+if workspace_id is not None:
+    stmt = stmt.where(MemoryItemModel.workspace_id == workspace_id)
+```
+
+**Behavior:**
+- If `workspace_id` is set on a memory, it is only visible when querying with the same `workspace_id`
+- Memories with `workspace_id=NULL` are user-global
+
+## Level 3: scope_type:scope_id (Optional)
+
+Fine-grained isolation for research tracks, projects, or papers.
+
+**Implementation:** `memory_store.py` lines 296-301:
+```python
+if scope_type is not None:
+    if scope_type == "global":
+        stmt = stmt.where(or_(MemoryItemModel.scope_type == scope_type, MemoryItemModel.scope_type.is_(None)))
+    else:
+        stmt = stmt.where(MemoryItemModel.scope_type == scope_type)
+if scope_id is not None:
+    stmt = stmt.where(MemoryItemModel.scope_id == scope_id)
+```
+
+**Scope Types:**
+
+| scope_type | scope_id | Usage |
+|------------|----------|-------|
+| `global` | NULL | User-wide memories, visible across all tracks |
+| `track` | track_id | Research direction specific |
+| `project` | project_id | Project-specific context |
+| `paper` | paper_id | Paper-specific notes |
+
+## Content Hash Deduplication
+
+To prevent duplicates while allowing the same content in different scopes, the content hash includes scope information.
+
+**Implementation:** `memory_store.py` lines 168-171:
+```python
+# Dedup within a user's scope boundary (prevents cross-track pollution/dedup collisions).
+content_hash = _sha256_text(
+    f"{effective_scope_type}:{effective_scope_id or ''}:{m.kind}:{content}"
+)
+```
+
+## Status Filtering
+
+Queries only return approved, non-deleted, non-expired items.
+
+**Implementation:** `memory_store.py` lines 303-305:
+```python
+stmt = stmt.where(MemoryItemModel.deleted_at.is_(None))
+stmt = stmt.where(MemoryItemModel.status == "approved")
+stmt = stmt.where(or_(MemoryItemModel.expires_at.is_(None), MemoryItemModel.expires_at > now))
+```
+
+## Auto-Status Assignment
+
+Status is automatically set based on confidence and PII risk.
+
+**Implementation:** `memory_store.py` lines 174-178:
+```python
+effective_status = "approved" if float(m.confidence) >= 0.60 else "pending"
+pii_risk = _estimate_pii_risk(content)
+if pii_risk >= 2 and effective_status == "approved":
+    effective_status = "pending"
+```
+
+**Rules:**
+- `confidence >= 0.60` → `approved`
+- `confidence < 0.60` → `pending`
+- `pii_risk >= 2` → override to `pending` (requires human review)
+
+## Provider as Metadata
+
+The source platform (ChatGPT/Gemini/Claude) is stored in `memory_sources.platform`, not in `memory_items`.
+
+**Rationale:** From `memory_survey.md` section 6.1:
+> 建议拆成三张逻辑表：
+> - `memory_sources`：导入批次（平台、文件名、sha256、统计信息）
+> - `memory_items`：稳定条目
+
+Provider is for provenance tracking, not query filtering.
+
+## Audit Trail
+
+All mutations are logged in `memory_audit_log`.
+
+**Implementation:** `memory_store.py` lines 205-220:
+```python
+session.add(
+    MemoryAuditLogModel(
+        ts=now,
+        actor_id=actor_id,
+        user_id=user_id,
+        workspace_id=workspace_id,
+        action="create",
+        item_id=row.id,
+        ...
+    )
+)
+```
+
+## References
+
+- Requirements: `docs/memory_todo.md` P0 section
+- Survey: `docs/memory_survey.md` section 6
+- Implementation: `src/paperbot/infrastructure/stores/memory_store.py`
diff --git a/docs/memory_types.md b/docs/memory_types.md
@@ -0,0 +1,141 @@
+# Memory Type Taxonomy
+
+This document defines the formal taxonomy for memory types in PaperBot's cross-platform memory middleware.
+
+## Overview
+
+Memory items are categorized into three main categories based on their purpose and lifecycle:
+
+1. **User Memory** - Stable, cross-session information about the user
+2. **Episodic Memory** - Session-derived facts with potential decay
+3. **Workspace/Project Memory** - Scope-specific context
+
+## Memory Type Definitions
+
+### User Memory
+
+Long-term, stable information about the user that persists across sessions.
+
+| Kind | Definition | Persistence | Confidence Threshold | Examples |
+|------|------------|-------------|---------------------|----------|
+| `profile` | Identity facts: name, role, affiliation, background | Permanent until edited | 0.85 | "My name is Jerry", "I'm a PhD student at MIT" |
+| `preference` | Style, format, or interaction preferences | Semi-permanent | 0.72 | "I prefer concise answers", "Use Chinese for responses" |
+| `goal` | Long-term objectives or research directions | Session-spanning | 0.68 | "I'm researching LLM memory systems", "Working on a paper about RAG" |
+| `constraint` | Hard rules or requirements that must be followed | Permanent | 0.62 | "Never include code examples", "Always cite sources" |
+| `project` | Project-level context and background | Project-scoped | 0.70 | "PaperBot is a research workflow tool", "The codebase uses FastAPI" |
+
+### Episodic Memory
+
+Information derived from specific sessions or interactions, subject to decay or update.
+
+| Kind | Definition | Persistence | Confidence Threshold | Examples |
+|------|------------|-------------|---------------------|----------|
+| `fact` | Specific facts mentioned during interaction | Decaying (use_count tracked) | 0.65 | "User's deadline is Friday", "Paper was submitted to NeurIPS" |
+| `decision` | Key decisions made during sessions | Project-scoped | 0.70 | "We chose SQLite over Postgres", "Using Docker for sandboxing" |
+| `hypothesis` | Tentative assumptions or inferences | Low-priority, may expire | 0.50 | "User may be a graduate student", "Likely interested in NLP" |
+| `todo` | Action items or pending tasks | Time-bound (expires_at) | 0.58 | "Need to implement FTS5", "Review the memory module" |
+
+### Workspace/Project Memory
+
+Scope-specific information tied to a particular workspace or research track.
+
+| Kind | Definition | Persistence | Confidence Threshold | Examples |
+|------|------------|-------------|---------------------|----------|
+| `keyword_set` | Domain-specific keywords or focus areas | Project-scoped | 0.75 | "Focus areas: RAG, memory, multi-agent" |
+| `note` | Free-form annotations or observations | Project-scoped | 0.60 | "Important: check GDPR compliance", "Consider using vector DB" |
+
+## What is NOT Memory
+
+The following are explicitly **excluded** from the memory system:
+
+| Item | Reason | Where It Belongs |
+|------|--------|------------------|
+| **Context Caching** | Performance optimization, not long-term storage | LLM provider layer (prompt caching) |
+| **Code Index** | Repository structure, symbols, dependencies | `context_engine/` or `code_index` subsystem |
+| **Search Results** | Ephemeral retrieval results | Not persisted; used immediately |
+| **Embeddings** | Vector representations of content | Separate `memory_embeddings` table (future) |
+| **Raw Conversation History** | Unprocessed message logs | `memory_sources` table (input, not output) |
+| **Temporary Variables** | Session-only working memory | In-memory only, not persisted |
+
+## Status Workflow
+
+Memory items follow a status workflow based on confidence and risk:
+
+```
+Extracted (MemoryCandidate)
+         │
+         ├── confidence >= 0.60 AND pii_risk < 2
+         │         │
+         │         ▼
+         │   ┌─────────────┐
+         │   │  approved   │ ──── Can be retrieved and injected
+         │   └─────────────┘
+         │
+         └── confidence < 0.60 OR pii_risk >= 2
+                   │
+                   ▼
+             ┌─────────────┐
+             │   pending   │ ──── Requires human review
+             └─────────────┘
+                   │
+         ┌─────────┴─────────┐
+         │                   │
+         ▼                   ▼
+   ┌─────────────┐    ┌─────────────┐
+   │  approved   │    │  rejected   │
+   └─────────────┘    └─────────────┘
+```
+
+## Confidence Score Guidelines
+
+When implementing extractors, use these confidence score ranges:
+
+| Score Range | Meaning | Action |
+|-------------|---------|--------|
+| 0.85 - 1.00 | High confidence, explicit statement | Auto-approve |
+| 0.70 - 0.84 | Good confidence, clear inference | Auto-approve |
+| 0.60 - 0.69 | Moderate confidence | Auto-approve (borderline) |
+| 0.50 - 0.59 | Low confidence | Pending (needs review) |
+| 0.00 - 0.49 | Very low confidence | Pending or reject |
+
+## PII Risk Levels
+
+| Level | Description | Behavior |
+|-------|-------------|----------|
+| 0 | No PII detected | Normal processing |
+| 1 | Possible PII (redacted patterns found) | Normal, but flagged |
+| 2 | High PII risk (raw PII detected) | Auto-pending, requires review |
+
+## Usage in Code
+
+```python
+from paperbot.memory.schema import MemoryKind, MemoryCandidate
+
+# Valid memory kinds
+valid_kinds = [
+    # User Memory
+    "profile", "preference", "goal", "constraint", "project",
+    # Episodic Memory
+    "fact", "decision", "hypothesis", "todo",
+    # Workspace Memory
+    "keyword_set", "note"
+]
+
+# Creating a memory candidate
+candidate = MemoryCandidate(
+    kind="preference",
+    content="User prefers concise answers",
+    confidence=0.75,
+    tags=["style", "output"],
+    evidence="From conversation: 'Please keep responses brief'",
+    scope_type="global",
+    scope_id=None,
+    status="approved"  # Will be set by store based on confidence
+)
+```
+
+## References
+
+- Design Document: `docs/P0_memory_scope_acceptance_design.md`
+- Schema Definition: `src/paperbot/memory/schema.py`
+- Store Implementation: `src/paperbot/infrastructure/stores/memory_store.py`
diff --git a/evals/memory/__init__.py b/evals/memory/__init__.py
@@ -0,0 +1,7 @@
+"""
+Memory evaluation tests for Scope and Acceptance criteria.
+
+Test files:
+- test_deletion_compliance.py: Verify deleted items never retrieved
+- test_retrieval_hit_rate.py: Verify relevant memories are found
+"""
diff --git a/evals/memory/fixtures/retrieval_queries.json b/evals/memory/fixtures/retrieval_queries.json
@@ -0,0 +1,35 @@
+{
+  "description": "Synthetic test queries for retrieval hit rate evaluation",
+  "test_cases": [
+    {
+      "id": "pref_output_style",
+      "query": "output format preference",
+      "expected_kinds": ["preference"],
+      "expected_keywords": ["concise", "brief", "format"]
+    },
+    {
+      "id": "project_context",
+      "query": "current project",
+      "expected_kinds": ["project"],
+      "expected_keywords": ["project", "working on", "building"]
+    },
+    {
+      "id": "user_identity",
+      "query": "user name role",
+      "expected_kinds": ["profile"],
+      "expected_keywords": ["name", "role", "position"]
+    },
+    {
+      "id": "constraints",
+      "query": "rules restrictions",
+      "expected_kinds": ["constraint"],
+      "expected_keywords": ["never", "always", "must", "don't"]
+    },
+    {
+      "id": "goals",
+      "query": "goals objectives",
+      "expected_kinds": ["goal"],
+      "expected_keywords": ["goal", "want", "trying to", "working towards"]
+    }
+  ]
+}
diff --git a/evals/memory/fixtures/sample_memories.json b/evals/memory/fixtures/sample_memories.json
@@ -0,0 +1,42 @@
+{
+  "description": "Sample memories for testing extraction and retrieval",
+  "user_id": "test_user",
+  "memories": [
+    {
+      "kind": "profile",
+      "content": "User's name is Jerry, a PhD student at MIT",
+      "confidence": 0.85,
+      "tags": ["identity", "name", "role"]
+    },
+    {
+      "kind": "preference",
+      "content": "User prefers concise, brief responses without excessive explanation",
+      "confidence": 0.75,
+      "tags": ["style", "output", "format"]
+    },
+    {
+      "kind": "project",
+      "content": "Currently working on PaperBot, a research workflow tool",
+      "confidence": 0.80,
+      "tags": ["project", "paperbot"]
+    },
+    {
+      "kind": "constraint",
+      "content": "Never include code examples unless explicitly requested",
+      "confidence": 0.70,
+      "tags": ["rules", "code"]
+    },
+    {
+      "kind": "goal",
+      "content": "Working towards building a memory middleware for LLM applications",
+      "confidence": 0.72,
+      "tags": ["goal", "memory", "llm"]
+    },
+    {
+      "kind": "fact",
+      "content": "Project deadline is end of Q1 2025",
+      "confidence": 0.65,
+      "tags": ["deadline", "timeline"]
+    }
+  ]
+}