Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,10 @@ jobs:
python evals/runners/run_track_pipeline_smoke.py
python evals/runners/run_eventlog_replay_smoke.py

- name: Run memory range and acceptance testing
env:
PYTHONPATH: src
run: |
python evals/memory/test_deletion_compliance.py
python evals/memory/test_retrieval_hit_rate.py

739 changes: 739 additions & 0 deletions docs/P0_memory_scope_acceptance_design.md

Large diffs are not rendered by default.

144 changes: 144 additions & 0 deletions docs/memory_isolation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Memory Namespace and Isolation Strategy

This document describes the isolation boundaries and query resolution rules implemented in the memory system.

> **Reference Implementation:** `src/paperbot/infrastructure/stores/memory_store.py`

## Isolation Hierarchy

Memory items are isolated using a three-level namespace hierarchy, as defined in `memory_todo.md`:

```
┌─────────────────────────────────────────────────────────────────┐
│ Level 1: user_id (required) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Level 2: workspace_id (optional) │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ Level 3: scope_type:scope_id (optional) │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```

## Level 1: user_id (Required)

The primary isolation boundary. Cross-user access is forbidden.

**Implementation:** Every query in `memory_store.py` starts with:
```python
stmt = select(MemoryItemModel).where(MemoryItemModel.user_id == user_id)
```

## Level 2: workspace_id (Optional)

Team or project-level isolation within a user's memories.

**Implementation:** `memory_store.py` line 293-294:
```python
if workspace_id is not None:
stmt = stmt.where(MemoryItemModel.workspace_id == workspace_id)
```

**Behavior:**
- If `workspace_id` is set on a memory, it is only visible when querying with the same `workspace_id`
- Memories with `workspace_id=NULL` are user-global

## Level 3: scope_type:scope_id (Optional)

Fine-grained isolation for research tracks, projects, or papers.

**Implementation:** `memory_store.py` lines 296-301:
```python
if scope_type is not None:
if scope_type == "global":
stmt = stmt.where(or_(MemoryItemModel.scope_type == scope_type, MemoryItemModel.scope_type.is_(None)))
else:
stmt = stmt.where(MemoryItemModel.scope_type == scope_type)
if scope_id is not None:
stmt = stmt.where(MemoryItemModel.scope_id == scope_id)
```
Comment on lines +50 to +59

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a discrepancy between the query logic described in the main design document (P0_memory_scope_acceptance_design.md) and the implementation snippet shown here. The design doc states that queries for a specific scope (e.g., track) should also include global items. However, the snippet here (and the actual implementation) only filters for the specific scope_type, excluding global items. This could lead to unexpected behavior where global memories are not retrieved when working within a specific track. The documentation and implementation should be aligned with the intended design.


**Scope Types:**

| scope_type | scope_id | Usage |
|------------|----------|-------|
| `global` | NULL | User-wide memories, visible across all tracks |
| `track` | track_id | Research direction specific |
| `project` | project_id | Project-specific context |
| `paper` | paper_id | Paper-specific notes |

## Content Hash Deduplication

To prevent duplicates while allowing the same content in different scopes, the content hash includes scope information.

**Implementation:** `memory_store.py` lines 168-171:
```python
# Dedup within a user's scope boundary (prevents cross-track pollution/dedup collisions).
content_hash = _sha256_text(
f"{effective_scope_type}:{effective_scope_id or ''}:{m.kind}:{content}"
)
```

## Status Filtering

Queries only return approved, non-deleted, non-expired items.

**Implementation:** `memory_store.py` lines 303-305:
```python
stmt = stmt.where(MemoryItemModel.deleted_at.is_(None))
stmt = stmt.where(MemoryItemModel.status == "approved")
stmt = stmt.where(or_(MemoryItemModel.expires_at.is_(None), MemoryItemModel.expires_at > now))
```

## Auto-Status Assignment

Status is automatically set based on confidence and PII risk.

**Implementation:** `memory_store.py` lines 174-178:
```python
effective_status = "approved" if float(m.confidence) >= 0.60 else "pending"
pii_risk = _estimate_pii_risk(content)
if pii_risk >= 2 and effective_status == "approved":
effective_status = "pending"
```

**Rules:**
- `confidence >= 0.60` → `approved`
- `confidence < 0.60` → `pending`
- `pii_risk >= 2` → override to `pending` (requires human review)

## Provider as Metadata

The source platform (ChatGPT/Gemini/Claude) is stored in `memory_sources.platform`, not in `memory_items`.

**Rationale:** From `memory_survey.md` section 6.1:
> 建议拆成三张逻辑表:
> - `memory_sources`:导入批次(平台、文件名、sha256、统计信息)
> - `memory_items`:稳定条目

Provider is for provenance tracking, not query filtering.

## Audit Trail

All mutations are logged in `memory_audit_log`.

**Implementation:** `memory_store.py` lines 205-220:
```python
session.add(
MemoryAuditLogModel(
ts=now,
actor_id=actor_id,
user_id=user_id,
workspace_id=workspace_id,
action="create",
item_id=row.id,
...
)
)
```

## References

- Requirements: `docs/memory_todo.md` P0 section
- Survey: `docs/memory_survey.md` section 6
- Implementation: `src/paperbot/infrastructure/stores/memory_store.py`
141 changes: 141 additions & 0 deletions docs/memory_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Memory Type Taxonomy

This document defines the formal taxonomy for memory types in PaperBot's cross-platform memory middleware.

## Overview

Memory items are categorized into three main categories based on their purpose and lifecycle:

1. **User Memory** - Stable, cross-session information about the user
2. **Episodic Memory** - Session-derived facts with potential decay
3. **Workspace/Project Memory** - Scope-specific context

## Memory Type Definitions

### User Memory

Long-term, stable information about the user that persists across sessions.

| Kind | Definition | Persistence | Confidence Threshold | Examples |
|------|------------|-------------|---------------------|----------|
| `profile` | Identity facts: name, role, affiliation, background | Permanent until edited | 0.85 | "My name is Jerry", "I'm a PhD student at MIT" |
| `preference` | Style, format, or interaction preferences | Semi-permanent | 0.72 | "I prefer concise answers", "Use Chinese for responses" |
| `goal` | Long-term objectives or research directions | Session-spanning | 0.68 | "I'm researching LLM memory systems", "Working on a paper about RAG" |
| `constraint` | Hard rules or requirements that must be followed | Permanent | 0.62 | "Never include code examples", "Always cite sources" |
| `project` | Project-level context and background | Project-scoped | 0.70 | "PaperBot is a research workflow tool", "The codebase uses FastAPI" |

### Episodic Memory

Information derived from specific sessions or interactions, subject to decay or update.

| Kind | Definition | Persistence | Confidence Threshold | Examples |
|------|------------|-------------|---------------------|----------|
| `fact` | Specific facts mentioned during interaction | Decaying (use_count tracked) | 0.65 | "User's deadline is Friday", "Paper was submitted to NeurIPS" |
| `decision` | Key decisions made during sessions | Project-scoped | 0.70 | "We chose SQLite over Postgres", "Using Docker for sandboxing" |
| `hypothesis` | Tentative assumptions or inferences | Low-priority, may expire | 0.50 | "User may be a graduate student", "Likely interested in NLP" |
| `todo` | Action items or pending tasks | Time-bound (expires_at) | 0.58 | "Need to implement FTS5", "Review the memory module" |

### Workspace/Project Memory

Scope-specific information tied to a particular workspace or research track.

| Kind | Definition | Persistence | Confidence Threshold | Examples |
|------|------------|-------------|---------------------|----------|
| `keyword_set` | Domain-specific keywords or focus areas | Project-scoped | 0.75 | "Focus areas: RAG, memory, multi-agent" |
| `note` | Free-form annotations or observations | Project-scoped | 0.60 | "Important: check GDPR compliance", "Consider using vector DB" |

## What is NOT Memory

The following are explicitly **excluded** from the memory system:

| Item | Reason | Where It Belongs |
|------|--------|------------------|
| **Context Caching** | Performance optimization, not long-term storage | LLM provider layer (prompt caching) |
| **Code Index** | Repository structure, symbols, dependencies | `context_engine/` or `code_index` subsystem |
| **Search Results** | Ephemeral retrieval results | Not persisted; used immediately |
| **Embeddings** | Vector representations of content | Separate `memory_embeddings` table (future) |
| **Raw Conversation History** | Unprocessed message logs | `memory_sources` table (input, not output) |
| **Temporary Variables** | Session-only working memory | In-memory only, not persisted |

## Status Workflow

Memory items follow a status workflow based on confidence and risk:

```
Extracted (MemoryCandidate)
├── confidence >= 0.60 AND pii_risk < 2
│ │
│ ▼
│ ┌─────────────┐
│ │ approved │ ──── Can be retrieved and injected
│ └─────────────┘
└── confidence < 0.60 OR pii_risk >= 2
┌─────────────┐
│ pending │ ──── Requires human review
└─────────────┘
┌─────────┴─────────┐
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ approved │ │ rejected │
└─────────────┘ └─────────────┘
```

## Confidence Score Guidelines

When implementing extractors, use these confidence score ranges:

| Score Range | Meaning | Action |
|-------------|---------|--------|
| 0.85 - 1.00 | High confidence, explicit statement | Auto-approve |
| 0.70 - 0.84 | Good confidence, clear inference | Auto-approve |
| 0.60 - 0.69 | Moderate confidence | Auto-approve (borderline) |
| 0.50 - 0.59 | Low confidence | Pending (needs review) |
| 0.00 - 0.49 | Very low confidence | Pending or reject |

## PII Risk Levels

| Level | Description | Behavior |
|-------|-------------|----------|
| 0 | No PII detected | Normal processing |
| 1 | Possible PII (redacted patterns found) | Normal, but flagged |
| 2 | High PII risk (raw PII detected) | Auto-pending, requires review |

## Usage in Code

```python
from paperbot.memory.schema import MemoryKind, MemoryCandidate

# Valid memory kinds
valid_kinds = [
# User Memory
"profile", "preference", "goal", "constraint", "project",
# Episodic Memory
"fact", "decision", "hypothesis", "todo",
# Workspace Memory
"keyword_set", "note"
]

# Creating a memory candidate
candidate = MemoryCandidate(
kind="preference",
content="User prefers concise answers",
confidence=0.75,
tags=["style", "output"],
evidence="From conversation: 'Please keep responses brief'",
scope_type="global",
scope_id=None,
status="approved" # Will be set by store based on confidence
)
```

## References

- Design Document: `docs/P0_memory_scope_acceptance_design.md`
- Schema Definition: `src/paperbot/memory/schema.py`
- Store Implementation: `src/paperbot/infrastructure/stores/memory_store.py`
7 changes: 7 additions & 0 deletions evals/memory/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""
Memory evaluation tests for Scope and Acceptance criteria.

Test files:
- test_deletion_compliance.py: Verify deleted items never retrieved
- test_retrieval_hit_rate.py: Verify relevant memories are found
"""
35 changes: 35 additions & 0 deletions evals/memory/fixtures/retrieval_queries.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"description": "Synthetic test queries for retrieval hit rate evaluation",
"test_cases": [
{
"id": "pref_output_style",
"query": "output format preference",
"expected_kinds": ["preference"],
"expected_keywords": ["concise", "brief", "format"]
},
{
"id": "project_context",
"query": "current project",
"expected_kinds": ["project"],
"expected_keywords": ["project", "working on", "building"]
},
{
"id": "user_identity",
"query": "user name role",
"expected_kinds": ["profile"],
"expected_keywords": ["name", "role", "position"]
},
{
"id": "constraints",
"query": "rules restrictions",
"expected_kinds": ["constraint"],
"expected_keywords": ["never", "always", "must", "don't"]
},
{
"id": "goals",
"query": "goals objectives",
"expected_kinds": ["goal"],
"expected_keywords": ["goal", "want", "trying to", "working towards"]
}
]
}
42 changes: 42 additions & 0 deletions evals/memory/fixtures/sample_memories.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"description": "Sample memories for testing extraction and retrieval",
"user_id": "test_user",
"memories": [
{
"kind": "profile",
"content": "User's name is Jerry, a PhD student at MIT",
"confidence": 0.85,
"tags": ["identity", "name", "role"]
},
{
"kind": "preference",
"content": "User prefers concise, brief responses without excessive explanation",
"confidence": 0.75,
"tags": ["style", "output", "format"]
},
{
"kind": "project",
"content": "Currently working on PaperBot, a research workflow tool",
"confidence": 0.80,
"tags": ["project", "paperbot"]
},
{
"kind": "constraint",
"content": "Never include code examples unless explicitly requested",
"confidence": 0.70,
"tags": ["rules", "code"]
},
{
"kind": "goal",
"content": "Working towards building a memory middleware for LLM applications",
"confidence": 0.72,
"tags": ["goal", "memory", "llm"]
},
{
"kind": "fact",
"content": "Project deadline is end of Q1 2025",
"confidence": 0.65,
"tags": ["deadline", "timeline"]
}
]
}
Loading
Loading