Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions llm_docs/PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Scribe-Fork Integration Test Gaps - Plan & Status

## Summary

We're doing TDD to fix production failures in scribe's compaction scenarios. The MCP server loses session state when Claude Code compacts context.

## What's Been Done

### 1. Added `SessionInfo` Pydantic Model
**File**: `scribe/notebook/notebook_mcp_server.py`

Changed `_active_sessions` from `set[str]` (just session IDs) to `dict[str, SessionInfo]` where:
```python
class SessionInfo(BaseModel):
session_id: str
notebook_path: str
```

This fixes the design flaw where notebook paths weren't persisted.

### 2. Updated State Persistence
- `save_state()` now serializes `SessionInfo` objects with `model_dump()`
- `load_state()` / `ensure_server_running()` restores sessions with notebook paths
- State file version bumped to 2
- Backward compatible with old format (list of session IDs)

### 3. Added New Integration Tests
**File**: `tests/test_state_persistence.py`

New test class `TestCompactionScenarios` with 5 tests:
- `test_kernel_state_persists_across_compaction` - PASSES
- `test_list_sessions_then_execute_after_compaction` - FAILS
- `test_execute_code_with_stale_session_returns_clear_error` - PASSES
- `test_state_file_includes_notebook_paths` - PASSES
- `test_multiple_sessions_across_compaction` - FAILS

Also added `TestCompactionScenariosDirect` with direct (non-agent) test.

### 4. Added `TEST_MODEL` Constant
All tests now use `claude-haiku-4-5-20251001` for speed/cost.

### 5. Added pydantic dependency
Added to `pyproject.toml`.

## Current Status: 29/29 Tests Passing ✅

All tests pass including:
- Basic state persistence
- State file includes notebook paths
- Stale session error handling
- list_sessions -> execute_code workflow after compaction
- Multiple sessions across compaction

## Root Cause (Fixed)

`list_sessions()` was not calling `ensure_server_running()` before returning sessions. When MCP 2 started fresh after compaction:
1. `_active_sessions` was empty (fresh process)
2. `list_sessions()` called `get_server_status()` which doesn't load state
3. Empty sessions returned, agent couldn't find session IDs

**Fix**: Added `ensure_server_running()` call at the start of `list_sessions()` to load state from disk.

## Remaining Tasks

1. **Switch to structlog** - Replace `print(file=sys.stderr)` with proper structured logging
2. **Add file-based logging** - Persist logs to `~/.scribe/logs/`
3. **Run full project tests** - Verify no regressions in other test files

## Pyright Configuration Issue

Pyright can't resolve imports from `scribe.notebook.notebook_mcp_server` in test files. Runtime imports work fine.

**Root cause**: Package needs to be installed in editable mode in the venv that pyright uses.

**Fix**: Run `uv pip install -e . --python .venv/bin/python` or configure pyright properly.

Currently using `# pyright: ignore[reportAttributeAccessIssue]` as workaround - should be fixed properly.

## Files Modified

- `scribe/notebook/notebook_mcp_server.py` - SessionInfo model, state persistence, __all__ exports
- `tests/test_state_persistence.py` - New test classes, TEST_MODEL constant
- `pyproject.toml` - Added pydantic, pyright config

## How to Run Tests

```bash
cd /Users/bronson/apex/llm_sessions/scribe-fork
uv run pytest tests/test_state_persistence.py -v --tb=short
```

For just the compaction tests:
```bash
uv run pytest tests/test_state_persistence.py::TestCompactionScenarios -v --tb=short
```
20 changes: 20 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,32 @@ dependencies = [
"nbformat>=5.10.4",
"pillow>=11.3.0",
"psutil>=7.0.0",
"pydantic>=2.0.0",
"python-dotenv>=1.1.1",
"requests>=2.32.4",
"structlog>=24.0.0",
]

[project.scripts]
scribe = "scribe.cli.cli:main"

[project.optional-dependencies]
test = [
"pytest>=8.0.0",
"pytest-asyncio>=0.23.0",
"claude-agent-sdk>=0.1.0",
]

[tool.setuptools.packages.find]
include = ["scribe*"]

[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]

[tool.pyright]
include = ["scribe", "tests"]
pythonVersion = "3.11"
typeCheckingMode = "standard"
venvPath = "."
venv = ".venv"
7 changes: 7 additions & 0 deletions pyrightconfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"include": ["scribe", "tests"],
"venvPath": ".",
"venv": ".venv",
"pythonVersion": "3.11",
"typeCheckingMode": "standard"
}
28 changes: 16 additions & 12 deletions scribe/notebook/_notebook_server_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,31 @@
import subprocess
import sys
import time
from typing import Any, Dict, List, Optional, Tuple
from typing import Any

import requests
import structlog
from fastmcp.utilities.types import Image

logger = structlog.get_logger(__name__)

from ._image_processing_utils import resize_image_if_needed


def find_safe_port(start_port=20000, max_port=30000):
def find_safe_port(start_port=35000, max_port=45000):
"""Find a port that's not in use by anyone.

Uses random selection to minimize conflicts between users.

Args:
start_port: Minimum port number (default: 20000)
max_port: Maximum port number (default: 30000)
start_port: Minimum port number (default: 35000)
max_port: Maximum port number (default: 45000)

Returns:
int: Available port number, or None if none found
"""
import socket
import random
import socket

# Try random ports first (more efficient and less likely to conflict)
ports_to_try = list(range(start_port, max_port + 1))
Expand Down Expand Up @@ -54,7 +58,7 @@ def clean_notebook_for_save(nb):
return nb


def check_server_health(port: int) -> Optional[Dict[str, Any]]:
def check_server_health(port: int) -> dict[str, Any] | None:
"""Check if scribe server is running on given port."""
try:
url = f"http://127.0.0.1:{port}/api/scribe/health"
Expand All @@ -67,7 +71,7 @@ def check_server_health(port: int) -> Optional[Dict[str, Any]]:


def start_scribe_server(
port: int, token: str, notebook_output_dir: Optional[str] = None
port: int, token: str, notebook_output_dir: str | None = None
) -> subprocess.Popen:
"""Start a Scribe Jupyter server subprocess.

Expand Down Expand Up @@ -136,7 +140,7 @@ def cleanup_scribe_server(process: subprocess.Popen) -> None:
process: The server process to clean up
"""
if process:
print("Shutting down managed Jupyter server...", file=sys.stderr)
logger.info("shutting_down_managed_jupyter_server")
process.terminate()
try:
process.wait(timeout=5)
Expand All @@ -146,11 +150,11 @@ def cleanup_scribe_server(process: subprocess.Popen) -> None:


def process_jupyter_outputs(
outputs: List[Dict[str, Any]],
session_id: Optional[str] = None,
outputs: list[dict[str, Any]],
session_id: str | None = None,
save_images_locally: bool = False,
provider: str = None,
) -> Tuple[List[Dict[str, Any]], List[Image]]:
provider: str | None = None,
) -> tuple[list[dict[str, Any]], list[Image]]:
"""Process Jupyter notebook outputs into MCP format.

Args:
Expand Down
Loading