ApolloResearch · bronson-apollo · Jan 20, 2026 · Jan 22, 2026 · Jan 23, 2026 · Jan 23, 2026
diff --git a/llm_docs/PLAN.md b/llm_docs/PLAN.md
@@ -0,0 +1,95 @@
+# Scribe-Fork Integration Test Gaps - Plan & Status
+
+## Summary
+
+We're doing TDD to fix production failures in scribe's compaction scenarios. The MCP server loses session state when Claude Code compacts context.
+
+## What's Been Done
+
+### 1. Added `SessionInfo` Pydantic Model
+**File**: `scribe/notebook/notebook_mcp_server.py`
+
+Changed `_active_sessions` from `set[str]` (just session IDs) to `dict[str, SessionInfo]` where:
+```python
+class SessionInfo(BaseModel):
+    session_id: str
+    notebook_path: str
+```
+
+This fixes the design flaw where notebook paths weren't persisted.
+
+### 2. Updated State Persistence
+- `save_state()` now serializes `SessionInfo` objects with `model_dump()`
+- `load_state()` / `ensure_server_running()` restores sessions with notebook paths
+- State file version bumped to 2
+- Backward compatible with old format (list of session IDs)
+
+### 3. Added New Integration Tests
+**File**: `tests/test_state_persistence.py`
+
+New test class `TestCompactionScenarios` with 5 tests:
+- `test_kernel_state_persists_across_compaction` - PASSES
+- `test_list_sessions_then_execute_after_compaction` - FAILS
+- `test_execute_code_with_stale_session_returns_clear_error` - PASSES
+- `test_state_file_includes_notebook_paths` - PASSES
+- `test_multiple_sessions_across_compaction` - FAILS
+
+Also added `TestCompactionScenariosDirect` with direct (non-agent) test.
+
+### 4. Added `TEST_MODEL` Constant
+All tests now use `claude-haiku-4-5-20251001` for speed/cost.
+
+### 5. Added pydantic dependency
+Added to `pyproject.toml`.
+
+## Current Status: 29/29 Tests Passing ✅
+
+All tests pass including:
+- Basic state persistence
+- State file includes notebook paths
+- Stale session error handling
+- list_sessions -> execute_code workflow after compaction
+- Multiple sessions across compaction
+
+## Root Cause (Fixed)
+
+`list_sessions()` was not calling `ensure_server_running()` before returning sessions. When MCP 2 started fresh after compaction:
+1. `_active_sessions` was empty (fresh process)
+2. `list_sessions()` called `get_server_status()` which doesn't load state
+3. Empty sessions returned, agent couldn't find session IDs
+
+**Fix**: Added `ensure_server_running()` call at the start of `list_sessions()` to load state from disk.
+
+## Remaining Tasks
+
+1. **Switch to structlog** - Replace `print(file=sys.stderr)` with proper structured logging
+2. **Add file-based logging** - Persist logs to `~/.scribe/logs/`
+3. **Run full project tests** - Verify no regressions in other test files
+
+## Pyright Configuration Issue
+
+Pyright can't resolve imports from `scribe.notebook.notebook_mcp_server` in test files. Runtime imports work fine.
+
+**Root cause**: Package needs to be installed in editable mode in the venv that pyright uses.
+
+**Fix**: Run `uv pip install -e . --python .venv/bin/python` or configure pyright properly.
+
+Currently using `# pyright: ignore[reportAttributeAccessIssue]` as workaround - should be fixed properly.
+
+## Files Modified
+
+- `scribe/notebook/notebook_mcp_server.py` - SessionInfo model, state persistence, __all__ exports
+- `tests/test_state_persistence.py` - New test classes, TEST_MODEL constant
+- `pyproject.toml` - Added pydantic, pyright config
+
+## How to Run Tests
+
+```bash
+cd /Users/bronson/apex/llm_sessions/scribe-fork
+uv run pytest tests/test_state_persistence.py -v --tb=short
+```
+
+For just the compaction tests:
+```bash
+uv run pytest tests/test_state_persistence.py::TestCompactionScenarios -v --tb=short
+```
diff --git a/pyproject.toml b/pyproject.toml
@@ -13,12 +13,32 @@ dependencies = [
     "nbformat>=5.10.4",
     "pillow>=11.3.0",
     "psutil>=7.0.0",
+    "pydantic>=2.0.0",
     "python-dotenv>=1.1.1",
     "requests>=2.32.4",
+    "structlog>=24.0.0",
 ]
 
 [project.scripts]
 scribe = "scribe.cli.cli:main"
 
+[project.optional-dependencies]
+test = [
+    "pytest>=8.0.0",
+    "pytest-asyncio>=0.23.0",
+    "claude-agent-sdk>=0.1.0",
+]
+
 [tool.setuptools.packages.find]
 include = ["scribe*"]
+
+[tool.pytest.ini_options]
+asyncio_mode = "auto"
+testpaths = ["tests"]
+
+[tool.pyright]
+include = ["scribe", "tests"]
+pythonVersion = "3.11"
+typeCheckingMode = "standard"
+venvPath = "."
+venv = ".venv"
diff --git a/pyrightconfig.json b/pyrightconfig.json
@@ -0,0 +1,7 @@
+{
+  "include": ["scribe", "tests"],
+  "venvPath": ".",
+  "venv": ".venv",
+  "pythonVersion": "3.11",
+  "typeCheckingMode": "standard"
+}
diff --git a/scribe/notebook/_notebook_server_utils.py b/scribe/notebook/_notebook_server_utils.py
@@ -3,27 +3,31 @@
 import subprocess
 import sys
 import time
-from typing import Any, Dict, List, Optional, Tuple
+from typing import Any
 
 import requests
+import structlog
 from fastmcp.utilities.types import Image
+
+logger = structlog.get_logger(__name__)
+
 from ._image_processing_utils import resize_image_if_needed
 
 
-def find_safe_port(start_port=20000, max_port=30000):
+def find_safe_port(start_port=35000, max_port=45000):
     """Find a port that's not in use by anyone.
 
     Uses random selection to minimize conflicts between users.
 
     Args:
-        start_port: Minimum port number (default: 20000)
-        max_port: Maximum port number (default: 30000)
+        start_port: Minimum port number (default: 35000)
+        max_port: Maximum port number (default: 45000)
 
     Returns:
         int: Available port number, or None if none found
     """
-    import socket
     import random
+    import socket
 
     # Try random ports first (more efficient and less likely to conflict)
     ports_to_try = list(range(start_port, max_port + 1))
@@ -54,7 +58,7 @@ def clean_notebook_for_save(nb):
     return nb
 
 
-def check_server_health(port: int) -> Optional[Dict[str, Any]]:
+def check_server_health(port: int) -> dict[str, Any] | None:
     """Check if scribe server is running on given port."""
     try:
         url = f"http://127.0.0.1:{port}/api/scribe/health"
@@ -67,7 +71,7 @@ def check_server_health(port: int) -> Optional[Dict[str, Any]]:
 
 
 def start_scribe_server(
-    port: int, token: str, notebook_output_dir: Optional[str] = None
+    port: int, token: str, notebook_output_dir: str | None = None
 ) -> subprocess.Popen:
     """Start a Scribe Jupyter server subprocess.
 
@@ -136,7 +140,7 @@ def cleanup_scribe_server(process: subprocess.Popen) -> None:
         process: The server process to clean up
     """
     if process:
-        print("Shutting down managed Jupyter server...", file=sys.stderr)
+        logger.info("shutting_down_managed_jupyter_server")
         process.terminate()
         try:
             process.wait(timeout=5)
@@ -146,11 +150,11 @@ def cleanup_scribe_server(process: subprocess.Popen) -> None:
 
 
 def process_jupyter_outputs(
-    outputs: List[Dict[str, Any]],
-    session_id: Optional[str] = None,
+    outputs: list[dict[str, Any]],
+    session_id: str | None = None,
     save_images_locally: bool = False,
-    provider: str = None,
-) -> Tuple[List[Dict[str, Any]], List[Image]]:
+    provider: str | None = None,
+) -> tuple[list[dict[str, Any]], list[Image]]:
     """Process Jupyter notebook outputs into MCP format.
 
     Args: