Add state persistence with session isolation #2

bronson-apollo · 2026-01-22T19:24:37Z

Summary

Persist Jupyter server state to survive Claude Code context compaction
Session isolation via SCRIBE_SESSION_ID for concurrent scribe sessions
Atomic state file writes with 0o600 permissions (token security)
Differentiate auth failures (401/403) from connection failures
Support SCRIBE_TOKEN for external server authentication

The MCP server now requires SCRIBE_SESSION_ID to be set, enforcing that it must be invoked via the scribe CLI wrapper.

Test plan

13 tests covering state file creation, permissions, reconnection, stale state handling, session isolation, server status checks, and external server support

🤖 Generated with Claude Code

Example:

Features: - Persist Jupyter server state to survive Claude Code context compaction - Session isolation via SCRIBE_SESSION_ID for concurrent scribe sessions - Atomic state file writes with 0o600 permissions (token security) - Differentiate auth failures (401/403) from connection failures - Support SCRIBE_TOKEN for external server authentication The MCP server now requires SCRIBE_SESSION_ID to be set, enforcing that it must be invoked via the scribe CLI wrapper. Includes 13 tests covering: - State file creation and permissions - Reconnection after MCP restart - Stale state handling - Session isolation - Server status checks - External server support

## Motivation When using scribe-fork MCP, confusing errors occurred when sessions were invalid: - "500 Server Error" instead of "Session X not found" - No way to discover active sessions after context compaction - Claude couldn't recover from lost session_ids This caused frustrating debugging where the actual error (missing session) was hidden behind generic HTTP error messages. ## Changes Made ### 1. Better error handling (notebook_mcp_server.py) Added `_check_response()` helper that extracts server error messages from JSON response bodies before raising exceptions. Applied to all HTTP calls: - _start_session_internal - execute_code - add_markdown - edit_cell - shutdown_session Now errors show "Failed to execute code in session X: Session X not found" instead of just "500 Server Error: Internal Server Error". ### 2. Add list_sessions MCP tool New @mcp.tool that returns: - List of active session IDs (UUIDs) - Server status (URL, port, health) This allows Claude to recover after context compaction by calling list_sessions to find the active session_id and continue execution without resuming the notebook. ### 3. Updated docstrings Enhanced execute_code and list_sessions docstrings to explain that: - Sessions persist across context compaction - Use list_sessions to recover lost session_ids - No need to resume notebook, just continue executing ### 4. Integration tests (test_state_persistence.py) Added TestErrorHandlingAndSessionDiscovery class with 2 tests: - test_invalid_session_id_returns_clear_error: Verifies server errors are propagated - test_list_sessions_mcp_integration: Verifies list_sessions works end-to-end ## Testing All 15 tests in test_state_persistence.py pass, including 2 new integration tests. ## Files Modified - scribe/notebook/notebook_mcp_server.py: Error handling + list_sessions tool - tests/test_state_persistence.py: Integration tests Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

## Context After Codex reviewed commit 51d032b, it identified several edge cases and security issues in the error handling implementation that needed to be addressed. ## Changes Made ### 1. Enhanced _check_response() error handling **Multiple error field support**: - Now checks "error", "detail", and "message" fields in JSON responses - Prevents loss of informative error messages from different server APIs - Line 50-58: Added fallback chain for error extraction **Empty/non-JSON response handling**: - Returns empty dict for 204 No Content and empty responses - Catches JSON parsing errors on success responses and returns empty dict - Prevents JSONDecodeError crashes on valid but non-JSON responses - Line 62-72: Added defensive handling for edge cases **Improved error messages**: - Includes HTTP status code in error messages - Better fallback when no error details available - Line 60: Format is now "Failed to X (HTTP 500): error message" ### 2. Security: Redacted tokens in list_sessions **Token exposure prevention**: - Redacts auth token from vscode_url before returning - Prevents token leakage in logs, transcripts, and tool outputs - Line 811-813: Splits URL and replaces token with "<redacted>" **Documentation**: - Added note about stale sessions being best-effort - Updated docstring to mention URL redaction ### 3. Comprehensive unit tests for _check_response() Added 8 unit tests (TestCheckResponse class) covering: - Success with JSON: Normal happy path - Success with empty content: 204 No Content edge case - Success with non-JSON: Plain text response edge case - Error with "error" field: Primary error field - Error with "detail" field: FastAPI/Django style errors - Error with "message" field: Alternative error field - Error with non-JSON: Plain text error responses - Error with no content: Empty error responses All tests use mocks to avoid server dependencies and verify exact behavior. ## Testing All 23 tests pass: - 15 existing integration tests (unchanged) - 8 new unit tests for _check_response() Tests verify: ✓ Multiple error fields are checked in correct order ✓ Empty and non-JSON responses don't crash ✓ Error messages include HTTP status codes ✓ Token redaction works correctly ## Codex Review Findings Addressed [HIGH] JSON handling bug → Fixed with empty content check and try/except [MEDIUM] Limited error fields → Now checks error/detail/message [MEDIUM] Token exposure → Redacted in list_sessions output Added comprehensive unit test coverage ## Files Modified - scribe/notebook/notebook_mcp_server.py: Enhanced error handling + token redaction - tests/test_state_persistence.py: Added 8 unit tests Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Added empty __init__.py file to tests directory to ensure proper Python package structure and test discovery. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

andrei-apollo · 2026-01-23T20:24:11Z

scribe/notebook/notebook_mcp_server.py

+
+    # Redact auth token from vscode_url to prevent token leakage in logs/transcripts
+    if status.get("vscode_url") and "?token=" in status["vscode_url"]:
+        base_url = status["vscode_url"].split("?token=")[0]


This would discard all query params after token
FWIW I think any URL manipulations would be easier to do with urllib.parse

andrei-apollo · 2026-01-23T20:26:19Z

scribe/notebook/notebook_mcp_server.py

+    """
+    global _server_port, _server_token, _server_url, _server_process, _active_sessions
+
+    state = {


Convert to BaseModel maybe?

Root cause: list_sessions() wasn't calling ensure_server_running() before returning sessions. After MCP restart (compaction), _active_sessions was empty because state was never loaded from disk. Fix: Add ensure_server_running() call at start of list_sessions() to restore state from persisted state file. All 29 tests now pass including: - test_list_sessions_then_execute_after_compaction - test_multiple_sessions_across_compaction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

New test classes: - TestBackwardCompatibility: Tests v1 state format migration and future version handling - TestServerFailureScenarios: Tests server death between operations, dead kernel, unreachable external server All tests pass, confirming existing implementation handles these edge cases correctly. Test count: 34 total (was 29) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…right support Split the monolithic test file (1740 lines) into maintainable components: - tests/conftest.py: Shared fixtures (reset_mcp_module, make_claude_options, etc.) - tests/test_state_persistence_unit.py: 17 fast unit tests (run in 0.03s) - tests/test_state_persistence_integration.py: 17 integration tests (require real servers) Key fixes: - Added pyrightconfig.json to override parent apex directory's config (pyright was loading /Users/bronson/apex/pyrightconfig.json instead of local settings) - Added py.typed markers for proper package typing - All 34 tests preserved with identical behavior - All pyright errors resolved without using ignore comments Root cause of pyright issues: Pyright searches upward for config files and was finding the parent apex project's pyrightconfig.json, which had different search paths that didn't include our scribe source directory. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Added structlog>=24.0.0 dependency - Converted all print(file=sys.stderr) calls to structured logging: - notebook_mcp_server.py: 4 prints → logger.info/warning/debug - notebook_sever_handlers.py: 8 prints → logger.exception (with traceback) - _notebook_server_utils.py: 1 print → logger.info - Fixed pre-existing type error: provider: str = None → str | None = None Structured logging provides: - Consistent log format with key-value pairs - Automatic exception traceback capture with logger.exception() - Better integration with log aggregation tools Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

New tests (10 added, 27 total unit tests): - TestStateFileMigration: 3 tests for load_state() edge cases - TestPortFinding: 2 tests for find_safe_port() - TestExternalServerFailures: 2 tests for SCRIBE_PORT/SCRIBE_TOKEN failures - TestCleanupHandlers: 3 tests for cleanup_scribe_server() Port range change: - Changed default from 20000-30000 to 35000-45000 - Prevents conflicts with upstream scribe installations All tests pass - no bugs found in these edge cases (implementation is robust). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

## Bugs Found and Fixed ### 1. Kernel Not Ready Before Execution (CRITICAL) The `start_session` method started a kernel but didn't wait for it to be ready before allowing code execution. This caused execute requests to hang indefinitely waiting for a response from an uninitialized kernel. Fix: Added `client.wait_for_ready(timeout=60)` after kernel startup in `notebook_server.py:start_session()`. ### 2. clear_state() Crash During Cleanup The `clear_state()` function called `_get_state_file()` which requires SCRIBE_SESSION_ID to be set. During atexit cleanup, the environment variable might not be set (especially after test fixtures clean up), causing a RuntimeError crash. Fix: Wrapped `_get_state_file()` call in try/except to handle RuntimeError gracefully in `notebook_mcp_server.py:clear_state()`. ### 3. Test Isolation (pkill was dangerous) The `cleanup_jupyter_processes` fixture used `pkill -f` which would kill ALL scribe servers including ones the user is running for actual work. Fix: Changed `reset_mcp_module` fixture to track server processes and clean up only the specific processes it started. ## New Test Files - `test_state_file_corruption.py` (17 tests): Truncated JSON, empty files, wrong schema, null values, extra fields, mixed session formats, read errors - `test_execution_edge_cases.py` (12 tests): Long-running code, large stdout, binary/unicode output, memory allocation, CPU-intensive code, special chars ## Test Coverage - 56 unit tests passing (was 27) - Integration tests improved with proper server tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add 5 new test files covering edit cell, authentication, kernel lifecycle, notebook file edge cases, and race conditions: - test_edit_cell_edge_cases.py (8 tests): negative indices, bounds checking, empty/long content, special characters, cell type handling - test_authentication.py (11 tests): missing auth, wrong schemes, invalid tokens, valid auth flows, auth persistence - test_kernel_lifecycle.py (8 tests): immediate execution after start, rapid requests, error recovery, import handling - test_notebook_file_edge_cases.py (7 tests): notebook deletion during session, path edge cases (spaces, special chars, long names) - test_race_conditions.py (6 tests): concurrent sessions, concurrent executions, lifecycle races, concurrent edits Key findings: - Jupyter Server accepts both 'token' and 'Bearer' auth schemes - Concurrent kernel operations queue sequentially (timeouts expected) - Race condition tests focus on kernel survival, not concurrent success Also fixes test_invalid_session_id_returns_clear_error to handle varying error formats when reconnecting to existing servers. Total test count: 121 tests (120 pass, 1 flaky due to port exhaustion) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bronson-apollo requested a review from andrei-apollo January 22, 2026 19:24

bronson-apollo and others added 3 commits January 22, 2026 22:27

Add tests/__init__.py for test discovery

9eefc4e

Added empty __init__.py file to tests directory to ensure proper Python package structure and test discovery. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

andrei-apollo approved these changes Jan 23, 2026

View reviewed changes

bronson-apollo and others added 7 commits January 28, 2026 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add state persistence with session isolation #2

Add state persistence with session isolation #2

Uh oh!

bronson-apollo commented Jan 22, 2026 •

edited

Loading

Uh oh!

andrei-apollo Jan 23, 2026

Uh oh!

andrei-apollo Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add state persistence with session isolation #2

Are you sure you want to change the base?

Add state persistence with session isolation #2

Uh oh!

Conversation

bronson-apollo commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Example:

Uh oh!

andrei-apollo Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

andrei-apollo Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bronson-apollo commented Jan 22, 2026 •

edited

Loading