forked from goodfire-ai/scribe
-
Notifications
You must be signed in to change notification settings - Fork 0
Add state persistence with session isolation #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bronson-apollo
wants to merge
11
commits into
main
Choose a base branch
from
feature/state-persistence
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Features: - Persist Jupyter server state to survive Claude Code context compaction - Session isolation via SCRIBE_SESSION_ID for concurrent scribe sessions - Atomic state file writes with 0o600 permissions (token security) - Differentiate auth failures (401/403) from connection failures - Support SCRIBE_TOKEN for external server authentication The MCP server now requires SCRIBE_SESSION_ID to be set, enforcing that it must be invoked via the scribe CLI wrapper. Includes 13 tests covering: - State file creation and permissions - Reconnection after MCP restart - Stale state handling - Session isolation - Server status checks - External server support
## Motivation When using scribe-fork MCP, confusing errors occurred when sessions were invalid: - "500 Server Error" instead of "Session X not found" - No way to discover active sessions after context compaction - Claude couldn't recover from lost session_ids This caused frustrating debugging where the actual error (missing session) was hidden behind generic HTTP error messages. ## Changes Made ### 1. Better error handling (notebook_mcp_server.py) Added `_check_response()` helper that extracts server error messages from JSON response bodies before raising exceptions. Applied to all HTTP calls: - _start_session_internal - execute_code - add_markdown - edit_cell - shutdown_session Now errors show "Failed to execute code in session X: Session X not found" instead of just "500 Server Error: Internal Server Error". ### 2. Add list_sessions MCP tool New @mcp.tool that returns: - List of active session IDs (UUIDs) - Server status (URL, port, health) This allows Claude to recover after context compaction by calling list_sessions to find the active session_id and continue execution without resuming the notebook. ### 3. Updated docstrings Enhanced execute_code and list_sessions docstrings to explain that: - Sessions persist across context compaction - Use list_sessions to recover lost session_ids - No need to resume notebook, just continue executing ### 4. Integration tests (test_state_persistence.py) Added TestErrorHandlingAndSessionDiscovery class with 2 tests: - test_invalid_session_id_returns_clear_error: Verifies server errors are propagated - test_list_sessions_mcp_integration: Verifies list_sessions works end-to-end ## Testing All 15 tests in test_state_persistence.py pass, including 2 new integration tests. ## Files Modified - scribe/notebook/notebook_mcp_server.py: Error handling + list_sessions tool - tests/test_state_persistence.py: Integration tests Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Context After Codex reviewed commit 51d032b, it identified several edge cases and security issues in the error handling implementation that needed to be addressed. ## Changes Made ### 1. Enhanced _check_response() error handling **Multiple error field support**: - Now checks "error", "detail", and "message" fields in JSON responses - Prevents loss of informative error messages from different server APIs - Line 50-58: Added fallback chain for error extraction **Empty/non-JSON response handling**: - Returns empty dict for 204 No Content and empty responses - Catches JSON parsing errors on success responses and returns empty dict - Prevents JSONDecodeError crashes on valid but non-JSON responses - Line 62-72: Added defensive handling for edge cases **Improved error messages**: - Includes HTTP status code in error messages - Better fallback when no error details available - Line 60: Format is now "Failed to X (HTTP 500): error message" ### 2. Security: Redacted tokens in list_sessions **Token exposure prevention**: - Redacts auth token from vscode_url before returning - Prevents token leakage in logs, transcripts, and tool outputs - Line 811-813: Splits URL and replaces token with "<redacted>" **Documentation**: - Added note about stale sessions being best-effort - Updated docstring to mention URL redaction ### 3. Comprehensive unit tests for _check_response() Added 8 unit tests (TestCheckResponse class) covering: - Success with JSON: Normal happy path - Success with empty content: 204 No Content edge case - Success with non-JSON: Plain text response edge case - Error with "error" field: Primary error field - Error with "detail" field: FastAPI/Django style errors - Error with "message" field: Alternative error field - Error with non-JSON: Plain text error responses - Error with no content: Empty error responses All tests use mocks to avoid server dependencies and verify exact behavior. ## Testing All 23 tests pass: - 15 existing integration tests (unchanged) - 8 new unit tests for _check_response() Tests verify: ✓ Multiple error fields are checked in correct order ✓ Empty and non-JSON responses don't crash ✓ Error messages include HTTP status codes ✓ Token redaction works correctly ## Codex Review Findings Addressed [HIGH] JSON handling bug → Fixed with empty content check and try/except [MEDIUM] Limited error fields → Now checks error/detail/message [MEDIUM] Token exposure → Redacted in list_sessions output Added comprehensive unit test coverage ## Files Modified - scribe/notebook/notebook_mcp_server.py: Enhanced error handling + token redaction - tests/test_state_persistence.py: Added 8 unit tests Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added empty __init__.py file to tests directory to ensure proper Python package structure and test discovery. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
andrei-apollo
approved these changes
Jan 23, 2026
|
|
||
| # Redact auth token from vscode_url to prevent token leakage in logs/transcripts | ||
| if status.get("vscode_url") and "?token=" in status["vscode_url"]: | ||
| base_url = status["vscode_url"].split("?token=")[0] |
Collaborator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would discard all query params after token
FWIW I think any URL manipulations would be easier to do with urllib.parse
| """ | ||
| global _server_port, _server_token, _server_url, _server_process, _active_sessions | ||
|
|
||
| state = { |
Collaborator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert to BaseModel maybe?
Root cause: list_sessions() wasn't calling ensure_server_running() before returning sessions. After MCP restart (compaction), _active_sessions was empty because state was never loaded from disk. Fix: Add ensure_server_running() call at start of list_sessions() to restore state from persisted state file. All 29 tests now pass including: - test_list_sessions_then_execute_after_compaction - test_multiple_sessions_across_compaction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New test classes: - TestBackwardCompatibility: Tests v1 state format migration and future version handling - TestServerFailureScenarios: Tests server death between operations, dead kernel, unreachable external server All tests pass, confirming existing implementation handles these edge cases correctly. Test count: 34 total (was 29) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…right support Split the monolithic test file (1740 lines) into maintainable components: - tests/conftest.py: Shared fixtures (reset_mcp_module, make_claude_options, etc.) - tests/test_state_persistence_unit.py: 17 fast unit tests (run in 0.03s) - tests/test_state_persistence_integration.py: 17 integration tests (require real servers) Key fixes: - Added pyrightconfig.json to override parent apex directory's config (pyright was loading /Users/bronson/apex/pyrightconfig.json instead of local settings) - Added py.typed markers for proper package typing - All 34 tests preserved with identical behavior - All pyright errors resolved without using ignore comments Root cause of pyright issues: Pyright searches upward for config files and was finding the parent apex project's pyrightconfig.json, which had different search paths that didn't include our scribe source directory. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added structlog>=24.0.0 dependency - Converted all print(file=sys.stderr) calls to structured logging: - notebook_mcp_server.py: 4 prints → logger.info/warning/debug - notebook_sever_handlers.py: 8 prints → logger.exception (with traceback) - _notebook_server_utils.py: 1 print → logger.info - Fixed pre-existing type error: provider: str = None → str | None = None Structured logging provides: - Consistent log format with key-value pairs - Automatic exception traceback capture with logger.exception() - Better integration with log aggregation tools Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New tests (10 added, 27 total unit tests): - TestStateFileMigration: 3 tests for load_state() edge cases - TestPortFinding: 2 tests for find_safe_port() - TestExternalServerFailures: 2 tests for SCRIBE_PORT/SCRIBE_TOKEN failures - TestCleanupHandlers: 3 tests for cleanup_scribe_server() Port range change: - Changed default from 20000-30000 to 35000-45000 - Prevents conflicts with upstream scribe installations All tests pass - no bugs found in these edge cases (implementation is robust). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Bugs Found and Fixed ### 1. Kernel Not Ready Before Execution (CRITICAL) The `start_session` method started a kernel but didn't wait for it to be ready before allowing code execution. This caused execute requests to hang indefinitely waiting for a response from an uninitialized kernel. Fix: Added `client.wait_for_ready(timeout=60)` after kernel startup in `notebook_server.py:start_session()`. ### 2. clear_state() Crash During Cleanup The `clear_state()` function called `_get_state_file()` which requires SCRIBE_SESSION_ID to be set. During atexit cleanup, the environment variable might not be set (especially after test fixtures clean up), causing a RuntimeError crash. Fix: Wrapped `_get_state_file()` call in try/except to handle RuntimeError gracefully in `notebook_mcp_server.py:clear_state()`. ### 3. Test Isolation (pkill was dangerous) The `cleanup_jupyter_processes` fixture used `pkill -f` which would kill ALL scribe servers including ones the user is running for actual work. Fix: Changed `reset_mcp_module` fixture to track server processes and clean up only the specific processes it started. ## New Test Files - `test_state_file_corruption.py` (17 tests): Truncated JSON, empty files, wrong schema, null values, extra fields, mixed session formats, read errors - `test_execution_edge_cases.py` (12 tests): Long-running code, large stdout, binary/unicode output, memory allocation, CPU-intensive code, special chars ## Test Coverage - 56 unit tests passing (was 27) - Integration tests improved with proper server tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add 5 new test files covering edit cell, authentication, kernel lifecycle, notebook file edge cases, and race conditions: - test_edit_cell_edge_cases.py (8 tests): negative indices, bounds checking, empty/long content, special characters, cell type handling - test_authentication.py (11 tests): missing auth, wrong schemes, invalid tokens, valid auth flows, auth persistence - test_kernel_lifecycle.py (8 tests): immediate execution after start, rapid requests, error recovery, import handling - test_notebook_file_edge_cases.py (7 tests): notebook deletion during session, path edge cases (spaces, special chars, long names) - test_race_conditions.py (6 tests): concurrent sessions, concurrent executions, lifecycle races, concurrent edits Key findings: - Jupyter Server accepts both 'token' and 'Bearer' auth schemes - Concurrent kernel operations queue sequentially (timeouts expected) - Race condition tests focus on kernel survival, not concurrent success Also fixes test_invalid_session_id_returns_clear_error to handle varying error formats when reconnecting to existing servers. Total test count: 121 tests (120 pass, 1 flaky due to port exhaustion) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
SCRIBE_SESSION_IDfor concurrent scribe sessionsSCRIBE_TOKENfor external server authenticationThe MCP server now requires
SCRIBE_SESSION_IDto be set, enforcing that it must be invoked via the scribe CLI wrapper.Test plan
🤖 Generated with Claude Code
Example: