Skip to content

Conversation

@bronson-apollo
Copy link
Collaborator

@bronson-apollo bronson-apollo commented Jan 22, 2026

Summary

  • Persist Jupyter server state to survive Claude Code context compaction
  • Session isolation via SCRIBE_SESSION_ID for concurrent scribe sessions
  • Atomic state file writes with 0o600 permissions (token security)
  • Differentiate auth failures (401/403) from connection failures
  • Support SCRIBE_TOKEN for external server authentication

The MCP server now requires SCRIBE_SESSION_ID to be set, enforcing that it must be invoked via the scribe CLI wrapper.

Test plan

  • 13 tests covering state file creation, permissions, reconnection, stale state handling, session isolation, server status checks, and external server support

🤖 Generated with Claude Code

Example:

Screenshot 2026-01-23 at 2 32 27 AM

Features:
- Persist Jupyter server state to survive Claude Code context compaction
- Session isolation via SCRIBE_SESSION_ID for concurrent scribe sessions
- Atomic state file writes with 0o600 permissions (token security)
- Differentiate auth failures (401/403) from connection failures
- Support SCRIBE_TOKEN for external server authentication

The MCP server now requires SCRIBE_SESSION_ID to be set, enforcing
that it must be invoked via the scribe CLI wrapper.

Includes 13 tests covering:
- State file creation and permissions
- Reconnection after MCP restart
- Stale state handling
- Session isolation
- Server status checks
- External server support
bronson-apollo and others added 3 commits January 22, 2026 22:27
## Motivation

When using scribe-fork MCP, confusing errors occurred when sessions were invalid:
- "500 Server Error" instead of "Session X not found"
- No way to discover active sessions after context compaction
- Claude couldn't recover from lost session_ids

This caused frustrating debugging where the actual error (missing session) was hidden
behind generic HTTP error messages.

## Changes Made

### 1. Better error handling (notebook_mcp_server.py)

Added `_check_response()` helper that extracts server error messages from JSON response
bodies before raising exceptions. Applied to all HTTP calls:
- _start_session_internal
- execute_code
- add_markdown
- edit_cell
- shutdown_session

Now errors show "Failed to execute code in session X: Session X not found" instead of
just "500 Server Error: Internal Server Error".

### 2. Add list_sessions MCP tool

New @mcp.tool that returns:
- List of active session IDs (UUIDs)
- Server status (URL, port, health)

This allows Claude to recover after context compaction by calling list_sessions to
find the active session_id and continue execution without resuming the notebook.

### 3. Updated docstrings

Enhanced execute_code and list_sessions docstrings to explain that:
- Sessions persist across context compaction
- Use list_sessions to recover lost session_ids
- No need to resume notebook, just continue executing

### 4. Integration tests (test_state_persistence.py)

Added TestErrorHandlingAndSessionDiscovery class with 2 tests:
- test_invalid_session_id_returns_clear_error: Verifies server errors are propagated
- test_list_sessions_mcp_integration: Verifies list_sessions works end-to-end

## Testing

All 15 tests in test_state_persistence.py pass, including 2 new integration tests.

## Files Modified

- scribe/notebook/notebook_mcp_server.py: Error handling + list_sessions tool
- tests/test_state_persistence.py: Integration tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## Context

After Codex reviewed commit 51d032b, it identified several edge cases and security
issues in the error handling implementation that needed to be addressed.

## Changes Made

### 1. Enhanced _check_response() error handling

**Multiple error field support**:
- Now checks "error", "detail", and "message" fields in JSON responses
- Prevents loss of informative error messages from different server APIs
- Line 50-58: Added fallback chain for error extraction

**Empty/non-JSON response handling**:
- Returns empty dict for 204 No Content and empty responses
- Catches JSON parsing errors on success responses and returns empty dict
- Prevents JSONDecodeError crashes on valid but non-JSON responses
- Line 62-72: Added defensive handling for edge cases

**Improved error messages**:
- Includes HTTP status code in error messages
- Better fallback when no error details available
- Line 60: Format is now "Failed to X (HTTP 500): error message"

### 2. Security: Redacted tokens in list_sessions

**Token exposure prevention**:
- Redacts auth token from vscode_url before returning
- Prevents token leakage in logs, transcripts, and tool outputs
- Line 811-813: Splits URL and replaces token with "<redacted>"

**Documentation**:
- Added note about stale sessions being best-effort
- Updated docstring to mention URL redaction

### 3. Comprehensive unit tests for _check_response()

Added 8 unit tests (TestCheckResponse class) covering:
- Success with JSON: Normal happy path
- Success with empty content: 204 No Content edge case
- Success with non-JSON: Plain text response edge case
- Error with "error" field: Primary error field
- Error with "detail" field: FastAPI/Django style errors
- Error with "message" field: Alternative error field
- Error with non-JSON: Plain text error responses
- Error with no content: Empty error responses

All tests use mocks to avoid server dependencies and verify exact behavior.

## Testing

All 23 tests pass:
- 15 existing integration tests (unchanged)
- 8 new unit tests for _check_response()

Tests verify:
✓ Multiple error fields are checked in correct order
✓ Empty and non-JSON responses don't crash
✓ Error messages include HTTP status codes
✓ Token redaction works correctly

## Codex Review Findings Addressed

[HIGH] JSON handling bug → Fixed with empty content check and try/except
[MEDIUM] Limited error fields → Now checks error/detail/message
[MEDIUM] Token exposure → Redacted in list_sessions output
Added comprehensive unit test coverage

## Files Modified

- scribe/notebook/notebook_mcp_server.py: Enhanced error handling + token redaction
- tests/test_state_persistence.py: Added 8 unit tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added empty __init__.py file to tests directory to ensure proper Python package
structure and test discovery.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

# Redact auth token from vscode_url to prevent token leakage in logs/transcripts
if status.get("vscode_url") and "?token=" in status["vscode_url"]:
base_url = status["vscode_url"].split("?token=")[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would discard all query params after token
FWIW I think any URL manipulations would be easier to do with urllib.parse

"""
global _server_port, _server_token, _server_url, _server_process, _active_sessions

state = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to BaseModel maybe?

bronson-apollo and others added 7 commits January 28, 2026 21:06
Root cause: list_sessions() wasn't calling ensure_server_running() before
returning sessions. After MCP restart (compaction), _active_sessions was
empty because state was never loaded from disk.

Fix: Add ensure_server_running() call at start of list_sessions() to
restore state from persisted state file.

All 29 tests now pass including:
- test_list_sessions_then_execute_after_compaction
- test_multiple_sessions_across_compaction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New test classes:
- TestBackwardCompatibility: Tests v1 state format migration and future version handling
- TestServerFailureScenarios: Tests server death between operations, dead kernel, unreachable external server

All tests pass, confirming existing implementation handles these edge cases correctly.

Test count: 34 total (was 29)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…right support

Split the monolithic test file (1740 lines) into maintainable components:

- tests/conftest.py: Shared fixtures (reset_mcp_module, make_claude_options, etc.)
- tests/test_state_persistence_unit.py: 17 fast unit tests (run in 0.03s)
- tests/test_state_persistence_integration.py: 17 integration tests (require real servers)

Key fixes:
- Added pyrightconfig.json to override parent apex directory's config
  (pyright was loading /Users/bronson/apex/pyrightconfig.json instead of local settings)
- Added py.typed markers for proper package typing
- All 34 tests preserved with identical behavior
- All pyright errors resolved without using ignore comments

Root cause of pyright issues: Pyright searches upward for config files and was
finding the parent apex project's pyrightconfig.json, which had different search
paths that didn't include our scribe source directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added structlog>=24.0.0 dependency
- Converted all print(file=sys.stderr) calls to structured logging:
  - notebook_mcp_server.py: 4 prints → logger.info/warning/debug
  - notebook_sever_handlers.py: 8 prints → logger.exception (with traceback)
  - _notebook_server_utils.py: 1 print → logger.info
- Fixed pre-existing type error: provider: str = None → str | None = None

Structured logging provides:
- Consistent log format with key-value pairs
- Automatic exception traceback capture with logger.exception()
- Better integration with log aggregation tools

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New tests (10 added, 27 total unit tests):
- TestStateFileMigration: 3 tests for load_state() edge cases
- TestPortFinding: 2 tests for find_safe_port()
- TestExternalServerFailures: 2 tests for SCRIBE_PORT/SCRIBE_TOKEN failures
- TestCleanupHandlers: 3 tests for cleanup_scribe_server()

Port range change:
- Changed default from 20000-30000 to 35000-45000
- Prevents conflicts with upstream scribe installations

All tests pass - no bugs found in these edge cases (implementation is robust).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
## Bugs Found and Fixed

### 1. Kernel Not Ready Before Execution (CRITICAL)
The `start_session` method started a kernel but didn't wait for it to be
ready before allowing code execution. This caused execute requests to hang
indefinitely waiting for a response from an uninitialized kernel.

Fix: Added `client.wait_for_ready(timeout=60)` after kernel startup in
`notebook_server.py:start_session()`.

### 2. clear_state() Crash During Cleanup
The `clear_state()` function called `_get_state_file()` which requires
SCRIBE_SESSION_ID to be set. During atexit cleanup, the environment
variable might not be set (especially after test fixtures clean up),
causing a RuntimeError crash.

Fix: Wrapped `_get_state_file()` call in try/except to handle
RuntimeError gracefully in `notebook_mcp_server.py:clear_state()`.

### 3. Test Isolation (pkill was dangerous)
The `cleanup_jupyter_processes` fixture used `pkill -f` which would kill
ALL scribe servers including ones the user is running for actual work.

Fix: Changed `reset_mcp_module` fixture to track server processes and
clean up only the specific processes it started.

## New Test Files

- `test_state_file_corruption.py` (17 tests): Truncated JSON, empty files,
  wrong schema, null values, extra fields, mixed session formats, read errors

- `test_execution_edge_cases.py` (12 tests): Long-running code, large stdout,
  binary/unicode output, memory allocation, CPU-intensive code, special chars

## Test Coverage

- 56 unit tests passing (was 27)
- Integration tests improved with proper server tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add 5 new test files covering edit cell, authentication, kernel lifecycle,
notebook file edge cases, and race conditions:

- test_edit_cell_edge_cases.py (8 tests): negative indices, bounds checking,
  empty/long content, special characters, cell type handling
- test_authentication.py (11 tests): missing auth, wrong schemes, invalid
  tokens, valid auth flows, auth persistence
- test_kernel_lifecycle.py (8 tests): immediate execution after start,
  rapid requests, error recovery, import handling
- test_notebook_file_edge_cases.py (7 tests): notebook deletion during
  session, path edge cases (spaces, special chars, long names)
- test_race_conditions.py (6 tests): concurrent sessions, concurrent
  executions, lifecycle races, concurrent edits

Key findings:
- Jupyter Server accepts both 'token' and 'Bearer' auth schemes
- Concurrent kernel operations queue sequentially (timeouts expected)
- Race condition tests focus on kernel survival, not concurrent success

Also fixes test_invalid_session_id_returns_clear_error to handle
varying error formats when reconnecting to existing servers.

Total test count: 121 tests (120 pass, 1 flaky due to port exhaustion)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants