Add OAuth user context caching + Refresh external tokens #110

bgaidioz · 2025-10-06T13:16:03Z

Description

Each tool invocation fetches the user context to pass as a parameter. This currently triggers a call to the AS on every request. In this changeset, a five-minute cache is added to minimize remote AS calls.

The changeset also adds the logic to trigger an external token refresh when the external access token expires. This implies a schema change to the OAuth DB. This doesn't implement a schema migration.

We finally set the MCP access token duration to 365 days while the possibility of refreshing isn't provided by the mcp lib.

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📝 Documentation update
🔧 Refactoring (no functional changes, no api changes)
⚡ Performance improvement
🧪 Test improvement
🔒 Security fix

Testing

Tests pass locally with uv run pytest
Linting passes with uv run ruff check .
Code formatting passes with uv run black --check .
Type checking passes with uv run mypy .
Added tests for new functionality (if applicable)
Updated documentation (if applicable)

Security Considerations

This change does not introduce security vulnerabilities
Sensitive data handling reviewed (if applicable)
Policy enforcement implications considered (if applicable)

Breaking Changes

No.

Additional Notes

When calling within cache duration, the cache is used:

INFO:mcp.server.lowlevel.server:Processing request of type CallToolRequest
...
...
...
DEBUG:mxcp.sdk.auth.middleware:External token mapping found
DEBUG:mxcp.sdk.auth.middleware:🎯 Cache HIT - using cached user context for token mcp_336188be92afbb05...
DEBUG:mxcp.sdk.auth.middleware:Successfully retrieved user context for ben (provider: keycloak)
...
...
...
INFO:mxcp.server.interfaces.server.mcp:Authenticated user: ben (provider: keycloak)

When the cache is expired, the value is ignored and a call is issued to the external AS:

INFO:mcp.server.lowlevel.server:Processing request of type CallToolRequest
...
...
...
DEBUG:mxcp.sdk.auth.middleware:⏰ Cache EXPIRED - removed entry for token mcp_f03722757d9e2f2e...
DEBUG:mxcp.sdk.auth.middleware:🔄 Cache MISS - calling KeycloakOAuthHandler.get_user_context() - Provider API call #6610
...
...
...
INFO:httpx:HTTP Request: GET http://localhost:8080/realms/demo/protocol/openid-connect/userinfo "HTTP/1.1 200 OK"
...
...
...
DEBUG:mxcp.sdk.auth.middleware:💾 Cache STORE - cached user context for token mcp_f03722757d9e2f2e... (TTL: 300s)
DEBUG:mxcp.sdk.auth.middleware:Successfully retrieved user context for ben (provider: keycloak)
DEBUG:mxcp.sdk.auth.middleware:Executing _body for authenticated user (provider: keycloak)
INFO:mxcp.server.interfaces.server.mcp:Calling tool get_user_info with: {}
INFO:mxcp.server.interfaces.server.mcp:Authenticated user: ben (provider: keycloak)

Here's the log when calling a tool after more than five minutes (cache expiration) and the keycloak access token has expired in the meantime.

The cached entry is discarded because it has expired.
The call to keycloak to read again the user context fails because the external (keycloak) token has expired.
The keycloak access token is refreshed using the AS refresh token.
The call to read the user context succeeds.
The MCP tool is called with a recent user context.

INFO:mcp.server.lowlevel.server:Processing request of type CallToolRequest
...
...
...
DEBUG:mxcp.sdk.auth.middleware:⏰ Cache EXPIRED - removed entry for token mcp_336188be92afbb05...
DEBUG:mxcp.sdk.auth.middleware:🔄 Cache MISS - calling KeycloakOAuthHandler.get_user_context() - Provider API call #8377
...
...
...
INFO:httpx:HTTP Request: GET http://localhost:8080/realms/demo/protocol/openid-connect/userinfo "HTTP/1.1 401 Unauthorized"
...
...
...
ERROR:mxcp.sdk.auth.providers.keycloak:Failed to get user info: 401
...
...
...
INFO:mxcp.sdk.auth.middleware:🔄 Access token expired, attempting refresh...
...
...
...
INFO:httpx:HTTP Request: POST http://localhost:8080/realms/demo/protocol/openid-connect/token "HTTP/1.1 200 OK"
...
...
...
INFO:mxcp.sdk.auth.base:Successfully refreshed external token for MCP token: mcp_336188... (new expires_at: 1759761783.1377778)
INFO:mxcp.sdk.auth.middleware:🔄 Successfully refreshed external token: eyJhbGciOiJSUzI1NiIs...
INFO:mxcp.sdk.auth.middleware:✅ Token refresh successful, retrying user context
...
...
...
INFO:httpx:HTTP Request: GET http://localhost:8080/realms/demo/protocol/openid-connect/userinfo "HTTP/1.1 200 OK"
...
...
...
DEBUG:mxcp.sdk.auth.middleware:💾 Cache STORE - cached user context for token mcp_336188be92afbb05... (TTL: 300s)
DEBUG:mxcp.sdk.auth.middleware:Successfully retrieved user context for ben (provider: keycloak)
DEBUG:mxcp.sdk.auth.middleware:Executing _body for authenticated user (provider: keycloak)
INFO:mxcp.server.interfaces.server.mcp:Calling tool get_user_info with: {}
INFO:mxcp.server.interfaces.server.mcp:Authenticated user: ben (provider: keycloak)

Note

Add 5‑min user context caching and implement external OAuth token refresh across providers, with config/schema updates and longer MCP token TTL.

Auth SDK
- Add 5‑minute user context caching in mxcp.sdk.auth.middleware with cache hit/miss/expiry logic.
- Implement external access token refresh flow in mxcp.sdk.auth.base and mxcp.sdk.auth.persistence with new expires_at/refresh handling in _types.
Providers
- Update OAuth handlers for keycloak, google, github, atlassian, and salesforce to support token refresh and resilient userinfo retrieval.
Server
- Integrate refreshed user context handling in mxcp.server.interfaces.server.mcp and auth helpers.
- Increase MCP access token duration; wire through in core auth helpers.
Config & Schema
- Extend config types in server/core/config/_types.py and update schemas/mxcp-config-schema-1.json for new OAuth fields.
- Add example configs under examples/*/config.yml.
Docs
- Update docs/guides/authentication.md to document caching and token refresh behavior.

^{Written by Cursor Bugbot for commit 0365b8d. This will update automatically on new commits. Configure here.}

src/mxcp/sdk/auth/middleware.py

src/mxcp/sdk/auth/base.py

src/mxcp/sdk/auth/middleware.py

examples/keycloak/config.yml

src/mxcp/sdk/auth/providers/atlassian.py

- Add thread-safe caching to prevent repeated provider API calls - Cache user context with configurable TTL (default 5 minutes) - Implement automatic cleanup of expired cache entries - Add comprehensive logging for cache hits/misses/cleanup - Maintain backward compatibility with existing middleware interface This addresses the performance issue where every MCP tool execution was making HTTP requests to OAuth providers (Keycloak, Google, etc.) to fetch the same user information repeatedly. Performance improvement: ~95% reduction in provider API calls for typical usage patterns within the cache TTL window.

wip

Problem: - Global lock in GeneralOAuthAuthorizationServer caused severe bottleneck - Token refresh operations held lock during network I/O (2-30+ seconds) - All auth operations (load, revoke, refresh) blocked during any single operation - Poor concurrency and performance degradation under load Solution: - Added per-token lock infrastructure (_token_locks dict + helper methods) - Refactored 4 critical methods to use per-token locks: * refresh_external_token: Eliminates network I/O bottleneck (CRITICAL FIX) * load_access_token: Allows concurrent lookups for different tokens * revoke_token: Isolates revocation to specific token * exchange_authorization_code: Split into global + per-token phases Key Improvements: 1. Parallelism: Multiple tokens can be refreshed/loaded simultaneously 2. Race condition prevention: Double-check pattern in refresh prevents duplicate work 3. Memory safety: Comprehensive lock cleanup prevents unbounded growth 4. Monitoring: Detailed logging tracks lock dictionary size and operations Lock Cleanup Strategy: - Locks removed when tokens expire (cleanup_expired_mappings) - Locks removed on explicit revocation (revoke_token) - Locks removed when tokens not found (load_access_token) - Stats logged periodically for monitoring Implementation Details: - Follows proven pattern from AuthenticationMiddleware - Global lock still used for shared structures (clients, auth codes) - Per-token locks for operations on specific tokens only - Lock warnings at 100-entry intervals detect potential leaks Testing Notes: - Test concurrent refresh of different tokens (should parallelize) - Test concurrent refresh of same token (should dedupe) - Monitor lock count vs token count over time - Verify cleanup under normal and error conditions

Problem: - Race condition in check_authentication() allowed duplicate API calls - Multiple concurrent requests for same token all saw cache miss - Each request made separate get_user_context() API call to OAuth provider - Cache lock only protected dictionary access, not check-then-act pattern - Resulted in wasted API calls, increased latency, potential rate limiting Race Condition Timeline: T0: Request A checks cache → miss T1: Request B checks cache → miss T2: Request C checks cache → miss T3: All three make API calls to OAuth provider (wasteful!) Solution: - Implemented double-checked locking pattern using existing per-token locks - Reuses _refresh_locks infrastructure (same locks used for token refresh) - Quick cache check without lock (optimistic path) - On miss, acquire per-token lock and double-check cache - Only first request makes API call, others wait and use cached result Implementation: 1. Initial cache check (line 319): Fast path without lock 2. Lock acquisition (line 325): Get per-token refresh lock on miss 3. Double-check (lines 328-332): Check cache again after lock acquired 4. API call (lines 334-384): Only if still a cache miss 5. Cache result (lines 348-350): Store for other waiting requests 6. Log stampede prevention (lines 385-389): Track deduplication Benefits: - Eliminates duplicate API calls under concurrent load - Reduces OAuth provider load and avoids rate limiting - Improves response time for waiting requests (cached result) - No new locks needed - reuses existing _refresh_locks - Consistent with token refresh pattern in same class Additional Fix: - Added caching after successful token refresh retry (lines 373-376) - Ensures refreshed tokens also benefit from caching Testing Notes: - Concurrent requests for same token should only trigger one API call - Second/third requests should log 'Cache HIT (after lock wait)' - Monitor API call frequency under load

Fix mypy type error on line 993 where current_external_token (str | None) was being sliced without null check. Changed condition from: if current_external_token != original_external_token: To: if current_external_token and current_external_token != original_external_token: This ensures current_external_token is not None before slicing it in the log message, satisfying the type checker while maintaining the same logic.

CRITICAL BUG FIX - This was causing complete hangs in production. Problem: - When get_user_context() returned 401, we attempted token refresh - But we were STILL HOLDING the refresh_lock when calling _attempt_token_refresh() - _attempt_token_refresh() tries to acquire the SAME lock - asyncio.Lock is NOT reentrant → DEADLOCK forever Deadlock Flow: 1. Line 334: Get refresh_lock for token 2. Line 336: ACQUIRE lock (async with refresh_lock) 3. Line 346: get_user_context() → throws 401 HTTPException 4. Line 368: Call _attempt_token_refresh() WHILE HOLDING LOCK 5. Inside _attempt_token_refresh(): - Tries to acquire SAME lock again → DEADLOCK Solution: - Moved HTTPException handling OUTSIDE the async with block - Lock is automatically released when exiting the context (on exception) - _attempt_token_refresh() can now acquire the lock successfully - Exception handlers at line 359+ run with lock released Key Changes: - Restructured try/except to wrap the async with block - 401 handling (line 359-392) executes AFTER lock release - Maintains stampede prevention for successful path - Preserves all caching and retry logic Testing: - Token expiration (401 error) should now trigger refresh successfully - No more hangs when external tokens expire - Concurrent requests still prevented by lock in success path

Problem: After successful token refresh, we were making TWO API calls to get_user_context(): 1. Inside _attempt_token_refresh() - fetches and caches user context 2. In the exception handler - fetches again and caches again (duplicate!) This wasted time, increased OAuth provider load, and was unnecessary. Flow Before: 401 Error → Refresh Token → Fetch User Context (cache it) → Return to handler → Fetch User Context AGAIN (overwrite cache) Flow After: 401 Error → Refresh Token → Fetch User Context (cache it) → Return to handler → Check cache → Use cached result ✓ Solution: - Check cache first after token refresh (line 378-380) - Only fetch if cache is empty (fallback, lines 382-395) - Log when using cached vs fetching (observability) Benefits: - Eliminates duplicate OAuth provider API call - Faster response time (one less network round-trip) - Reduces load on OAuth provider - Better user experience The fallback path ensures robustness - if caching fails in _attempt_token_refresh for any reason, we still fetch it.

… token refresh

src/mxcp/sdk/auth/base.py

Remove per-token locking infrastructure from GeneralOAuthAuthorizationServer since the middleware already serializes refresh operations per token. Key changes: - Removed _token_locks, _token_locks_lock, and _locks_to_remove - Removed _get_token_lock(), _mark_token_lock_for_removal(), _cleanup_marked_locks() - Simplified token operations to use only the global lock for shared state - All network/DB I/O now happens outside locks for better performance Benefits: - Eliminates double-locking overhead (middleware + backend) - Reduces code complexity by 115 lines - Prevents potential deadlock scenarios from complex lock hierarchies - Clearer separation of concerns: middleware handles serialization, backend handles logic - Improved performance by minimizing lock hold times The global lock is sufficient for protecting shared data structures while the middleware's per-token refresh locks prevent concurrent refresh attempts. Token refresh operations are now explicitly documented as requiring external serialization by the caller.

cursor · 2025-11-13T13:04:13Z

src/mxcp/sdk/auth/providers/atlassian.py

+                self.token_url,
+                data=refresh_data,
+                headers={"Content-Type": "application/x-www-form-urlencoded"},
+            )


Bug: Inconsistent Request Format Blocks Token Refresh

The refresh_access_token method uses data= with "Content-Type": "application/x-www-form-urlencoded" for the Atlassian token endpoint, but the existing exchange_code method (line 139) uses json= with "Content-Type": "application/json" for the same token_url. Atlassian's token endpoint likely expects JSON format for both operations, so the refresh request will fail with this inconsistent format.

This comment was marked as outdated.

Sign in to view

bgaidioz force-pushed the feature/oauth-user-context-caching branch from f474c13 to 31d822c Compare October 6, 2025 13:42

bgaidioz force-pushed the feature/oauth-refresh-tokens branch from 7ff502b to a282c2b Compare October 6, 2025 14:23

bgaidioz changed the base branch from feature/oauth-user-context-caching to main October 6, 2025 14:23

bgaidioz changed the title ~~Refresh external tokens~~ Add OAuth user context caching + Refresh external tokens Oct 6, 2025

This comment was marked as outdated.

Sign in to view

bgaidioz force-pushed the feature/oauth-refresh-tokens branch from 9bf103d to 4e9b05a Compare October 13, 2025 09:28

This comment was marked as outdated.

Sign in to view

bgaidioz force-pushed the feature/oauth-refresh-tokens branch from 4e9b05a to 644def6 Compare October 13, 2025 09:51

miguelbranco80 approved these changes Oct 17, 2025

View reviewed changes

bgaidioz force-pushed the feature/oauth-refresh-tokens branch from 644def6 to bed1395 Compare October 17, 2025 13:53

This comment was marked as outdated.

Sign in to view

bgaidioz commented Oct 17, 2025

View reviewed changes

src/mxcp/sdk/auth/middleware.py Show resolved Hide resolved

cursor bot reviewed Oct 27, 2025

View reviewed changes

src/mxcp/sdk/auth/base.py Outdated Show resolved Hide resolved

src/mxcp/sdk/auth/middleware.py Show resolved Hide resolved

bgaidioz force-pushed the feature/oauth-refresh-tokens branch 2 times, most recently from 713085d to 27266cb Compare November 12, 2025 14:43

cursor bot reviewed Nov 12, 2025

View reviewed changes

examples/keycloak/config.yml Show resolved Hide resolved

src/mxcp/sdk/auth/providers/atlassian.py Show resolved Hide resolved

bgaidioz added 9 commits November 12, 2025 17:45

fixup! feat: implement user context caching in OAuth middleware

8e81117

wip

Fixed bug + addressed comment

4a636ef

Addressed comments

8c44929

Addressed comments

cf0021a

Addressed comments

db61762

Format...

59ea835

Addressed comments

523d8d7

Fixed ALTER TABLE into something more regular

a89b45e

bgaidioz added 15 commits November 12, 2025 17:45

Addressed comment

20bc3bb

Addressed comment about refresh_locks deletion

407f7c5

reformat

a391216

fmt

8fa9443

fmt

4a91d3d

Added Pydantic model for refresh response

5bbba2e

Preserve HTTP status codes in OAuth provider error handling to enable…

86d7faf

… token refresh

Removed test

cd296c9

Addressed comment

50cf392

Fix jira-oauth/config.yml

6d5ec3f

bgaidioz force-pushed the feature/oauth-refresh-tokens branch from c849a20 to 6d5ec3f Compare November 12, 2025 16:46

cursor bot reviewed Nov 12, 2025

View reviewed changes

src/mxcp/sdk/auth/base.py Show resolved Hide resolved

bgaidioz added 2 commits November 13, 2025 10:21

Clears the race condition in lock cleanup

4bfd286

cursor bot reviewed Nov 13, 2025

View reviewed changes

Add OAuth user context caching + Refresh external tokens #110

Are you sure you want to change the base?

Add OAuth user context caching + Refresh external tokens #110

Uh oh!

Conversation

bgaidioz commented Oct 6, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Security Considerations

Breaking Changes

Additional Notes

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 13, 2025

Choose a reason for hiding this comment

Bug: Inconsistent Request Format Blocks Token Refresh

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bgaidioz commented Oct 6, 2025 •

edited by cursor bot

Loading