-
Notifications
You must be signed in to change notification settings - Fork 7
Add OAuth user context caching + Refresh external tokens #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f474c13 to
31d822c
Compare
7ff502b to
a282c2b
Compare
9bf103d to
4e9b05a
Compare
4e9b05a to
644def6
Compare
644def6 to
bed1395
Compare
713085d to
27266cb
Compare
- Add thread-safe caching to prevent repeated provider API calls - Cache user context with configurable TTL (default 5 minutes) - Implement automatic cleanup of expired cache entries - Add comprehensive logging for cache hits/misses/cleanup - Maintain backward compatibility with existing middleware interface This addresses the performance issue where every MCP tool execution was making HTTP requests to OAuth providers (Keycloak, Google, etc.) to fetch the same user information repeatedly. Performance improvement: ~95% reduction in provider API calls for typical usage patterns within the cache TTL window.
Problem: - Global lock in GeneralOAuthAuthorizationServer caused severe bottleneck - Token refresh operations held lock during network I/O (2-30+ seconds) - All auth operations (load, revoke, refresh) blocked during any single operation - Poor concurrency and performance degradation under load Solution: - Added per-token lock infrastructure (_token_locks dict + helper methods) - Refactored 4 critical methods to use per-token locks: * refresh_external_token: Eliminates network I/O bottleneck (CRITICAL FIX) * load_access_token: Allows concurrent lookups for different tokens * revoke_token: Isolates revocation to specific token * exchange_authorization_code: Split into global + per-token phases Key Improvements: 1. Parallelism: Multiple tokens can be refreshed/loaded simultaneously 2. Race condition prevention: Double-check pattern in refresh prevents duplicate work 3. Memory safety: Comprehensive lock cleanup prevents unbounded growth 4. Monitoring: Detailed logging tracks lock dictionary size and operations Lock Cleanup Strategy: - Locks removed when tokens expire (cleanup_expired_mappings) - Locks removed on explicit revocation (revoke_token) - Locks removed when tokens not found (load_access_token) - Stats logged periodically for monitoring Implementation Details: - Follows proven pattern from AuthenticationMiddleware - Global lock still used for shared structures (clients, auth codes) - Per-token locks for operations on specific tokens only - Lock warnings at 100-entry intervals detect potential leaks Testing Notes: - Test concurrent refresh of different tokens (should parallelize) - Test concurrent refresh of same token (should dedupe) - Monitor lock count vs token count over time - Verify cleanup under normal and error conditions
Problem: - Race condition in check_authentication() allowed duplicate API calls - Multiple concurrent requests for same token all saw cache miss - Each request made separate get_user_context() API call to OAuth provider - Cache lock only protected dictionary access, not check-then-act pattern - Resulted in wasted API calls, increased latency, potential rate limiting Race Condition Timeline: T0: Request A checks cache → miss T1: Request B checks cache → miss T2: Request C checks cache → miss T3: All three make API calls to OAuth provider (wasteful!) Solution: - Implemented double-checked locking pattern using existing per-token locks - Reuses _refresh_locks infrastructure (same locks used for token refresh) - Quick cache check without lock (optimistic path) - On miss, acquire per-token lock and double-check cache - Only first request makes API call, others wait and use cached result Implementation: 1. Initial cache check (line 319): Fast path without lock 2. Lock acquisition (line 325): Get per-token refresh lock on miss 3. Double-check (lines 328-332): Check cache again after lock acquired 4. API call (lines 334-384): Only if still a cache miss 5. Cache result (lines 348-350): Store for other waiting requests 6. Log stampede prevention (lines 385-389): Track deduplication Benefits: - Eliminates duplicate API calls under concurrent load - Reduces OAuth provider load and avoids rate limiting - Improves response time for waiting requests (cached result) - No new locks needed - reuses existing _refresh_locks - Consistent with token refresh pattern in same class Additional Fix: - Added caching after successful token refresh retry (lines 373-376) - Ensures refreshed tokens also benefit from caching Testing Notes: - Concurrent requests for same token should only trigger one API call - Second/third requests should log 'Cache HIT (after lock wait)' - Monitor API call frequency under load
Fix mypy type error on line 993 where current_external_token (str | None) was being sliced without null check. Changed condition from: if current_external_token != original_external_token: To: if current_external_token and current_external_token != original_external_token: This ensures current_external_token is not None before slicing it in the log message, satisfying the type checker while maintaining the same logic.
CRITICAL BUG FIX - This was causing complete hangs in production.
Problem:
- When get_user_context() returned 401, we attempted token refresh
- But we were STILL HOLDING the refresh_lock when calling _attempt_token_refresh()
- _attempt_token_refresh() tries to acquire the SAME lock
- asyncio.Lock is NOT reentrant → DEADLOCK forever
Deadlock Flow:
1. Line 334: Get refresh_lock for token
2. Line 336: ACQUIRE lock (async with refresh_lock)
3. Line 346: get_user_context() → throws 401 HTTPException
4. Line 368: Call _attempt_token_refresh() WHILE HOLDING LOCK
5. Inside _attempt_token_refresh():
- Tries to acquire SAME lock again → DEADLOCK
Solution:
- Moved HTTPException handling OUTSIDE the async with block
- Lock is automatically released when exiting the context (on exception)
- _attempt_token_refresh() can now acquire the lock successfully
- Exception handlers at line 359+ run with lock released
Key Changes:
- Restructured try/except to wrap the async with block
- 401 handling (line 359-392) executes AFTER lock release
- Maintains stampede prevention for successful path
- Preserves all caching and retry logic
Testing:
- Token expiration (401 error) should now trigger refresh successfully
- No more hangs when external tokens expire
- Concurrent requests still prevented by lock in success path
Problem: After successful token refresh, we were making TWO API calls to get_user_context(): 1. Inside _attempt_token_refresh() - fetches and caches user context 2. In the exception handler - fetches again and caches again (duplicate!) This wasted time, increased OAuth provider load, and was unnecessary. Flow Before: 401 Error → Refresh Token → Fetch User Context (cache it) → Return to handler → Fetch User Context AGAIN (overwrite cache) Flow After: 401 Error → Refresh Token → Fetch User Context (cache it) → Return to handler → Check cache → Use cached result ✓ Solution: - Check cache first after token refresh (line 378-380) - Only fetch if cache is empty (fallback, lines 382-395) - Log when using cached vs fetching (observability) Benefits: - Eliminates duplicate OAuth provider API call - Faster response time (one less network round-trip) - Reduces load on OAuth provider - Better user experience The fallback path ensures robustness - if caching fails in _attempt_token_refresh for any reason, we still fetch it.
c849a20 to
6d5ec3f
Compare
Remove per-token locking infrastructure from GeneralOAuthAuthorizationServer since the middleware already serializes refresh operations per token. Key changes: - Removed _token_locks, _token_locks_lock, and _locks_to_remove - Removed _get_token_lock(), _mark_token_lock_for_removal(), _cleanup_marked_locks() - Simplified token operations to use only the global lock for shared state - All network/DB I/O now happens outside locks for better performance Benefits: - Eliminates double-locking overhead (middleware + backend) - Reduces code complexity by 115 lines - Prevents potential deadlock scenarios from complex lock hierarchies - Clearer separation of concerns: middleware handles serialization, backend handles logic - Improved performance by minimizing lock hold times The global lock is sufficient for protecting shared data structures while the middleware's per-token refresh locks prevent concurrent refresh attempts. Token refresh operations are now explicitly documented as requiring external serialization by the caller.
| self.token_url, | ||
| data=refresh_data, | ||
| headers={"Content-Type": "application/x-www-form-urlencoded"}, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Inconsistent Request Format Blocks Token Refresh
The refresh_access_token method uses data= with "Content-Type": "application/x-www-form-urlencoded" for the Atlassian token endpoint, but the existing exchange_code method (line 139) uses json= with "Content-Type": "application/json" for the same token_url. Atlassian's token endpoint likely expects JSON format for both operations, so the refresh request will fail with this inconsistent format.
Description
Each tool invocation fetches the user context to pass as a parameter. This currently triggers a call to the AS on every request. In this changeset, a five-minute cache is added to minimize remote AS calls.
The changeset also adds the logic to trigger an external token refresh when the external access token expires. This implies a schema change to the OAuth DB. This doesn't implement a schema migration.
We finally set the MCP access token duration to 365 days while the possibility of refreshing isn't provided by the
mcplib.Type of Change
Testing
uv run pytestuv run ruff check .uv run black --check .uv run mypy .Security Considerations
Breaking Changes
No.
Additional Notes
When calling within cache duration, the cache is used:
When the cache is expired, the value is ignored and a call is issued to the external AS:
Here's the log when calling a tool after more than five minutes (cache expiration) and the keycloak access token has expired in the meantime.
Note
Add 5‑min user context caching and implement external OAuth token refresh across providers, with config/schema updates and longer MCP token TTL.
mxcp.sdk.auth.middlewarewith cache hit/miss/expiry logic.mxcp.sdk.auth.baseandmxcp.sdk.auth.persistencewith newexpires_at/refresh handling in_types.keycloak,google,github,atlassian, andsalesforceto support token refresh and resilient userinfo retrieval.mxcp.server.interfaces.server.mcpand auth helpers.server/core/config/_types.pyand updateschemas/mxcp-config-schema-1.jsonfor new OAuth fields.examples/*/config.yml.docs/guides/authentication.mdto document caching and token refresh behavior.Written by Cursor Bugbot for commit 0365b8d. This will update automatically on new commits. Configure here.