Skip to content

Conversation

@nilskroe
Copy link

@nilskroe nilskroe commented Jan 31, 2026

Summary

This PR addresses critical performance issues for long chat sessions:

Performance Optimizations

  • O(1) sync during streaming - During streaming, only the last message changes. Now only checks the last message instead of looping through all messages. Full sync only happens when IDs change or on first sync.
  • LRU cache eviction - Tracks recently used subchats (max 10 cached) and automatically evicts oldest subchat caches when threshold exceeded, preventing unbounded memory growth.
  • Better cache cleanup - Clears textPartCache and messageStructureCache entries when messages are removed.
  • Deep clone nested objects - Fixed shallow cloning that omitted output, result, and error fields in message sync, ensuring Jotai detects all changes.
  • Eliminate duplicate token calculation - Replaced O(n) useMemo in active-chat.tsx with centralized messageTokenDataAtom that uses caching for O(1) lookups during streaming.

Stream Lifecycle Fixes

  • Abort streams on cache eviction - When a sub-chat is evicted from LRU cache, its active stream is now aborted first, preventing background streams from consuming resources.
  • Proper cleanup on chat deletion - agentChatStore.delete() now calls abort() first to ensure streams are stopped before removing references, preventing orphaned tRPC subscriptions.

Performance Impact

Metric Before After
Streaming sync O(n) O(1)
Token calculation O(n) per render O(1) cached
Memory (subchats) Unbounded Max 10 cached
Long sessions Memory leak Bounded
Background streams Continue after eviction Properly aborted

Test Plan

  • Open a long chat session (100+ messages)
  • Verify streaming is responsive (no lag during AI responses)
  • Switch between multiple chats and verify memory doesn't grow unbounded
  • Create new chats within a worktree and verify no lag
  • Verify token counts display correctly after streaming completes
  • Open 10+ sub-chats, verify oldest ones are evicted and streams stopped

This addresses critical performance issues for long chat sessions:

1. O(1) sync during streaming (was O(n)):
   - During streaming, only the last message changes
   - Now only checks the last message instead of all messages
   - Full sync only happens when IDs change or on first sync

2. O(1) LRU cache eviction for subchats:
   - Uses Map's insertion order for O(1) LRU operations
   - Tracks most recent subChat to skip redundant updates during streaming
   - Automatically evicts oldest subchat caches when threshold exceeded (10 max)
   - Prevents unbounded memory growth in long-running sessions

3. O(1) textPartCache cleanup:
   - Tracks cache keys per message for instant cleanup
   - No more O(n) iteration through all keys when removing a message

4. Better cache cleanup in clearSubChatCaches():
   - Clears textPartCache entries using tracked keys
   - Clears messageStructureCache entries

Performance impact:
- Streaming sync: O(n) -> O(1) where n = message count
- LRU operations: O(n) -> O(1)
- Cache cleanup: O(total keys) -> O(keys for message)
- Memory: Bounded by MAX_CACHED_SUBCHATS (10) instead of unbounded
@nilskroe nilskroe force-pushed the fix/chat-performance-memory-leaks branch from 9dff50f to ec4ca66 Compare January 31, 2026 01:27
1. Deep clone nested objects in message sync
   - Previously only shallow cloned `input` field
   - Now also clones `output`, `result`, and `error` to ensure Jotai
     detects all changes and components re-render properly

2. Eliminate duplicate token calculation
   - active-chat.tsx had O(n) useMemo that iterated all messages
   - Now uses centralized messageTokenDataAtom from message-store
   - Atom uses caching to achieve O(1) lookups during streaming
   - Added totalCostUsd to the centralized atom for full parity
1. Add abort() method to agentChatStore
   - Calls chat.stop() to abort any active stream
   - Sets manuallyAborted flag to prevent completion sounds
   - Handles errors gracefully (chat may already be stopped)

2. Update delete() to call abort() first
   - Ensures stream is stopped before removing references
   - Prevents orphaned tRPC subscriptions

3. LRU cache eviction now aborts streams
   - When oldest sub-chat is evicted, abort its stream first
   - Prevents background streams from continuing after cache cleared
   - Saves CPU, memory, and network bandwidth
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant