Feat/chat rolling cache #526
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Chat Sampler: Implement Rolling Cache for Infinite Conversations
Summary
Implements a rolling cache strategy in
ChatSamplerto support infinite multi-turn conversations. When the context window (cache) capacity is exceeded, the sampler efficiently truncates the oldest history while preserving the most recent turns and the new user prompt, allowing the conversation to continue indefinitely.Problem Statement
Previously,
ChatSamplerutilized a fixed-size KV cache (defined bycache_length, default 4096).cache.is_fullcheck) or generation would halt immediately.TODO(epot): Support and test rolling cache.validation.Solution
Implemented a robust "prompt-based" rolling cache mechanism within
ChatSampler.chat.Algorithm
last_state.used_cache_length + new_prompt_lengthagainstcache_length.self.turnsinto a single string.cache_length - 64(a safety buffer).last_state = None. This instructs the underlyingSamplerto discard the full KV cache and perform a fresh prefill on the truncated prompt.print_streamis enabled, a message is printed to notify the user that rolling occurred.self.turnsremains intact, preserving the logical history of the conversation (for record-keeping), even though the model's effective context window is truncated.Impact
Verification
Sampler.last_stateis reset correctly upon overflow, ensuring theSamplerprefills the new truncated context.