Skip to content

Comments

Fix: respect --max_seq_len for sliding window models with custom kv cache + sdpa#219

Open
rhn19 wants to merge 1 commit intohuggingface:mainfrom
rhn19:fix/max-seq-len-sliding-window-ring-cache
Open

Fix: respect --max_seq_len for sliding window models with custom kv cache + sdpa#219
rhn19 wants to merge 1 commit intohuggingface:mainfrom
rhn19:fix/max-seq-len-sliding-window-ring-cache

Conversation

@rhn19
Copy link

@rhn19 rhn19 commented Feb 20, 2026

Fixes #218

Changes

  • _prepare_export_inputs: use max_dim = max_seq_len - 1 instead of
    min(max_seq_len, sliding_window) - 1 when ring cache is active
  • export: bypass TorchExportableModuleWithHybridCache.__init__ to prevent
    StaticSlidingWindowLayer baking a <= sliding_window guard into the exported graph

Tested on

  • google/gemma-3-1b-it with --max_seq_len 1024 --use_custom_kv_cache --use_custom_sdpa,
    prompt > 512 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] --max_seq_len ignored for sliding window models with --use_custom_kv_cache --use_custom_sdpa (cache capped at sliding_window size)

1 participant