Fix: respect --max_seq_len for sliding window models with custom kv cache + sdpa by rhn19 · Pull Request #219 · huggingface/optimum-executorch

rhn19 · 2026-02-20T23:32:34Z

Fixes #218

Changes

_prepare_export_inputs: use max_dim = max_seq_len - 1 instead of
min(max_seq_len, sliding_window) - 1 when ring cache is active
export: bypass TorchExportableModuleWithHybridCache.__init__ to prevent
StaticSlidingWindowLayer baking a <= sliding_window guard into the exported graph

google/gemma-3-1b-it with --max_seq_len 1024 --use_custom_kv_cache --use_custom_sdpa,
prompt > 512 tokens

…ache + sdpa

Fix: respect --max_seq_len for sliding window models with custom kv c…

4511751

…ache + sdpa