Add Attention sink in export by kirklandsign · Pull Request #215 · huggingface/optimum-executorch

kirklandsign · 2026-02-20T02:23:41Z

No description provided.

Introduce CustomRingKVCacheWithSink and ETCustomAttentionSinkCache that preserve the first sink_size tokens while using a ring buffer for the remaining window. Add get_custom_sdpa_for_attention_sink to build per-layer attention masks with sink token preservation. Wire the attention_sink parameter through replace_with_et_custom_kv_cache. Co-authored-by: Claude <noreply@anthropic.com>

Register a dedicated custom_sdpa_attention_sink attention implementation when the attention_sink option is provided, with priority over the existing ring KV cache SDPA path. Pass attention_sink through to the cache setup at export time.

kirklandsign · 2026-02-23T21:04:06Z

optimum/executorch/attentions/custom_kv_cache.py

+try:
+    from executorch.examples.models.llama.source_transformation.attention_sink import (
+        CachePositionsManagerWithSink,
+        _create_causal_mask_for_attention_sink,


Will need to wait for et side change to merge first

kirklandsign and others added 2 commits February 19, 2026 17:42

kirklandsign commented Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add Attention sink in export#215

Add Attention sink in export#215
kirklandsign wants to merge 2 commits intohuggingface:mainfrom
kirklandsign:attention-sink

kirklandsign commented Feb 20, 2026

Uh oh!

kirklandsign Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

kirklandsign commented Feb 20, 2026

Uh oh!

kirklandsign Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant