Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ Some presses rely on a different logic:
- `DuoAttentionPress` ([source](kvpress/presses/duo_attention_press.py), [paper](https://arxiv.org/abs/2410.10819)): split heads into retrieval heads (no compression) and streaming heads (StreamingLLM approach)
- `FinchPress` ([source](kvpress/presses/finch_press.py), [paper](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00716/125280)): similar to SnapKV with a dynamic window size and key value re-rotation
- `KVzipPress` ([source](kvpress/presses/kvzip_press.py), [paper](https://arxiv.org/abs/2505.23416)): identifies redundant KV pairs through context reconstruction. Achieves near-lossless compression at the cost of multiple forward passes.
- `KVComposePress` ([source](kvpress/presses/kvcompose_press.py), [paper](https://arxiv.org/abs/2509.05165)): attention-guided eviction, aligning per-head selections into composite tokens to preserve cache structure.

> [!NOTE]
> `KVComposePress` performs an extra pass over the full context, temporarily creating a KV cache of ~2x the context length and creating memory overhead during prefill.

Finally we provide wrapper presses that can be combined with other presses:
- `AdaKVPress` ([source](kvpress/presses/adakv_press.py), [paper](https://arxiv.org/abs/2407.11550)): prune bottom scores of any `ScorerPress` but across all heads, achieving head-wise compressions
Expand Down
3 changes: 3 additions & 0 deletions evaluation/evaluate_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
FinchPress,
KeyDiffPress,
KnormPress,
KVComposePress,
KVzipPress,
ObservedAttentionPress,
PyramidKVPress,
Expand Down Expand Up @@ -72,6 +73,8 @@
"expected_attention": ExpectedAttentionPress(),
"finch": FinchPress(),
"keydiff": KeyDiffPress(),
"kvcompose": KVComposePress(),
"kvcompose_unstructured": KVComposePress(structured=False),
"kvzip": KVzipPress(),
"knorm": KnormPress(),
"observed_attention": ObservedAttentionPress(),
Expand Down
2 changes: 2 additions & 0 deletions kvpress/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from kvpress.presses.key_rerotation_press import KeyRerotationPress
from kvpress.presses.keydiff_press import KeyDiffPress
from kvpress.presses.knorm_press import KnormPress
from kvpress.presses.kvcompose_press import KVComposePress
from kvpress.presses.kvzip_press import KVzipPress
from kvpress.presses.lagkv_press import LagKVPress
from kvpress.presses.observed_attention_press import ObservedAttentionPress
Expand Down Expand Up @@ -65,4 +66,5 @@
"KeyDiffPress",
"KVzipPress",
"ExpectedAttentionStatsPress",
"KVComposePress",
]
Loading