Skip to content

cuda : fix "V is K view" check for non-unified KV cache#19145

Merged
ggerganov merged 1 commit intomasterfrom
gg/cuda-fix-v-is-k-view-check
Jan 28, 2026
Merged

cuda : fix "V is K view" check for non-unified KV cache#19145
ggerganov merged 1 commit intomasterfrom
gg/cuda-fix-v-is-k-view-check

Conversation

@ggerganov
Copy link
Member

#19057

We weren't handling the case where both V and K are views of the same data with the same offset different from 0. This happens with split KV cache (e.g. --parallel 4 --no-kv-unified) and causes the flash attention to fall back to the CPU in such cases.

@ggerganov ggerganov force-pushed the gg/cuda-fix-v-is-k-view-check branch from 89697cf to c9f3020 Compare January 27, 2026 19:25
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 27, 2026
@ggerganov ggerganov merged commit 631cbfc into master Jan 28, 2026
75 of 78 checks passed
@ggerganov ggerganov deleted the gg/cuda-fix-v-is-k-view-check branch January 28, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants