cuda : fix "V is K view" check for non-unified KV cache by ggerganov · Pull Request #19145 · ggml-org/llama.cpp

ggerganov · 2026-01-27T19:24:46Z

#19057

We weren't handling the case where both V and K are views of the same data with the same offset different from 0. This happens with split KV cache (e.g. --parallel 4 --no-kv-unified) and causes the flash attention to fall back to the CPU in such cases.

ggerganov requested a review from JohannesGaessler as a code owner January 27, 2026 19:24

cuda : fix "V is K view" check for non-unified KV cache

c9f3020

ggerganov force-pushed the gg/cuda-fix-v-is-k-view-check branch from 89697cf to c9f3020 Compare January 27, 2026 19:25

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 27, 2026

loci-dev mentioned this pull request Jan 27, 2026

UPSTREAM PR #19145: cuda : fix "V is K view" check for non-unified KV cache auroralabs-loci/llama.cpp#1054

Open

JohannesGaessler approved these changes Jan 27, 2026

View reviewed changes

ggerganov merged commit 631cbfc into master Jan 28, 2026
75 of 78 checks passed

ggerganov deleted the gg/cuda-fix-v-is-k-view-check branch January 28, 2026 07:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda : fix "V is K view" check for non-unified KV cache#19145

cuda : fix "V is K view" check for non-unified KV cache#19145
ggerganov merged 1 commit intomasterfrom
gg/cuda-fix-v-is-k-view-check

ggerganov commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants