Skip to content

Conversation

@ggerganov
Copy link
Member

ref #18986 (comment)
ref #19128 (comment)
ref #19292

This type of rope is needed during rope shift:

ggml_tensor * k =
ggml_view_3d(ctx, layer.k,
n_rot, n_head_kv, get_size()*n_stream,
ggml_row_size(layer.k->type, n_embd_head_k),
ggml_row_size(layer.k->type, n_embd_k_gqa),
ggml_row_size(layer.k->type, n_embd_nope));
ggml_tensor * cur = build_rope_shift(cparams, ctx, k, inp->k_shift, rope_factors, freq_base_l, freq_scale_l);

We are updating the KV cache data, so the op is inplace. Since the changes in #18986 the tensor is now also non-contiguous.

Based on the discussion in #18986 (comment), it appears that the Vulkan backend currently does not support this case. cc @jeffbolznv @0cc4m

@jeffbolznv
Copy link
Collaborator

Yeah, the new tests are failing in the vulkan backend. I'll look into it.

@LostRuins
Copy link
Collaborator

Thank you. If it works I will close #19292

@github-actions github-actions bot added the testing Everything test related label Feb 3, 2026
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
@ggerganov
Copy link
Member Author

Huh, adding the dim 3 somehow hided the CUDA failures from earlier:

https://github.com/ggml-org/llama.cpp/actions/runs/21636838841/job/62364670686#step:3:48919

I guess I have to add both tests for ne[3] == 1 and ne[3] > 1

@LostRuins
Copy link
Collaborator

Hi all, I can confirm that Jeff's patch fixes #19292 for me on Vulkan!

Works fine and fixes the incoherence-after-shifting regression on GLM-4-32B-0414 and GLM-4.7-Flash.

Much appreciated to all.

However is anyone able to check this test case is good for CUDA too? I have heard that a similar issue affected a CUDA RTX 5xxx user on GLM-4.7-Flash, would be nice if someone could confirm the test passes on CUDA.

@ggerganov
Copy link
Member Author

It's very likely broken with CUDA in a similar way - see the test failure from earlier: https://github.com/ggml-org/llama.cpp/actions/runs/21636838841/job/62364670686#step:3:48919

Once we stabilize the tests, we'll notify CUDA maintainers to take a look.

@LostRuins
Copy link
Collaborator

Thanks! Once there is a fix for cuda i'll let the cuda user try it out again.

@ggerganov ggerganov merged commit eaba92c into master Feb 4, 2026
66 of 75 checks passed
@ggerganov ggerganov deleted the gg/tests-rope-inplace branch February 4, 2026 10:45
@ggerganov
Copy link
Member Author

@JohannesGaessler Note that some of the newly added rope tests are currently failing with CUDA: https://github.com/ggml-org/llama.cpp/actions/runs/21664340312/job/62455946368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants