Skip to content

Conversation

@ykhrustalev
Copy link
Collaborator

@ykhrustalev ykhrustalev commented Nov 20, 2025

A control PR to show the divergence from the original llama.cpp repo

@ykhrustalev ykhrustalev changed the title [DO NOT MERGE] A control PR to show the divergence from the original llama.cpp repo [DO NOT MERGE] Diff with the master branch Nov 20, 2025
ykhrustalev and others added 12 commits November 20, 2025 08:22
Implement conditional prefill computation skipping in llama-bench:
disable computation for --depth prefill while keeping it enabled for
prefill benchmarks.

- Default behavior (no flag): Depth prefill skips computation
- With `--enable-depth-computation`: Depth prefill performs full
computation
- `-p` benchmarks: Always perform computation (not affected by this
flag)
Added a new windows-cuda job that:
- Uses Windows 2022 runner with CUDA 12.4
- Installs CUDA toolkit and Ninja build system
- Builds llama-bench with CUDA support enabled
- Packages and uploads the benchmark tool artifacts
- Follows the same pattern as the release.yml windows-cuda job

Updated the release job to depend on the new windows-cuda job.

*Make sure to read the [contributing
guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md)
before submitting a PR*

Co-authored-by: Claude <noreply@anthropic.com>
ykhrustalev pushed a commit that referenced this pull request Dec 8, 2025
* Faster tensors (#8)

Add fast matrix and matrix/vector multiplication.

* Use map for shader replacements instead of pair of strings

* Wasm (#9)

* webgpu : fix build on emscripten

* more debugging stuff

* test-backend-ops: force single thread on wasm

* fix single-thread case for init_tensor_uniform

* use jspi

* add pthread

* test: remember to set n_thread for cpu backend

* Add buffer label and enable dawn-specific toggles to turn off some checks

* Intermediate state

* Fast working f16/f32 vec4

* Working float fast mul mat

* Clean up naming of mul_mat to match logical model, start work on q mul_mat

* Setup for subgroup matrix mat mul

* Basic working subgroup matrix

* Working subgroup matrix tiling

* Handle weirder sg matrix sizes (but still % sg matrix size)

* Working start to gemv

* working f16 accumulation with shared memory staging

* Print out available subgroup matrix configurations

* Vectorize dst stores for sg matrix shader

* Gemv working scalar

* Minor set_rows optimization (#4)

* updated optimization, fixed errors

* non vectorized version now dispatches one thread per element

* Simplify

* Change logic for set_rows pipelines

---------

Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan>
Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

* Comment on dawn toggles

* Working subgroup matrix code for (semi)generic sizes

* Remove some comments

* Cleanup code

* Update dawn version and move to portable subgroup size

* Try to fix new dawn release

* Update subgroup size comment

* Only check for subgroup matrix configs if they are supported

* Add toggles for subgroup matrix/f16 support on nvidia+vulkan

* Make row/col naming consistent

* Refactor shared memory loading

* Move sg matrix stores to correct file

* Working q4_0

* Formatting

* Work with emscripten builds

* Fix test-backend-ops emscripten for f16/quantized types

* Use emscripten memory64 to support get_memory

* Add build flags and try ci

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* Remove extra whitespace

* Move wasm single-thread logic out of test-backend-ops for cpu backend

* Disable multiple threads for emscripten single-thread builds in ggml_graph_plan

* Fix .gitignore

* Add memory64 option and remove unneeded macros for setting threads to 1

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants