Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
575 commits
Select commit Hold shift + click to select a range
2e677ae
vulkan: faster q6_k matmul (llama/17813)
netrunnereve Dec 14, 2025
ceb68e5
vulkan: improve mul_mat_vec_iq1_s speed (llama/17874)
lovedheart Dec 14, 2025
05a170a
vulkan: Fix data race/hang in scalar/cm1 flash attention (llama/17887)
jeffbolznv Dec 14, 2025
3e9f2ba
sync : llama.cpp
ggerganov Dec 14, 2025
106ba28
vulkan: fix mul_mat_vec_iq1_s formatting (llama/18026)
0cc4m Dec 14, 2025
691a7ca
Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (llama/1…
NeoZhangJianyu Dec 15, 2025
0a2b6b2
llama: automatically set parameters not set by the user in such a way…
JohannesGaessler Dec 15, 2025
0b5b649
metal: use shared buffers on eGPU (llama/17866)
jdemeule Dec 15, 2025
b821be8
ggml-hexagon: mm for mtmd (llama/17894)
joeldushouyu Dec 15, 2025
23c6c8a
ggml : use WARP_SIZE/2 for argmax reduction offset (llama/18092)
Aadeshveer Dec 17, 2025
f1a4061
llama.android : Rewrite Android binding (w/o cpu_features dep) (llama…
naco-siren Dec 17, 2025
ac0c8be
sync : llama.cpp
ggerganov Dec 17, 2025
0b2b21f
HIP: Refactor mma for RDNA and CDNA (llama/17990)
zhang-hui-yulo Dec 17, 2025
02331aa
ggml-cpu: ARM64: repack version of q8_0 (dotprod and i8mm) (llama/18096)
Alcpz Dec 17, 2025
1c73b80
ggml-hexagon: gelu operation (llama/17921)
joeldushouyu Dec 17, 2025
1ca4837
ggml-hexagon: swiglu_oai operation (llama/18114)
joeldushouyu Dec 17, 2025
0c4d8f6
remove i_major_dual (llama/18157)
zhang-hui-yulo Dec 18, 2025
117a677
ggml-cpu: extend support for RVV floating-point kernels (llama/17318)
taimur-10x Dec 18, 2025
cdb1e3f
model : add ASR support for LFM2-Audio-1.5B (conformer) (llama/18106)
ngxson Dec 18, 2025
8faf01f
vulkan: Add perf logger mode with concurrency (llama/17944)
jeffbolznv Dec 19, 2025
1c915c8
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for mor…
ngdxzy Dec 19, 2025
ef780f7
Added comments explaining thread block size selection logic based on …
Aadeshveer Dec 20, 2025
68b14a5
test-backend-ops: improve msvc build time (llama/18209)
jeffbolznv Dec 20, 2025
9287ae2
tests: Avoid floating point precision false positives in SUM (llama/1…
jeffbolznv Dec 20, 2025
75d7a02
Vulkan: some improvement on mul_mat_iq2_xs (llama/18031)
lovedheart Dec 21, 2025
b76e95c
vulkan: in graph_optimize, try to group ADD operations (llama/18060)
jeffbolznv Dec 21, 2025
2d15b06
vulkan: support GGML_UNARY_OP_XIELU (llama/18062)
jeffbolznv Dec 21, 2025
0fb4292
vulkan/cuda: fix topk_moe with exp_probs_b (llama/18071)
jeffbolznv Dec 21, 2025
e68c6fb
vulkan: fix im2col overflowing maxworkgroupcount (llama/18180)
jeffbolznv Dec 21, 2025
68eab95
llama: fix RPC for -fit on (llama/18233)
JohannesGaessler Dec 21, 2025
2ac2227
vulkan: Implement set_tensor_async and the event interfaces (llama/18…
jeffbolznv Dec 21, 2025
582cf26
vulkan: Extend rope fusions to allow mrope (llama/18264)
jeffbolznv Dec 22, 2025
fcf84d0
opencl: unpack q4_0 for adreno in get_tensor (llama/18278)
lhez Dec 22, 2025
1ee0280
llamafile: add rvv support for sgemm kernels (llama/18199)
taimur-10x Dec 22, 2025
85750fc
ggml-hexagon: gelu optimization (llama/18151)
joeldushouyu Dec 22, 2025
049f610
ggml-hexagon: create generalized functions for cpu side op (llama/17500)
chraac Dec 23, 2025
7221514
rpc : add check for rpc buffer type (llama/18242)
struct Dec 23, 2025
76f20c2
CANN: Uses yarn_ramp cache in ROPE (llama/17725)
TianHao324 Dec 24, 2025
532c8d0
vulkan: use fewer FA rows for small cache runs (llama/18280)
0cc4m Dec 24, 2025
5566085
CANN : refactor ACL graph cache (llama/17752)
wangweixuan Dec 24, 2025
9e98a08
vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (…
jeffbolznv Dec 24, 2025
f1f9708
CUDA: experimental native mxfp4 support for blackwell (llama/17906)
am17an Dec 24, 2025
4da8e8a
ggml : optimize cuda cumsum fallback kernel (llama/18343)
Aadeshveer Dec 25, 2025
4106989
CANN: Add support for CONV_TRANSPOSE_1D when kernel size > 255 (llama…
Intellouis Dec 25, 2025
a5506cd
ggml-cuda: fix blackwell native builds (llama/18361)
am17an Dec 25, 2025
0f75a76
cuda: optimize cumsum cub path (llama/18362)
am17an Dec 25, 2025
c5aac04
ggml-cuda: fix regex for arch list (llama/18371)
am17an Dec 25, 2025
cc18d0f
CANN: implement the SSM_CONV operator (llama/17737)
0Marble Dec 26, 2025
d063a19
vulkan: handle rope with large number of rows (llama/18306)
jeffbolznv Dec 26, 2025
256ba84
vulkan: Support UPSCALE w/antialias (llama/18327)
jeffbolznv Dec 26, 2025
709ca57
vulkan: small dequantization improvements (llama/18380)
netrunnereve Dec 26, 2025
9d351d6
vulkan: Use BK=32 for coopmat2 mul_mat_id (llama/18332)
jeffbolznv Dec 26, 2025
792ec16
vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (llama/18349)
jeffbolznv Dec 26, 2025
82db5d8
vulkan: preprocess mul_mat_id experts and discard workgroups more qui…
jeffbolznv Dec 26, 2025
cf317e9
ggml-cuda: Use same regex for GGML_NATIVE=OFF (llama/18407)
am17an Dec 27, 2025
c84e02e
opencl: allow resizing transpose buffers (llama/18384)
lhez Dec 27, 2025
fc65c4c
ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATIVE=ON (l…
QDelta Dec 28, 2025
62b6d8c
cmake: Added more x86_64 CPU backends when building with `GGML_CPU_AL…
bberberov Dec 28, 2025
b0a7369
rpc: fix segfault on invalid endpoint format (llama/18387)
o7si Dec 28, 2025
6024715
Revert "ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATI…
am17an Dec 28, 2025
73175ce
HIP: Use mmq on MFMA devices for MUL_MAT_ID in cases where a lot of s…
IMbackK Dec 28, 2025
391d744
cuda: fix race condition in cumsum (llama/18448)
am17an Dec 29, 2025
2aacc78
CUDA: Blackwell features for non-native builds (llama/18436)
JohannesGaessler Dec 29, 2025
5d51abc
CUDA: fix replacment of bad archs in CMake (llama/18457)
JohannesGaessler Dec 29, 2025
8feb4ac
CUDA: add log line when mxfp4 acceleration is used (llama/18483)
am17an Dec 30, 2025
2b4c14b
kleidiai: add and integrate SVE 256-bit vector-length kernel (llama/1…
chaxu01 Dec 30, 2025
4218541
Work around broken IntelSYCLConfig.cmake in Intel oneAPI 2025.x (llam…
rrsathe Dec 31, 2025
1d55ae2
sycl: add newline at the end of CMakeLists.txt (llama/18503)
am17an Dec 31, 2025
81034b3
sync : llama.cpp
ggerganov Dec 31, 2025
5b08c44
metal : remove BF16 x F16 kernels (llama/18456)
ggerganov Dec 31, 2025
cf9e3e9
CUDA: fix KQ max calculation (llama/18487)
JohannesGaessler Dec 31, 2025
dfac759
metal : add count_equal op (llama/18314)
gatbontonpc Dec 31, 2025
924883e
sync : llama.cpp
ggerganov Dec 31, 2025
e926786
sync : whisper.cpp
ggerganov Dec 31, 2025
ebc3a0f
ggml : bump version to 0.9.5 (#1410)
ggerganov Dec 31, 2025
1fae222
vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron …
jeffbolznv Jan 1, 2026
d20816e
ggml-cuda: remove unneccesary prints on ggml_cuda_init (llama/18502)
am17an Jan 1, 2026
5ec8d3c
cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion) (…
Meet91721 Jan 1, 2026
22206af
rpc : use unordered_map::reserve and emplace (llama/18513)
struct Jan 2, 2026
22d3e1b
metal : adjust extra size for FA buffer to avoid reallocations (llama…
ggerganov Jan 2, 2026
89904ac
vulkan: Implement mmvq for iq1_s/iq1_m (llama/18450)
jeffbolznv Jan 2, 2026
cca55fe
vulkan: Optimize GGML_OP_CUMSUM (llama/18417)
jeffbolznv Jan 2, 2026
f0858b1
ggml-hexagon: optimize activation function (llama/18393)
joeldushouyu Jan 3, 2026
a90aefc
(Bugfix, ggml-cuda) Pool alloc count fix + small size computation typ…
pl752 Jan 3, 2026
5433aea
CUDA: only allocate FA tmp buffer if needed (llama/18564)
JohannesGaessler Jan 3, 2026
748aad9
ggml-cuda: fixes for concurrent streams (llama/18496)
am17an Jan 3, 2026
4aa1bd8
ggml-cuda: remove unused params in ggml_cuda_graph (llama/18579)
am17an Jan 4, 2026
609fc52
CUDA: disable cuda graph when using n-cpu-moe (llama/18593)
am17an Jan 4, 2026
175f645
sampling : add support for backend sampling (llama/17004)
danbev Jan 4, 2026
34e1e6c
CANN: add operator fusion support for ADD + RMS_NORM (llama/17512)
noemotiovon Jan 5, 2026
0624ce2
vulkan: handle quantize_q8_1 overflowing the max workgroup count (lla…
jeffbolznv Jan 5, 2026
4a25293
vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (llama/18582)
jeffbolznv Jan 5, 2026
e48d756
ggml-cuda: check for srcs outside the cgraph (llama/18583)
am17an Jan 5, 2026
bb7e7d2
CUDA: fix FA FP16 accumulator overflow for Granite (llama/18614)
JohannesGaessler Jan 5, 2026
2addf83
ggml webgpu: add CEIL operation support (llama/18605)
tnguyen21 Jan 5, 2026
16f62c5
CANN: Make `valid_values` variable `static const` (llama/18627)
rauletorresc Jan 6, 2026
f825682
ggml : fix avx512bf16 build (llama/18623)
angt Jan 6, 2026
803885a
mmq.cu: tune mmq/rocblas switching for RDNA (llama/18537)
Beinsezii Jan 6, 2026
88e4325
ggml-cuda: refactor cuda graph usage (llama/18637)
am17an Jan 6, 2026
39b43fd
vulkan: support buffer_from_host_ptr (llama/18467)
jeffbolznv Jan 6, 2026
4c540d0
ggml : optimize cuda ssm_scan using warp-level reduction (llama/18505)
Aadeshveer Jan 6, 2026
e1fa998
Hexagon add support for f16/f32 flash attention, scale, set-rows and …
max-krasnyansky Jan 7, 2026
273506d
CANN: Rename `get_env` to `get_env_as_lowercase` (llama/18624)
rauletorresc Jan 7, 2026
98b2f1c
CANN: Fix rename for get_env (llama/18652)
hipudding Jan 7, 2026
03229fc
vulkan: more mul mat optimizations (llama/18533)
netrunnereve Jan 7, 2026
5bd4811
vulkan: Warptile tuning for Intel Xe2/Xe3 (llama/18178)
virajwad Jan 7, 2026
89c639b
vulkan: reject ops when a tensor is too large to allocate (llama/18646)
jeffbolznv Jan 7, 2026
5c3d483
cuda : fix build on cuda 12.8 (llama/18672)
olliewalsh Jan 7, 2026
3354cf5
opencl: add FILL op support (llama/18682)
shaofeiqi Jan 8, 2026
bce8a97
ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (llama/18535)
DocShotgun Jan 8, 2026
b53073c
metal : add MoE kernel specialization for ne20=5 (llama/18667)
dororodoroddo Jan 8, 2026
a199873
vulkan: optimize ssm_scan (llama/18630)
jeffbolznv Jan 8, 2026
c1c1841
vulkan: fix push constant size for quantize_q8_1 (llama/18687)
jeffbolznv Jan 8, 2026
a253f62
ggml webgpu: initial flashattention implementation (llama/18610)
reeselevine Jan 8, 2026
b2f1a57
ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (llama/18628)
yomaytk Jan 8, 2026
880736e
llama: use host memory if device reports 0 memory (llama/18587)
taronaeo Jan 8, 2026
2ea5afa
sync : llama.cpp
ggerganov Jan 9, 2026
4d86fe0
Updates to webgpu get_memory (llama/18707)
reeselevine Jan 9, 2026
4138b9b
opencl: add EXPM1 op (llama/18704)
shaofeiqi Jan 9, 2026
4a4cd37
Corrected: changed s13 = src1->nb[3] instead of nb[2] (llama/18724)
michaelw9999 Jan 10, 2026
fcb34fb
cmake : update blas logic (llama/18205)
DaAwesomeP Jan 10, 2026
9ca3cfe
HIP: adjust RDNA3.5 MMQ kernel selction logic (llama/18666)
JohannesGaessler Jan 10, 2026
d8faba2
test-backend-ops: fix mxfp4 tests on blackwell (llama/18736)
am17an Jan 10, 2026
9d7137f
opencl: add SOFTPLUS op support (llama/18726)
shaofeiqi Jan 11, 2026
8891ab6
sync : llma.cpp
ggerganov Jan 11, 2026
3febb7e
Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support …
0cc4m Jan 11, 2026
0411513
vulkan: Disable large coopmat matmul configuration on proprietary AMD…
0cc4m Jan 12, 2026
803ae58
vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id)…
jeffbolznv Jan 12, 2026
3086e27
vulkan: change memory_logger to be controlled by an env var (llama/18…
jeffbolznv Jan 12, 2026
c9c733c
CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (llama/18800)
ggerganov Jan 13, 2026
b6d1f0f
sync : llama.cpp
ggerganov Jan 13, 2026
7e52a14
HIP: add fattn-mma-f16 for RDNA4 (llama/18481)
zhang-hui-yulo Jan 13, 2026
db023c9
ggml-metal: do not copy headers for embedded, use current binary dir …
DaAwesomeP Jan 14, 2026
7955c25
vulkan: work around Intel fp16 bug in mmq (llama/18814)
0cc4m Jan 14, 2026
b7b0315
CUDA : fix typo in clang pragma comment [no ci] (llama/18830)
danbev Jan 14, 2026
2502bce
vulkan: Check maxStorageBufferRange in supports_op (llama/18709)
jeffbolznv Jan 14, 2026
f3a50eb
CUDA: Factor out and re-use `block_reduce` function (llama/18785)
ORippler Jan 15, 2026
15c35cd
sync : llama.cpp
ggerganov Jan 30, 2026
c2f1fb2
hexagon: support for OP_CPY, host buffers now optional (llama/18822)
max-krasnyansky Jan 30, 2026
93601e0
sync : llama.cpp
ggerganov Jan 30, 2026
2a4066d
ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (llama/18837)
shalinib-ibm Jan 15, 2026
c062a07
CUDA: fix allignment on register spill for FA (llama/18815)
JohannesGaessler Jan 15, 2026
9054f92
cuda : print less debug logs when disabling cuda graphs (llama/18868)
ggerganov Jan 15, 2026
1999fc0
OpenCL: add SOLVE_TRI op support (llama/18846)
shaofeiqi Jan 15, 2026
aab6aac
CANN: support gated linear attn (llama/18653)
hipudding Jan 16, 2026
007da0e
CANN: fix an issue where get_env was not fully renamed (llama/18796)
noemotiovon Jan 16, 2026
51a9964
CANN: Remove unused `ggml_cann_get_device` function (llama/18625)
rauletorresc Jan 16, 2026
5fbb524
ggml-blas: hide warnings from included BLAS headers (llama/18818)
DaAwesomeP Jan 16, 2026
32201fa
ggml : extend ggml_pool_1d + metal (llama/16429)
ThoreKoritzius Jan 16, 2026
947adfd
sync : llama.cpp
ggerganov Jan 30, 2026
94be18d
ggml webgpu: support for backend sampling (llama/18880)
reeselevine Jan 30, 2026
acea53f
sync : llama.cpp
ggerganov Jan 30, 2026
df5dd5e
opencl: fix q6_K mv for m=1 (llama/18893)
lhez Jan 17, 2026
daa2d46
ggml : add ggml_build_forward_select (llama/18550)
ggerganov Jan 19, 2026
7ab61a2
metal : enable FA for MLA heads (llama/18950)
ggerganov Jan 20, 2026
3f0de98
ggml : cleanup path_str() (llama/18928)
angt Jan 20, 2026
831fdb7
CUDA: Replace init_offsets kernel with iterators in cub-based argsort…
ORippler Jan 20, 2026
0a25526
CUDA: Fix builds for older CCCL versions by ifdefing strided_iterator…
ORippler Jan 21, 2026
960ee54
vulkan: Use mul_mat_vec_id for small values of n (llama/18918)
jeffbolznv Jan 21, 2026
6c0e0d9
Revert "vulkan: force full subgroups for flash attention to fix intel…
rillomas Jan 21, 2026
541c051
vulkan: support flash attention GQA/split_k with small batches (llama…
jeffbolznv Jan 21, 2026
84ec0f9
vulkan: Remove transfer_ctx, do everything in compute_ctx. (llama/18945)
jeffbolznv Jan 21, 2026
b262a57
ggml-zdnn : mark zDNN buffers as non-host (llama/18967)
AlekseiNikiforovIBM Jan 22, 2026
7751c92
opencl: add TRI op support (llama/18979)
shaofeiqi Jan 22, 2026
e5c8629
CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953)
am17an Jan 22, 2026
8fd2b3e
opencl: enable the general fp mm for non-cont input and as a fallback…
lhez Jan 22, 2026
544f15d
CUDA: fix alignment check for FA (llama/19023)
JohannesGaessler Jan 22, 2026
91d417c
mla : make the V tensor a view of K (llama/18986)
ggerganov Jan 22, 2026
6a7ff59
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementat…
Alcpz Jan 23, 2026
a2bd034
use malloc to support both iGPU and dGPU in same time (llama/18992)
arthw Jan 23, 2026
88b0e18
ggml-hexagon: flash-attn opt (llama/19025)
chraac Jan 24, 2026
a1f1906
ggml-cuda: enable cuda-graphs for `n-cpu-moe` (llama/18934)
am17an Jan 24, 2026
0c38116
CUDA: re-use MLA K data for V in MMA FA (llama/19057)
JohannesGaessler Jan 24, 2026
fce2ea4
kv-cache : support V-less cache (llama/19067)
ggerganov Jan 25, 2026
10e7b31
ggml-cpu: Use tiled FA for prompt-processing (llama/19012)
am17an Jan 25, 2026
1a7ad53
metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/m…
ccbinn Jan 25, 2026
a82d61d
CUDA: faster FA for GQA > 1 but not power of 2 (llama/19092)
JohannesGaessler Jan 25, 2026
43e6c66
CUDA: fix padding of GQA to power of 2 in FA (llama/19115)
JohannesGaessler Jan 26, 2026
3af1b04
sync : llama.cpp
ggerganov Jan 30, 2026
deb30f3
opencl: add flattened q6_K mv (llama/19054)
lhez Jan 30, 2026
bc83b4b
ggml-cpu: Enable FP16 MMA kernels on PPC (llama/19060)
shalinib-ibm Jan 27, 2026
2a860ed
Reduce CPU-side stalls due to the CUDA command buffer being full (lla…
gaugarg-nv Jan 27, 2026
8dd386b
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementat…
Alcpz Jan 27, 2026
0dff0d5
CUDA: tune GLM 4.7 Flash FA kernel selection logic (llama/19097)
JohannesGaessler Jan 27, 2026
bc6fdcb
ggml-zendnn : update ZenDNN git tag to main branch (llama/19133)
z-vishal Jan 27, 2026
3bff69a
ggml webgpu: Split shared state (webgpu_context) into global state an…
nikhilJain17 Jan 28, 2026
9be59da
CUDA: tune GLM 4.7 Flash FA kernel selection logic (DGX Spark) (llama…
ggerganov Jan 28, 2026
1403364
cuda : fix "V is K view" check for non-unified KV cache (llama/19145)
ggerganov Jan 28, 2026
a18015f
ggml-cpu: arm64: Q4_K scale unroll and vectorization (llama/19108)
Alcpz Jan 28, 2026
2e2ec9c
ggml: new backend for Virglrenderer API Remoting acceleration (v2) (l…
kpouget Jan 28, 2026
1d2f624
vulkan: handle device dedup on MacOS + Vega II Duo cards (llama/19058)
okuvshynov Jan 28, 2026
083bb3c
ggml-sycl: remove unused syclcompat header (llama/19140)
PatKamin Jan 28, 2026
d59df96
Vulkan Flash Attention Coopmat1 Refactor (llama/19075)
0cc4m Jan 28, 2026
bc26937
sycl: fix norm kernels: l2_norm, group_norm, rms_norm by remove asser…
arthw Jan 29, 2026
7e02746
CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.…
am17an Jan 29, 2026
167552e
ggml-zendnn : resolve ZenDNN backend cross-module symbol dependency (…
z-vishal Jan 29, 2026
54f7659
HIP: add mmf for CDNA (llama/18896)
zhang-hui-yulo Jan 29, 2026
a1bbd3e
cuda : fix nkvo, offload and cuda graph node properties matching (lla…
ggerganov Jan 29, 2026
b6f5a28
hexagon: enable offloading to Hexagon on Windows on Snapdragon (llama…
tboinovski1 Jan 29, 2026
b79ba08
ggml-webgpu: improve flastAttention performance by software pipelinin…
ArberSephirotheca Jan 29, 2026
af11f83
sycl: implement GGML_OP_TRI (llama/19089)
RachelMantel Jan 30, 2026
ccee88e
sycl: implement GGML_UNARY_OP_SOFTPLUS (llama/19114)
s8322 Jan 30, 2026
00d7d43
add tensor type checking as part of cuda graph properties (llama/19186)
bssrdf Jan 30, 2026
f7cb4b7
sync : llama.cpp
ggerganov Jan 30, 2026
aa96b94
cuda : fix compile warnings (whisper/0)
ggerganov Jan 30, 2026
ad04d94
sync : whisper.cpp
ggerganov Jan 30, 2026
a8db410
cmake : remove unused file (#1419)
ggerganov Jan 30, 2026
95e935d
ggml : bump version to 0.9.6 (#1423)
ggerganov Feb 7, 2026
e0af28c
ci : remove "Release" word from the title of the release
ggerganov Feb 7, 2026
6203de9
tests : add GQA=20 FA test (llama/19095)
ggerganov Jan 30, 2026
1e7ce7e
Correctly fetch q8_1 quantize pipeline in test as needed by 8a3519b (…
sredman Jan 30, 2026
c5b01b8
opencl: add optimized q8_0 mm kernel for adreno (llama/18871)
shaofeiqi Jan 30, 2026
7b5288a
ggml-hexagon: flash-attention and reduce-sum optimizations (llama/19141)
chraac Jan 31, 2026
7b7bb1b
Bump cmake max version (needed for Windows on Snapdragon builds) (lla…
max-krasnyansky Feb 1, 2026
2948be1
Remove pipeline cache mutexes (llama/19195)
nikhilJain17 Feb 2, 2026
bc0cac4
docs : Minor cleanups (llama/19252)
ckastner Feb 2, 2026
c1c1294
ggml-backend: fix async set/get fallback sync (llama/19179)
JohannesGaessler Feb 2, 2026
e1fda9d
metal : support virtual devices (llama/18919)
ggerganov Feb 2, 2026
b0cff7a
sycl: implement GGML_OP_TOP_K (llama/19242)
tdevelope Feb 2, 2026
4c3c93f
Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nv…
arthw Feb 2, 2026
218a005
ggml-cpu: FA split across kv for faster TG (llama/19209)
am17an Feb 2, 2026
86a110c
opencl: refactor some ops, concat, repeat, tanh and scale (llama/19226)
lhez Feb 2, 2026
8c2086e
cuda : revert CUDA_SCALE_LAUNCH_QUEUES override until investigated (l…
gaugarg-nv Feb 3, 2026
cb92394
ggml: added cleanups in ggml_quantize_free (llama/19278)
noctrex Feb 3, 2026
93ce07d
CUDA: Fix loop unrolling for BW in mul_mat_q_stream_k_fixup (llama/19…
ORippler Feb 3, 2026
f6f23e6
metal : minor cleanup (llama/19251)
ggerganov Feb 3, 2026
0622f36
CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)
am17an Feb 3, 2026
fda0197
vulkan: disable coopmat1 fa on Nvidia Turing (llama/19290)
0cc4m Feb 3, 2026
abd50df
metal : add solve_tri (llama/19302)
ggerganov Feb 3, 2026
545b7a9
ggml-cpu: use LUT for converting e8->f32 scales on x86 (llama/19288)
am17an Feb 4, 2026
b0e0d35
ggml-virtgpu: make the code thread safe (llama/19204)
kpouget Feb 4, 2026
8ecbda9
tests : add non-cont, inplace rope tests (llama/19296)
ggerganov Feb 4, 2026
21bb131
metal : add missing includes (llama/19348)
will-lms Feb 5, 2026
3268190
vulkan: fix non-contig rope (llama/19299)
jeffbolznv Feb 5, 2026
04a7b89
vulkan: Set k_load_shmem to false when K is too large (llama/19301)
jeffbolznv Feb 5, 2026
8f52249
vulkan: fix GPU deduplication logic. (llama/19222)
okuvshynov Feb 5, 2026
51f911d
metal : add diag (llama/19330)
ggerganov Feb 5, 2026
8ab9ddb
vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (llama…
jeffbolznv Feb 5, 2026
e9e0035
metal : adaptive CPU/GPU interleave based on number of nodes (llama/1…
ggerganov Feb 5, 2026
35cea47
cuda : cuda graphs now compare all node params (llama/19383)
ggerganov Feb 6, 2026
d2c3f43
metal : skip loading all-zero mask (llama/19337)
ggerganov Feb 6, 2026
aba07b5
vulkan: make FA mask/softcap enables spec constants (llama/19309)
jeffbolznv Feb 6, 2026
7f996d6
vulkan: For coopmat2 FA, use fp16 accumulators for the final result (…
jeffbolznv Feb 6, 2026
80fab56
tests: reduce number of FA test permutations (llama/19381)
jeffbolznv Feb 6, 2026
7a43521
sycl: add F16 support for GGML_OP_CEIL (llama/19306)
NechamaKrashinski Feb 6, 2026
4343512
sync : llama.cpp
ggerganov Feb 7, 2026
d5e11af
ggml-webgpu: JIT compile binary operators and handle binding overlaps…
abhijitramesh Feb 6, 2026
90bda2f
metal : fix event synchronization in cpy_tensor_async (llama/19402)
ggerganov Feb 7, 2026
17bda96
metal : consolidate bin kernels (llama/19390)
ggerganov Feb 7, 2026
5cecdad
sync : llama.cpp
ggerganov Feb 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
build:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, macos-13, windows-latest]
os: [ubuntu-latest, macos-latest, windows-latest]
libraries: [shared, static]

runs-on: ${{ matrix.os }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ jobs:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: ${{ github.ref_name }}
release_name: Release ${{ github.ref }}
release_name: ${{ github.ref }}
draft: false
prerelease: false
139 changes: 84 additions & 55 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
cmake_minimum_required(VERSION 3.14) # for add_link_options and implicit target directories.
cmake_minimum_required(VERSION 3.14...3.28) # for add_link_options and implicit target directories.
project("ggml" C CXX ASM)

### GGML Version
set(GGML_VERSION_MAJOR 0)
set(GGML_VERSION_MINOR 9)
set(GGML_VERSION_PATCH 4)
set(GGML_VERSION_PATCH 6)
set(GGML_VERSION_BASE "${GGML_VERSION_MAJOR}.${GGML_VERSION_MINOR}.${GGML_VERSION_PATCH}")

find_program(GIT_EXE NAMES git git.exe NO_CMAKE_FIND_ROOT_PATH)
Expand All @@ -25,16 +25,17 @@ if(GIT_EXE)
)
endif()

# Build the version string with optional dirty flag
set(GGML_VERSION "${GGML_VERSION_BASE}")
if(GGML_GIT_DIRTY AND NOT GGML_GIT_DIRTY EQUAL 0)
set(GGML_VERSION "${GGML_VERSION}-dirty")
endif()

if(NOT GGML_BUILD_COMMIT)
set(GGML_BUILD_COMMIT "unknown")
endif()

# Build the commit string with optional dirty flag
if(DEFINED GGML_GIT_DIRTY AND GGML_GIT_DIRTY EQUAL 1)
set(GGML_BUILD_COMMIT "${GGML_BUILD_COMMIT}-dirty")
endif()

include(CheckIncludeFileCXX)

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
Expand All @@ -53,6 +54,10 @@ if (CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
# TODO
else()
set(GGML_STANDALONE OFF)

if (NOT CMAKE_RUNTIME_OUTPUT_DIRECTORY)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
endif()
endif()

if (EMSCRIPTEN)
Expand Down Expand Up @@ -167,21 +172,18 @@ option(GGML_RVV "ggml: enable rvv" ON)
option(GGML_RV_ZFH "ggml: enable riscv zfh" ON)
option(GGML_RV_ZVFH "ggml: enable riscv zvfh" ON)
option(GGML_RV_ZICBOP "ggml: enable riscv zicbop" ON)
option(GGML_RV_ZIHINTPAUSE "ggml: enable riscv zihintpause " ON)
option(GGML_XTHEADVECTOR "ggml: enable xtheadvector" OFF)
option(GGML_VXE "ggml: enable vxe" ON)
option(GGML_VXE "ggml: enable vxe" ${GGML_NATIVE})

option(GGML_CPU_ALL_VARIANTS "ggml: build all variants of the CPU backend (requires GGML_BACKEND_DL)" OFF)
set(GGML_CPU_ARM_ARCH "" CACHE STRING "ggml: CPU architecture for ARM")
set(GGML_CPU_POWERPC_CPUTYPE "" CACHE STRING "ggml: CPU type for PowerPC")


if (MINGW)
set(GGML_WIN_VER "0xA00" CACHE STRING "ggml: Windows version")
endif()

# ggml core
set(GGML_SCHED_MAX_COPIES "4" CACHE STRING "ggml: max input copies for pipeline parallelism")
option(GGML_CPU "ggml: enable CPU backend" ON)
option(GGML_SCHED_NO_REALLOC "ggml: disallow reallocations in ggml-alloc (for debugging)" OFF)

# 3rd party libs / backends
option(GGML_ACCELERATE "ggml: enable Accelerate framework" ON)
Expand Down Expand Up @@ -224,8 +226,10 @@ option(GGML_WEBGPU "ggml: use WebGPU"
option(GGML_WEBGPU_DEBUG "ggml: enable WebGPU debug output" OFF)
option(GGML_WEBGPU_CPU_PROFILE "ggml: enable WebGPU profiling (CPU)" OFF)
option(GGML_WEBGPU_GPU_PROFILE "ggml: enable WebGPU profiling (GPU)" OFF)

option(GGML_WEBGPU_JSPI "ggml: use JSPI for WebGPU" ON)
option(GGML_ZDNN "ggml: use zDNN" OFF)
option(GGML_VIRTGPU "ggml: use the VirtGPU/Virglrenderer API Remoting frontend" OFF)
option(GGML_VIRTGPU_BACKEND "ggml: build the VirtGPU/Virglrenderer API Remoting backend" OFF)
option(GGML_METAL "ggml: use Metal" ${GGML_METAL_DEFAULT})
option(GGML_METAL_NDEBUG "ggml: disable Metal debugging" OFF)
option(GGML_METAL_SHADER_DEBUG "ggml: compile Metal with -fno-fast-math" OFF)
Expand All @@ -251,9 +255,15 @@ option(GGML_OPENCL_USE_ADRENO_KERNELS "ggml: use optimized kernels for Adr
set (GGML_OPENCL_TARGET_VERSION "300" CACHE STRING
"gmml: OpenCL API version to target")

option(GGML_HEXAGON "ggml: enable Hexagon backend" OFF)
set(GGML_HEXAGON_FP32_QUANTIZE_GROUP_SIZE 128 CACHE STRING "ggml: quantize group size (32, 64, or 128)")

# toolchain for vulkan-shaders-gen
set (GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" CACHE FILEPATH "ggml: toolchain file for vulkan-shaders-gen")

option(GGML_ZENDNN "ggml: use ZenDNN" OFF)
option(ZENDNN_ROOT "ggml: path to ZenDNN installation" "")

# extra artifacts
option(GGML_BUILD_TESTS "ggml: build tests" ${GGML_STANDALONE})
option(GGML_BUILD_EXAMPLES "ggml: build examples" ${GGML_STANDALONE})
Expand Down Expand Up @@ -312,9 +322,11 @@ set(GGML_PUBLIC_HEADERS
include/ggml-opt.h
include/ggml-metal.h
include/ggml-rpc.h
include/ggml-virtgpu.h
include/ggml-sycl.h
include/ggml-vulkan.h
include/ggml-webgpu.h
include/ggml-zendnn.h
include/gguf.h)

set_target_properties(ggml PROPERTIES PUBLIC_HEADER "${GGML_PUBLIC_HEADERS}")
Expand Down Expand Up @@ -404,62 +416,79 @@ if (MSVC)
/wd4996 # Disable POSIX deprecation warnings
/wd4702 # Unreachable code warnings
)
function(disable_msvc_warnings target_name)
set(MSVC_COMPILE_OPTIONS
"$<$<COMPILE_LANGUAGE:C>:/utf-8>"
"$<$<COMPILE_LANGUAGE:CXX>:/utf-8>"
)
function(configure_msvc_target target_name)
if(TARGET ${target_name})
target_compile_options(${target_name} PRIVATE ${MSVC_WARNING_FLAGS})
target_compile_options(${target_name} PRIVATE ${MSVC_COMPILE_OPTIONS})
endif()
endfunction()

disable_msvc_warnings(ggml-base)
disable_msvc_warnings(ggml)
disable_msvc_warnings(ggml-cpu)
disable_msvc_warnings(ggml-cpu-x64)
disable_msvc_warnings(ggml-cpu-sse42)
disable_msvc_warnings(ggml-cpu-sandybridge)
disable_msvc_warnings(ggml-cpu-haswell)
disable_msvc_warnings(ggml-cpu-skylakex)
disable_msvc_warnings(ggml-cpu-icelake)
disable_msvc_warnings(ggml-cpu-alderlake)
configure_msvc_target(ggml-base)
configure_msvc_target(ggml)
configure_msvc_target(ggml-cpu)
configure_msvc_target(ggml-cpu-x64)
configure_msvc_target(ggml-cpu-sse42)
configure_msvc_target(ggml-cpu-sandybridge)
# __FMA__ and __F16C__ are not defined in MSVC, however they are implied with AVX2/AVX512
# skipping ggml-cpu-ivybridge
# skipping ggml-cpu-piledriver
configure_msvc_target(ggml-cpu-haswell)
configure_msvc_target(ggml-cpu-skylakex)
configure_msvc_target(ggml-cpu-cannonlake)
configure_msvc_target(ggml-cpu-cascadelake)
configure_msvc_target(ggml-cpu-icelake)
# MSVC 2022 doesn't support BF16 intrinsics without `/arch:AVX10.1` ?!
# https://learn.microsoft.com/en-us/cpp/intrinsics/x64-amd64-intrinsics-list?view=msvc-170
# https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
# skipping ggml-cpu-cooperlake
# skipping ggml-cpu-zen4
configure_msvc_target(ggml-cpu-alderlake)
# MSVC doesn't support AMX
# skipping ggml-cpu-sapphirerapids

if (GGML_BUILD_EXAMPLES)
disable_msvc_warnings(common-ggml)
disable_msvc_warnings(common)
configure_msvc_target(common-ggml)
configure_msvc_target(common)

disable_msvc_warnings(mnist-common)
disable_msvc_warnings(mnist-eval)
disable_msvc_warnings(mnist-train)
configure_msvc_target(mnist-common)
configure_msvc_target(mnist-eval)
configure_msvc_target(mnist-train)

disable_msvc_warnings(gpt-2-ctx)
disable_msvc_warnings(gpt-2-alloc)
disable_msvc_warnings(gpt-2-backend)
disable_msvc_warnings(gpt-2-sched)
disable_msvc_warnings(gpt-2-quantize)
disable_msvc_warnings(gpt-2-batched)
configure_msvc_target(gpt-2-ctx)
configure_msvc_target(gpt-2-alloc)
configure_msvc_target(gpt-2-backend)
configure_msvc_target(gpt-2-sched)
configure_msvc_target(gpt-2-quantize)
configure_msvc_target(gpt-2-batched)

disable_msvc_warnings(gpt-j)
disable_msvc_warnings(gpt-j-quantize)
configure_msvc_target(gpt-j)
configure_msvc_target(gpt-j-quantize)

disable_msvc_warnings(magika)
disable_msvc_warnings(yolov3-tiny)
disable_msvc_warnings(sam)
configure_msvc_target(magika)
configure_msvc_target(yolov3-tiny)
configure_msvc_target(sam)

disable_msvc_warnings(simple-ctx)
disable_msvc_warnings(simple-backend)
configure_msvc_target(simple-ctx)
configure_msvc_target(simple-backend)
endif()

if (GGML_BUILD_TESTS)
disable_msvc_warnings(test-mul-mat)
disable_msvc_warnings(test-arange)
disable_msvc_warnings(test-backend-ops)
disable_msvc_warnings(test-cont)
disable_msvc_warnings(test-conv-transpose)
disable_msvc_warnings(test-conv-transpose-1d)
disable_msvc_warnings(test-conv1d)
disable_msvc_warnings(test-conv2d)
disable_msvc_warnings(test-conv2d-dw)
disable_msvc_warnings(test-customop)
disable_msvc_warnings(test-dup)
disable_msvc_warnings(test-opt)
disable_msvc_warnings(test-pool)
configure_msvc_target(test-mul-mat)
configure_msvc_target(test-arange)
configure_msvc_target(test-backend-ops)
configure_msvc_target(test-cont)
configure_msvc_target(test-conv-transpose)
configure_msvc_target(test-conv-transpose-1d)
configure_msvc_target(test-conv1d)
configure_msvc_target(test-conv2d)
configure_msvc_target(test-conv2d-dw)
configure_msvc_target(test-customop)
configure_msvc_target(test-dup)
configure_msvc_target(test-opt)
configure_msvc_target(test-pool)
endif ()
endif()
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023-2024 The ggml authors
Copyright (c) 2023-2026 The ggml authors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
10 changes: 6 additions & 4 deletions ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -294,14 +294,16 @@ function gg_run_sam {
python3 ../examples/sam/convert-pth-to-ggml.py ${path_models}/sam_vit_b_01ec64.pth ${path_models}/ 1

# Test default parameters
(time ./bin/sam -m ${model_f16} -i ${img_0} ) 2>&1 | tee -a $OUT/${ci}-main.log
(time ./bin/sam -m ${model_f16} -i ${img_0} -st 0.925 ) 2>&1 | tee -a $OUT/${ci}-main.log
grep -q "point prompt" $OUT/${ci}-main.log
grep -q "bbox (371, 436), (144, 168)" $OUT/${ci}-main.log
grep -q "bbox (371, 436), (144, 168)" $OUT/${ci}-main.log ||
grep -q "bbox (370, 439), (144, 168)" $OUT/${ci}-main.log

# Test box prompt and single mask output
(time ./bin/sam -m ${model_f16} -i ${img_0} -b 368,144,441,173 -sm) 2>&1 | tee -a $OUT/${ci}-main.log
(time ./bin/sam -m ${model_f16} -i ${img_0} -st 0.925 -b 368,144,441,173 -sm) 2>&1 | tee -a $OUT/${ci}-main.log
grep -q "box prompt" $OUT/${ci}-main.log
grep -q "bbox (370, 439), (144, 169)" $OUT/${ci}-main.log
grep -q "bbox (370, 439), (144, 169)" $OUT/${ci}-main.log ||
grep -q "bbox (370, 439), (144, 168)" $OUT/${ci}-main.log

set +e
}
Expand Down
54 changes: 0 additions & 54 deletions cmake/BuildTypes.cmake

This file was deleted.

Loading