Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
549 commits
Select commit Hold shift + click to select a range
56920f5
server : bring back timings_per_token (#15879)
ngxson Sep 8, 2025
8802156
chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style…
createthis Sep 8, 2025
0a16bf5
CUDA: generate_cu_files.py - add missing mxfp4 (#15880)
am17an Sep 8, 2025
e68aa10
vulkan: sort graph to allow more parallel execution (#15850)
jeffbolznv Sep 8, 2025
fe1c92c
media : add llama1 icon (#15878)
06kellyjac Sep 8, 2025
7057faf
json : support `enum` values within `allOf` (#15830)
aldehir Sep 8, 2025
acc1b00
model-conversion : add extra debugging support for model conversion (…
pwilkin Sep 9, 2025
70cd37d
requirements : update transformers/torch for Embedding Gemma (#15828)
danbev Sep 9, 2025
c252ce6
contrib : add notes about merging PRs (#15881)
ggerganov Sep 9, 2025
550cf72
CUDA: fix GET_ROWS for large tensors (#15882)
JohannesGaessler Sep 9, 2025
a972fae
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
am17an Sep 9, 2025
ed54e32
Workaround for subgroup arithmetic failing on MoltenVK with AMD GPUs …
lksj92hs Sep 9, 2025
17bc5a8
HIP: use v_dot2_f32_f16 instruction for FA (#15884)
JohannesGaessler Sep 9, 2025
4f63cd7
vulkan: Fix OOB accesses in soft_max_back (#15861)
jeffbolznv Sep 9, 2025
ae355f6
vulkan: throw the oom error instead of no memory type found (#15905)
0cc4m Sep 9, 2025
ff02caf
ci : cache ROCm installation in windows-latest-cmake-hip (#15887)
danbev Sep 10, 2025
86587da
llama : check returned fn ptrs from ggml_backend_reg_get_proc_address…
danbev Sep 10, 2025
28b5f19
CANN: implement LRU cache for ACL graphs (#15814)
noemotiovon Sep 10, 2025
10d8b2b
CANN: Add ROPE sin/cos cache for reuse (#15912)
noemotiovon Sep 10, 2025
09e72a0
gitignore : Ignore vim swap files in tests (#15901)
createthis Sep 10, 2025
2cfef4d
media : add transparent icon svg and png [no ci] (#15891)
06kellyjac Sep 10, 2025
e7b6d83
tests : filter out no-ops from coverage report (#15900)
danbev Sep 10, 2025
33daece
ci : add caching for ROCm installation in release workflow (#15924)
danbev Sep 10, 2025
0f0a3c2
metal : make the backend async (#15906)
ggerganov Sep 10, 2025
9de447d
ggml-cpu : fix padding in ggml_timestep_embedding (#15917)
danbev Sep 10, 2025
6ab397e
graph : support non-contiguous Q in build_attn_mha (#15908)
CISC Sep 10, 2025
4f65885
llama : support T5 models with unequal number of encoder-decoder laye…
DamonFool Sep 10, 2025
00681df
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#…
ORippler Sep 10, 2025
c0389db
CANN: Disable acl_graph for prefill stage (#15933)
hipudding Sep 11, 2025
2b3efea
kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed (#15614)
chaxu01 Sep 11, 2025
24a6734
ggml-cpu : add check for ARM MATMUL_INT8/i8mm support (#15922)
danbev Sep 11, 2025
df082f5
nitpick : correct MB to MiB (#15934)
ddh0 Sep 11, 2025
0e6ff00
CUDA: larger SRAM reads for tile FA, AMD FP16 dot (#15927)
JohannesGaessler Sep 11, 2025
360d653
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797)
slaren Sep 11, 2025
704d90c
Revert "sycl: add usage of enqueue_functions extension (#14244)" (#1…
NeoZhangJianyu Sep 12, 2025
6c88ad8
vulkan: Make device memory check more portable (#15939)
mbaudier Sep 12, 2025
304ac56
Vulkan iGPU device selection overhaul and PCI ID API support (#15947)
0cc4m Sep 12, 2025
f088b6a
server : adjust prompt similarity thold + add logs (#15913)
ggerganov Sep 12, 2025
f4e664f
context : remove redundant explicit casting to the same type (#15948)
haiyuewa Sep 12, 2025
4bf5549
Add docker protocol support for llama-server model loading (#15790)
ericcurtin Sep 12, 2025
40be511
ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorr…
taronaeo Sep 12, 2025
84d7b2f
metal : fix memory leaks (#15962)
ggerganov Sep 13, 2025
f161463
metal : allow ops to run concurrently (#15929)
ggerganov Sep 13, 2025
55758b0
metal : refactor kernel loading (#15964)
ggerganov Sep 13, 2025
50f4281
llama : allow using iGPUs with --device (#15951)
slaren Sep 13, 2025
b9c9c9f
vulkan: initialize vulkan-hpp to allow using extension function point…
jeffbolznv Sep 13, 2025
aa0c461
vulkan: fix failing dequant shaders (#15862)
jeffbolznv Sep 13, 2025
6380d6a
ggml-zdnn: rm user mapped buffers (#15965)
taronaeo Sep 14, 2025
d1c6f11
doc : update documentation for --tensor-split (#15980)
rgerganov Sep 14, 2025
9ecb884
releases : update ROCM, add gfx1200, gfx1201, gfx1151 (#15972)
slaren Sep 14, 2025
918b26f
rpc : fix regression when --device is used (#15981)
rgerganov Sep 14, 2025
a14bd35
metal : fix kernel requirements (#15983)
ggerganov Sep 14, 2025
a0e13dc
build: fix the build failures of Windows HIP release job (#15984)
lcy0321 Sep 14, 2025
261e6a2
Vulkan: Clean up mul_mm shader (#15987)
0cc4m Sep 14, 2025
0fa154e
rocm.Dockerfile: added gfx1200,gfx1201 architectures to support AMD …
channeladam Sep 14, 2025
9dcd200
metal : remove memory pools (#15966)
ggerganov Sep 14, 2025
6c019cb
server : only attempt to enable thinking if using jinja (#15967)
CISC Sep 14, 2025
b8e09f0
model : add grok-2 support (#15539)
CISC Sep 14, 2025
a68f31e
fix KLD percentile output (#15999)
ddh0 Sep 15, 2025
1062205
CUDA: some micro-optimizations in mmf.cuh for mul_mat_id (#15926)
am17an Sep 15, 2025
28c39da
llama-run: Fix model download on Windows (#15988)
npopov-vst Sep 15, 2025
b907255
SYCL: Add COUNT_EQUAL operator support (#15991)
yael-works Sep 15, 2025
10d1974
releases : switch to rocWMMA develop branch, add gfx1151 (#15992)
slaren Sep 15, 2025
dc381aa
docker : enable rocWMMA in ROCm images, add gfx1151 (#15997)
slaren Sep 15, 2025
3d4053f
CUDA: fix im2col_3d to respect non-contiguous inputs (views) (#15956)
jakekarnes42 Sep 15, 2025
6d75883
Add LLaDA-7b-MoE diffusion model (#16003)
am17an Sep 16, 2025
07808eb
cmake : Do not install tools on iOS targets (#15903)
ykhrustalev Sep 16, 2025
51abc96
ci : update macos-latest* jobs to use macos-latest (#15938)
danbev Sep 16, 2025
f1fbffb
fix: apply clang-format to CUDA macros (#16017)
bugparty Sep 16, 2025
76888d2
ci : upload xcframework artifact from ios-xcode-build job (#16010)
danbev Sep 16, 2025
3913f87
ggml : fix padding in timestep embedding kernels (#15932)
danbev Sep 16, 2025
7747553
ci : use macos-latest for arm64 webgpu build (#16029)
danbev Sep 16, 2025
8ff2060
llama-bench: add --n-cpu-moe support (#15952)
jacekpoplawski Sep 16, 2025
d5fabe3
CANN: Optimize ggml_cann_set_device (#15935)
noemotiovon Sep 17, 2025
85286f3
model : add OLMo3 support (#16015)
2015aroras Sep 17, 2025
1cbd80f
examples : support encoder-decoder models in the simple example (#16002)
DamonFool Sep 17, 2025
745cbcf
llama-quant : fix the verification of attention layers for encoder-de…
DamonFool Sep 17, 2025
a91d035
ci : revert back to macos-13 for macOS-latest-cmake-x64 (#16040)
danbev Sep 17, 2025
cb5bb6c
vulkan: automatically remove unsupported devices (#15976)
netrunnereve Sep 17, 2025
cd08fc3
common : Fix corrupted memory error on json grammar initialization (#…
dralves Sep 17, 2025
c959b67
CUDA: fix FA occupancy, optimize tile kernel (#15982)
JohannesGaessler Sep 17, 2025
8f8f227
convert : add Llama4ForCausalLM (#16042)
ngxson Sep 17, 2025
a7a98e0
SvelteKit-based WebUI (#14839)
allozaur Sep 17, 2025
0320ac5
metal : refactor + optimize v2 (#15995)
ggerganov Sep 17, 2025
d304f45
GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018)
reeselevine Sep 17, 2025
62c3b64
CANN: Remove print (#16044)
noemotiovon Sep 18, 2025
f2f2838
metal : handle nil cv during pipeline creation (#16065)
ggerganov Sep 18, 2025
e00f3fd
metal : avoid call free for non-owned buffer (#16067)
jhen0409 Sep 18, 2025
b213fce
metal : improve F32, F16 and BF16 mat-vec multiplication (#16057)
ggerganov Sep 18, 2025
e58174c
llama : bump max seq limit from 64 to 256 (#15916)
ggerganov Sep 18, 2025
2b6b55a
server : include usage statistics only when user request them (#16052)
rgerganov Sep 18, 2025
ad6bd90
cuda : add missing F32<->I32 entries in ggml_cuda_cpy_fn (#16060)
CISC Sep 18, 2025
703f9e3
metal : use function constants for mul_mv_ext kernels (#16074)
ggerganov Sep 18, 2025
4ca088b
Add resumable downloads for llama-server model loading (#15963)
ericcurtin Sep 18, 2025
368560a
CUDA: fix compilation on CC 6.0 (#16091)
JohannesGaessler Sep 18, 2025
38dbdf4
CUDA: Optimize PAD_REFLECT_1D (#15957)
bugparty Sep 18, 2025
c0b4509
rename optimize_graph to graph_optimize (#16082)
jeffbolznv Sep 18, 2025
3edd87c
opencl: optimize mxfp4 kernels (#16037)
shawngu-quic Sep 18, 2025
246c0d9
cmake : fix static linking for OpenMP on Unix-like systems (#16031)
angt Sep 18, 2025
69ffd89
ggml-amx : fix ggml_amx_init() on generic Linux (#16049)
angt Sep 18, 2025
0dd58b6
ggml : refactor forward_dup for cpu backend (#16062)
ngxson Sep 19, 2025
4b8560a
chat : fix build on arm64 (#16101)
ngxson Sep 19, 2025
4067f07
feat: Improve mobile UI for Settings Dialog (#16084)
allozaur Sep 19, 2025
f432d8d
chat: Fix streaming parser for granite models (#15682)
shun095 Sep 19, 2025
be79d9f
llama-bench: add --devices and --list-devices support (#16039)
ssweens Sep 19, 2025
459c0c2
server: fix SSE and OpenAI compatibility for error messages when stre…
BenjaminBruenau Sep 20, 2025
803dac2
vulkan: use vec dot for matrix matrix multiplications (#16056)
0cc4m Sep 20, 2025
fa6383c
CUDA : conditionally add cuda architectures (ggml/1341)
gjasny Sep 10, 2025
405921d
ggml : introduce semantic versioning (ggml/1336)
danbev Sep 16, 2025
7f76692
sync : ggml
ggerganov Sep 20, 2025
5bb4a3e
vulkan: fix validation error about VK_PIPELINE_CREATE_CAPTURE_STATIST…
jeffbolznv Sep 21, 2025
1eeb523
vulkan: optimize UMA buffer operations and fix driver hangs (#16059)
giuseppe Sep 21, 2025
28baac9
ci : migrate ggml ci to self-hosted runners (#16116)
ggerganov Sep 21, 2025
da30ab5
ci : add label for the RISC-V runner (#16150)
ggerganov Sep 21, 2025
c4510dc
opencl: initial `q8_0` mv support (#15732)
lhez Sep 21, 2025
51f5a45
opencl: fix concat crash on win arm64 with Adreno (#15944)
lhez Sep 21, 2025
9073a73
vulkan: vec dot matrix multiplication fix (#16151)
0cc4m Sep 22, 2025
4d0a7cb
ci : adjust params for less runtime (#16167)
ggerganov Sep 22, 2025
a20d810
vulkan: add RTE variants of exp shader (#16165)
jeffbolznv Sep 22, 2025
1d660d2
ci : use smaller model (#16168)
ggerganov Sep 22, 2025
ec65fb5
ci : remove vulkaninfo calls (#16169)
ggerganov Sep 22, 2025
5c6106a
contrib : update roles (#16113)
ggerganov Sep 22, 2025
b2d980f
codeowners : claim responsibility for ci, models, gguf-py and convert…
CISC Sep 22, 2025
96fdca0
Vulkan: add conv_transpose_2d operation (#16022)
relent95 Sep 22, 2025
05a2458
codeowners : update ownership for @ngxson and @allozuar (#16128)
ngxson Sep 22, 2025
a71ae3b
ggml : add ggml_op_is_empty (#16122)
ggerganov Sep 22, 2025
4f324a5
ggml : extend ggml_can_fuse to work with non-sequential nodes (#16123)
ggerganov Sep 22, 2025
d05affb
common : remove unused local variables (#16140)
haiyuewa Sep 22, 2025
c6db9a1
embedding : fix typos in README (#16171)
GideonSerf Sep 22, 2025
138c87c
webui : fix handling incomplete chunks (#16107)
Bramas Sep 22, 2025
37a23c1
common : enable `--offline` mode without curl support (#16137)
angt Sep 22, 2025
432cf43
codeowners : update + cleanup (#16174)
ggerganov Sep 22, 2025
3ecb2f6
ggml : implement set_rows with i32 index (#16159)
CISC Sep 22, 2025
351f3da
clang-tidy : disable warning about performance enum size (#16127)
haiyuewa Sep 22, 2025
1d0125b
feat: Add conversion support in GraniteHybrid for non-hybrid (all att…
gabe-l-hart Sep 22, 2025
85e7227
ggml-cpu : fix typo in gemm comments [no ci] (#16189)
danbev Sep 23, 2025
4b9f4cb
devops: add s390x containers (#15915)
taronaeo Sep 23, 2025
0bc7cc7
codeowners : add @danbev to model-conversion example [no ci] (#16190)
danbev Sep 23, 2025
264f1b5
zdnn: refactor codebase + add docs (#16178)
taronaeo Sep 23, 2025
f6b4af3
ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15…
CISC Sep 23, 2025
4e29084
ggml-cpu: Respect cpumask settings (#16164)
wishstudio Sep 23, 2025
0889589
ci : enable Vulkan workflow on Mac (#16194)
ggerganov Sep 23, 2025
f505bd8
ci : disable AMD workflows + update NVIDIA workflows (#16200)
ggerganov Sep 23, 2025
8ba548d
model-conversion : fix the make targets in the README.md (#16209)
DamonFool Sep 24, 2025
4d9ea03
codeowners : use slash prefix for root files [no ci] (#16210)
danbev Sep 24, 2025
7735706
model-conversion : run-org-model.py fails to run on mac m1 (#16213)
DamonFool Sep 24, 2025
c0c59c1
codeowners : match all requirements files (#16214)
CISC Sep 24, 2025
152729f
common : add missing chrono header for common.cpp (#16211)
uilianries Sep 24, 2025
63b54c8
model-conversion : make causal-verify-logits fails with model names c…
DamonFool Sep 24, 2025
3a59971
model : add label for LiquidAI LFM2-2.6B model (#16204)
tdakhran Sep 24, 2025
f2a789e
ggml : split graph allocations according to backend max buffer size (…
Acly Sep 24, 2025
e789095
llama: print memory breakdown on exit (#15860)
JohannesGaessler Sep 24, 2025
4ae88d0
codeowners: add ownership of zdnn backend [no ci] (#16229)
taronaeo Sep 24, 2025
5fb5576
devops: fix s390x docker release failure (#16231)
taronaeo Sep 25, 2025
bee378e
ci: run the x64 and arm ci on the github machines instead (#16183)
netrunnereve Sep 25, 2025
e7a5130
codeowners: add ownership of zdnn backend [no ci] (#16232)
taronaeo Sep 25, 2025
c498fc8
rpc : use ggml logging facilities
rgerganov Sep 25, 2025
02a6a82
metal : restore im2col perf (#16219)
ggerganov Sep 25, 2025
4ea0079
metal : relax reorder conditions (#16216)
ggerganov Sep 25, 2025
dfcd53f
metal : fuse NORM + MUL + ADD, support non-multiples of 4 (#16220)
ggerganov Sep 25, 2025
b5bd037
llama : add support for qwen3 reranker (#15824)
iamlemec Sep 25, 2025
4cdd0bb
docs: fix typo [no ci] (#16244)
JohannesGaessler Sep 25, 2025
aa719c2
ggml : fix loongarch lsx compilation error (#15864)
junchao-loongson Sep 25, 2025
d0991da
server : add support for external server for tests (#16243)
danbev Sep 25, 2025
aa3ee0e
model-conversion : add embedding prompt file support (#15871)
danbev Sep 25, 2025
077c94d
CUDA: add a fused top-K MoE kernel (#16130)
am17an Sep 25, 2025
2705297
readme : update bindings (#16144)
romantal Sep 25, 2025
b05a9d6
vendors: update miniaudio version (#16212)
taronaeo Sep 25, 2025
835b2b9
model : add GroveMoE support (#15510)
CISC Sep 25, 2025
0f7c696
musa: fix build warnings (#15611)
yeahdongcn Sep 26, 2025
a86a580
musa: upgrade musa sdk to 4.3.0 (#16240)
yeahdongcn Sep 26, 2025
3b337b0
codeowners : add danbev as owner of build-xcframework.sh [no ci] (#16…
danbev Sep 26, 2025
00217cd
ci : create git tags for released docker images (#16008)
rgerganov Sep 26, 2025
9b26511
ggml-cpu: implement MXFP4 SIMD for s390x (#16193)
taronaeo Sep 26, 2025
4710dd3
build : fix build-ios-device (#16257)
angt Sep 26, 2025
b995a10
common : use cpp-httplib as a cURL alternative for downloads (#16185)
angt Sep 26, 2025
54dbc37
metal : report OOM errors (#16274)
ggerganov Sep 26, 2025
cc1cfa2
mtmd : fix uninitialized variable in bicubic_resize (#16275)
AlekseiNikiforovIBM Sep 26, 2025
d12a983
codeowners : add rgerganov as owner of RPC [no ci] (#16279)
rgerganov Sep 26, 2025
5d0a40f
Always show message actions for mobile UI + improvements for user mes…
allozaur Sep 26, 2025
e0539eb
webui: switch to hash-based routing (alternative of #16079) (#16157)
isaac-mcfadyen Sep 26, 2025
1a18927
Allow viewing conversations even when llama server is down (#16255)
allozaur Sep 26, 2025
807e8c6
Enhance text file detection logic for file attachments (#16199)
allozaur Sep 26, 2025
624207e
devops: add s390x & ppc64le CI (#15925)
taronaeo Sep 26, 2025
72b24d9
model : make minicpm embedding_scale, residual_scale and logit_scale …
vinkal-chudgar Sep 26, 2025
ace6a54
build : add LLAMA_OPENSSL option (#16287)
angt Sep 27, 2025
3f81b4e
vulkan: support GET_ROWS for k-quants (#16235)
jeffbolznv Sep 27, 2025
234e2ff
server : remove old LLAMA_SERVER_SSL (#16290)
angt Sep 27, 2025
0499b29
vulkan: throw system error instead of SIGABRT during init on older de…
DmyMi Sep 27, 2025
75a3a6c
CUDA: refactor and deduplicate vector FA kernels (#16208)
JohannesGaessler Sep 27, 2025
c0bfc57
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#…
am17an Sep 27, 2025
4807e8f
Show message actions by default (#16289)
allozaur Sep 27, 2025
8656f5d
vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16…
Acly Sep 27, 2025
e6d65fb
vulkan: support arbitrary KV dimension in flash attention (#16160)
jeffbolznv Sep 27, 2025
1384abf
vulkan: handle mat_mul with A matrix > 4GB (#16176)
jeffbolznv Sep 28, 2025
3b53634
metal : fuse non-sequential nodes (#16102)
ggerganov Sep 28, 2025
6a2c614
metal : extend mat-mat multiplication support (#16225)
ggerganov Sep 28, 2025
d8359f5
vulkan: 64-bit im2col (#16135)
jeffbolznv Sep 28, 2025
2811c65
Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] …
ImadSaddik Sep 28, 2025
0124ac9
devops: switch to using ubuntu-22.04-s390x image (#16302)
taronaeo Sep 28, 2025
d9e0e7c
ci : fix musa docker build (#16306)
yeahdongcn Sep 28, 2025
bd0af02
common : fix reasoning before forced tool call via tool_choice = requ…
crat0z Sep 28, 2025
b887d2f
ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307)
CISC Sep 28, 2025
92cd103
vulkan: Fix validation failure in quantized flash attention (#16292)
jeffbolznv Sep 29, 2025
a4a0aa5
ggml : fix dependencies for ggml_set_rows (#16318)
ggerganov Sep 29, 2025
3ffd0fa
perplexity : show more kl-divergence data (#16321)
ddh0 Sep 29, 2025
2f61c0f
llama-cli: prevent spurious assistant token (#16202)
vinkal-chudgar Sep 29, 2025
66bb798
fix: preserved zero values in chat settings inputs and textareas by s…
ServeurpersoCom Sep 29, 2025
3a2bdcd
Improve Mobile UI for dialogs and action dropdowns (#16222)
allozaur Sep 29, 2025
adc7634
ggml : check cuda and metal argsort limits and add test (#16323)
CISC Sep 29, 2025
02463ab
ggml-backend : add root cause in error message if loading backend lib…
rlewczuk Sep 29, 2025
2db78c7
ggml : bump version to 0.9.1
ggerganov Sep 20, 2025
b6dff20
ggml : prepare for development of 0.9.2-dev
ggerganov Sep 20, 2025
b6ae75a
ggml : bump version to 0.9.3 (ggml/1353)
danbev Sep 25, 2025
c9b1c06
ggml : remove -dev suffix from release version (ggml/1355)
danbev Sep 26, 2025
4d3d455
sync : whisper.cpp (ggml/1359)
ggerganov Sep 29, 2025
2ddd3f2
sync : ggml
ggerganov Sep 29, 2025
b77e6c1
ggml: riscv: add riscv spacemit backend (#15288)
alex-spacemit Sep 29, 2025
d72f5f7
ci : add AMD runners and workflows (#16249)
ggerganov Sep 29, 2025
5f7e166
Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` b…
ServeurpersoCom Sep 29, 2025
a74a0d6
tests: override test_set_rows::max_nmse_err to allow for occasional r…
jeffbolznv Sep 30, 2025
de41f2b
codeowners: add codeowners for opencl backend (#16344)
lhez Sep 30, 2025
f1eb1cb
kleidiai : fix work size and threads sync for fp16 (#16246)
chaxu01 Sep 30, 2025
3c62aed
common : simplify etag tracking by removing json (#16342)
angt Sep 30, 2025
35fb824
metal : dynamic simdgroups for MV kernels (#16340)
ggerganov Sep 30, 2025
a014310
cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (#16328)
anavp-nvidia Sep 30, 2025
075c015
ggml : bump version to 0.9.4 (ggml/1363)
ggerganov Sep 30, 2025
2df5bcf
ci : disable ccache for android (#16348)
CISC Sep 30, 2025
364a7a6
common : remove common_has_curl() (#16351)
angt Sep 30, 2025
d1c84a6
opencl: support ne3 in get_rows (#15866)
lhez Sep 30, 2025
8d78cd2
ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)
reeselevine Sep 30, 2025
16b0ca0
Chatapi ignore empty sampling (#16330)
ServeurpersoCom Sep 30, 2025
7c156df
opencl: support pad_ext (#15888)
lhez Sep 30, 2025
bf6f3b3
common : disable progress bar without a tty (#16352)
angt Sep 30, 2025
b2ba81d
ci : fix ccache key for ubuntu-cpu-cmake (#16355)
CISC Sep 30, 2025
e74c92e
model : support GLM 4.6 (make a few NextN/MTP tensors not required) (…
bartowski1182 Sep 30, 2025
aa9538a
webui: Remove running `llama-server` within WebUI `dev.sh` script (#1…
allozaur Oct 1, 2025
132d673
vulkan: make ggml_vk_default_dispatcher support older vulkan headers …
netrunnereve Oct 1, 2025
4f15759
Add optional setting for showing "Model used:" information (#16337)
allozaur Oct 1, 2025
1104ca1
ci : use registry cache for docker builds (#16366)
CISC Oct 1, 2025
2a9b633
Improve code block color theming (#16325)
allozaur Oct 1, 2025
7647992
Conversation action dialogs as singletons from Chat Sidebar + apply c…
allozaur Oct 1, 2025
4201dea
common: introduce http.h for httplib-based client (#16373)
angt Oct 1, 2025
1fe4e38
ci: Properly install rocwmma for hip builds (#16305)
IMbackK Oct 1, 2025
ded67b9
llama : parameter conversion and loading fixes for PLaMo2 variants (#…
mitmul Oct 1, 2025
e95fec6
HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.…
IMbackK Oct 1, 2025
c8dedc9
CI: reenable cdna in rocm docker builds (#16376)
IMbackK Oct 1, 2025
045257c
feat:.gitignore
lochjin Oct 2, 2025
bf9dc6a
feat:merge from master in #16376
lochjin Oct 2, 2025
5c5a0d2
feat:.gitignore
lochjin Oct 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 8 additions & 1 deletion .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,14 @@ AllowShortIfStatementsOnASingleLine: Never
AllowShortLambdasOnASingleLine: Inline
AllowShortLoopsOnASingleLine: false
AlwaysBreakBeforeMultilineStrings: true
BinPackArguments: false
# Treat CUDA keywords/attributes as "attribute macros" and avoid breaking lines inside them
AttributeMacros:
- __host__
- __device__
- __global__
- __forceinline__
- __launch_bounds__
BinPackArguments: true
BinPackParameters: false # OnePerLine
BitFieldColonSpacing: Both
BreakBeforeBraces: Custom # Attach
Expand Down
1 change: 1 addition & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Checks: >
clang-analyzer-*,
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
performance-*,
-performance-enum-size,
portability-*,
-portability-simd-intrinsics,
misc-*,
Expand Down
22 changes: 0 additions & 22 deletions .devops/cloud-v-pipeline

This file was deleted.

6 changes: 1 addition & 5 deletions .devops/cpu.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,15 @@ FROM ubuntu:$UBUNTU_VERSION AS build

ARG TARGETARCH

ARG GGML_CPU_ARM_ARCH=armv8-a

RUN apt-get update && \
apt-get install -y build-essential git cmake libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN if [ "$TARGETARCH" = "amd64" ]; then \
RUN if [ "$TARGETARCH" = "amd64" ] || [ "$TARGETARCH" = "arm64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_TESTS=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON; \
elif [ "$TARGETARCH" = "arm64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_TESTS=OFF -DGGML_CPU_ARM_ARCH=${GGML_CPU_ARM_ARCH}; \
else \
echo "Unsupported architecture"; \
exit 1; \
Expand Down
2 changes: 1 addition & 1 deletion .devops/cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ RUN apt-get update \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
2 changes: 1 addition & 1 deletion .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc4.2.0
ARG MUSA_VERSION=rc4.3.0
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}-amd64

Expand Down
27 changes: 14 additions & 13 deletions .devops/rocm.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
ARG UBUNTU_VERSION=24.04

# This needs to generally match the container host's environment.
ARG ROCM_VERSION=6.4
ARG AMDGPU_VERSION=6.4
ARG ROCM_VERSION=7.0
ARG AMDGPU_VERSION=7.0

# Target the CUDA build image
# Target the ROCm build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

### Build image
Expand All @@ -13,18 +13,14 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build
# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggml-org/llama.cpp/pull/1087#issuecomment-1682807878
# This is mostly tied to rocBLAS supported archs.
# gfx803, gfx900, gfx1032, gfx1101, gfx1102,not officialy supported
# gfx906 is deprecated
#check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.4/reference/system-requirements.html
# gfx803, gfx900, gfx906, gfx1032, gfx1101, gfx1102,not officialy supported
# check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html

ARG ROCM_DOCKER_ARCH='gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102'
#ARG ROCM_DOCKER_ARCH=gfx1100
ARG ROCM_DOCKER_ARCH='gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1032;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201;gfx1151'
#ARG ROCM_DOCKER_ARCH='gfx1151'

# Set nvcc architectured
# Set ROCm architectures
ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
# ENV CC=/opt/rocm/llvm/bin/clang
# ENV CXX=/opt/rocm/llvm/bin/clang++

RUN apt-get update \
&& apt-get install -y \
Expand All @@ -40,7 +36,12 @@ WORKDIR /app
COPY . .

RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$ROCM_DOCKER_ARCH -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF \
cmake -S . -B build \
-DGGML_HIP=ON \
-DGGML_HIP_ROCWMMA_FATTN=ON \
-DAMDGPU_TARGETS="$ROCM_DOCKER_ARCH" \
-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON \
-DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF \
&& cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib \
Expand Down
123 changes: 123 additions & 0 deletions .devops/s390x.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
ARG GCC_VERSION=15.2.0
ARG UBUNTU_VERSION=24.04

### Build Llama.cpp stage
FROM gcc:${GCC_VERSION} AS build

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt update -y && \
apt upgrade -y && \
apt install -y --no-install-recommends \
git cmake ccache ninja-build \
# WARNING: Do not use libopenblas-openmp-dev. libopenblas-dev is faster.
libopenblas-dev libcurl4-openssl-dev && \
rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY . .

RUN --mount=type=cache,target=/root/.ccache \
--mount=type=cache,target=/app/build \
cmake -S . -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER_LAUNCHER=ccache \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
-DLLAMA_BUILD_TESTS=OFF \
-DGGML_BACKEND_DL=OFF \
-DGGML_NATIVE=OFF \
-DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS && \
cmake --build build --config Release -j $(nproc) && \
cmake --install build --prefix /opt/llama.cpp

COPY *.py /opt/llama.cpp/bin
COPY .devops/tools.sh /opt/llama.cpp/bin

COPY gguf-py /opt/llama.cpp/gguf-py
COPY requirements.txt /opt/llama.cpp/gguf-py
COPY requirements /opt/llama.cpp/gguf-py/requirements


### Collect all llama.cpp binaries, libraries and distro libraries
FROM scratch AS collector

# Copy llama.cpp binaries and libraries
COPY --from=build /opt/llama.cpp/bin /llama.cpp/bin
COPY --from=build /opt/llama.cpp/lib /llama.cpp/lib
COPY --from=build /opt/llama.cpp/gguf-py /llama.cpp/gguf-py


### Base image
FROM ubuntu:${UBUNTU_VERSION} AS base

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt update -y && \
apt install -y --no-install-recommends \
# WARNING: Do not use libopenblas-openmp-dev. libopenblas-dev is faster.
# See: https://github.com/ggml-org/llama.cpp/pull/15915#issuecomment-3317166506
curl libgomp1 libopenblas-dev && \
apt autoremove -y && \
apt clean -y && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

# Copy llama.cpp libraries
COPY --from=collector /llama.cpp/lib /usr/lib/s390x-linux-gnu


### Full
FROM base AS full

ENV PATH="/root/.cargo/bin:${PATH}"
WORKDIR /app

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt update -y && \
apt install -y \
git cmake libjpeg-dev \
python3 python3-pip python3-dev && \
apt autoremove -y && \
apt clean -y && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

RUN curl https://sh.rustup.rs -sSf | bash -s -- -y

COPY --from=collector /llama.cpp/bin /app
COPY --from=collector /llama.cpp/gguf-py /app/gguf-py

RUN pip install --no-cache-dir --break-system-packages \
-r /app/gguf-py/requirements.txt

ENTRYPOINT [ "/app/tools.sh" ]


### CLI Only
FROM base AS light

WORKDIR /llama.cpp/bin

# Copy llama.cpp binaries and libraries
COPY --from=collector /llama.cpp/bin/llama-cli /llama.cpp/bin

ENTRYPOINT [ "/llama.cpp/bin/llama-cli" ]


### Server
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

WORKDIR /llama.cpp/bin

# Copy llama.cpp binaries and libraries
COPY --from=collector /llama.cpp/bin/llama-server /llama.cpp/bin

EXPOSE 8080

ENTRYPOINT [ "/llama.cpp/bin/llama-server" ]
30 changes: 23 additions & 7 deletions .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,30 @@ ARG UBUNTU_VERSION=24.04

FROM ubuntu:$UBUNTU_VERSION AS build

# Install build tools
RUN apt update && apt install -y git build-essential cmake wget
# Ref: https://vulkan.lunarg.com/doc/sdk/latest/linux/getting_started.html

# Install Vulkan SDK and cURL
RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list && \
apt update -y && \
apt-get install -y vulkan-sdk libcurl4-openssl-dev curl
# Install build tools
RUN apt update && apt install -y git build-essential cmake wget xz-utils

# Install Vulkan SDK
ARG VULKAN_VERSION=1.4.321.1
RUN ARCH=$(uname -m) && \
wget -qO /tmp/vulkan-sdk.tar.xz https://sdk.lunarg.com/sdk/download/${VULKAN_VERSION}/linux/vulkan-sdk-linux-${ARCH}-${VULKAN_VERSION}.tar.xz && \
mkdir -p /opt/vulkan && \
tar -xf /tmp/vulkan-sdk.tar.xz -C /tmp --strip-components=1 && \
mv /tmp/${ARCH}/* /opt/vulkan/ && \
rm -rf /tmp/*

# Install cURL and Vulkan SDK dependencies
RUN apt install -y libcurl4-openssl-dev curl \
libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev

# Set environment variables
ENV VULKAN_SDK=/opt/vulkan
ENV PATH=$VULKAN_SDK/bin:$PATH
ENV LD_LIBRARY_PATH=$VULKAN_SDK/lib:$LD_LIBRARY_PATH
ENV CMAKE_PREFIX_PATH=$VULKAN_SDK:$CMAKE_PREFIX_PATH
ENV PKG_CONFIG_PATH=$VULKAN_SDK/lib/pkgconfig:$PKG_CONFIG_PATH

# Build it
WORKDIR /app
Expand Down
8 changes: 8 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,11 @@ insert_final_newline = unset
[vendor/miniaudio/miniaudio.h]
trim_trailing_whitespace = unset
insert_final_newline = unset

[tools/server/webui/**]
indent_style = unset
indent_size = unset
end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/010-bug-compilation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ body:
attributes:
label: GGML backends
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL, zDNN]
multiple: true
validations:
required: true
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/011-bug-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ body:
attributes:
label: GGML backends
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL, zDNN]
multiple: true
validations:
required: true
Expand Down
Loading
Loading