Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
95ce098
HIP: add IMbackK to codeowner (#16375)
IMbackK Oct 2, 2025
2be72c2
SYCL: Update to oneAPI 2025.2 (#16371)
NeoZhangJianyu Oct 2, 2025
bbd32bc
ci : fix clean-up of old logs (#16381)
ggerganov Oct 2, 2025
f09aefa
ci: update vulkan ci (#16294)
netrunnereve Oct 2, 2025
72ee736
ci : fix ubuntu-latest-cmake-rpc (disable ccache) (#16388)
CISC Oct 2, 2025
91a2a56
musa: update compile flags (#16265)
yeahdongcn Oct 2, 2025
34fcc5a
model : Apertus model implementation (#15852)
pwilkin Oct 2, 2025
ef07a40
ggml webgpu: add support for soft_max, optimize rms_norm (#16357)
reeselevine Oct 2, 2025
d64c810
test-barrier : do not use more threads than physically available (#16…
CISC Oct 2, 2025
5113efd
fix: track viewportHeight via window.innerHeight to avoid unwanted sc…
ServeurpersoCom Oct 3, 2025
136bda7
webui : Fix messages payload sent to chat completions (#16402)
allozaur Oct 3, 2025
e308efd
vulkan: in flash attention, bounds check against nem1 (don't rely on …
jeffbolznv Oct 3, 2025
7723327
Capture model name only after first token (streaming) or completed re…
allozaur Oct 3, 2025
ad12647
ci : change macos-13 to macos-15-intel (#16401)
danbev Oct 3, 2025
0e1f838
vulkan: Fix FA coopmat1 invalid array indexing (#16365)
jeffbolznv Oct 3, 2025
2aaf0a2
vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (#1…
jeffbolznv Oct 3, 2025
84c8e30
Fix missing messages on sibling navigation (#16408)
allozaur Oct 3, 2025
638d330
ggml : fix graph reallocation with multiple chunks (#16396)
Acly Oct 3, 2025
946f71e
llama : fix shapes for bert/mpt q/k norm (#16409)
CISC Oct 3, 2025
606a73f
metal : fix loop bound in ggml_mem_ranges (#16412)
ggerganov Oct 3, 2025
f6dcda3
server : context checkpointing for hybrid and recurrent models (#16382)
ddh0 Oct 3, 2025
128d522
chat : support Magistral thinking (#16413)
ServeurpersoCom Oct 3, 2025
e29acf7
vulkan : incremental shader builds (#16341)
Acly Oct 4, 2025
898acba
rpc : add support for multiple devices (#16276)
rgerganov Oct 4, 2025
f392839
rpc : check src buffer when copying tensor (#16421)
rgerganov Oct 4, 2025
86df2c9
vulkan: use a more appropriate amount of threads when generating shad…
netrunnereve Oct 4, 2025
3526657
ggml webgpu: actually add softmax, fix rms_norm offset (#16400)
reeselevine Oct 5, 2025
ca71fb9
model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)
gabe-l-hart Oct 5, 2025
c5fef0f
server: update readme to mention n_past_max metric (#16436)
okuvshynov Oct 6, 2025
1d49ca3
nix : removed metal for nix (#16118)
yuannan Oct 6, 2025
a80ff18
ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (#16443)
danbev Oct 6, 2025
04e632a
ci : remove missing reranker model files (#16444)
danbev Oct 6, 2025
a23b9bd
ggml : fix unaligned access in AMX code (#16315)
ggerganov Oct 6, 2025
3a002af
ci : refactor sdk caching to minimize storage (#16414)
CISC Oct 6, 2025
c08002a
chat : Granite Docling stopping (#16438)
gabe-l-hart Oct 6, 2025
3df2244
llama : add --no-host to disable host buffers (#16310)
Gadflyii Oct 6, 2025
8ae32dc
metal : various optimizations + refactoring (#16446)
ggerganov Oct 7, 2025
1d6092f
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
ggerganov Oct 7, 2025
0a319bb
metal : add support for non-padded FA KV (#16148)
ggerganov Oct 7, 2025
0123ff3
memory : use sequential equal splits for recurrent modules (#16442)
ggerganov Oct 7, 2025
c61ae20
rpc : update documentation (#16441)
rgerganov Oct 7, 2025
ef4c5b8
presets : fix pooling param for embedding models (#16455)
ggerganov Oct 7, 2025
4e0388a
webui : added download action (#13552) (#16282)
srogmann Oct 7, 2025
df1b612
server : add `/v1/health` endpoint (#16461)
ggerganov Oct 7, 2025
aeaf8a3
llama : support LiquidAI LFM2-MoE hybrid model (#16464)
tdakhran Oct 7, 2025
74b8fc1
ggml webgpu: profiling, CI updates, reworking of command submission (…
reeselevine Oct 7, 2025
7fdd16b
server : improve context checkpoint logic (#16440)
ggerganov Oct 8, 2025
b2c08c9
metal : mark FA blocks (#16372)
ggerganov Oct 8, 2025
d2ee056
server : fix cancel pending task (#16467)
issixx Oct 8, 2025
9d08828
Disable CUDA host buffers on integrated GPUs (#16308)
ai-fonsi Oct 8, 2025
12bbc3f
refactor: centralize CoT parsing in backend for streaming mode (#16394)
ServeurpersoCom Oct 8, 2025
e08db42
model: EmbeddingGemma Adding Support for SentenceTransformers Dense M…
sfallah Oct 9, 2025
b260213
[SYCL] refactor soft_max, add soft_max_back (#16472)
NeoZhangJianyu Oct 9, 2025
d80d6d2
kleidiai: kernel interface refactoring (#16460)
chaxu01 Oct 9, 2025
aa4711d
CANN: Improve ACL graph matching (#16166)
noemotiovon Oct 9, 2025
2c0d875
ci: add ARM64 Kleidiai build and test support (#16462)
sudhiarm Oct 9, 2025
56b4795
model-conversion : add support for SentenceTransformers (#16387)
danbev Oct 9, 2025
8328fd4
No markdown in cot (#16483)
ServeurpersoCom Oct 9, 2025
d00cbea
server : host-memory prompt caching (#16391)
ggerganov Oct 9, 2025
1deee0f
cpu : optimize the ggml NORM operation (#15953)
duduta Oct 9, 2025
1faa13a
webui: updated the chat service to only include max_tokens in the req…
ServeurpersoCom Oct 9, 2025
6d69ab3
cmake : Dont define XOPENSOURCE on AIX (#16481)
mehendarkarprajwal Oct 10, 2025
cdb6da4
server : log requests to /v1/completions (#16495)
rgerganov Oct 10, 2025
68ee98a
server : return HTTP 400 if prompt exceeds context length (#16486)
rgerganov Oct 10, 2025
81086cd
vocab : mark EOT token for Granite models (#16499)
ggerganov Oct 10, 2025
e60f01d
server : fix division by zero when reporting stats (#16501)
ggerganov Oct 10, 2025
477a66b
convert : correctly handle LLaMA tokenizer for Jamba (#16470)
amirai21 Oct 11, 2025
97870e6
cuda : avoid initializing unused devices (#16510)
slaren Oct 11, 2025
31d0ff1
server / ranking : add sorting and management of top_n (#16403)
YannFollet Oct 11, 2025
4a8fbe0
feat: render user content as markdown option (#16358)
ServeurpersoCom Oct 11, 2025
a3cb047
metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494)
ggerganov Oct 11, 2025
11f0af5
CUDA: faster tile FA, add oob checks, more HSs (#16492)
JohannesGaessler Oct 11, 2025
20cc625
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518)
sirus20x6 Oct 12, 2025
a2fba89
hparams : add check for layer index in is_recurrent (#16511)
danbev Oct 12, 2025
41aac5c
ggml : Fix FP16 ELU positive branch (#16519)
sirus20x6 Oct 12, 2025
4b2dae3
common : update presets (#16504)
ggerganov Oct 12, 2025
1616bf4
Merge branch 'master'
lochjin Oct 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
ARG ONEAPI_VERSION=2025.1.1-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.2.2-0-devel-ubuntu24.04

## Build Image

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS build

ARG GGML_SYCL_F16=OFF
RUN apt-get update && \
Expand Down Expand Up @@ -31,7 +31,7 @@ RUN mkdir -p /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS base
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
Expand Down
4 changes: 0 additions & 4 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,6 @@ effectiveStdenv.mkDerivation (finalAttrs: {
};

postPatch = ''
substituteInPlace ./ggml/src/ggml-metal/ggml-metal.m \
--replace '[bundle pathForResource:@"ggml-metal" ofType:@"metal"];' "@\"$out/bin/ggml-metal.metal\";"
substituteInPlace ./ggml/src/ggml-metal/ggml-metal.m \
--replace '[bundle pathForResource:@"default" ofType:@"metallib"];' "@\"$out/bin/default.metallib\";"
'';

# With PR#6015 https://github.com/ggml-org/llama.cpp/pull/6015,
Expand Down
36 changes: 36 additions & 0 deletions .github/actions/install-exe/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: "Install exe"
description: "Download and install exe"
inputs:
url:
description: "URL of the exe installer"
required: true
args:
description: "Installer arguments"
required: true
timeout:
description: "Timeout (in ms)"
required: false
default: "600000"

runs:
using: "composite"
steps:
- name: Install EXE
shell: pwsh
run: |
$ErrorActionPreference = "Stop"
write-host "Downloading Installer EXE"
Invoke-WebRequest -Uri "${{ inputs.url }}" -OutFile "${env:RUNNER_TEMP}\temp-install.exe"
write-host "Installing"
$proc = Start-Process "${env:RUNNER_TEMP}\temp-install.exe" -ArgumentList '${{ inputs.args }}' -NoNewWindow -PassThru
$completed = $proc.WaitForExit(${{ inputs.timeout }})
if (-not $completed) {
Write-Error "Installer timed out. Killing the process"
$proc.Kill()
exit 1
}
if ($proc.ExitCode -ne 0) {
Write-Error "Installer failed with exit code $($proc.ExitCode)"
exit 1
}
write-host "Completed installation"
20 changes: 20 additions & 0 deletions .github/actions/linux-setup-spacemit/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: "Linux - Setup SpacemiT Toolchain"
description: "Setup SpacemiT Toolchain for Linux"
inputs:
path:
description: "Installation path"
required: true
version:
description: "SpacemiT toolchain version"
required: true

runs:
using: "composite"
steps:
- name: Setup SpacemiT Toolchain
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://archive.spacemit.com/toolchain/spacemit-toolchain-linux-glibc-x86_64-v${{ inputs.version }}.tar.xz
path: ${{ inputs.path }}
strip: 1
20 changes: 20 additions & 0 deletions .github/actions/linux-setup-vulkan/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: "Linux - Setup Vulkan SDK"
description: "Setup Vulkan SDK for Linux"
inputs:
path:
description: "Installation path"
required: true
version:
description: "Vulkan SDK version"
required: true

runs:
using: "composite"
steps:
- name: Setup Vulkan SDK
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://sdk.lunarg.com/sdk/download/${{ inputs.version }}/linux/vulkan_sdk.tar.xz
path: ${{ inputs.path }}
strip: 1
27 changes: 27 additions & 0 deletions .github/actions/unarchive-tar/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: "Unarchive tar"
description: "Download and unarchive tar into directory"
inputs:
url:
description: "URL of the tar archive"
required: true
path:
description: "Directory to unarchive into"
required: true
type:
description: "Compression type (tar option)"
required: false
default: "J"
strip:
description: "Strip components"
required: false
default: "0"

runs:
using: "composite"
steps:
- name: Unarchive into directory
shell: bash
run: |
mkdir -p ${{ inputs.path }}
cd ${{ inputs.path }}
curl --no-progress-meter ${{ inputs.url }} | tar -${{ inputs.type }}x --strip-components=${{ inputs.strip }}
15 changes: 15 additions & 0 deletions .github/actions/windows-setup-rocm/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: "Windows - Setup ROCm"
description: "Setup ROCm for Windows"
inputs:
version:
description: "ROCm version"
required: true

runs:
using: "composite"
steps:
- name: Setup ROCm
uses: ./.github/actions/install-exe
with:
url: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-${{ inputs.version }}-WinSvr2022-For-HIP.exe
args: -install
89 changes: 89 additions & 0 deletions .github/workflows/build-cache.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
name: Build Actions Cache

on:
workflow_dispatch: # allows manual triggering
schedule:
- cron: '0 * * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

jobs:
ubuntu-24-vulkan-cache:
runs-on: ubuntu-24.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Get latest Vulkan SDK version
id: vulkan_sdk_version
run: |
echo "VULKAN_SDK_VERSION=$(curl https://vulkan.lunarg.com/sdk/latest/linux.txt)" >> "$GITHUB_ENV"

- name: Setup Cache
uses: actions/cache@v4
id: cache-sdk
with:
path: ./vulkan_sdk
key: vulkan-sdk-${{ env.VULKAN_SDK_VERSION }}-${{ runner.os }}

- name: Setup Vulkan SDK
if: steps.cache-sdk.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-vulkan
with:
path: ./vulkan_sdk
version: ${{ env.VULKAN_SDK_VERSION }}

ubuntu-24-spacemit-cache:
runs-on: ubuntu-24.04

env:
# Make sure this is in sync with build-linux-cross.yml
SPACEMIT_IME_TOOLCHAIN_VERSION: "1.1.2"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Setup Cache
uses: actions/cache@v4
id: cache-toolchain
with:
path: ./spacemit_toolchain
key: spacemit-ime-toolchain-v${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}-${{ runner.os }}

- name: Setup SpacemiT Toolchain
if: steps.cache-toolchain.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-spacemit
with:
path: ./spacemit_toolchain
version: ${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}

windows-2022-rocm-cache:
runs-on: windows-2022

env:
# Make sure this is in sync with build.yml
HIPSDK_INSTALLER_VERSION: "25.Q3"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Setup Cache
uses: actions/cache@v4
id: cache-rocm
with:
path: C:\Program Files\AMD\ROCm
key: rocm-${{ env.HIPSDK_INSTALLER_VERSION }}-${{ runner.os }}

- name: Setup ROCm
if: steps.cache-rocm.outputs.cache-hit != 'true'
uses: ./.github/actions/windows-setup-rocm
with:
version: ${{ env.HIPSDK_INSTALLER_VERSION }}
26 changes: 12 additions & 14 deletions .github/workflows/build-linux-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -258,31 +258,29 @@ jobs:
runs-on: ubuntu-24.04

env:
# Make sure this is in sync with build-cache.yml
SPACEMIT_IME_TOOLCHAIN_VERSION: "1.1.2"
SPACEMIT_IME_TOOLCHAIN_PATH: "spacemit-toolchain-linux-glibc-x86_64"

steps:
- uses: actions/checkout@v4

- name: Cache Toolchain
- name: Use SpacemiT Toolchain Cache
uses: actions/cache@v4
id: cache-spacemit-ime-cross-toolchain
id: cache-toolchain
with:
path: ./${{ env.SPACEMIT_IME_TOOLCHAIN_PATH }}
key: ${{ runner.os }}-spacemit-ime-toolchain-v${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}
path: ./spacemit_toolchain
key: spacemit-ime-toolchain-v${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}-${{ runner.os }}

- name: Setup Toolchain
if: steps.cache-spacemit-ime-cross-toolchain.outputs.cache-hit != 'true'
run: |
wget --quiet --no-check-certificate https://archive.spacemit.com/toolchain/spacemit-toolchain-linux-glibc-x86_64-v${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}.tar.xz -O ${{ env.SPACEMIT_IME_TOOLCHAIN_PATH }}.tar.xz
rm -rf ${{ env.SPACEMIT_IME_TOOLCHAIN_PATH }}
mkdir -p ${{ env.SPACEMIT_IME_TOOLCHAIN_PATH }}
tar xf ${{ env.SPACEMIT_IME_TOOLCHAIN_PATH }}.tar.xz -C ${{ env.SPACEMIT_IME_TOOLCHAIN_PATH }} --strip-components=1
rm -rf ${{ env.SPACEMIT_IME_TOOLCHAIN_PATH }}.tar.xz
- name: Setup SpacemiT Toolchain
if: steps.cache-toolchain.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-spacemit
with:
path: ./spacemit_toolchain
version: ${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}

- name: Build
run: |
export RISCV_ROOT_PATH=${PWD}/${{ env.SPACEMIT_IME_TOOLCHAIN_PATH }}
export RISCV_ROOT_PATH=${PWD}/spacemit_toolchain
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENMP=OFF \
Expand Down
Loading
Loading