ggml: new backend for Virglrenderer API Remoting acceleration (v2) #18718

kpouget · 2026-01-09T13:29:35Z

This is a follow up of #17072

The API Remoting backend/frontend allow escaping the VM isolation, with the help of the virt-gpu paravirtualization (and the virglrenderer library on the host side).

ggml-remotingfrontend is a GGML API implementation, which intercepts the GGML API calls and forwards them to the virt-gpu virtual device
ggml-remotingbackend is library loaded by virglrenderer (PR will be opened soon for discussion), which opens a GGML library and forwards the call received from virglrenderer.

Here is the context behind this PR:

How we improved AI inference on macOS Podman containers --> the performance of ggml-Vulkan on Mac is 75-80% of ggml-metal
Reach native speed with MacOS llama.cpp container inference --> with API Remoting, the llama.cpp in a VM container runs at nearly 100% of ggml-metal

See the Virglrenderer PR which enables the API Remoting trampoline required in Virglrenderer:
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1590

this work focused on MacOS, where the in VM/container inference performance are tight to the remoting stack
the code works on Linux. I didn't evaluate thoroughly the performance.
Add support for the APIR capset containers/libkrun#508 --> libkrun VMM patch that allows the routing of the APIR capset to Virglrenderer

Disclaimer: I got helped by Claude Code to finalize this PR. Mostly through pre-submit reviews (no automated C code generation involved). Claude Code did generate the Python code generator (see the *.gen.h and *,gen.c files) used for the backend/frontend RPC (it was generated based on the C/H files I had manually written).

taronaeo · 2026-01-10T05:57:04Z

I'll review this in awhile. If we were to merge this, we will need a named maintainer for the backend for maintainability reasons. Will it be you? :)

taronaeo

Spacing across the PR is very inconsistent. Please follow 4 spaces and make it consistent.
The vendor files within ggml-remotingfrontend/include - can they be discovered/downloaded separately from the codebase? See:

llama.cpp/CONTRIBUTING.md

Line 65 in 9ac2693

- Avoid adding third-party dependencies, extra files, extra headers, etc.
Inconsistent styling:

__attribute__((unused))
static inline const char *apir_command_name(ApirCommandType type)
{

vs.

static ggml_status ggml_backend_remoting_graph_compute(ggml_backend_t backend, ggml_cgraph * cgraph) {

Please follow CONTRIBUTING.md: https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md

ggml/src/ggml-virtgpu/ggml-backend.cpp

CMakePresets.json

ggml/src/ggml-virtgpu/backend/shared/apir_cs.h

ggml/src/ggml-remotingbackend/backend-utils.cpp

ggml/src/ggml-remotingbackend/backend-utils.h

ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp

kpouget · 2026-01-12T16:40:40Z

thanks for the review @taronaeo, I think I followed and fixed all the suggestions

If we were to merge this, we will need a named maintainer for the backend for maintainability reasons. Will it be you? :)

yes, would be me indeed :)

taronaeo

Looks a lot better now, thank you for cleaning the code.

I'm still wondering, are the 3rd party vendor files required to be part of GGML/Llama.cpp? (Can they be downloaded separately during development time via a script?)
I'm not sure if I missed it, but I don't see the required GGML_BACKEND_DL_IMPL macro call in this PR. Did GGML register your backend correctly?
#18718 (comment)

I'm also interested in testing this PR out on my MacBook. Do you have any guides/steps for me to follow to test it?

ggml/src/.claude/settings.local.json

ggml/src/ggml-remotingbackend/shared/api_remoting.h

ggml/src/ggml-virtgpu/backend/shared/api_remoting.h

ggml/src/ggml-remotingbackend/shared/api_remoting.h

ggml/src/ggml-remotingbackend/CMakeLists.txt

ggml/src/ggml-remotingfrontend/CMakeLists.txt

ggml/src/ggml-remotingfrontend/ggml-remoting-frontend.cpp

ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp

ggml/src/ggml-remotingbackend/backend-dispatched.cpp

ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp

kpouget · 2026-01-13T13:49:56Z

I'm also interested in testing this PR out on my MacBook. Do you have any guides/steps for me to follow to test it?

sure :)

the blog post has the steps to reproduce it with pre-compiled binaries:
https://developers.redhat.com/articles/2025/09/18/reach-native-speed-macos-llamacpp-container-inference#try_api_remoting_with_ramalama

actually, you should be able to follow the INSTALL steps from my release page:
https://github.com/crc-org/llama.cpp/releases/tag/b7356-remoting-0.3.0

(I'll try to regenerate the binaries before the end of the week)

and this document has the steps to rebuild the different sources, you can request access

happy to discuss it on IBM-RH slack if you need help

kpouget · 2026-01-14T13:02:20Z

For information, I'll be at FOSDEM at the end of the month to present the work behind this PR:
https://fosdem.org/2026/schedule/event/C9NF8K-api_remoting_for_llama_cpp_near-native_gpu_speed_in_macos_containers/

kpouget · 2026-01-14T17:14:52Z

I'm not sure if I missed it, but I don't see the required GGML_BACKEND_DL_IMPL macro call in this PR. Did GGML register your backend correctly?

indeed, I'm not using it at the moment (and everything works fine), I'll review tomorrow how it should be used

taronaeo · 2026-01-16T08:34:59Z

For information, I'll be at FOSDEM at the end of the month to present the work behind this PR: https://fosdem.org/2026/schedule/event/C9NF8K-api_remoting_for_llama_cpp_near-native_gpu_speed_in_macos_containers/

That's great and congratulations! I apologise for my slowness in reviewing this.

I've tested this, and it looks great. Performance is pretty good.

There are CI failures. For the LLAMA_CURL failures, could you rebase with master to fix the cURL failures?

I'm not sure if I missed it, but I don't see the required GGML_BACKEND_DL_IMPL macro call in this PR. Did GGML register your backend correctly?

indeed, I'm not using it at the moment (and everything works fine), I'll review tomorrow how it should be used

Odd and interesting, but I can see it registered during the benchmark, so all is good :)

taronaeo · 2026-01-16T08:41:15Z

Also, could you do/consider the following?

Add yourself to the CODEOWNERS file so that GitHub/we can identify the maintainer to ping when issues arise.
If possible, and can be done in a follow-up PR, have a backend documentation f.ex., https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/zDNN.md
nitpick: Could the backend name be a little more descriptive? I was thinking something like ggml-virgl.

kpouget · 2026-01-16T09:05:03Z

That's great and congratulations! I apologise for my slowness in reviewing this.

no rush, thanks for having a look, that's much appreciated 🙏🏻

I'm not sure if I missed it, but I don't see the required GGML_BACKEND_DL_IMPL macro call in this PR. Did GGML register your backend correctly?

indeed, I'm not using it at the moment (and everything works fine), I'll review tomorrow how it should be used

Odd and interesting, but I can see it registered during the benchmark, so all is good :)

I'm actually confused about the intended behavior of this. Or more the actual behavior, as I have a good clue about the intend.

if I add this:

GGML_BACKEND_DL_IMPL(ggml_backend_remoting_frontend_reg)

I still see that:

load_backend: failed to find ggml_backend_init in /home/kpouget/remoting-linux/llama_cpp/build.remoting-frontend/bin/libggml-vulkan.so
load_backend: failed to find ggml_backend_init in /home/kpouget/remoting-linux/llama_cpp/build.remoting-frontend/bin/libggml-remotingfrontend.so
load_backend: failed to find ggml_backend_init in /home/kpouget/remoting-linux/llama_cpp/build.remoting-frontend/bin/libggml-cpu.so

and I'm confused about the way it's actually implemented (I didn't review it in depth), as I feel that this part already does the job, but in a non-generic way:

    ggml_backend_registry() {
        ...
#ifdef GGML_USE_REMOTINGFRONTEND
        register_backend(ggml_backend_remoting_frontend_reg());
#endif
        ...
    }

Also, could you do/consider the following?

yes sure.
I'll push the rebase soon, I need to CI-validate it first.

taronaeo · 2026-01-19T12:04:32Z

I'm actually confused about the intended behavior of this. Or more the actual behavior, as I have a good clue about the intend.

IIRC,

    ggml_backend_registry() {
        ...
#ifdef GGML_USE_REMOTINGFRONTEND
        register_backend(ggml_backend_remoting_frontend_reg());
#endif
        ...
    }

Only registers the backend for static builds i.e., with -DGGML_NATIVE=ON -DGGML_BACKEND_DL=OFF. But when we try to build for dynamic loading i.e., with -DGGML_NATIVE=OFF -DGGML_BACKEND_DL=ON, it would not be able to register the backend.

Please have a go with switching those 2 macros and see if your backend is registered for both cases. It's likely that because GGML_BACKEND_DL_IMPL is not part of your code, building for dynamic loading will fail to load your backend.

kpouget · 2026-01-20T13:16:44Z

Please have a go with switching those 2 macros and see if your backend is registered for both cases.

thanks for the suggestion, I went back to this part of the code (loading the ggml library) and I could rework and simplify the way the remoting backend loads the GGML library implementation. From 3 config options (lib path, reg fct, init fct), I'm down to one ! (lib path)

I'm also reworking the way the API Remoting is getting configured (the path to the libraries to load), so that the hypervisor is in charge of it, via an API, instead of environment variables. Will make things much cleaner. I'll push the update (and rebase) soon.

nitpick: Could the backend name be a little more descriptive? I was thinking something like ggml-virgl.

yes, I'm thinking about ggml-virtgpu-apir, and I'll try to see if I can have the backend (currently ggml-remotingbackend)
stored in a subdirectory of the frontend.

EDIT: ggml-virtgpu-apir cannot work in the build system (because of the second -), so going for ggml-virtgpu

kpouget · 2026-01-26T08:16:32Z

@taronaeo, I could complete the rebase and finalize multiple aspects of the polishing, including this part:

nitpick: Could the backend name be a little more descriptive? I was thinking something like ggml-virgl.

I used ggml-virtgpu (ggml-virtgpu-apir isn't allowed by the build system, unfortunately, because of the second -) and I moved the backend to ggml-virtgpu/backend. Cleaner this way :)

The only pending thing I see is the documentation, I'll give that a kick during the week.

After quite some struggle to align Virglrenderer, the CI test harness and llama.cpp to work on Linux and MacOS, I managed to get things properly aligned :)

This version of the PR (named b7755-remoting-0.4.4 in my repo) + release virglrenderer v1.2.0-remoting-0.3.5-macos/v1.2.0-remoting-0.3.5-linux) should be feature complete, I don't expect any significant change coming from my side anymore.

I could get the build and some test running on Linux, but I'm afraid the Vulkan backend on my Intel CPU has come race conditions that prevent the testing from running end to end :/ I'll review that further to be sure it doesn't come from the API Remoting layer, but the Vulkan testing doesn't succeed any better.

taronaeo · 2026-01-26T10:42:16Z

The only pending thing I see is the documentation, I'll give that a kick during the week.

Feel free to push the documentation in a separate PR :)

I could get the build and some test running on Linux, but I'm afraid the Vulkan backend on my Intel CPU has come race conditions that prevent the testing from running end to end :/ I'll review that further to be sure it doesn't come from the API Remoting layer, but the Vulkan testing doesn't succeed any better.

I think its fine if the feature is limited to macOS for now. You'll just need to specify in your documentation that there is currently this limitation and it is being worked on.

Aside, there are CI errors again haha. Can you fix them? I'll review the PR again in a while.

taronaeo

GGML to device implementation generally looks okay. Just one question about the backend initialization.

ggml/src/ggml-virtgpu/ggml-backend-reg.cpp

taronaeo · 2026-01-26T13:45:18Z

CIs are still failing :(

Once those are fixed, let me know when this PR is ready for merge.

…uffer is supported

Also cleanup the apir<>ggml-remotingbackend interface

…ception arguments

kpouget · 2026-01-26T16:57:22Z

Once those are fixed, let me know when this PR is ready for merge.

@taronaeo, PR is ready from my POV, your comment should have all been addressed,
the documentation will come later this week with another PR
my CI test passed

only thing is that I couldn't test against the latest master (b7837) because llama-cli wasn't answering correctly :/

./llama_cpp/build.remoting-backend/bin/llama-cli -ngl 99 -m /Users/kevinpouget/models/llama3.2

> say nothing

{"name": "say", "parameters": {"x": "nothing"}}

> What's the GGML API?

{"name": "get_api_documentation", "parameters": {"x": "GGML API"}}

this ^^^ is the MacOS native run, so I guess something's broken elsewhere ...

seems to be this commit that broke it actually:

c15395f - common : implement new jinja template engine (implement new jinja template engine #18462) (tag: b7756) (10 days ago)

as with b7755 I get the expected answer (😛)

> What's the GGML API?

GGML (Geometry Game Markup Language) is a markup language used to describe 3D geometry in games. It's primarily used in the context of game development, particularly with the Unity game engine...

taronaeo · 2026-01-27T11:12:37Z

only thing is that I couldn't test against the latest master (b7837) because llama-cli wasn't answering correctly :/
./llama_cpp/build.remoting-backend/bin/llama-cli -ngl 99 -m /Users/kevinpouget/models/llama3.2 
> say nothing

{"name": "say", "parameters": {"x": "nothing"}}
> What's the GGML API?

{"name": "get_api_documentation", "parameters": {"x": "GGML API"}}
this ^^^ is the MacOS native run, so I guess something's broken elsewhere ...

Interesting. llama-cli is a thin-client with llama-server running in the background. Were you able to get it working with llama-server and a simple HTTP request?

IMO we should try to aim for a working backend before merging with upstream.

kpouget · 2026-01-28T08:25:32Z

I've opened #19155, the issue is unrelated to my PR 😌

Below is the manual latest testing, rebased on top of b7849, and there is the automated build and perf test.

$ ramalama   run --image quay.io/crcont/remoting:v0.16.0-apir.0.1.4-rc4  ibm/granite:2b
🦭 > hello
Hello! It's a pleasure to meet you. How can I assist you today?

$ ramalama   run --image quay.io/crcont/remoting:v0.16.0-apir.0.1.4-rc4   smollm:135m
🦭 > hello
Hello! How can I help you?

$ ramalama   run --image quay.io/crcont/remoting:v0.16.0-apir.0.1.4-rc4  ollama://llama3.2
🦭 > hello
{"name": "print", "parameters": {"s": "hello"}}

$ ramalama   run --image quay.io/crcont/remoting:v0.16.0-apir.0.1.4-rc4  mistral:7b
🦭 > hello
 Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help!

interestingly, the perf testing is unaffected by the bug, although the output text is clearly wrong (and that's the vanilla ggml-metal output)

      "output_text": "{\"name\": \"decide\", \"parameters\": {\"value\": \"see and buy the bike\"}}",
      "output_tokens": 22,

taronaeo · 2026-01-28T09:49:20Z

Great! Yeah usually it's good to test other models to see if the issue is related to a specific model. I guess this PR is good to merge, merging.

kpouget · 2026-01-28T09:54:12Z

great, thanks again for your help in the review, it was really appreciated!
and great timing to have this merged before Fosdem, that will make a good conclusion to the talk 😃

kpouget requested a review from ggerganov as a code owner January 9, 2026 13:29

kpouget mentioned this pull request Jan 9, 2026

[RFC] ggml: new backend for API Remoting #17072

Closed

kpouget changed the title ~~ggml: new backend for Virglrenderer API Remoting~~ ggml: new backend for Virglrenderer API Remoting (v2) Jan 9, 2026

kpouget changed the title ~~ggml: new backend for Virglrenderer API Remoting (v2)~~ ggml: new backend for Virglrenderer API Remoting acceleration (v2) Jan 9, 2026

kpouget mentioned this pull request Jan 9, 2026

Add support for the APIR capset containers/libkrun#508

Draft

loci-dev mentioned this pull request Jan 9, 2026

UPSTREAM PR #18718: ggml: new backend for Virglrenderer API Remoting acceleration (v2) auroralabs-loci/llama.cpp#867

Open

github-actions bot added build Compilation issues python python script changes ggml changes relating to the ggml tensor library for machine learning labels Jan 9, 2026

kpouget force-pushed the upstream branch from a6ed565 to 9017716 Compare January 9, 2026 17:38

taronaeo self-assigned this Jan 10, 2026

taronaeo reviewed Jan 10, 2026

View reviewed changes

taronaeo reviewed Jan 13, 2026

View reviewed changes

kpouget force-pushed the upstream branch from 1e841ac to a888700 Compare January 13, 2026 13:38

kpouget force-pushed the upstream branch from 7198d88 to 6ce1bec Compare January 26, 2026 07:59

taronaeo approved these changes Jan 26, 2026

View reviewed changes

ggml/src/ggml-virtgpu/ggml-backend-reg.cpp Show resolved Hide resolved

ggml: add the ggml-remoting frontend/backend to the build system

dd34429

kpouget added 21 commits January 26, 2026 15:57

Use uint64_t instead of long long

e9a469b

use (full) upper case for constants

ab4d5cc

ggml-remoting-frontend.cpp: remove unused file

b522bfe

regenerate_remoting: remove unnecessary import

18ef30d

regenerate_remoting: appease the linter

e0bb437

backend.cpp: use the right variable in error message

ba48cfb

ggml-backend-reg: fix typo

9182516

ggml_backend_remoting_buffer_type_get_alloc_size: validate that the b…

7ec38db

…uffer is supported

ggml-backend-reg.cpp: define the GGML_BACKEND_DL_IMPL

f1ec1be

Update to allow dynamic configuration from the hypervisor

119bdec

Also cleanup the apir<>ggml-remotingbackend interface

remotingbackend: Simplify the initialization process

8ff5522

Rename the GGML backend

2401f63

virtgpu-forward-buffer.cpp: remove dead code

4e38199

finish updating the backend location

179a146

ggml: src: ggml-virtgpu/regenerate_remoting: correctly use logging.ex…

9eb77dd

…ception arguments

appaise the linter

66f75b3

appaise the linter

d2944e7

fix the wrong indent style

cf241f8

ggml-virtgpu: use a mutex to protect the virtgpu initialization

29acebe

ggml-virtgpu: fetch venus_hw.h from virglrenderer project

08e8080

fix the wrong indent style

e38e146

kpouget force-pushed the upstream branch from 6acea9a to e38e146 Compare January 26, 2026 15:07

fix typo

4cac29c

taronaeo merged commit b7feacf into ggml-org:master Jan 28, 2026
147 of 151 checks passed

kpouget deleted the upstream branch January 28, 2026 09:52

ggml: new backend for Virglrenderer API Remoting acceleration (v2) #18718

ggml: new backend for Virglrenderer API Remoting acceleration (v2) #18718

Uh oh!

Conversation

kpouget commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taronaeo commented Jan 10, 2026

Uh oh!

taronaeo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kpouget commented Jan 12, 2026

Uh oh!

taronaeo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kpouget commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kpouget commented Jan 14, 2026

Uh oh!

kpouget commented Jan 14, 2026

Uh oh!

taronaeo commented Jan 16, 2026

Uh oh!

taronaeo commented Jan 16, 2026

Uh oh!

kpouget commented Jan 16, 2026

Uh oh!

taronaeo commented Jan 19, 2026

Uh oh!

kpouget commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kpouget commented Jan 26, 2026

Uh oh!

taronaeo commented Jan 26, 2026

Uh oh!

taronaeo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taronaeo commented Jan 26, 2026

Uh oh!

kpouget commented Jan 26, 2026

Uh oh!

taronaeo commented Jan 27, 2026

Uh oh!

kpouget commented Jan 28, 2026

Uh oh!

taronaeo commented Jan 28, 2026

Uh oh!

Uh oh!

kpouget commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

kpouget commented Jan 9, 2026 •

edited

Loading

taronaeo left a comment •

edited

Loading

taronaeo left a comment •

edited

Loading

kpouget commented Jan 13, 2026 •

edited

Loading

kpouget commented Jan 20, 2026 •

edited

Loading

taronaeo left a comment •

edited

Loading