-
Notifications
You must be signed in to change notification settings - Fork 14.7k
ggml: new backend for Virglrenderer API Remoting acceleration (v2) #18718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'll review this in awhile. If we were to merge this, we will need a named maintainer for the backend for maintainability reasons. Will it be you? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Spacing across the PR is very inconsistent. Please follow 4 spaces and make it consistent.
- The vendor files within
ggml-remotingfrontend/include- can they be discovered/downloaded separately from the codebase? See:Line 65 in 9ac2693
- Avoid adding third-party dependencies, extra files, extra headers, etc. - Inconsistent styling:
__attribute__((unused))
static inline const char *apir_command_name(ApirCommandType type)
{vs.
static ggml_status ggml_backend_remoting_graph_compute(ggml_backend_t backend, ggml_cgraph * cgraph) {Please follow CONTRIBUTING.md: https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md
|
thanks for the review @taronaeo, I think I followed and fixed all the suggestions
yes, would be me indeed :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a lot better now, thank you for cleaning the code.
- I'm still wondering, are the 3rd party vendor files required to be part of GGML/Llama.cpp? (Can they be downloaded separately during development time via a script?)
- I'm not sure if I missed it, but I don't see the required
GGML_BACKEND_DL_IMPLmacro call in this PR. Did GGML register your backend correctly? - #18718 (comment)
I'm also interested in testing this PR out on my MacBook. Do you have any guides/steps for me to follow to test it?
sure :) the blog post has the steps to reproduce it with pre-compiled binaries: actually, you should be able to follow the (I'll try to regenerate the binaries before the end of the week) and this document has the steps to rebuild the different sources, you can request access happy to discuss it on IBM-RH slack if you need help |
|
For information, I'll be at FOSDEM at the end of the month to present the work behind this PR: |
indeed, I'm not using it at the moment (and everything works fine), I'll review tomorrow how it should be used |
That's great and congratulations! I apologise for my slowness in reviewing this. I've tested this, and it looks great. Performance is pretty good. There are CI failures. For the
Odd and interesting, but I can see it registered during the benchmark, so all is good :) |
|
Also, could you do/consider the following?
|
no rush, thanks for having a look, that's much appreciated 🙏🏻
I'm actually confused about the intended behavior of this. Or more the actual behavior, as I have a good clue about the intend. if I add this: I still see that: and I'm confused about the way it's actually implemented (I didn't review it in depth), as I feel that this part already does the job, but in a non-generic way:
yes sure. |
IIRC,
Only registers the backend for static builds i.e., with Please have a go with switching those 2 macros and see if your backend is registered for both cases. It's likely that because |
thanks for the suggestion, I went back to this part of the code (loading the ggml library) and I could rework and simplify the way the remoting backend loads the GGML library implementation. From 3 config options (lib path, reg fct, init fct), I'm down to one ! (lib path) I'm also reworking the way the API Remoting is getting configured (the path to the libraries to load), so that the hypervisor is in charge of it, via an API, instead of environment variables. Will make things much cleaner. I'll push the update (and rebase) soon.
yes, I'm thinking about EDIT: |
|
@taronaeo, I could complete the rebase and finalize multiple aspects of the polishing, including this part:
I used The only pending thing I see is the documentation, I'll give that a kick during the week. After quite some struggle to align Virglrenderer, the CI test harness and llama.cpp to work on Linux and MacOS, I managed to get things properly aligned :) This version of the PR (named I could get the build and some test running on Linux, but I'm afraid the Vulkan backend on my Intel CPU has come race conditions that prevent the testing from running end to end :/ I'll review that further to be sure it doesn't come from the API Remoting layer, but the Vulkan testing doesn't succeed any better. |
Feel free to push the documentation in a separate PR :)
I think its fine if the feature is limited to macOS for now. You'll just need to specify in your documentation that there is currently this limitation and it is being worked on. Aside, there are CI errors again haha. Can you fix them? I'll review the PR again in a while. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGML to device implementation generally looks okay. Just one question about the backend initialization.
|
CIs are still failing :( Once those are fixed, let me know when this PR is ready for merge. |
…uffer is supported
Also cleanup the apir<>ggml-remotingbackend interface
…ception arguments
@taronaeo, PR is ready from my POV, your comment should have all been addressed, only thing is that I couldn't test against the latest master ( this ^^^ is the MacOS native run, so I guess something's broken elsewhere ... seems to be this commit that broke it actually:
as with |
Interesting. IMO we should try to aim for a working backend before merging with upstream. |
|
I've opened #19155, the issue is unrelated to my PR 😌 Below is the manual latest testing, rebased on top of interestingly, the perf testing is unaffected by the bug, although the output text is clearly wrong (and that's the vanilla
|
|
Great! Yeah usually it's good to test other models to see if the issue is related to a specific model. I guess this PR is good to merge, merging. |
|
great, thanks again for your help in the review, it was really appreciated! |

This is a follow up of #17072
The API Remoting backend/frontend allow escaping the VM isolation, with the help of the
virt-gpuparavirtualization (and thevirglrendererlibrary on the host side).ggml-remotingfrontendis a GGML API implementation, which intercepts the GGML API calls and forwards them to thevirt-gpuvirtual deviceggml-remotingbackendis library loaded byvirglrenderer(PR will be opened soon for discussion), which opens a GGML library and forwards the call received fromvirglrenderer.Here is the context behind this PR:
See the Virglrenderer PR which enables the API Remoting trampoline required in Virglrenderer:
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1590
this work focused on MacOS, where the in VM/container inference performance are tight to the remoting stack
the code works on Linux. I didn't evaluate thoroughly the performance.
Add support for the APIR capset containers/libkrun#508 --> libkrun VMM patch that allows the routing of the APIR capset to Virglrenderer
Disclaimer: I got helped by Claude Code to finalize this PR. Mostly through pre-submit reviews (no automated C code generation involved). Claude Code did generate the Python code generator (see the
*.gen.hand*,gen.cfiles) used for the backend/frontend RPC (it was generated based on the C/H files I had manually written).