Examples/kubernetes dev with model downloading functionality#7
Examples/kubernetes dev with model downloading functionality#7shenron0101 wants to merge 2 commits intophymbert:example/kubernetesfrom
Conversation
phymbert
left a comment
There was a problem hiding this comment.
Thanks for the effort, this is a good start. We need to bring it to the original repo. Let's merge it then we can discuss there
|
|
||
| livenessProbe: | ||
| httpGet: | ||
| path: / |
There was a problem hiding this comment.
You mean you want to remove this?
| name: modelRunner | ||
| description: A Helm chart for Kubernetes | ||
|
|
||
| # A chart can be either an 'application' or a 'library' chart. |
|
|
||
| --- | ||
|
|
||
| {{- end}} No newline at end of file |
There was a problem hiding this comment.
Mind that each file must end with an empty line
| - -c | ||
| - | | ||
| set -e | ||
| if curl -L {{ $modelConfig.url }} --output /models/{{ $modelName }}/{{ $modelName }}.gguf; then |
There was a problem hiding this comment.
It will not support sharded model files. Better to let llama.cpp server handles the initial download
There was a problem hiding this comment.
Ok but then we wont be able to have a job running it. This will prevent us from updating it using kubectl apply. Also i dont believe llama server supports autodownload? I know Ollama does. When llamacpp server container tries to start it needs a model file to point to or else it errors out.
There was a problem hiding this comment.
No I developed that feature some time ago, see the doc.
|
Maybe it would be easier if I push the base branch to the original repo ? |
|
Yes, Ideally we merge here first and once finalized we can push |
|
@phymbert @OmegAshEnr01n Awesome work you've done here, small question, when this chart is deployed are the models's api compatible with Open IA api, like the way together ai works, where i just change the OPENAI_API_KEY and OPENAI_BASE_URL (https://api.together.xyz/v1) |
|
Hi @ceddybi, Please check the server API docs from llama.cpp.
|
|
Is it necessary to limit to MiG here? llama.cpp supports pre-ampere GPUs, so it would be nice to use more standard multi-GPU container techniques. |
Hi,
I have built the helm chart according to the template you had provided earlier. I think this can still be improved in some ways. Any comments are welcome.
Feature set for the Helm chart
Pending testing