RTX4060Ti being detected but not being used (Docker)

I have an odd problem. I'm running the docker Orpheus FastAPI. The host system has a 16GB RTX4060Ti. It is being detected as a high performance GPU but it doesn't seem to be used. I can see system memory usage go up as the voice model is loaded in and when it received text for conversion I see CPU load increase. Once (and only once) I could see GPU RAM usage increase when the model loads in but there was still no GPU usage, the CPU workload still increased when receiving text. GPU sat near idle. The fastest I can get it to go is 0.84x real time by forcing the CPU to run at 4.6GHz constantly.

I've tried switching out the torch cuda version, 12.4, 12.6 and 12.8 all have the same results. The host operating system is Windows 10.

Does anyone have any idea why this could be happening?

Update:
So trying this on a fresh day, not touching the system at all. I've changed the model in the env file to use the Q8 model. It is now loading onto the GPU, using roughly 5.4GB VRAM. GPU usage while converting text jumps to 13% GPU utilization. I'm getting a whole 0.54x real time now apparently running from the GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RTX4060Ti being detected but not being used (Docker) #88

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RTX4060Ti being detected but not being used (Docker) #88

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions