-
Notifications
You must be signed in to change notification settings - Fork 124
Description
I have an odd problem. I'm running the docker Orpheus FastAPI. The host system has a 16GB RTX4060Ti. It is being detected as a high performance GPU but it doesn't seem to be used. I can see system memory usage go up as the voice model is loaded in and when it received text for conversion I see CPU load increase. Once (and only once) I could see GPU RAM usage increase when the model loads in but there was still no GPU usage, the CPU workload still increased when receiving text. GPU sat near idle. The fastest I can get it to go is 0.84x real time by forcing the CPU to run at 4.6GHz constantly.
I've tried switching out the torch cuda version, 12.4, 12.6 and 12.8 all have the same results. The host operating system is Windows 10.
Does anyone have any idea why this could be happening?
Update:
So trying this on a fresh day, not touching the system at all. I've changed the model in the env file to use the Q8 model. It is now loading onto the GPU, using roughly 5.4GB VRAM. GPU usage while converting text jumps to 13% GPU utilization. I'm getting a whole 0.54x real time now apparently running from the GPU.