-
-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
enhancementNew feature or requestNew feature or request
Description
#APN currently sends the correct VRAM and clock frequency to primenet, but uses the CPU's core+thread count and cache sizes. This doesn't really affect anything but perhaps having the same value for each model (i.e. deriving nothing from the CPU's values) would be more consistent.
- The "core" count can be the streaming multiprocessor/CU/whatever count of the GPU. On OpenCL this is CL_DEVICE_MAX_COMPUTE_UNITS. My RTX 2060 has 30.
- There are several threads in a GPU multiprocessor (like my RTX 2020/Turing has 4 dispatchers per SM), but that isn't that important to most people and isn't very easy to get from any API (instead you might need to have a database of models). Probably just leave it at 1.
- For cache you can get a global mem cache amount from opencl and an analogous value for L2 (both being more like a CPU’s L3) from nvidia-smi. What happens in each CU is harder to know, again assuming you don't want to maintain a big database of models — the amount of local memory per CU matters more in comparison. Some small placeholder number might be good enough there.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request