Prerequisites
Bug Description
The metadata collector makes use of nvmlDeviceGetPlatformInfo which is only available from R560 onwards, so some clusters that are using R550 or older aren't able to run the metadata collector.
nvmlDeviceGetPlatformInfo is only required to get chassis number which is applicable for Blackwell and newer, can we skip this API call based on driver version or some other property?
Component
Core Service
Steps to Reproduce
- Install any version of NVSentinel on a cluster running R550 driver
- Look at logs for metadata collector
Environment
- NVSentinel version: any
- Kubernetes version: any
- Deployment method: helm
Logs/Output
symbol lookup error: undefined symbol: nvmlDeviceGetPlatformInfo