Custom libvirt hook scripts for specific system management
libvirt hooks overview
https://libvirt.org/hooks.html
Executed when a QEMU guest is started, stopped, or migrated.
A custom libvirt hook script for automatic configuration of NVIDIA Fabric Manager Shared NVSwitch GPU Partitions during startup and shutdown of a QEMU guest virtual machine.
The QEMU libvirt hook script for the automatic configuration of NVIDIA Fabric Manager Shared NVSwitch GPU Partitions is only supported on NVIDIA HGX H100 and later systems with Autonomous Link Initialization (ALI) hardware feature. For NVIDIA HGX-2 and NVIDIA HGX A100 systems without ALI hardware feature, once the Shared NVSwitch GPU partition is activated, GPU reset should be skipped during guest VM start. If the GPUs get a PCIe reset as part of guest VM launch, the GPU NVLinks will be in an InActive state on the guest VM. Starting the guest VM without a GPU reset might require a modification in the hypervosr VM launch sequence. For details please refer to the NVIDIA Fabric Manager User Guide
-
The custom libvirt hook script is a Python script.
Install Python on the system. -
The custom libvirt hook script depends on Fabric Manager Partition Manager.
Install its dependencies.-
Install the NVIDIA Fabric Manager Development package.
On Ubuntu, this package is named "nvidia-fabricmanager-dev-<version>"
On RHEL, this package is named "nvidia-fabricmanager-devel-<version>" -
Install the JSON CPP development package.
On Ubuntu, this package is named "libjsoncpp-dev"
On RHEL, this package is named "jsoncpp-devel
The EPEL repository must be set up on your system to access this package. -
Obtain Fabric Manager Partition Manager source and build it.
Deploy the binary fmpm in /usr/bin/
-
-
Install the NVIDIA Fabric Manager.
The package is named "nvidia-fabricmanager-<version>"
The version shall match the version of NVIDIA GPU driver installed on the system. -
Configure Fabric Manager to Shared NVSwitch Mode.
Set FABRIC_MODE=1 in /usr/share/nvidia/nvswitch/fabricmanager.cfg
Restart Fabric Manager service.
sed -i 's/FABRIC_MODE=./FABRIC_MODE=1/g' /usr/share/nvidia/nvswitch/fabricmanager.cfg
sudo systemctl restart nvidia-fabricmanager.service
- Install the NVIDIA Fabric Manager Development package for the Fabric Manager SDK.
On RHEL, this package is named "nvidia-fabricmanager-devel-<version>"
On Ubuntu, this package is named "nvidia-fabricmanager-dev-<version>"
Deploy the libvirt hook script for QEMU at /etc/libvirt/hooks/qemu
sudo wget 'https://raw.githubusercontent.com/NVIDIA/libvirt-hooks/refs/heads/main/qemu' -O /etc/libvirt/hooks/qemu
sudo chmod +x /etc/libvirt/hooks/qemu
sudo systemctl restart libvirtd
If using the host to manage the NVSwitches, the Fabric Manager runs on the host
and configured to listen on the default interface 127.0.0.1. No change is needed
to the libvirt hook script.
If using a dedicated Service VM to manage the NVSwitches with all NVSwitches passed through
to the Service VM, the Fabric Manager runs on the Service VM and is configured
to listen on the Service VM's network interface xxx.yyy.zzz.www.
Update the libvirt hook script with the IP and port number FM is configured to listen on \
SERVICE_VM_IP = "xxx.yyy.zzz.www"
PORT_NUM="xxx"
sed -i "s/\(FM_IP = \)\"[^\"]*\"/\1\"$SERVICE_VM_IP:$PORT_NUM\"/" /etc/libvirt/hooks/qemu
If the Fabric Manager in the Service VM is configured to listen on the default port number,
skip the PORT_NUM update in the libvirt hook script.
sed -i "s/\(FM_IP = \)\"[^\"]*\"/\1\"$SERVICE_VM_IP\"/" /etc/libvirt/hooks/qemu
By downloading or using this software, I agree to the terms of the LICENSE