-
Notifications
You must be signed in to change notification settings - Fork 69
Open
Description
I am using Ubuntu 22.04 and testing the performance of an NVIDIA A100 GPU with the nvbandwidth tool. I observed that as the buffer size increases, the reported throughput decreases:
Test case: host_to_device_memcpy_sm
- 512 MiB: 25.13 GB/s
- 1 GiB: 25.13 GB/s
- 10 GiB: 18.05 GB/s
- 20 GiB: 16.50 GB/s
Below is the output:
$ ./nvbandwidth -t host_to_device_memcpy_sm -b 512
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 560.35.05
Device 0: NVIDIA A100-SXM4-40GB (00000000:99:00)
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 25.13
SUM host_to_device_memcpy_sm 25.13
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.$ ./nvbandwidth -t host_to_device_memcpy_sm -b 1024
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 560.35.05
Device 0: NVIDIA A100-SXM4-40GB (00000000:99:00)
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 25.13
SUM host_to_device_memcpy_sm 25.13
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.
$ ./nvbandwidth -t host_to_device_memcpy_sm -b 10240
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 560.35.05
Device 0: NVIDIA A100-SXM4-40GB (00000000:99:00)
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 18.05
SUM host_to_device_memcpy_sm 18.05
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.
$ ./nvbandwidth -t host_to_device_memcpy_sm -b 20480
nvbandwidth Version: v0.8
Built from Git version: v0.8
CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 560.35.05
Device 0: NVIDIA A100-SXM4-40GB (00000000:99:00)
Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 16.50
SUM host_to_device_memcpy_sm 16.50
NOTE: The reported results may not reflect the full capabilities of the platform.
Performance can vary with software drivers, hardware clocks, and system topology.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels