HPC Samples

Prerequisites

Follow this post for the installation instructions.

Clone the repository

git clone https://github.com/j3soon/hpc-samples.git
cd hpc-samples

Docker Environment

We use the nvidia/nvhpc NGC image as the base image. See the documentation for more details.

cd src

docker build -f Dockerfile_cuda13.0 -t j3soon/hpc-samples:nvhpc-25.9-devel-cuda13.0-ubuntu24.04 .
docker build -f Dockerfile_cuda12.9 -t j3soon/hpc-samples:nvhpc-25.7-devel-cuda12.9-ubuntu24.04 .
docker build -f Dockerfile_cuda12.4 -t j3soon/hpc-samples:nvhpc-24.5-devel-cuda12.4-ubuntu22.04 .

docker run --rm -it --gpus all -v $PWD:/app j3soon/hpc-samples:nvhpc-25.9-devel-cuda13.0-ubuntu24.04
docker run --rm -it --gpus all -v $PWD:/app j3soon/hpc-samples:nvhpc-25.7-devel-cuda12.9-ubuntu24.04
docker run --rm -it --gpus all -v $PWD:/app j3soon/hpc-samples:nvhpc-24.5-devel-cuda12.4-ubuntu22.04

Examples

Built-in Examples

To compile, run, and clean the built-in examples at /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples, you can use the following commands:

# C++ Standard Parallelism
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/stdpar/stdblas
make all
# OpenACC Examples
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/OpenACC/samples
make all
# OpenMP Examples
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/OpenMP
make all
# CUDA-Libraries Examples
# - cuBLAS
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/CUDA-Libraries/cuBLAS
make all
# - cuFFT
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/CUDA-Libraries/cuFFT
make all
# - cuRAND
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/CUDA-Libraries/cuRAND
make all
# - cuSPARSE
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/CUDA-Libraries/cuSPARSE
make all
# - thrust
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/CUDA-Libraries/thrust
make all
# MPI Examples
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/MPI
export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
make all

# CUDA-Fortran Examples
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/CUDA-Fortran/CUDA-Fortran-Book
make all
# AutoPar Examples
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/AutoPar
make all
# F2003 Examples
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/F2003
make all
# NVLAmath Examples
cd /opt/nvidia/hpc_sdk/Linux_x86_64/25.9/examples/NVLAmath
make all

CUDA Samples

NVIDIA/cuda-samples has been pre-built and included in the docker image at /workspace/cuda-samples. For example, to run the deviceQuery example, you can run the following command:

/workspace/cuda-samples/build/Samples/1_Utilities/deviceQuery/deviceQuery

or the p2pBandwidthLatencyTest example to test the GPU-to-GPU communication:

/workspace/cuda-samples/build/Samples/5_Domain_Specific/p2pBandwidthLatencyTest/p2pBandwidthLatencyTest

See the full list of examples here.

If you are using a custom docker image, follow the official instructions:
git clone https://github.com/NVIDIA/cuda-samples
cd cuda-samples
git checkout v13.0  # Replace with the CUDA version matching your image
mkdir build && cd build
cmake ..
make -j$(nproc)
You might also need to set CUDA_PATH and LIBRARY_PATH according to your environment if the build fails.

NCCL Tests

NVIDIA/nccl-tests has been pre-built and included in the docker image at /workspace/nccl-tests. For example, to run the all_reduce_perf test, you can run the following command:

cd /workspace/nccl-tests
# single node 8 GPUs
./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8
# two node 16 GPUs
mpirun -np 16 -N 2 ./build/all_reduce_perf_mpi -b 8 -e 8G -f 2 -g 1

or with Slurm:

# Enroot+Pyxis
srun -N 2 --ntasks-per-node=8 --mpi=pmix \
  --container-image=j3soon/hpc-samples:nvhpc-24.5-devel-cuda12.4-ubuntu22.04 \
  /usr/local/bin/hpcx-entrypoint.sh \
  /workspace/nccl-tests/build/all_reduce_perf_mpi -b 8 -e 8G -f 2 -g 1
# Apptainer/Singularity (To be confirmed)
singularity pull docker://j3soon/hpc-samples:nvhpc-24.5-devel-cuda12.4-ubuntu22.04
singularity build --sandbox hpc-samples-cuda12/ hpc-samples_nvhpc-24.5-devel-cuda12.4-ubuntu22.04.sif
srun -N 2 --ntasks-per-node 8 --mpi=pmix --gres=gpu:8 \
  singularity exec --nv hpc-samples-cuda12/ \
  /usr/local/bin/hpcx-entrypoint.sh \
  /workspace/nccl-tests/build/all_reduce_perf_mpi -b 8 -e 8G -f 2 -g 1

or with debug flags:

cd /workspace/nccl-tests
NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=ALL ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8

If you are using a custom docker image, follow the official instructions:
git clone https://github.com/NVIDIA/nccl-tests
cd nccl-tests
git checkout v2.17.6  # Replace with the NCCL version matching your image
make -j$(nproc)
make -j$(nproc) MPI=1 NAME_SUFFIX=_mpi
You might also need to set CUDA_HOME, NCCL_HOME, and MPI_HOME according to your environment if the build fails.

NVBandwidth

NVIDIA/nvbandwidth has been pre-built and included in the docker image at /workspace/nvbandwidth. For example, to run the nvbandwidth tool, you can run the following command:

cd /workspace/nvbandwidth
./nvbandwidth

or verbose mode:

./nvbandwidth -v

or single test case:

./nvbandwidth -t device_to_device_memcpy_read_ce

or the multi-node version:

cd /workspace/nvbandwidth_mpi
export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
mpirun -n 4 ./nvbandwidth -p multinode

If you are using a custom docker image, follow the official instructions:

git clone https://github.com/NVIDIA/nvbandwidth
cd nvbandwidth
git checkout v0.8  # Replace with the NVBandwidth version matching your image
cp -r . ../nvbandwidth_mpi
apt-get update && apt-get install -y libboost-program-options-dev
cmake .
make -j$(nproc)
cd ../nvbandwidth_mpi
cmake -DMULTINODE=1 .
make -j$(nproc)

Tools

NVIDIA-SMI

Use the nvidia-smi tool to query GPU status.

Check local GPU topology status:

nvidia-smi topo -p2p n

Topology connections and affinities matrix between the GPUs and NICs in the system:

nvidia-smi topo -m

Compute Sanitizer

Use compute-sanitizer to detect CUDA errors.

compute-sanitizer ./a.out

Nsight Guided Profiling

See nsight-guided-profiling.md for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
src		src
README.md		README.md
nsight-guided-profiling.md		nsight-guided-profiling.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HPC Samples

Prerequisites

Clone the repository

Docker Environment

Examples

Built-in Examples

CUDA Samples

NCCL Tests

NVBandwidth

CUDA Library Samples

Compute Sanitizer Samples

Multi GPU Programming Models

Tools

NVIDIA-SMI

Compute Sanitizer

Nsight Guided Profiling

About

Uh oh!

Releases 1

Packages

Languages

j3soon/hpc-samples

Folders and files

Latest commit

History

Repository files navigation

HPC Samples

Prerequisites

Clone the repository

Docker Environment

Examples

Built-in Examples

CUDA Samples

NCCL Tests

NVBandwidth

CUDA Library Samples

Compute Sanitizer Samples

Multi GPU Programming Models

Tools

NVIDIA-SMI

Compute Sanitizer

Nsight Guided Profiling

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages