This project aims to in essence imitate the functionality of the TenPy library, but using CUDA to accelerate the simulation of Matrix Product States (MPS).
A quantum state of
The main computational task involves decomposing a tensor of shape
Since exact SVD decompositions do not parallelize well on GPUs (ref) I opted for the existing one-sided Jacobi based implementation cusolverDn<t>gesvdj from cuSolver.
The implementation focused mainly on avoiding copying data to the host when possible and preallocating all the required memory / workspaces up front to aid getting the data ready for the decomposition. This involved using scractch tensors and custom out-of-place kernels to allow directly writing to the destination tensor memory. The truncation and normalization procedure was fused into a single kernel making use of cooperative groups to synchronize across blocks.
A C++ compiler with C++20 support is required to build the project. The project uses the CMake build system. Additionally the project expects the CUDA Toolkit (>=12.4) to be installed on the system. The other dependencies will be automatically fetched by CPM when configuring the project.
To build the project run CMake:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target mpsThe simulator can be compiled to run in double or single precision. Check out precision.h which also gives you control over parameters for bond dimension truncation and the accuracy of the Jacobi based SVD solver.
To run a demonstration simulation involving repeated layers of controlled
./build/mpsAs part of evaluating the correctness I wrote a primitive condensed version of the TenPy's MPS simulation. It aided me in figuring out the correct way to reshape/transpose/contract tensors. For reference it can be found in here