Skip to content

LeonardEyer/cuMPS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cuMPS: CUDA Matrix Product State Simulator

Introduction

This project aims to in essence imitate the functionality of the TenPy library, but using CUDA to accelerate the simulation of Matrix Product States (MPS).


A quantum state of $L$ qubits can be represented in a compressed form, nameley a MPS representation which is a special case of a tensor network.

The main computational task involves decomposing a tensor of shape $(D_1, d, d, D_2)$ into two separate tensors with shapes $(D_1, d, \chi)$ and $(\chi, d, D_1)$ where $\chi$ is some controllable bond dimension. For this, typically a SVD is employed.

Implementation

Since exact SVD decompositions do not parallelize well on GPUs (ref) I opted for the existing one-sided Jacobi based implementation cusolverDn<t>gesvdj from cuSolver.

The implementation focused mainly on avoiding copying data to the host when possible and preallocating all the required memory / workspaces up front to aid getting the data ready for the decomposition. This involved using scractch tensors and custom out-of-place kernels to allow directly writing to the destination tensor memory. The truncation and normalization procedure was fused into a single kernel making use of cooperative groups to synchronize across blocks.

Requirements

A C++ compiler with C++20 support is required to build the project. The project uses the CMake build system. Additionally the project expects the CUDA Toolkit (>=12.4) to be installed on the system. The other dependencies will be automatically fetched by CPM when configuring the project.

Build

To build the project run CMake:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target mps

The simulator can be compiled to run in double or single precision. Check out precision.h which also gives you control over parameters for bond dimension truncation and the accuracy of the Jacobi based SVD solver.

Usage

To run a demonstration simulation involving repeated layers of controlled $R_x$ gates to build entanglement:

./build/mps

Python implementation

As part of evaluating the correctness I wrote a primitive condensed version of the TenPy's MPS simulation. It aided me in figuring out the correct way to reshape/transpose/contract tensors. For reference it can be found in here

Future work

  • QR based decompositions as highlighted here
  • Batched deompositions
  • Non-unitary evolution (requires recanonicalization)
  • Hybrid decomposition mehtods MAGMA