STRUMPACK currently only supports for GPU acceleration in the sparse direct solver. None of the preconditioners or rank structured solvers currently support GPU acceleration.
The sparse direct solver performs most of its computations using BLAS and LAPACK, and in the distributed memory setting also ScaLAPACK. For the BLAS and LAPACK operations, we use CUDA, cuBLAS and cuSOLVER for acceleration using NVIDIA GPUs. As a ScaLAPACK alternative with GPU off-loading capabilities we use SLATE:
https://github.com/icl-utk-edu/slate
See the Installation and Requirements page for instructions on how to build STRUMPACK with support for CUDA, and optionally SLATE.
CUDA support will be enabled by default, if it can be detected. The GPU accelerated code has the same interfaces as the CPU code, meaning the input and output data is always expected to reside in host memory. If STRUMPACK is compiled with CUDA support, GPU acceleration will be enabled by default in the sparse direct solver. GPU acceleration can still be enabled/disabled using the command line arguments:
or in the code:
The number of GPU streams per MPI rank can be set with:
or in the code:
There is also support for AMD GPUs through HIP and the ROCm libraries, rocSOLVER and hipBLAS. Support for HIP can be enabled through the CMake build using:
where the user can specify the specific GPU architecture. If the CMake build system detects CUDA support, then HIP will be disabled.