Example Usage

StrumpackSparseSolver Example

The following shows the typical way to use a (sequential or multithreaded) STRUMPACK sparse solver:

Figure 1: Illustration of a small 5 × 5 sparse matrix with 11 nonzeros and its Compressed Sparse Row (CSR) or Yale format representation. We always use 0-based indexing! Let \(N\) = 5 denote the number of rows. The row_ptr array has \(N\) +1 elements, with element \(i\) denoting the start of row \(i\) in the col_ind and values arrays. Element row_ptr[N] = nnz, i.e., the total number of nonzero elements in the matrix. The values array holds the actual matrix values, ordered by row. The corresponding elements in col_ind give the column indices for each nonzero. There can be explicit zero elements in the matrix. The nonzero values and corresponding column indices need not be sorted by column (within a row).

#include "StrumpackSparseSolver.hpp"
using namespace strumpack; // all strumpack code is in the strumpack namespace,
int main(int argc, char* argv[]) {
int N = ...; // construct an NxN CSR matrix with nnz nonzeros
int* row_ptr = ...; // N+1 integers
int* col_ind = ...; // nnz integers
double* val = ...; // nnz scalars
double* x = new double[N]; // will hold the solution vector
double* b = ...; // set a right-hand side b
StrumpackSparseSolver<double> sp; // create solver object
sp.options().set_rel_tol(1e-10); // set options
sp.options().set_gmres_restart(10); // ...
sp.options().set_compression(CompressionType::HSS); // enable HSS compression, see HSS Preconditioning
sp.options().set_from_command_line(argc, argv); // parse command line options
sp.set_csr_matrix(N, row_ptr, col_ind, val); // set the matrix (copy)
sp.reorder(); // reorder matrix
sp.factor(); // numerical factorization
sp.solve(b, x); // solve Ax=b
... // check residual/error and cleanup
}
Definition: StrumpackOptions.hpp:42

The main steps are: create solver object, set options (parse options from the command line), set matrix, reorder, factor and finally solve. The matrix should be in the Compressed Sparse Row (CSR) format, also called Yale format, with 0 based indices. Figure 1 illustrates the CSR format. In the basic scenario, it is not necessary to explicitly call reorder and factor, since trying to solve with a StrumpackSparseSolver object that is not factored yet, will internally call the factor routine, which will call reorder if necessary.

The above code should be linked with -lstrumpack and with the Metis, ParMetis, Scotch, PT-Scotch, BLAS, LAPACK, and ScaLAPACK libraries.

StrumpackSparseSolverMPIDist Example

Finally, we illustrate the usage of StrumpackSparseSolverMPIDist<scalar,integer=int> solver. This interface takes a block-row distributed compressed sparse row matrix as input. This matrix format is illustrated in Figure 2 (below).

#include "StrumpackSparseSolverMPI.hpp"
using namespace strumpack;
int main(int argc, char* argv[ ]) {
int thread_level, rank, P;
MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &thread_level);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &P);
{
// define a block-row distributed CSR matrix
int* dist = new int[P];
// set dist such that processor p owns rows [dist[p], dist[p+1]) of the sparse matrix
for (int p=0; p<P; p++) dist[p] = .. ;
// local_n is the number of rows of the input matrix assigned to me
int local_n = dist[rank+1] - dist[rank];
int* row_ptr = new int[local_n+1];
.. // set the sparse matrix row pointers in row_ptr
int local_nnz = row_ptr[local_n+1] - row_ptr[0];
int* col_ind = new int[local_nnz];
.. // set the sparse matrix column indices in col_ind
double* val = new double[local_nnz];
.. // set the matrix nonzero value in val
double* x = new double[local_n]; // local part of solution
double* b = new double[local_n]; // local part of rhs
for (int i=0; i<local_n; i++) b[i] = ..; // set the rhs
sp.options().set_reordering_method(ReorderingStrategy::PARMETIS);
sp.options().set_from_command_line(argc, argv);
sp.set_distributed_csr_matrix(local_n, row_ptr, col_ind, val, dist);
sp.solve(b, x);
... // check residual/error, cleanup
}
Cblacs_exit(1);
MPI_Finalize();
}
This is the fully distributed solver.
Definition: StrumpackSparseSolverMPIDist.hpp:72

Figure 2: Illustration of a small 5×5 sparse matrix with 11 nonzeros and its block-row distributed compressed sparse row representation. We always use 0-based indexing! Process P0 owns row 0, process P1 has rows 1 and 2 and process P2 has rows 3 and 4. This distribution of rows over the processes is represented by the dist array. Process p owns rows [dist[p],dist[p+1]). If N = 5 is the number of rows in the entire matrix and P is the total number of processes, then dist[P]=N. The (same) dist array is stored on every process. Each process holds a CSR representation of only its local rows of the matrix, see Figure 1.

Initializing the Solver Object

Let

typedef strumpack::StrumpackSparseSolverMPI<scalar,integer> SpMPI;
SparseSolver is the main sequential or multithreaded sparse solver class.
Definition: StrumpackSparseSolver.hpp:75

Each of the solver classes has two constructors:

strumpack::StrumpackSparseSolver(bool verbose=true, bool root=true);
strumpack::StrumpackSparseSolver(int argc, char* argv[], bool verbose=true, bool root=true);
strumpack::StrumpackSparseSolverMPIDist(MPI_Comm comm, bool verbose=true);
strumpack::StrumpackSparseSolverMPIDist(MPI_Comm comm, int argc, char* argv[], bool verbose=true);
strumpack::StrumpackSparseSolverMPIDist(MPI_Comm comm, bool verbose=true);
strumpack::StrumpackSparseSolverMPIDist(MPI_Comm comm, int argc, char* argv[], bool verbose=true);

where argc and argv contain the command line options and the verbose option can be set to false to suppress output of the solver. Note that since SpMPIDist is a subclass of SpMPI, which is a subclass of Sp, all public members of Sp are also members of SpMPI and SpMPIDist. The public interface to the SpMPI class is exactly the same as that for the Sp class.

Sparse Matrix Format

The sparse matrix should be specified in compressed sparse row format [8]:

void strumpack::StrumpackSparseSolver::set_csr_matrix(int N, int* row_ptr, int* col_ind, scalar* values, bool symmetric_pattern=false);
void set_csr_matrix(integer_t N, const integer_t *row_ptr, const integer_t *col_ind, const scalar_t *values, bool symmetric_pattern=false)

Internally, the matrix is copied, so it will not be modified. Previous versions of STRUMPACK also supported the CSC format, but this is now deprecated. If the sparsity pattern of the matrix is symmetric (the values do not have to be symmetric), then you can set symmetric_pattern=true. This saves some work in the setup phase of the solver.

For the SpMPIDist solver the input is a block-row distributed compressed sparse row matrix (as illustrated in the example above):

void strumpack::StrumpackSparseSolverMPIDist::set_distributed_csr_matrix (integer local_rows, integer* row_ptr, integer* col_ind, scalar* values, integer* dist, bool symmetric_pattern=false);
void set_distributed_csr_matrix(integer_t local_rows, const integer_t *row_ptr, const integer_t *col_ind, const scalar_t *values, const integer_t *dist, bool symmetric_pattern=false)

Alternatively, you can also specify a sequential CSR matrix to the SpMPIDist solver:

void strumpack::StrumpackSparseSolverMPIDist::set_csr_matrix (integer N, integer* row_ptr, integer* col_ind, scalar* values, bool symmetric_pattern=false);

For this routine, the matrix only needs to be specified completely on the root process. Other processes can pass NULL for the arrays.

Factorization

Compute the factorization by calling

ReturnCode strumpack::StrumpackSparseSolver::factor();
ReturnCode
Enumeration for the possible return codes.
Definition: StrumpackParameters.hpp:60

where the possible return values are the same as for Sp::reorder(). If Sp::reorder() was not called already, it is called automatically. When compression is not enabled, this will compute an exact LU factorization of the (permuted) sparse input matrix. If HSS/HODLR/BLR compression is enabled (for instance with SPOptions::set_compression() or --sp_compression BLR, see Preconditioning Preconditioning), the factorization is only approximate.

Solve

Solve the linear system \(Ax = b\) by calling

ReturnCode strumpack::StrumpackSparseSolver::solve(scalar* b, scalar* x, bool use_initial_guess=false);

By default (bool use_initial_guess=false) the input in x is ignored. If bool use_initial_guess=true, x is used as initial guess for the iterative solver (if an iterative solver is used, for instance iterative refinement or GMRES). If the Sp::factor() was not called, it is called automatically. The return values are the same as for Sp::reorder().

The iterative solver can be chosen through:

void set_Krylov_solver(KrylovSolver s)
Definition: StrumpackOptions.hpp:289
KrylovSolver
Definition: StrumpackOptions.hpp:159

where KrylovSolver can take the following values:

with KrylovSolver::AUTO being the default value. The KrylovSolver::AUTO setting will use iterative refinement when HSS compression is not enabled, and preconditioned GMRES when HSS compression is enabled, see HSS Preconditioning. To use the solver as a preconditioner, or a single (approximate) solve, set the solver to KrylovSolver::DIRECT. When calling SpMPIDist::solve, the right-hand side and solution vectors should only point to the local parts!

Reordering

The STRUMPACK sparse solver applies three different matrix orderings

  • For numerical stability
  • To reduce fill-in
  • To reduce the numerical rank of certain blocks when preconditioning These reorderings are all performed when calling
    ReturnCode strumpack::StrumpackSparseSolver::reorder( );

Reordering for Numerical Stability

The reordering for numerical stability is performed using MC64 or Combinatorial BLAS. For many matrices, this reordering is not necessary and can safely be disabled! MC64 supports 5 different modes and there is one option to select the Combinatorial BLAS code:

which can be selected via

void set_matching(MatchingJob job)
Definition: StrumpackOptions.hpp:518
MatchingJob matching() const
Definition: StrumpackOptions.hpp:878

where matching() queries the currently selected strategy (the default is MAX_DIAGONAL_PRODUCT_SCALING maximum product of diagonal values plus row and column scaling). The command line option

--sp_matching [0-6]

can also be used, where the integers are defined as:

  • 0: no reordering for stability, this disables MC64/matching
  • 1: MC64(1): currently not supported
  • 2: MC64(2): maximize the smallest diagonal value
  • 3: MC64(3): maximize the smallest diagonal value, different strategy
  • 4: MC64(4): maximize sum of diagonal values
  • 5: MC64(5): maximize product of diagonal values and apply row and column scaling
  • 6: Combinatorial BLAS: approximate weight perfect matching

The MC64 code is sequential, so when using this option in parallel, the graph is first gathered to the root process. The Combinatorial BLAS code can currently only be used in parallel, and only with a square number of processes.

Nested Dissection Recording

The STRUMPACK sparse solver supports both (Par)Metis and (PT-)Scotch for the matrix reordering. The following functions can set the preferred method or check the currently selected method:

void set_reordering_method(ReorderingStrategy m)
Definition: StrumpackOptions.hpp:317
ReorderingStrategy reordering_method() const
Definition: StrumpackOptions.hpp:802
ReorderingStrategy
Definition: StrumpackOptions.hpp:59

The options for MatrixReorderingStrategy are

When the solver is an object of Sp, PARMETIS or PTSCOTCH are not supported. When the solver is parallel, either an SpMPI or SpMPIDist object, and METIS, SCOTCH or RCM are chosen, then the graph of the complete matrix will be gathered onto the root process and the root process will call the (sequential) Metis, Scotch or RCM reordering routine. For large graphs this might fail due to insufficient memory.

The GEOMETRIC option is only allowed for regular grids. In this case, the dimensions of the grid should be specified in the function

ReturnCode strumpack::StrumpackSparseSolver::reorder(int
nx=1, int ny=1, int nz=1);

For instance for a regular 2d 2000 \(×\) 4000 grid, you can call this as sp.reorder(2000, 4000). In the general algebraic case, the grid dimensions don’t have to be provided. The reordering method can also be specified via the command line option

--sp_reordering_method [metis|parmetis|scotch|ptscotch|geometric|rcm]

Setting and Parsing Options

The sparse solver options are stored in an object of class SPOptions, which can be accessed through:

Options for the sparse solver.
Definition: StrumpackOptions.hpp:210

see also StrumpackOptions.hpp for several enumerations.

To get a list of all available options, make sure to pass “int argc, char* argv[]” when initializing the StrumpackSparseSolver or when calling SPOptions::set_from_command_line and run the application with –help or -h. Some default values listed here are for double precision and might be different when running in single precision.

STRUMPACK sparse solver options:

# STRUMPACK options:
# --sp_maxit int (default 5000)
# maximum Krylov iterations
# --sp_rel_tol real_t (default 1e-06)
# Krylov relative (preconditioned) residual stopping tolerance
# --sp_abs_tol real_t (default 1e-10)
# Krylov absolute (preconditioned) residual stopping tolerance
# --sp_Krylov_solver [auto|direct|refinement|pgmres|gmres|pbicgstab|bicgstab]
# default: auto (refinement when no HSS, pgmres (preconditioned) with HSS compression)
# --sp_gmres_restart int (default 30)
# gmres restart length
# --sp_GramSchmidt_type [modified|classical]
# Gram-Schmidt type for GMRES
# --sp_reordering_method [natural|metis|scotch|parmetis|ptscotch|rcm|geometric]
# Code for nested dissection.
# Geometric only works on regular meshes and you need to provide the sizes.
# --sp_nd_param int (default 8)
# --sp_nx int (default 1)
# --sp_ny int (default 1)
# --sp_nz int (default 1)
# --sp_components int (default 1)
# --sp_separator_width int (default 1)
# --sp_enable_METIS_NodeNDP (default false)
# use undocumented Metis routine NodeNDP instead of NodeND
# --sp_disable_METIS_NodeNDP (default true)
# use Metis routine NodeND instead of the undocumented NodeNDP
# --sp_enable_METIS_NodeND (default true)
# use Metis routine NodeND instead of the undocumented NodeNDP
# --sp_disable_METIS_NodeND (default false)
# use undocumented Metis routine NodeNDP instead of NodeND
# --sp_enable_MUMPS_SYMQAMD (default false)
# --sp_disable_MUMPS_SYMQAMD (default true)
# --sp_enable_agg_amalg (default false)
# --sp_disable_agg_amalg (default true)
# --sp_matching int [0-6] (default 0)
# 0 none
# 1 maximum cardinality ! Doesn't work
# 2 maximum smallest diagonal value, version 1
# 3 maximum smallest diagonal value, version 2
# 4 maximum sum of diagonal values
# 5 maximum matching with row and column scaling
# 6 approximate weigthed perfect matching, from CombBLAS
# --sp_compression [none|hss|blr|hodlr]
# type of rank-structured compression to use
# --sp_compression_min_sep_size (default 2147483647)
# minimum separator size for compression
# --sp_compression_leaf_size (default 2147483647)
# leaf size for rank-structured representation
# --sp_separator_ordering_level (default 1)
# --sp_enable_indirect_sampling
# --sp_disable_indirect_sampling
# --sp_enable_replace_tiny_pivots
# --sp_disable_replace_tiny_pivots
# --sp_write_root_front
# --sp_print_root_front_stats
# --sp_enable_gpu
# --sp_disable_gpu
# --sp_cuda_cutoff (default 500)
# CUDA kernel/CUBLAS cutoff size
# --sp_cuda_streams (default 10)
# number of CUDA streams
# --sp_verbose or -v (default true)
# --sp_quiet or -q (default false)
# --help or -h