HSS Approximation of Dense Matrices

We recommend to use the strumpack::structured::StructuredMatrix interface instead of directly using the HSS classes. See Dense Solvers .

Figure 3: Illustration of a Hierarchically Semi-Separable (HSS) matrix. Gray blocks are dense matrices. Off- diagonal blocks, on different levels of the HSS hierarchy, are low-rank. The low-rank factors of off-diagonal blocks of different levels are related.

The HSS include files are installed in the include/HSS/ subdirectory, or in src/HSS/. All HSS code is in the namespace strumpack::HSS. The class for a sequential/multithreaded HSS matrix is strumpack::HSS::HSSMatrix, while the distributed memory HSS class is strumpack::HSS::HSSMatrixMPI. For examples of the usage of these classes, see the test code in test/test_HSS_seq.cpp and test/test_HSS_mpi.cpp respectively. There is also one sequential example in examples/dense/KernelRegression.cpp, which uses HSS compression for kernel matrices as used in certain machine learning applications, see below, and see for instance [2].

We use a simple wrapper class strumpack::DenseMatrix, as a wrapper around a column-major matrix. See the documentation for that class for more info.

HSS Matrix Construction

There are currently two ways to construct an HSS matrix:

  • By giving an explicit dense matrix as input. This requires that the user builds the entire matrix. Currently, our HSS compression code uses randomized sampling, which repeatedly multiplies the given dense input matrix with a randomly generated matrix. This leads to an overall O(N^2r) complexity, where r is the maximum HSS rank. This complexity, combined with the O(N^2) memory requirements mean that this way of constructing an HSS matrix can be prohibitively expensive for large matrices.
  • By specifying two routines:
    • A matrix times (multiple)vector product routine: Specifying a fast matrix times (multiple)vector multiplication routine will greatly speed-up the HSS construction phase (compared to performing the random sampling with an explicit dense matrix).
    • An element extraction routine: The randomized HSS construction algorithm still needs to have access to certain selected elements from the original matrix. The user needs to provide a routine to evaluate the submatrix A(I,J) for a row index set I and a column index set J.

The interfaces to construct HSS matrices in these two different ways are detailed below.

Unfortunately, many applications do not fit in the two cases listed above, i.e., you cannot build the dense matrix first (too expensive), or you do not have a fast matrix vector product or fast acces to individual elements. We have experimental code to construct an HODLR matrix representation for those cases, see HODLR .

The HSS Cluster Tree

The clustering of the HSS matrix is defined by a strumpack::structured::ClusterTree. The strumpack::structured::ClusterTree uses a recursive representation, a child in the tree is also a strumpack::structured::ClusterTree. A node in this tree should always have 0 or 2 children, and the size of the tree should always be the sum of the sizes of its children. To build a whole tree use the simple constructor to specify the size n of the corresponding HSS matrix, then refine it.

To refine uniformly to a given leaf size is reached use:

const ClusterTree & refine(int leaf_size)
Definition: ClusterTree.hpp:104

or refine it manualy by adding nodes to the child vector strumpack::structured::ClusterTree::c:

// construct a basic, empty, 1 level tree
// uniformly refine the tree to a given leaf size
t.refine(leaf_size)
// OR manually define the tree
// each node should have either 0 or 2 childeren
t.c.emplace_back(N/2);
t.c.emplace_back(N-N/2);
// the result is a tree with only 2 levels, which can be refined further
The cluster tree, or partition tree that represents the partitioning of the rows or columns of a hier...
Definition: ClusterTree.hpp:67

Alternatively, if you have access underlying geometry that is used to construct the matrix, you can use one of the clustering algorithms, see binary_tree_clustering().

Sequential/Threaded HSS Matrix Construction

See the class strumpack::HSS::HSSMatrix, which is a subclass of the abstract class strumpack::HSS::HSSMatrixBase.

HSS matrix construction from an explicit dense matrix can be done as follows:

// ... fill the dense matrix A
// Create an HSS options object and set some options.
opts.set_leaf_size(512);
opts.set_rel_tol(1e-2);
// allow the command line arguments to overwrite any options
opts.set_from_command_line(argc, argv);
// Construct the HSS matrix from the dense matrix, using the options specified in opts.
// This will use a uniform partitioning of the matrix, using a leaf size from opts.leaf_size(),
// and it will immediately start the HSS compression.
// OR construct an HSS matrix with a given cluster/partition tree.
// and compress it
H2.compress(A, opts);
This class represents a matrix, stored in column major format, to allow direct use of BLAS/LAPACK rou...
Definition: DenseMatrix.hpp:138
Class to represent a sequential/threaded Hierarchically Semi-Separable matrix.
Definition: HSSMatrix.hpp:80
Class containing several options for the HSS code and data-structures.
Definition: HSSOptions.hpp:152
void set_from_command_line(int argc, const char *const *cargv) override

Alternatively, to construct an HSS without first building the whole dense matrix, you need to define a matrix-vector multiplication routine and an element extraction routine. The matrix-vector product should have t the following signature:

std::function<void(
strumpack::DenseMatrix<scalar_t>& Rr, // input, set by compression code
strumpack::DenseMatrix<scalar_t>& Rc, // input, set by compression code
strumpack::DenseMatrix<scalar_t>& Sr, // output, compute as A * Rr
strumpack::DenseMatrix<scalar_t>& Sc // output, compute as A^c * Rc or A^T * Rc
)>;

where Rr and Rc are random matrices (set by the HSS compression code), and the routine should fill in Sr and Sc. This can be a functor, or a lambda function for instance. The random sample matrix Sr should be computed by the matrix-(multiple)vector multiplication routine as A*Rr. The Sr matrix will aready be allocated by the compression routine and should satisfy Sr.rows() == A.rows() and Sr.cols() == Rr.cols(). The random sample matrix Sc should be computed by the matrix-(multiple)vector multiplication routine as A^T*Rc, or A^C*Rc. This will aready be allocated by the compression routine and should be Sc.rows() == A.cols() and Sc.cols() == Rc.cols().

And the element extraction routine should have the signature:

std::function<void(
const std::vector<std::size_t>& I, // row index set
const std::vector<std::size_t>& J, // column index set
strumpack::DenseMatrix<scalar_t>& B // output B == A(I,J)
)>;

where the user is responsible for computing the elements A(I,J) and putting them into the matrix B. B will already be allocated and is of size I.size() x J.size().

The HSS construction would look as follows:

auto Amult = [&](DenseM_t& Rr, DenseM_t& Rc, DenseM_t& Sr, DenseM_t& Sc) {
// TODO compute Sr = A * Rr
// TODO compute Sc = A^c * Rc
};
auto Aelem = [&](const std::vector<std::size_t>& I,
const std::vector<std::size_t>& J,
DenseM_t& B) {
for (std::size_t j=0; j<J.size(); j++)
for (std::size_t i=0; i<I.size(); i++)
B(i, j) = A(I[i], J[j]); // get/compute elements of A
};
opts.set_from_command_line(argc, argv);
t.refine(opt.leaf_size());
H.compress(Amult, Aelem, opts);
void compress(const DenseM_t &A, const opts_t &opts)

Kernel Matrix Approximation

We have an optimized HSS construction algorithm for the so called kernel matrices, which arise in several applications, such as kernel ridge regression in machine learning. One can use the strumpack::HSS::HSSMatrix constructor:

std::vector<int>& perm,
Representation of a kernel matrix.
Definition: Kernel.hpp:73

However, for kernel ridge regression, the strumpack::kernel::Kernel class provides some easy to use driver routines, see

strumpack::kernel::Kernel::fit_HSS(std::vector<scalar_t>& labels,
std::vector<scalar_t>
std::vector< scalar_t > predict(const DenseM_t &test, const DenseM_t &weights) const
DenseM_t fit_HSS(std::vector< scalar_t > &labels, const HSS::HSSOptions< scalar_t > &opts)

There is also a Python interface to these Kernel regression routines, compatibile with scikit-learn, see install/python/STRUMPACKKernel.py and examples/dense/KernelRegression.py.

Limitations

Currently the class strumpack::HSS::HSSMatrixMPI cannot be used for on a communicator with a single MPI process. In that case, one should use the sequential strumpack::HSS::HSSMatrix class.