ButterflyPACK provides a collection of fast algebraic tools for compressing and manipulating dense matrices and tensors arising from integral equations for PDEs, kernel matrices in machine learning, and various integral transforms. ButterflyPACK represents the operators as low-rank or butterfly-compressed hierarchical matrices (both square and non-square ) including H, H-BF, HODLR, HODBF, HSSBF and SHNBF formats, assuming only a user-supplied function to either evaluate any entry of the matrix, or to apply the matrix and its adjoint to any input matrix (matvec). Depending on the compression tolerance and application need, the hierarchical matrices can be used as fast matvec in iterative solvers, fast preconditioners, or fast direct solvers. ButterflyPACK also provides a TFQMR iterative solver. The supported formats are briefly described:
For square matrix operators, the supported APIs are BPACK_construction_Init (initialization), BPACK_construction_Element (element evaluation-based construction), BPACK_construction_Matvec (sketching-based construction), BPACK_Factorization (matrix inversion), BPACK_Mult (apply the matrix) and BPACK_Solution (solve with inverted or preconditioned matrices).
The H matrix assumes that the matrix is first split into four smaller blocks. When any block representing interactions that are either too close or non-compressible, the block is further split into four small blocks. The recursive splitting leads to a partioned matrix of log(N) levels. Each compressible block is represented as a low-rank product. ButterflyPACK currently supports parallel construction and block LU factorization of H matrices, assuming any block is handled by one MPI. The relevant options are:
The H-BF matrix follows the same partitioning as the H matrix, the only difference is that each compressible block is represented as a butterfly instead of a low-rank product. Compared with the H matrix, H-BF is more suitable for highly-oscillatory applications. Again, ButterflyPACK assumes that each block is handled by one MPI. The options are:
The B-LR (block low-rank) matrix is a single-level format where the matrix is partitioned into MxM blocks where each block representing well-seperated interactions are low-rank compressed. ButterflyPACK supports parallel construction and block LU factorization of B-LR matrices, assuming any block is handled by one MPI. The relevant options are:
The B-BF (block butterfly) matrix is a single-level format where the matrix is partitioned into MxM blocks where each block representing well-seperated interactions are butterfly compressed. ButterflyPACK supports parallel construction and block LU factorization of B-BF matrices, assuming any block is handled by one MPI. The relevant options are:
The HODLR (hierarchically off-diagonal low-rank) matrix assumes that the matrix is first split into four smaller blocks. Any block representing self interactions (i.e., the diagonal block) is further split into four small blocks. The recursive splitting leads to a partitioned matrix of log(N) levels. Each offdiagonal block is represented as a low-rank product. Compared with the strong-admissible H matrix, the weak-admissible HODLR is more aggressive and parallelizable, leading to much smaller prefactors. But HODLR typically exhibits much larger ranks and worse scalings for high-dimensional problems. ButterflyPACK currently supports parallel construction and inversion of HODLR matrices. Unlike the H matrix, here each block is handled by multiple MPIs. The relevant options are:
The HODBF (hierarchically off-diagonal butterfly) matrix follows the same matrix partitioning as HODLR. But each offdiagonal block is represented as a butterfly instead of a low-rank product. HODBF gives much better performance than HODLR for highly-oscillatory and/or high-dimensional problems, and has much smaller prefactors and higher parallel efficiency compared to H or H-BF. As such, HODBF is the most well-developed algorithm in ButterflyPACK. ButterflyPACK currently supports parallel construction and inversion of HODBF matrices. Unlike the H matrix, here each block (butterfly) is handled by multiple MPIs. The relevant options are:
The HSSBF (hierarchically semi-separable butterfly) format is an extension of HODBF. Unlike HODBF that leads to an O(logN) level partitioning, HSSBF uses an O(loglogN) level partitioning. For low-dimensional problems, this leads to much smaller prefactors compared to HODBF. The relevant options are:
The SHNBF (strongly admissible and hierarchically nested butterfly) format is a strong-admissible extension of HSSBF. SHNBF uses an O(loglogN) level partitioning. For high-dimensional problems, this leads to much smaller prefactors and low asymptotic complexities. The relevant options are:
For non-square matrix operators, the supported APIs are BF_Construct_Init/BP_Construct_Init (initialization), BF_Construct_Element_Compute/BP_Construct_Element_Compute (element evaluation-based construction), BF_Construct_Element_Matvec (sketching-based construction), and BF_Mult/BP_Mult (apply the matrix).
The H matrix assumes that the matrix is first split into four smaller blocks. When any block representing interactions that are either too close or non-compressible, the block is further split into four small blocks. The recursive splitting leads to a partioned matrix of log(N) levels. Each compressible block is represented as a low-rank product or butterfly. ButterflyPACK currently supports parallel construction of non-square H matrices, assuming any block is handled by one MPI. The relevant options are:
The single LR or butterfly doesn't split the matrix due to admissibility conditions, but rather performs a single compression. The relevant options are:
The SHNBF (hierarchically semi-separable butterfly) format for non-square matrix is similar to that of square matrix. The relevant options are:
ButterlfyPACK also supports compressing a tensor into a tensor-butterfly or tucker-like decomposition. The supported APIs are BF_MD_Construct_Init (initialization), BF_MD_Construct_Element_Compute (element evaluation-based construction), and BF_MD_block_mvp (apply the tensor).
The relevant options are: