Solves a system of linear equations A*X=B using 3D process grid. More...

#include "superlu_sdefs.h"
#include "superlu_upacked.h"
#include <stdbool.h>

Include dependency graph for psgssvx3d.c:

Macros
#define	EPSILON 1e-3

Functions
int	swriteLUtoDisk (int nsupers, int_t xsup, sLUstruct_t LUstruct)

static int	scheckArr (float A, float B, int n)

int	scheckLUFromDisk (int nsupers, int_t xsup, sLUstruct_t LUstruct)

void	sDumpLblocks3D (int_t nsupers, gridinfo3d_t grid3d, Glu_persist_t Glu_persist, sLocalLU_t *Llu)
	Dump the factored matrix L using matlab triple-let format. More...

void	psgssvx3d (superlu_dist_options_t options, SuperMatrix A, sScalePermstruct_t ScalePermstruct, float B[], int ldb, int nrhs, gridinfo3d_t grid3d, sLUstruct_t LUstruct, sSOLVEstruct_t SOLVEstruct, float berr, SuperLUStat_t stat, int *info)

Detailed Description

Solves a system of linear equations A*X=B using 3D process grid.

The source code is distributed under BSD license, see the file License.txt at the top-level directory.

   -- Distributed SuperLU routine (version 9.0) --
   Lawrence Berkeley National Lab, Georgia Institute of Technology,
   Oak Ridge National Lab
   May 12, 2021
   October 5, 2021
   Last update: November 8, 2021  v7.2.0
 */


//#include "TRF3dV100/superlu_summit.h"

// #include "psdistribute3d.h"

// #include "sssvx3dAux.c"

// int_t sgatherAllFactoredLU3d( strf3Dpartition_t*  trf3Dpartition,
//             sLUstruct_t* LUstruct, gridinfo3d_t* grid3d, SCT_t* SCT );


/*!


      Purpose
   =======

   PSGSSVX3D solves a system of linear equations A*X=B,
   by using Gaussian elimination with "static pivoting" to
   compute the LU factorization of A.

   Static pivoting is a technique that combines the numerical stability
   of partial pivoting with the scalability of Cholesky (no pivoting),
   to run accurately and efficiently on large numbers of processors.
   See our paper at http://www.nersc.gov/~xiaoye/SuperLU/ for a detailed
   description of the parallel algorithms.

   The input matrices A and B are distributed by block rows.
   Here is a graphical illustration (0-based indexing):

                          A                B
                 0 ---------------       ------
                     |           |        |  |
                     |           |   P0   |  |
                     |           |        |  |
                   ---------------       ------
fst_row->|           |        |  |
          |          |           |        |  |
         m_loc       |           |   P1   |  |
          |          |           |        |  |
         |           |        |  |
                   ---------------       ------
                     |    .      |        |. |
                     |    .      |        |. |
                     |    .      |        |. |
                   ---------------       ------

   where, fst_row is the row number of the first row,
          m_loc is the number of rows local to this processor
   These are defined in the 'SuperMatrix' structure, see supermatrix.h.


   Here are the options for using this code:

     1. Independent of all the other options specified below, the
        user must supply

        -  B, the matrix of right-hand sides, distributed by block rows,
              and its dimensions ldb (local) and nrhs (global)

 grid, a structure describing the 2D processor mesh
options->IterRefine, which determines whether or not to
              improve the accuracy of the computed solution using
              iterative refinement

        On output, B is overwritten with the solution X.

     2. Depending on options->Fact, the user has four options
        for solving A*X=B. The standard option is for factoring
        A "from scratch". (The other options, described below,
        are used when A is sufficiently similar to a previously
        solved problem to save time by reusing part or all of
        the previous factorization.)

        -  options->Fact = DOFACT: A is factored "from scratch"

        In this case the user must also supply

          o  A, the input matrix

          as well as the following options to determine what matrix to
          factorize.

          o  options->Equil,   to specify how to scale the rows and columns
                               of A to "equilibrate" it (to try to reduce its
                               condition number and so improve the
                               accuracy of the computed solution)

          o  options->RowPerm, to specify how to permute the rows of A
                               (typically to control numerical stability)

          o  options->ColPerm, to specify how to permute the columns of A
                               (typically to control fill-in and enhance
                               parallelism during factorization)

          o  options->ReplaceTinyPivot, to specify how to deal with tiny
                               pivots encountered during factorization
                               (to control numerical stability)

        The outputs returned include

          o  ScalePermstruct,  modified to describe how the input matrix A
                               was equilibrated and permuted:
            .  ScalePermstruct->DiagScale, indicates whether the rows and/or
                                           columns of A were scaled
            .  ScalePermstruct->R, array of row scale factors
            .  ScalePermstruct->C, array of column scale factors
            .  ScalePermstruct->perm_r, row permutation vector
            .  ScalePermstruct->perm_c, column permutation vector

            (part of ScalePermstruct may also need to be supplied on input,
             depending on options->RowPerm and options->ColPerm as described
             later).

          o  A, the input matrix A overwritten by the scaled and permuted
                matrix diag(R)*A*diag(C)*Pc^T, where
                Pc is the row permutation matrix determined by
                    ScalePermstruct->perm_c
                diag(R) and diag(C) are diagonal scaling matrices determined
                    by ScalePermstruct->DiagScale, ScalePermstruct->R and
                    ScalePermstruct->C

          o  LUstruct, which contains the L and U factorization of A1 where

                  A1 = Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U

                 (Note that A1 = Pc*Pr*Aout, where Aout is the matrix stored
                  in A on output.)

     3. The second value of options->Fact assumes that a matrix with the same
        sparsity pattern as A has already been factored:

        -  options->Fact = SamePattern: A is factored, assuming that it has
              the same nonzero pattern as a previously factored matrix. In
              this case the algorithm saves time by reusing the previously
              computed column permutation vector stored in
              ScalePermstruct->perm_c and the "elimination tree" of A
              stored in LUstruct->etree

        In this case the user must still specify the following options
        as before:

          o  options->Equil
          o  options->RowPerm
          o  options->ReplaceTinyPivot

        but not options->ColPerm, whose value is ignored. This is because the
        previous column permutation from ScalePermstruct->perm_c is used as
        input. The user must also supply

          o  A, the input matrix
          o  ScalePermstruct->perm_c, the column permutation
          o  LUstruct->etree, the elimination tree

        The outputs returned include

          o  A, the input matrix A overwritten by the scaled and permuted
                matrix as described above
          o  ScalePermstruct, modified to describe how the input matrix A was
                              equilibrated and row permuted
          o  LUstruct, modified to contain the new L and U factors

     4. The third value of options->Fact assumes that a matrix B with the same
        sparsity pattern as A has already been factored, and where the
        row permutation of B can be reused for A. This is useful when A and B
        have similar numerical values, so that the same row permutation
        will make both factorizations numerically stable. This lets us reuse
        all of the previously computed structure of L and U.

        -  options->Fact = SamePattern_SameRowPerm: A is factored,
              assuming not only the same nonzero pattern as the previously
              factored matrix B, but reusing B's row permutation.

        In this case the user must still specify the following options
        as before:

          o  options->Equil
          o  options->ReplaceTinyPivot

        but not options->RowPerm or options->ColPerm, whose values are
        ignored. This is because the permutations from ScalePermstruct->perm_r
        and ScalePermstruct->perm_c are used as input.

        The user must also supply

          o  A, the input matrix
          o  ScalePermstruct->DiagScale, how the previous matrix was row
                                         and/or column scaled
          o  ScalePermstruct->R, the row scalings of the previous matrix,
                                 if any
          o  ScalePermstruct->C, the columns scalings of the previous matrix,
                                 if any
          o  ScalePermstruct->perm_r, the row permutation of the previous
                                      matrix
          o  ScalePermstruct->perm_c, the column permutation of the previous
                                      matrix
          o  all of LUstruct, the previously computed information about
                              L and U (the actual numerical values of L and U
                              stored in LUstruct->Llu are ignored)

        The outputs returned include

          o  A, the input matrix A overwritten by the scaled and permuted
                matrix as described above
          o  ScalePermstruct,  modified to describe how the input matrix A was
                               equilibrated (thus ScalePermstruct->DiagScale,
                               R and C may be modified)
          o  LUstruct, modified to contain the new L and U factors

     5. The fourth and last value of options->Fact assumes that A is
        identical to a matrix that has already been factored on a previous
        call, and reuses its entire LU factorization

        -  options->Fact = Factored: A is identical to a previously
              factorized matrix, so the entire previous factorization
              can be reused.

        In this case all the other options mentioned above are ignored
        (options->Equil, options->RowPerm, options->ColPerm,
         options->ReplaceTinyPivot)

        The user must also supply

          o  A, the unfactored matrix, only in the case that iterative
                refinment is to be done (specifically A must be the output
                A from the previous call, so that it has been scaled and permuted)
          o  all of ScalePermstruct
          o  all of LUstruct, including the actual numerical values of
             L and U

        all of which are unmodified on output.

   Arguments
   =========

   options (input) superlu_dist_options_t* (global)
           The structure defines the input parameters to control
           how the LU decomposition will be performed.
           The following fields should be defined for this structure:

           o Fact (fact_t)
             Specifies whether or not the factored form of the matrix
             A is supplied on entry, and if not, how the matrix A should
             be factorized based on the previous history.

             = DOFACT: The matrix A will be factorized from scratch.
                   Inputs:  A
                            options->Equil, RowPerm, ColPerm, ReplaceTinyPivot
                   Outputs: modified A
                               (possibly row and/or column scaled and/or
                                permuted)
                            all of ScalePermstruct
                            all of LUstruct

             = SamePattern: the matrix A will be factorized assuming
               that a factorization of a matrix with the same sparsity
               pattern was performed prior to this one. Therefore, this
               factorization will reuse column permutation vector
               ScalePermstruct->perm_c and the elimination tree
               LUstruct->etree
                   Inputs:  A
                            options->Equil, RowPerm, ReplaceTinyPivot
                            ScalePermstruct->perm_c
                            LUstruct->etree
                   Outputs: modified A
                               (possibly row and/or column scaled and/or
                                permuted)
                            rest of ScalePermstruct (DiagScale, R, C, perm_r)
                            rest of LUstruct (GLU_persist, Llu)

             = SamePattern_SameRowPerm: the matrix A will be factorized
               assuming that a factorization of a matrix with the same
               sparsity pattern and similar numerical values was performed
               prior to this one. Therefore, this factorization will reuse
               both row and column scaling factors R and C, and the
               both row and column permutation vectors perm_r and perm_c,
               distributed data structure set up from the previous symbolic
               factorization.
                   Inputs:  A
                            options->Equil, ReplaceTinyPivot
                            all of ScalePermstruct
                            all of LUstruct
                   Outputs: modified A
                               (possibly row and/or column scaled and/or
                                permuted)
                            modified LUstruct->Llu
             = FACTORED: the matrix A is already factored.
                   Inputs:  all of ScalePermstruct
                            all of LUstruct

           o Equil (yes_no_t)
             Specifies whether to equilibrate the system.
             = NO:  no equilibration.
             = YES: scaling factors are computed to equilibrate the system:
                        diag(R)*A*diag(C)*inv(diag(C))*X = diag(R)*B.
                    Whether or not the system will be equilibrated depends
                    on the scaling of the matrix A, but if equilibration is
                    used, A is overwritten by diag(R)*A*diag(C) and B by
                    diag(R)*B.

           o RowPerm (rowperm_t)
             Specifies how to permute rows of the matrix A.
             = NATURAL:   use the natural ordering.
             = LargeDiag_MC64: use the Duff/Koster algorithm to permute rows of
                          the original matrix to make the diagonal large
                          relative to the off-diagonal.
             = LargeDiag_HPWM: use the parallel approximate-weight perfect
                          matching to permute rows of the original matrix
                          to make the diagonal large relative to the
                          off-diagonal.
             = MY_PERMR:  use the ordering given in ScalePermstruct->perm_r
                          input by the user.

           o ColPerm (colperm_t)
             Specifies what type of column permutation to use to reduce fill.
             = NATURAL:       natural ordering.
             = MMD_AT_PLUS_A: minimum degree ordering on structure of A'+A.
             = MMD_ATA:       minimum degree ordering on structure of A'*A.
             = MY_PERMC:      the ordering given in ScalePermstruct->perm_c.

           o ReplaceTinyPivot (yes_no_t)
             = NO:  do not modify pivots
             = YES: replace tiny pivots by sqrt(epsilon)*norm(A) during
                    LU factorization.

           o IterRefine (IterRefine_t)
             Specifies how to perform iterative refinement.
             = NO:     no iterative refinement.
             = SLU_DOUBLE: accumulate residual in double precision.
             = SLU_EXTRA:  accumulate residual in extra precision.

           NOTE: all options must be indentical on all processes when
                 calling this routine.

   A (input) SuperMatrix* (local); A resides on all 3D processes.
           On entry, matrix A in A*X=B, of dimension (A->nrow, A->ncol).
             The number of linear equations is A->nrow. The type of A must be:
             Stype = SLU_NR_loc; Dtype = SLU_S; Mtype = SLU_GE.
             That is, A is stored in distributed compressed row format.
             See supermatrix.h for the definition of 'SuperMatrix'.
             This routine only handles square A, however, the LU factorization
             routine PSGSTRF can factorize rectangular matrices.

       Internally, A is gathered on 2D processs grid-0, call it A2d.
           On exit, A2d may be overwtirren by diag(R)*A*diag(C)*Pc^T,
             depending on ScalePermstruct->DiagScale and options->ColPerm:
               if ScalePermstruct->DiagScale != NOEQUIL, A2d is overwritten by
                  diag(R)*A*diag(C).
               if options->ColPerm != NATURAL, A2d is further overwritten by
                  diag(R)*A*diag(C)*Pc^T.
             If all the above condition are true, the LU decomposition is
             performed on the matrix Pc*Pr*diag(R)*A*diag(C)*Pc^T.

   ScalePermstruct (input/output) sScalePermstruct_t* (global)
           The data structure to store the scaling and permutation vectors
           describing the transformations performed to the matrix A.
           It contains the following fields:

           o DiagScale (DiagScale_t)
             Specifies the form of equilibration that was done.
             = NOEQUIL: no equilibration.
             = ROW:     row equilibration, i.e., A was premultiplied by
                        diag(R).
             = COL:     Column equilibration, i.e., A was postmultiplied
                        by diag(C).
             = BOTH:    both row and column equilibration, i.e., A was
                        replaced by diag(R)*A*diag(C).
             If options->Fact = FACTORED or SamePattern_SameRowPerm,
             DiagScale is an input argument; otherwise it is an output
             argument.

           o perm_r (int*)
             Row permutation vector, which defines the permutation matrix Pr;
             perm_r[i] = j means row i of A is in position j in Pr*A.
             If options->RowPerm = MY_PERMR, or
             options->Fact = SamePattern_SameRowPerm, perm_r is an
             input argument; otherwise it is an output argument.

           o perm_c (int*)
             Column permutation vector, which defines the
             permutation matrix Pc; perm_c[i] = j means column i of A is
             in position j in A*Pc.
             If options->ColPerm = MY_PERMC or options->Fact = SamePattern
             or options->Fact = SamePattern_SameRowPerm, perm_c is an
             input argument; otherwise, it is an output argument.
             On exit, perm_c may be overwritten by the product of the input
             perm_c and a permutation that postorders the elimination tree
             of Pc*A'*A*Pc'; perm_c is not changed if the elimination tree
             is already in postorder.

           o R (float *) dimension (A->nrow)
             The row scale factors for A.
             If DiagScale = ROW or BOTH, A is multiplied on the left by
                            diag(R).
             If DiagScale = NOEQUIL or COL, R is not defined.
             If options->Fact = FACTORED or SamePattern_SameRowPerm, R is
             an input argument; otherwise, R is an output argument.

           o C (float *) dimension (A->ncol)
             The column scale factors for A.
             If DiagScale = COL or BOTH, A is multiplied on the right by
                            diag(C).
             If DiagScale = NOEQUIL or ROW, C is not defined.
             If options->Fact = FACTORED or SamePattern_SameRowPerm, C is
             an input argument; otherwise, C is an output argument.

   B       (input/output) float* (local)
           On entry, the right-hand side matrix of dimension (m_loc, nrhs),
             where, m_loc is the number of rows stored locally on my
             process and is defined in the data structure of matrix A.
           On exit, the solution matrix if info = 0;

   ldb     (input) int (local)
           The leading dimension of matrix B.

   nrhs    (input) int (global)
           The number of right-hand sides.
           If nrhs = 0, only LU decomposition is performed, the forward
           and back substitutions are skipped.

   grid    (input) gridinfo_t* (global)
           The 2D process mesh. It contains the MPI communicator, the number
           of process rows (NPROW), the number of process columns (NPCOL),
           and my process rank. It is an input argument to all the
           parallel routines.
           Grid can be initialized by subroutine SUPERLU_GRIDINIT.
           See superlu_sdefs.h for the definition of 'gridinfo_t'.

   LUstruct (input/output) sLUstruct_t*
           The data structures to store the distributed L and U factors.
           It contains the following fields:

           o etree (int*) dimension (A->ncol) (global)
             Elimination tree of Pc*(A'+A)*Pc' or Pc*A'*A*Pc'.
             It is computed in sp_colorder() during the first factorization,
             and is reused in the subsequent factorizations of the matrices
             with the same nonzero pattern.
             On exit of sp_colorder(), the columns of A are permuted so that
             the etree is in a certain postorder. This postorder is reflected
             in ScalePermstruct->perm_c.
             NOTE:
             Etree is a vector of parent pointers for a forest whose vertices
             are the integers 0 to A->ncol-1; etree[root]==A->ncol.

           o Glu_persist (Glu_persist_t*) (global)
             Global data structure (xsup, supno) replicated on all processes,
             describing the supernode partition in the factored matrices
             L and U:
           xsup[s] is the leading column of the s-th supernode,
               supno[i] is the supernode number to which column i belongs.

           o Llu (sLocalLU_t*) (local)
             The distributed data structures to store L and U factors.
             See superlu_ddefs.h for the definition of 'dLocalLU_t'.

   SOLVEstruct (input/output) sSOLVEstruct_t*
           The data structure to hold the communication pattern used
           in the phases of triangular solution and iterative refinement.
           This pattern should be intialized only once for repeated solutions.
           If options->SolveInitialized = YES, it is an input argument.
           If options->SolveInitialized = NO and nrhs != 0, it is an output
           argument. See superlu_sdefs.h for the definition of 'sSOLVEstruct_t'.

   berr    (output) float*, dimension (nrhs) (global)
           The componentwise relative backward error of each solution
           vector X(j) (i.e., the smallest relative change in
           any element of A or B that makes X(j) an exact solution).

   stat   (output) SuperLUStat_t*
          Record the statistics on runtime and floating-point operation count.
          See util_dist.h for the definition of 'SuperLUStat_t'.

   info    (output) int*
           = 0: successful exit
           < 0: if info = -i, the i-th argument had an illegal value
           > 0: if info = i, and i is
               <= A->ncol: U(i,i) is exactly zero. The factorization has
                  been completed, but the factor U is exactly singular,
                  so the solution could not be computed.
               > A->ncol: number of bytes allocated when memory allocation
                  failure occurred, plus A->ncol.

   See superlu_sdefs.h for the definitions of varioous data types.

Macro Definition Documentation

◆ EPSILON

#define EPSILON 1e-3

Function Documentation

◆ psgssvx3d()

void psgssvx3d	(	superlu_dist_options_t *	options,
		SuperMatrix *	A,
		sScalePermstruct_t *	ScalePermstruct,
		float	B[],
		int	ldb,
		int	nrhs,
		gridinfo3d_t *	grid3d,
		sLUstruct_t *	LUstruct,
		sSOLVEstruct_t *	SOLVEstruct,
		float *	berr,
		SuperLUStat_t *	stat,
		int *	info
	)

Here is the caller graph for this function:

◆ scheckArr()

static int scheckArr	(	float *	A,
		float *	B,
		int	n
	)

static

Here is the caller graph for this function:

◆ scheckLUFromDisk()

int scheckLUFromDisk	(	int	nsupers,
		int_t *	xsup,
		sLUstruct_t *	LUstruct
	)

Here is the call graph for this function:

Here is the caller graph for this function:

◆ sDumpLblocks3D()

void sDumpLblocks3D	(	int_t	nsupers,
		gridinfo3d_t *	grid3d,
		Glu_persist_t *	Glu_persist,
		sLocalLU_t *	Llu
	)

Dump the factored matrix L using matlab triple-let format.

◆ swriteLUtoDisk()

int swriteLUtoDisk	(	int	nsupers,
		int_t *	xsup,
		sLUstruct_t *	LUstruct
	)

Here is the caller graph for this function:

Macros

Functions

Detailed Description

Macro Definition Documentation

◆ EPSILON

Function Documentation

◆ psgssvx3d()

◆ scheckArr()

◆ scheckLUFromDisk()

◆ sDumpLblocks3D()

◆ swriteLUtoDisk()