|
SuperLU Distributed 9.0.0
gpu3d
|
Gets matrix permutation. More...

Functions | |
| static float | a_plus_at_CompRow_loc (int, int_t *, int, int_t *, int_t, int_t *, int_t *, int, int_t *, int_t *, int_t **, int_t **, gridinfo_t *) |
| float | get_perm_c_parmetis (SuperMatrix *A, int_t *perm_r, int_t *perm_c, int nprocs_i, int noDomains, int_t **sizes, int_t **fstVtxSep, gridinfo_t *grid, MPI_Comm *metis_comm) |
Gets matrix permutation.
Copyright (c) 2003, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from U.S. Dept. of Energy)
All rights reserved.
The source code is distributed under BSD license, see the file License.txt at the top-level directory.
-- Distributed symbolic factorization auxialiary routine (version 2.1) -- Lawrence Berkeley National Lab, Univ. of California Berkeley - July 2003 INRIA France - January 2004 Laura Grigori November 1, 2007
|
static |
Purpose ======= Form the structure of Pr*A +A'Pr'. A is an n-by-n matrix in NRformat_loc format, represented by (rowptr, colind). The output B=Pr*A +A'Pr' is in NRformat_loc format (symmetrically, also row oriented), represented by (b_rowptr, b_colind). The input matrix A is distributed in block row format on nprocs_i processors. The output matrix B is distributed in block row format on nprocs_o processors, where nprocs_o <= nprocs_i. On output, the matrix B has its rows permuted according to perm_r. Sketch of the algorithm ======================= Let iam by my process number. Let fst_row, lst_row = m_loc + fst_row be the first/last row stored on iam. Compute Pr' - the inverse row permutation, stored in iperm_r. Compute the transpose of the block row of Pr*A that iam owns: T[:,Pr(fst_row:lst_row)] = Pr' * A[:,fst_row:lst_row] * Pr' All to all communication such that every processor iam receives all the blocks of the transpose matrix that it needs, that is T[fst_row:lst_row, :] Compute B = A[fst_row:lst_row, :] + T[fst_row:lst_row, :] If Pr != I or nprocs_i != nprocs_o then permute the rows of B (that is compute Pr*B) and redistribute from nprocs_i to nprocs_o according to the block row distribution in vtxdist_i, vtxdist_o.


| float get_perm_c_parmetis | ( | SuperMatrix * | A, |
| int_t * | perm_r, | ||
| int_t * | perm_c, | ||
| int | nprocs_i, | ||
| int | noDomains, | ||
| int_t ** | sizes, | ||
| int_t ** | fstVtxSep, | ||
| gridinfo_t * | grid, | ||
| MPI_Comm * | metis_comm | ||
| ) |
Purpose
=======
GET_PERM_C_PARMETIS obtains a permutation matrix Pc, by applying a
graph partitioning algorithm to the symmetrized graph A+A'. The
multilevel graph partitioning algorithm used is the
ParMETIS_V3_NodeND routine available in the parallel graph
partitioning package parMETIS.
The number of independent sub-domains noDomains computed by this
algorithm has to be a power of 2. Hence noDomains is the larger
number power of 2 that is smaller than nprocs_i, where nprocs_i = nprow
* npcol is the number of processors used in SuperLU_DIST.
Arguments
=========
A (input) SuperMatrix*
Matrix A in A*X=B, of dimension (A->nrow, A->ncol). The number
of the linear equations is A->nrow. Matrix A is distributed
in NRformat_loc format.
perm_r (input) int_t*
Row permutation vector of size A->nrow, which defines the
permutation matrix Pr; perm_r[i] = j means row i of A is in
position j in Pr*A.
perm_c (output) int_t*
Column permutation vector of size A->ncol, which defines the
permutation matrix Pc; perm_c[i] = j means column i of A is
in position j in A*Pc.
nprocs_i (input) int*
Number of processors the input matrix is distributed on in a block
row format. It corresponds to number of processors used in
SuperLU_DIST.
noDomains (input) int*, must be power of 2
Number of independent domains to be computed by the graph
partitioning algorithm. ( noDomains <= nprocs_i )
sizes (output) int_t**, of size 2 * noDomains
Returns pointer to an array containing the number of nodes
for each sub-domain and each separator. Separators are stored
from left to right.
Memory for the array is allocated in this routine.
fstVtxSep (output) int_t**, of size 2 * noDomains
Returns pointer to an array containing first node for each
sub-domain and each separator.
Memory for the array is allocated in this routine.
Return value
============
< 0, number of bytes allocated on return from the symbolic factorization.
> 0, number of bytes allocated when out of memory.

