SuperLU Distributed 8.2.1
Distributed memory sparse direct solver
get_perm_c_parmetis.c File Reference

Gets matrix permutation. More...

#include <limits.h>
#include <math.h>
#include "superlu_dist_config.h"
#include "parmetis.h"
#include "superlu_ddefs.h"
Include dependency graph for get_perm_c_parmetis.c:

Functions

static float a_plus_at_CompRow_loc (int, int_t *, int, int_t *, int_t, int_t *, int_t *, int, int_t *, int_t *, int_t **, int_t **, gridinfo_t *)
 
float get_perm_c_parmetis (SuperMatrix *A, int_t *perm_r, int_t *perm_c, int nprocs_i, int noDomains, int_t **sizes, int_t **fstVtxSep, gridinfo_t *grid, MPI_Comm *metis_comm)
 

Detailed Description

Gets matrix permutation.

Copyright (c) 2003, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from U.S. Dept. of Energy)

All rights reserved.

The source code is distributed under BSD license, see the file License.txt at the top-level directory.

-- Distributed symbolic factorization auxialiary routine  (version 2.1) --
Lawrence Berkeley National Lab, Univ. of California Berkeley - July 2003
INRIA France - January 2004
Laura Grigori

November 1, 2007

Function Documentation

◆ a_plus_at_CompRow_loc()

static float a_plus_at_CompRow_loc ( int  iam,
int_t perm_r,
int  nprocs_i,
int_t vtxdist_i,
int_t  n,
int_t rowptr,
int_t colind,
int  nprocs_o,
int_t vtxdist_o,
int_t p_bnz,
int_t **  p_b_rowptr,
int_t **  p_b_colind,
gridinfo_t grid 
)
static
Purpose
=======

Form the structure of Pr*A +A'Pr'. A is an n-by-n matrix in
NRformat_loc format, represented by (rowptr, colind). The output
B=Pr*A +A'Pr' is in NRformat_loc format (symmetrically, also row
oriented), represented by (b_rowptr, b_colind).

The input matrix A is distributed in block row format on nprocs_i
processors.  The output matrix B is distributed in block row format
on nprocs_o processors, where nprocs_o <= nprocs_i.  On output, the
matrix B has its rows permuted according to perm_r.

Sketch of the algorithm
=======================

Let iam by my process number.  Let fst_row, lst_row = m_loc +
fst_row be the first/last row stored on iam.

Compute Pr' - the inverse row permutation, stored in iperm_r.

Compute the transpose  of the block row of Pr*A that iam owns:
   T[:,Pr(fst_row:lst_row)] = Pr' * A[:,fst_row:lst_row] * Pr'


All to all communication such that every processor iam receives all
the blocks of the transpose matrix that it needs, that is
          T[fst_row:lst_row, :]

Compute B = A[fst_row:lst_row, :] + T[fst_row:lst_row, :]

If Pr != I or nprocs_i != nprocs_o then permute the rows of B (that
is compute Pr*B) and redistribute from nprocs_i to nprocs_o
according to the block row distribution in vtxdist_i, vtxdist_o.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ get_perm_c_parmetis()

float get_perm_c_parmetis ( SuperMatrix A,
int_t perm_r,
int_t perm_c,
int  nprocs_i,
int  noDomains,
int_t **  sizes,
int_t **  fstVtxSep,
gridinfo_t grid,
MPI_Comm *  metis_comm 
)
Purpose
=======

GET_PERM_C_PARMETIS obtains a permutation matrix Pc, by applying a
graph partitioning algorithm to the symmetrized graph A+A'.  The
multilevel graph partitioning algorithm used is the
ParMETIS_V3_NodeND routine available in the parallel graph
partitioning package parMETIS.  

The number of independent sub-domains noDomains computed by this
algorithm has to be a power of 2.  Hence noDomains is the larger
number power of 2 that is smaller than nprocs_i, where nprocs_i = nprow
* npcol is the number of processors used in SuperLU_DIST.

Arguments
=========

A       (input) SuperMatrix*
        Matrix A in A*X=B, of dimension (A->nrow, A->ncol). The number
        of the linear equations is A->nrow.  Matrix A is distributed
        in NRformat_loc format.

perm_r  (input) int_t*
        Row permutation vector of size A->nrow, which defines the 
        permutation matrix Pr; perm_r[i] = j means row i of A is in 
        position j in Pr*A.

perm_c  (output) int_t*
    Column permutation vector of size A->ncol, which defines the 
        permutation matrix Pc; perm_c[i] = j means column i of A is 
        in position j in A*Pc.

nprocs_i (input) int*
        Number of processors the input matrix is distributed on in a block
        row format.  It corresponds to number of processors used in
        SuperLU_DIST.

noDomains (input) int*, must be power of 2
        Number of independent domains to be computed by the graph
        partitioning algorithm.  ( noDomains <= nprocs_i )

sizes   (output) int_t**, of size 2 * noDomains
        Returns pointer to an array containing the number of nodes
        for each sub-domain and each separator.  Separators are stored 
        from left to right.
        Memory for the array is allocated in this routine.

fstVtxSep (output) int_t**, of size 2 * noDomains
        Returns pointer to an array containing first node for each
        sub-domain and each separator.
        Memory for the array is allocated in this routine.

Return value
============
  < 0, number of bytes allocated on return from the symbolic factorization.
  > 0, number of bytes allocated when out of memory.
Here is the call graph for this function: