HPL_pdrpancrN - Man Page

Crout recursive panel factorization.

Synopsis

#include "hpl.h"
 void HPL_pdrpancrN( HPL_T_panel * PANEL, const int M, const int N, const int ICOFF, double * WORK );

Description

HPL_pdrpancrN HPL_pdrpancrN recursively  factorizes  a panel of columns  using  the recursive  Crout  variant of the usual one-dimensional algorithm. The lower triangular  N0-by-N0  upper block  of  the  panel  is stored in no-transpose form (i.e. just like the input matrix itself).
 Bi-directional  exchange  is  used  to  perform  the  swap::broadcast operations  at once  for one column in the panel.  This  results in a lower number of slightly larger  messages than usual.  On P processes and assuming bi-directional links,  the running time of this function can be approximated by (when N is equal to N0):                      

  N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
  N0^2 * ( M - N0/3 ) * gam2-3
 where M is the local number of rows of  the panel, lat and bdwth  are the latency and bandwidth of the network for  double  precision  real words, and  gam2-3  is  an estimate of the  Level 2 and Level 3  BLAS rate of execution. The  recursive  algorithm  allows indeed to almost achieve  Level 3 BLAS  performance  in the panel factorization.  On a large  number of modern machines,  this  operation is however latency bound,  meaning  that its cost can  be estimated  by only the latency portion N0 * log_2(P) * lat.  Mono-directional links will double this communication cost.

Arguments

PANEL   (local input/output)    HPL_T_panel *

On entry,  PANEL  points to the data structure containing the panel information.

M       (local input)           const int

On entry,  M specifies the local number of rows of sub(A).

N       (local input)           const int

On entry,  N specifies the local number of columns of sub(A).

ICOFF   (global input)          const int

On entry, ICOFF specifies the row and column offset of sub(A) in A.

WORK    (local workspace)       double *

On entry, WORK  is a workarray of size at least 2*(4+2*N0).

See Also

HPL_dlocmax (3), HPL_dlocswpN (3), HPL_dlocswpT (3), HPL_pdmxswp (3), HPL_pdpancrN (3), HPL_pdpancrT (3), HPL_pdpanllN (3), HPL_pdpanllT (3), HPL_pdpanrlN (3), HPL_pdpanrlT (3), HPL_pdrpancrT (3), HPL_pdrpanllN (3), HPL_pdrpanllT (3), HPL_pdrpanrlN (3), HPL_pdrpanrlT (3), HPL_pdfact (3).

Referenced By

HPL_dlocmax(3), HPL_dlocswpN(3), HPL_dlocswpT(3), HPL_pdfact(3), HPL_pdmxswp(3), HPL_pdrpancrT(3), HPL_pdrpanllN(3), HPL_pdrpanllT(3), HPL_pdrpanrlN(3), HPL_pdrpanrlT(3).

February 24, 2016 HPL 2.2 HPL Library Functions