void HPL_pdfact( HPL_T_panel * PANEL );
HPL_pdfact recursively factorizes a 1-dimensional panel of columns. The RPFACT function pointer specifies the recursive algorithm to be used, either Crout, Left- or Right looking. NBMIN allows to vary the recursive stopping criterium in terms of the number of columns in the panel, and NDIV allow to specify the number of subpanels each panel should be divided into. Usuallly a value of 2 will be chosen. Finally PFACT is a function pointer specifying the non-recursive algorithm to to be used on at most NBMIN columns. One can also choose here between Crout, Left- or Right looking. Empirical tests seem to indicate that values of 4 or 8 for NBMIN give the best results.
Bi-directional exchange is used to perform the swap::broadcast operations at once for one column in the panel. This results in a lower number of slightly larger messages than usual. On P processes and assuming bi-directional links, the running time of this function can be approximated by (when N is equal to N0):
N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
N0^2 * ( M - N0/3 ) * gam2-3
where M is the local number of rows of the panel, lat and bdwth are the latency and bandwidth of the network for double precision real words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS rate of execution. The recursive algorithm allows indeed to almost achieve Level 3 BLAS performance in the panel factorization. On a large number of modern machines, this operation is however latency bound, meaning that its cost can be estimated by only the latency portion N0 * log_2(P) * lat. Mono-directional links will double this communication cost.
- PANEL (local input/output) HPL_T_panel *
On entry, PANEL points to the data structure containing the panel information.
HPL_dlocmax (3), HPL_dlocswpN (3), HPL_dlocswpT (3), HPL_pdmxswp (3), HPL_pdpancrN (3), HPL_pdpancrT (3), HPL_pdpanllN (3), HPL_pdpanllT (3), HPL_pdpanrlN (3), HPL_pdpanrlT (3), HPL_pdrpancrN (3), HPL_pdrpancrT (3), HPL_pdrpanllN (3), HPL_pdrpanllT (3), HPL_pdrpanrlN (3), HPL_pdrpanrlT (3).
HPL_dlocmax(3), HPL_dlocswpN(3), HPL_dlocswpT(3), HPL_pdgesv0(3), HPL_pdgesvK1(3), HPL_pdgesvK2(3), HPL_pdmxswp(3), HPL_pdrpancrN(3), HPL_pdrpancrT(3), HPL_pdrpanllN(3), HPL_pdrpanllT(3), HPL_pdrpanrlN(3), HPL_pdrpanrlT(3).