void HPL_pdlaswp01T( HPL_T_panel * PBCST, int * IFLAG, HPL_T_panel * PANEL, const int NN );
HPL_pdlaswp01T applies the NB row interchanges to NN columns of the trailing submatrix and broadcast a column panel.
A "Spread then roll" algorithm performs the swap :: broadcast of the row panel U at once, resulting in a minimal communication volume and a "very good" use of the connectivity if available. With P process rows and assuming bi-directional links, the running time of this function can be approximated by:
(log_2(P)+(P-1)) * lat + K * NB * LocQ(N) / bdwth
where NB is the number of rows of the row panel U, N is the global number of columns being updated, lat and bdwth are the latency and bandwidth of the network for double precision real words. K is a constant in (2,3] that depends on the achieved bandwidth during a simultaneous message exchange between two processes. An empirical optimistic value of K is typically 2.4.
- PBCST (local input/output) HPL_T_panel *
On entry, PBCST points to the data structure containing the panel (to be broadcast) information.
- IFLAG (local input/output) int *
On entry, IFLAG indicates whether or not the broadcast has already been completed. If not, probing will occur, and the outcome will be contained in IFLAG on exit.
- PANEL (local input/output) HPL_T_panel *
On entry, PANEL points to the data structure containing the panel information.
- NN (local input) const int
On entry, NN specifies the local number of columns of the trailing submatrix to be swapped and broadcast starting at the current position. NN must be at least zero.
HPL_pdgesv (3), HPL_pdgesvK2 (3), HPL_pdupdateNT (3), HPL_pdupdateTT (3), HPL_pipid (3), HPL_plindx1 (3), HPL_plindx10 (3), HPL_spreadT (3), HPL_equil (3), HPL_rollT (3), HPL_dlaswp10N (3), HPL_dlaswp01T (3), HPL_dlaswp06T (3).
HPL_equil(3), HPL_logsort(3), HPL_pdupdateNT(3), HPL_pdupdateTT(3), HPL_perm(3), HPL_pipid(3), HPL_plindx0(3), HPL_plindx1(3), HPL_plindx10(3), HPL_rollT(3), HPL_spreadT(3).