Package hpl-doc

HPL documentation

http://www.netlib.org/benchmark/hpl/

Library Functions (Section 3)
HPL_abort
HPL_abort displays an error message on stderr and halts execution.
HPL_all_reduce
HPL_all_reduce performs a global reduce operation across all processes of a group leaving the results on all processes.
HPL_barrier
HPL_barrier blocks the caller until all process members have call it. The call returns at any process only after all group members have entered the call.
HPL_bcast
HPL_bcast broadcasts the current panel. Successful completion is indicated by IFLAG set to HPL_SUCCESS on return. IFLAG will be set to HPL_FAILURE on failure...
HPL_binit
HPL_binit initializes a row broadcast. Successful completion is indicated by the returned error code HPL_SUCCESS.
HPL_broadcast
HPL_broadcast broadcasts a message from the process with rank ROOT to all processes in the group.
HPL_bwait
HPL_bwait waits for the row broadcast of the current panel to terminate. Successful completion is indicated by the returned error code HPL_SUCCESS.
HPL_copyL
HPL_copyL copies the panel of columns, the L1 replicated submatrix, the pivot array and the info scalar into a contiguous workspace for later broadcast. The...
HPL_daxpy
HPL_daxpy scales the vector x by alpha and adds it to y.
HPL_dcopy
HPL_dcopy copies the vector x into the vector y.
HPL_dgemm
HPL_dgemm performs one of the matrix-matrix operations C := alpha * op( A ) * op( B ) + beta * C where op( X ) is one of op( X ) = X or op( X ) = X^T. Alpha and...
HPL_dgemv
HPL_dgemv performs one of the matrix-vector operations y := alpha * op( A ) * x + beta * y, where op( X ) is one of op( X ) = X or op( X ) = X^T. where alpha...
HPL_dger
HPL_dger performs the rank 1 operation A := alpha * x * y^T + A, where alpha is a scalar, x is an m-element vector, y is an n-element vector and A is an m by n...
HPL_dlacpy
HPL_dlacpy copies an array A into an array B.
HPL_dlamch
HPL_dlamch determines machine-specific arithmetic constants such as the relative machine precision (eps), the safe minimum (sfmin) such that 1 / sfmin does not...
HPL_dlange
HPL_dlange returns the value of the one norm, or the infinity norm, or the element of largest absolute value of a matrix A: max(abs(A(i,j))) when NORM =...
HPL_dlaprnt
HPL_dlaprnt prints to standard error an M-by-N matrix A.
HPL_dlaswp00N
HPL_dlaswp00N performs a series of local row interchanges on a matrix A. One row interchange is initiated for rows 0 through M-1 of A.
HPL_dlaswp01N
HPL_dlaswp01N copies scattered rows of A into itself and into an array U. The row offsets in A of the source rows are specified by LINDXA. The destination of...
HPL_dlaswp01T
HPL_dlaswp01T copies scattered rows of A into itself and into an array U. The row offsets in A of the source rows are specified by LINDXA. The destination of...
HPL_dlaswp02N
HPL_dlaswp02N packs scattered rows of an array A into workspace W. The row offsets in A are specified by LINDXA.
HPL_dlaswp03N
HPL_dlaswp03N copies columns of W into rows of an array U. The destination in U of these columns contained in W is stored within W0.
HPL_dlaswp03T
HPL_dlaswp03T copies columns of W into an array U. The destination in U of these columns contained in W is stored within W0.
HPL_dlaswp04N
HPL_dlaswp04N copies M0 rows of U into A and replaces those rows of U with columns of W. In addition M1 - M0 columns of W are copied into rows of U.
HPL_dlaswp04T
HPL_dlaswp04T copies M0 columns of U into rows of A and replaces those columns of U with columns of W. In addition M1 - M0 columns of W are copied into U.
HPL_dlaswp05N
HPL_dlaswp05N copies rows of U of global offset LINDXAU into rows of A at positions indicated by LINDXA.
HPL_dlaswp05T
HPL_dlaswp05T copies columns of U of global offset LINDXAU into rows of A at positions indicated by LINDXA.
HPL_dlaswp06N
HPL_dlaswp06N swaps rows of U with rows of A at positions indicated by LINDXA.
HPL_dlaswp06T
HPL_dlaswp06T swaps columns of U with rows of A at positions indicated by LINDXA.
HPL_dlaswp10N
HPL_dlaswp10N performs a sequence of local column interchanges on a matrix A. One column interchange is initiated for columns 0 through N-1 of A.
HPL_dlatcpy
HPL_dlatcpy copies the transpose of an array A into an array B.
HPL_dlocmax
HPL_dlocmax finds the maximum entry in the current column and packs the useful information in WORK[0:3]. On exit, WORK[0] contains the local maximum absolute...
HPL_dlocswpN
HPL_dlocswpN performs the local swapping operations within a panel. The lower triangular N0-by-N0 upper block of the panel is stored in no-transpose form (i.e...
HPL_dlocswpT
HPL_dlocswpT performs the local swapping operations within a panel. The lower triangular N0-by-N0 upper block of the panel is stored in transpose form.
HPL_dmatgen
HPL_dmatgen generates (or regenerates) a random matrix A. The pseudo-random generator uses the linear congruential algorithm: X(n+1) = (a * X(n) + c) mod m as...
HPL_dscal
HPL_dscal scales the vector x by alpha.
HPL_dswap
HPL_dswap swaps the vectors x and y.
HPL_dtrsm
HPL_dtrsm solves one of the matrix equations op( A ) * X = alpha * B, or X * op( A ) = alpha * B, where alpha is a scalar, X and B are m by n matrices, A is a...
HPL_dtrsv
HPL_dtrsv solves one of the systems of equations A * x = b, or A^T * x = b, where b and x are n-element vectors and A is an n by n non-unit, or unit, upper or...
HPL_equil
HPL_equil equilibrates the local pieces of U, so that on exit to this function, pieces of U contained in every process row are of the same size. This phase...
HPL_fprintf
HPL_fprintf is a wrapper around fprintf flushing the output stream.
HPL_grid_exit
HPL_grid_exit marks the process grid object for deallocation. The returned error code MPI_SUCCESS indicates successful completion. Other error codes are (MPI)...
HPL_grid_info
HPL_grid_info returns the grid shape and the coordinates in the grid of the calling process. Successful completion is indicated by the returned error code...
HPL_grid_init
HPL_grid_init creates a NPROW x NPCOL process grid using column- or row-major ordering from an initial collection of processes identified by an MPI...
HPL_idamax
HPL_idamax returns the index in an n-vector x of the first element having maximum absolute value.
HPL_indxg2l
HPL_indxg2l computes the local index of a matrix entry pointed to by the global index IG. This local returned index is the same in all processes.
HPL_indxg2lp
HPL_indxg2lp computes the local index of a matrix entry pointed to by the global index IG as well as the process coordinate which posseses this entry. The local...
HPL_indxg2p
HPL_indxg2p computes the process coordinate which posseses the entry of a matrix specified by a global index IG.
HPL_indxl2g
HPL_indxl2g computes the global index of a matrix entry pointed to by the local index IL of the process indicated by PROC.
HPL_infog2l
HPL_infog2l computes the starting local index II, JJ corresponding to the submatrix starting globally at the entry pointed by I, J. This routine returns the...
HPL_jumpit
HPL_jumpit jumps in the random sequence from the number X(n) encoded in IRANN to the number X(m) encoded in IRANM using the constants A and C encoded in MULT...
HPL_ladd
HPL_ladd adds without carry two long positive integers K and J and puts the result into I. The long integers I, J, K are encoded on 64 bits using an array of 2...
HPL_lmul
HPL_lmul multiplies without carry two long positive integers K and J and puts the result into I. The long integers I, J, K are encoded on 64 bits using an array...
HPL_logsort
HPL_logsort computes an array IPMAP and its inverse IPMAPM1 that contain the logarithmic sorted processes id with repect to the local number of rows of U that...
HPL_max
HPL_max combines (max) two buffers.
HPL_min
HPL_min combines (min) two buffers.
HPL_numroc
HPL_numroc returns the local number of matrix rows/columns process PROC will get if we give out N rows/columns starting from global index 0.
HPL_numrocI
HPL_numrocI returns the local number of matrix rows/columns process PROC will get if we give out N rows/columns starting from global index I.
HPL_pabort
HPL_pabort displays an error message on stderr and halts execution.
HPL_packL
HPL_packL forms the MPI data type for the panel to be broadcast. Successful completion is indicated by the returned error code MPI_SUCCESS.
HPL_pddriver
main is the main driver program for testing the HPL routines. This program is driven by a short data file named "HPL.dat".
HPL_pdfact
HPL_pdfact recursively factorizes a 1-dimensional panel of columns. The RPFACT function pointer specifies the recursive algorithm to be used, either Crout...
HPL_pdgesv
HPL_pdgesv factors a N+1-by-N matrix using LU factorization with row partial pivoting. The main algorithm is the "right looking" variant with or without...
HPL_pdgesv0
HPL_pdgesv0 factors a N+1-by-N matrix using LU factorization with row partial pivoting. The main algorithm is the "right looking" variant without look-ahead...
HPL_pdgesvK1
HPL_pdgesvK1 factors a N+1-by-N matrix using LU factorization with row partial pivoting. The main algorithm is the "right looking" variant with look-ahead. The...
HPL_pdgesvK2
HPL_pdgesvK2 factors a N+1-by-N matrix using LU factorization with row partial pivoting. The main algorithm is the "right looking" variant with look-ahead. The...
HPL_pdinfo
HPL_pdinfo reads the startup information for the various tests and transmits it to all processes.
HPL_pdlamch
HPL_pdlamch determines machine-specific arithmetic constants such as the relative machine precision (eps), the safe minimum(sfmin) such that 1/sfmin does not...
HPL_pdlange
HPL_pdlange returns the value of the one norm, or the infinity norm, or the element of largest absolute value of a distributed matrix A: max(abs(A(i,j))) when...
HPL_pdlaprnt
HPL_pdlaprnt prints to standard error a distributed matrix A. The local pieces of A are sent to the process of coordinates (0,0) in the grid and then printed.
HPL_pdlaswp00N
HPL_pdlaswp00N applies the NB row interchanges to NN columns of the trailing submatrix and broadcast a column panel. Bi-directional exchange is used to perform...
HPL_pdlaswp00T
HPL_pdlaswp00T applies the NB row interchanges to NN columns of the trailing submatrix and broadcast a column panel. Bi-directional exchange is used to perform...
HPL_pdlaswp01N
HPL_pdlaswp01N applies the NB row interchanges to NN columns of the trailing submatrix and broadcast a column panel. A "Spread then roll" algorithm performs the...
HPL_pdlaswp01T
HPL_pdlaswp01T applies the NB row interchanges to NN columns of the trailing submatrix and broadcast a column panel. A "Spread then roll" algorithm performs the...
HPL_pdmatgen
HPL_pdmatgen generates (or regenerates) a parallel random matrix A. The pseudo-random generator uses the linear congruential algorithm: X(n+1) = (a * X(n) + c)...
HPL_pdmxswp
HPL_pdmxswp swaps and broadcasts the absolute value max row using bi-directional exchange. The buffer is partially set by HPL_dlocmax. Bi-directional exchange...
HPL_pdpancrN
HPL_pdpancrN factorizes a panel of columns that is a sub-array of a larger one-dimensional panel A using the Crout variant of the usual one-dimensional...
HPL_pdpancrT
HPL_pdpancrT factorizes a panel of columns that is a sub-array of a larger one-dimensional panel A using the Crout variant of the usual one-dimensional...
HPL_pdpanel_disp
HPL_pdpanel_disp deallocates the panel structure and resources and stores the error code returned by the panel factorization.
HPL_pdpanel_free
HPL_pdpanel_free deallocates the panel resources and stores the error code returned by the panel factorization.
HPL_pdpanel_init
HPL_pdpanel_init initializes a panel data structure.
HPL_pdpanel_new
HPL_pdpanel_new creates and initializes a panel data structure.
HPL_pdpanllN
HPL_pdpanllN factorizes a panel of columns that is a sub-array of a larger one-dimensional panel A using the Left-looking variant of the usual one-dimensional...
HPL_pdpanllT
HPL_pdpanllT factorizes a panel of columns that is a sub-array of a larger one-dimensional panel A using the Left-looking variant of the usual one-dimensional...
HPL_pdpanrlN
HPL_pdpanrlN factorizes a panel of columns that is a sub-array of a larger one-dimensional panel A using the Right-looking variant of the usual one-dimensional...
HPL_pdpanrlT
HPL_pdpanrlT factorizes a panel of columns that is a sub-array of a larger one-dimensional panel A using the Right-looking variant of the usual one-dimensional...
HPL_pdrpancrN
HPL_pdrpancrN recursively factorizes a panel of columns using the recursive Crout variant of the usual one-dimensional algorithm. The lower triangular N0-by-N0...
HPL_pdrpancrT
HPL_pdrpancrT recursively factorizes a panel of columns using the recursive Crout variant of the usual one-dimensional algorithm. The lower triangular N0-by-N0...
HPL_pdrpanllN
HPL_pdrpanllN recursively factorizes a panel of columns using the recursive Left-looking variant of the one-dimensional algorithm. The lower triangular N0-by-N0...
HPL_pdrpanllT
HPL_pdrpanllT recursively factorizes a panel of columns using the recursive Left-looking variant of the one-dimensional algorithm. The lower triangular N0-by-N0...
HPL_pdrpanrlN
HPL_pdrpanrlN recursively factorizes a panel of columns using the recursive Right-looking variant of the one-dimensional algorithm. The lower triangular...
HPL_pdrpanrlT
HPL_pdrpanrlT recursively factorizes a panel of columns using the recursive Right-looking variant of the one-dimensional algorithm. The lower triangular...
HPL_pdtest
HPL_pdtest performs one test given a set of parameters such as the process grid, the problem size, the distribution blocking factor ... This function generates...
HPL_pdtrsv
HPL_pdtrsv solves an upper triangular system of linear equations. The rhs is the last column of the N by N+1 matrix A. The solve starts in the process column...
HPL_pdupdateNN
HPL_pdupdateNN broadcast - forward the panel PBCST and simultaneously applies the row interchanges and updates part of the trailing (using the panel PANEL)...
HPL_pdupdateNT
HPL_pdupdateNT broadcast - forward the panel PBCST and simultaneously applies the row interchanges and updates part of the trailing (using the panel PANEL)...
HPL_pdupdateTN
HPL_pdupdateTN broadcast - forward the panel PBCST and simultaneously applies the row interchanges and updates part of the trailing (using the panel PANEL)...
HPL_pdupdateTT
HPL_pdupdateTT broadcast - forward the panel PBCST and simultaneously applies the row interchanges and updates part of the trailing (using the panel PANEL)...
HPL_perm
HPL_perm combines two index arrays and generate the corresponding permutation. First, this function computes the inverse of LINDXA, and then combine it with...
HPL_pipid
HPL_pipid computes an array IPID that contains the source and final destination of matrix rows resulting from the application of N interchanges as computed by...
HPL_plindx0
HPL_plindx0 computes two local arrays LINDXA and LINDXAU containing the local source and final destination position resulting from the application of row...
HPL_plindx1
HPL_plindx1 computes two local arrays LINDXA and LINDXAU containing the local source and final destination position resulting from the application of row...
HPL_plindx10
HPL_plindx10 computes three arrays IPLEN, IPMAP and IPMAPM1 that contain the logarithmic mapping information for the spreading phase.
HPL_pnum
HPL_pnum determines the rank of a process as a function of its coordinates in the grid.
HPL_ptimer
HPL_ptimer provides a "stopwatch" functionality cpu/wall timer in seconds. Up to 64 separate timers can be functioning at once. The first call starts the timer...
HPL_ptimer_cputime
HPL_ptimer_cputime returns the cpu time. If HPL_USE_CLOCK is defined, the clock() function is used to return an approximation of processor time used by the...
HPL_ptimer_walltime
HPL_ptimer_walltime returns the elapsed (wall-clock) time.
HPL_pwarn
HPL_pwarn displays an error message.
HPL_rand
HPL_rand generates the next number in the random sequence. This function ensures that this number lies in the interval (-0.5, 0.5]. The static array irand...
HPL_recv
HPL_recv is a simple wrapper around MPI_Recv. Its main purpose is to allow for some experimentation / tuning of this simple routine. Successful completion is...
HPL_reduce
HPL_reduce performs a global reduce operation across all processes of a group. Note that the input buffer is used as workarray and in all processes but the...
HPL_rollN
HPL_rollN rolls the local arrays containing the local pieces of U, so that on exit to this function U is replicated in every process row. In addition, this...
HPL_rollT
HPL_rollT rolls the local arrays containing the local pieces of U, so that on exit to this function U is replicated in every process row. In addition, this...
HPL_sdrv
HPL_sdrv is a simple wrapper around MPI_Sendrecv. Its main purpose is to allow for some experimentation and tuning of this simple function. Messages of length...
HPL_send
HPL_send is a simple wrapper around MPI_Send. Its main purpose is to allow for some experimentation / tuning of this simple routine. Successful completion is...
HPL_setran
HPL_setran initializes the random generator with the encoding of the first number X(0) in the sequence, and the constants a and c used to compute the next...
HPL_spreadN
HPL_spreadN spreads the local array containing local pieces of U, so that on exit to this function, a piece of U is contained in every process row. The array...
HPL_spreadT
HPL_spreadT spreads the local array containing local pieces of U, so that on exit to this function, a piece of U is contained in every process row. The array...
HPL_sum
HPL_sum combines (sum) two buffers.
HPL_timer
HPL_timer provides a "stopwatch" functionality cpu/wall timer in seconds. Up to 64 separate timers can be functioning at once. The first call starts the timer...
HPL_timer_cputime
HPL_timer_cputime returns the cpu time. If HPL_USE_CLOCK is defined, the clock() function is used to return an approximation of processor time used by the...
HPL_timer_walltime
HPL_timer_walltime returns the elapsed (wall-clock) time.
HPL_warn
HPL_warn displays an error message.
HPL_xjumpm
HPL_xjumpm computes the constants A and C to jump JUMPM numbers in the random sequence: X(n+JUMPM) = A*X(n)+C. The constants encoded in MULT and IADD specify...