pbs_gpureset man page

pbs_ gpureset — reset GPU error counts

Synopsis

#include <pbs_error.h>
#include <pbs_ifl.h>

int pbs_ gpureset(int connect, char *mom_node, int gpu_id, int ecc_perm, int ecc_vol)

Description

Issue a batch request for the pbs_mom to reset the ECC counts on one of it's Nvidia GPUs. The GPU's error count is reset by sending a GPU Control batch request to the batch server.

The argument, mom_node, specifies the host within the cluster on which the GPU is located. The argument is the name of a host that is a member of the cluster of hosts managed by the server.

The argument, gpu_id, specifies ID of the GPU on the MOM node.

The argument, ecc_perm, specifies whether or not to reset the GPU's permanent ECC error count. Value of 1 resets, value of 0 does not.

The argument, ecc_vol, specifies whether or not to reset the GPU's volatile ECC error count. Value of 1 resets, value of 0 does not.

This call requires PBS Operator or Manager privilege. It also requires that Torque be configured with --enable-nvidia-gpu.

See Also

qgpureset(1B)

Diagnostics

When the batch request generated by the pbs_ gpureset() function has been completed successfully by a batch server, the routine will return 0 (zero). Otherwise, a non zero error is returned. The error number is also set in pbs_errno.

Info

3B Local