qgpureset man page

qgpureset — reset GPU error counts


qgpureset -H host -g gpuid -p -v


The qgpureset command will request a MOM to reset the ECC counts on one of it's Nvidia GPUs. The GPU's error count is reset by sending a GPU Control batch request to the batch server.

Changing the GPU mode requires PBS Operator or Manager privilege. It also requires that Torque be configured with --enable-nvidia-gpu.


-H host
Specifies the host within the cluster on which the GPU is located. The argument is the name of a host that is a member of the cluster of hosts managed by the server.
-g gpuid
Specifies the ID of the GPU.
Specifies to reset the GPU's permanent ECC error count.
Specifies to reset the GPU's volatile ECC error count.



Standard Error

The qgpureset command will write a diagnostic messages to standard error for each error occurrence.

Exit Status

Upon successful processing of all the operands presented to the
qgpureset command, the exit status will be a value of zero.

If the qgpureset command fails to process any operand, the command exits with a value greater than zero.

See Also

pbs_mom(8B) and pbs_server(8B)