pmdarocestat - Man Page
Performance Metrics Domain Agent (PMDA) for RoCE devices
Synopsis
$PCP_PMDAS_DIR/rocestat/pmdarocestat
Description
The Rocestat PMDA (Performance Metrics Domain Agent) is a Performance Co-Pilot (PCP) module that collects and exports performance statistics for RDMA over Converged Ethernet (RoCE) devices. It provides insights into network performance, error conditions, and congestion events, aiding in the diagnosis and monitoring of RoCE-based communication.
This PMDA reports software-aggregated InfiniBand port statistics, including received/transmitted bytes and packets, link errors, and congestion-related drops, helping to identify potential bottlenecks and failures. Additionally, it includes hardware-level counters, which track low-level transmission metrics, duplicate requests, NAKs, and physical/constraint errors, offering a deeper view into the underlying transport reliability and efficiency.
Furthermore, Rocestat PMDA collects priority-based lane metrics from ethtool -S <interface>, filtering statistics related to priority lanes in RoCE traffic. These metrics provide visibility into traffic distribution across lanes, helping diagnose congestion hotspots and optimize workload balancing across different lanes
By integrating Rocestat PMDA into a PCP monitoring environment, users can efficiently analyze RoCE network behavior, detect performance anomalies, and optimize high-speed RDMA workloads in data center and HPC environments.
Installation
To install the Rocestat PMDA, follow these steps:
# cd $PCP_PMDAS_DIR/rocestat # ./Install
To verify that the PMDA is running:
$ pminfo -t rocestat
Usage
To query Rocestat metrics, use the following command:
$ pminfo rocestat
To retrieve specific metric values:
$ pmval rocestat.hw.rcv.port_rcv_packets
Files
- $PCP_PMDAS_DIR/rocestat/Install
Installation script for Rocestat PMDA.
- $PCP_PMDAS_DIR/rocestat/Remove
Uninstallation script.
- $PCP_LOG_DIR/pmcd/rocestat.log
Log file for Rocestat PMDA events and errors.
PCP Environment
Environment variables with the prefix PCP_ are used to parameterize the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).
See Also
PCPIntro(1), pmcd(1), pminfo(1) and PMDA(3).