pmdarocestat - Man Page

Performance Metrics Domain Agent (PMDA) for RoCE devices

Synopsis

$PCP_PMDAS_DIR/rocestat/pmdarocestat

Description

The Rocestat PMDA (Performance Metrics Domain Agent) is a Performance Co-Pilot (PCP) module that collects and exports performance statistics for RDMA over Converged Ethernet (RoCE) devices. It provides insights into network performance, error conditions, and congestion events, aiding in the diagnosis and monitoring of RoCE-based communication.

This PMDA reports software-aggregated InfiniBand port statistics, including received/transmitted bytes and packets, link errors, and congestion-related drops, helping to identify potential bottlenecks and failures. Additionally, it includes hardware-level counters, which track low-level transmission metrics, duplicate requests, NAKs, and physical/constraint errors, offering a deeper view into the underlying transport reliability and efficiency.

Furthermore, Rocestat PMDA collects priority-based lane metrics from ethtool -S <interface>, filtering statistics related to priority lanes in RoCE traffic. These metrics provide visibility into traffic distribution across lanes, helping diagnose congestion hotspots and optimize workload balancing across different lanes

By integrating Rocestat PMDA into a PCP monitoring environment, users can efficiently analyze RoCE network behavior, detect performance anomalies, and optimize high-speed RDMA workloads in data center and HPC environments.

Installation

To install the Rocestat PMDA, follow these steps:

# cd $PCP_PMDAS_DIR/rocestat
# ./Install

To verify that the PMDA is running:

$ pminfo -t rocestat

Usage

To query Rocestat metrics, use the following command:

$ pminfo rocestat

To retrieve specific metric values:

$ pmval rocestat.hw.rcv.port_rcv_packets

Files

$PCP_PMDAS_DIR/rocestat/Install

Installation script for Rocestat PMDA.

$PCP_PMDAS_DIR/rocestat/Remove

Uninstallation script.

$PCP_LOG_DIR/pmcd/rocestat.log

Log file for Rocestat PMDA events and errors.

PCP Environment

Environment variables with the prefix PCP_ are used to parameterize the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).

See Also

PCPIntro(1), pmcd(1), pminfo(1) and PMDA(3).

Info

PCP Performance Co-Pilot General Commands Manual