fi_mon_sampler - Man Page
Simple sampler for ofi_hook_monitor provider.
Synopsis
fi_mon_sampler [OPTIONS] <target> sample from file(s) at <target>
Description
Extract data from the ofi_hook_monitor provider via communication files. <target>
can either be one communication file or a folder of files. Data is exported based on -f <format>
and either printed to stdout (only for single files), or stored per communication file at -o <outpath>
. The sampler can watch the communication files for changes via the option -w <msec>
for repeated sampling.
The name format of the output files is based on the ofi_hook_monitor provider and is as follows: <ppid>_<pid>_<sequential id>_<job id>_<provider name>
. ppid
and pid
are taken from the perspective of the monitored application. In a batched environment running SLURM, job id
is set to the SLURM job ID, otherwise it is set to 0.
How to Run
Launch a libfabric application with FI_HOOK=monitor
to enable the ofi_hook_monitor provider. Adjust the monitor provider settings according to fi_hook(7).
Then launch the sampler via fi_mon_sampler -o <output> <target>
. By default, the ofi_hook_monitor provider stores data at /dev/shm/ofi/<uid>/<hostname>
.
The sampler will generate output files in the directory specified at <output>
, one for each monitored provider.
Options
- -w <msec>
Watch files for changes, check every <msec> milliseconds.
- -f <format>
Output format. Currently only supports CSV.
- -o <outpath>
Output file path. Uses stdout if unset.
Usage Examples
Launch a libfabric application and enable the ofi_hook_monitor provider:
FI_HOOK=monitor fi_pingpong [OPTIONS]
Launch another fi_pingpong
with the respective settings.
Finally, launch the sampler:
fi_mon_sampler -o $HOME -w 1000 -f csv /dev/shm/ofi/$UID/$HOSTNAME
Output
Output files will be generated in the folder specified at -o <output>
.
In -f csv
mode, this will contain a CSV file with data for all monitored libfabric functions. For each function, both the count
and sum
counters are exported, indicated by the column name suffix _c
and _s
respectively. In addition, each function is monitored for each data size bucket. Refer to fi_hook(7) for more details.
Example CSV output, first four columns, first three rows:
mon_recv_0_64_c,mon_recv_0_64_s,mon_recv_64_512_c,mon_recv_64_512_s 0,0,0,0 22529,0,0,0 113664,0,0,0
See Also
Authors
OpenFabrics.