pcp2arrow - Man Page

pcp-to-arrow metrics exporter

Synopsis

pcp2arrow [-jLnrRVz?] [-8|-9 limit] [-a archive] [-A align] [--archive-folio folio] [-c config] [--container container] [-h host] [-i instances] [-J rank] [-K spec] [-o outfile] [-O origin] [-s samples] [-S starttime] [-t interval] [-T endtime] [-Z timezone] [metricspec...]

Description

pcp2arrow is a customizable performance metrics exporter tool from PCP to Apache Arrow. It is particularly useful as a mechanism for producing the Parquet columnar data format, for use with Pandas or similar data analysis modules. Each PCP metric, and each instance of each metric, will form a unique column named according to the PCP metric specification - that is, metric name followed by square bracket enclosed instance name (for metrics with an instance domain).

Any available performance metric, live or archived, system and/or application, can be selected for exporting using either command line arguments or a configuration file.

With no metricspec options, all available metrics are considered for exporting.

pcp2arrow is a close relative of pmrep(1). Refer to pmrep(1) for the metricspec description accepted on pcp2arrow command line. See pmrep.conf(5) for description of the pcp2arrow.conf configuration file syntax. This page describes pcp2arrow specific options and configuration file differences with pmrep.conf(5). pmrep(1) also lists some usage examples of which most are applicable with pcp2arrow as well.

Only the command line options listed on this page are supported, other options available for pmrep(1) are not supported.

Options via environment values (see pmGetOptions(3)) override the corresponding built-in default values (if any). Configuration file options override the corresponding environment variables (if any). Command line options override the corresponding configuration file options (if any).

Configuration File

pcp2arrow uses a configuration file with syntax described in pmrep.conf(5). The following options are common with pmrep.conf: version, source, speclocal, derived, header, globals, samples, interval, type, type_prefer, ignore_incompat, names_change, instances, live_filter, rank, limit_filter, limit_filter_force, invert_filter, predicate, omit_flat, include_labels, precision, precision_force, count_scale, count_scale_force, space_scale, space_scale_force, time_scale, time_scale_force. The rest of the pmrep.conf options are recognized but ignored for compatibility.

Options

The available command line options are:

-8 limit, --limit-filter=limit

Limit results to instances with values above/below limit. A positive integer will include instances with values at or above the limit in reporting. A negative integer will include instances with values at or below the limit in reporting. A value of zero performs no limit filtering. This option will not override possible per-metric specifications. See also -J and -N.

-9 limit, --limit-filter-force=limit

Like -8 but this option will override per-metric specifications.

-a archive, --archive=archive

Performance metric values are retrieved from the set of Performance Co-Pilot (PCP) archive files identified by the archive argument, which is a comma-separated list of names, each of which may be the base name of an archive or the name of a directory containing one or more archives.

-A align, --align=align

Force the initial sample to be aligned on the boundary of a natural time unit align. Refer to PCPIntro(1) for a complete description of the syntax for align.

--archive-folio=folio

Read metric source archives from the PCP archive folio created by tools like pmchart(1) or, less often, manually with mkaf(1).

-c config, --config=config

Specify the config file or directory to use. In case config is a directory all files in it ending .conf will be included. The default is the first found of: ./pcp2arrow.conf, $HOME/.pcp2arrow.conf, $HOME/pcp/pcp2arrow.conf, and $PCP_SYSCONF_DIR/pcp2arrow.conf. For details, see the above section and pmrep.conf(5).

--container=container

Fetch performance metrics from the specified container, either local or remote (see -h).

-C,  --check

Exit before reporting any values, but after parsing the configuration and metrics and printing possible headers.

-h host, --host=host

Fetch performance metrics from pmcd(1) on host, rather than from the default localhost.

-H,  --no-header

Do not print any headers.

-i instances, --instances=instances

Retrieve and report only the specified metric instances. By default all instances, present and future, are reported.

Refer to pmrep(1) for complete description of this option.

-j,  --live-filter

Perform instance live filtering. This allows capturing all named instances even if processes are restarted at some point (unlike without live filtering). Performing live filtering over a huge number of instances will add some internal overhead so a bit of user caution is advised. See also -n.

-J rank, --rank=rank

Limit results to highest/lowest ranked instances of set-valued metrics. A positive integer will include highest valued instances in reporting. A negative integer will include lowest valued instances in reporting. A value of zero performs no ranking. Ranking does not imply sorting, see -6. See also -8.

-K spec, --spec-local=spec

When fetching metrics from a local context (see -L), the -K option may be used to control the DSO PMDAs that should be made accessible. The spec argument conforms to the syntax described in pmSpecLocalPMDA(3). More than one -K option may be used.

-L,  --local-PMDA

Use a local context to collect metrics from DSO PMDAs on the local host without PMCD. See also -K.

-n,  --invert-filter

Perform ranking before live filtering. By default instance live filtering (when requested, see -j) happens before instance ranking (when requested, see -J). With this option the logic is inverted and ranking happens before live filtering.

-o outfile, --output-file=outfile

Specify the output file outfile. -O origin, --origin=origin When reporting archived metrics, start reporting at origin within the time window (see -S and -T). Refer to PCPIntro(1) for a complete description of the syntax for origin.

-r,  --raw

Output raw metric values, do not convert cumulative counters to rates. This option will override possible per-metric specifications.

-R,  --raw-prefer

Like -r but this option will not override per-metric specifications.

-s samples, --samples=samples

The samples argument defines the number of samples to be retrieved and reported. If samples is 0 or -s is not specified, pcp2arrow will sample and report continuously (in real time mode) or until the end of the set of PCP archives (in archive mode). See also -T.

-S starttime, --start=starttime

When reporting archived metrics, the report will be restricted to those records logged at or after starttime. Refer to PCPIntro(1) for a complete description of the syntax for starttime.

-t interval, --interval=interval

Set the reporting interval to something other than the default 1 second. The interval argument follows the syntax described in PCPIntro(1), and in the simplest form may be an unsigned integer (the implied units in this case are seconds). See also the -T option.

-T endtime, --finish=endtime

When reporting archived metrics, the report will be restricted to those records logged before or at endtime. Refer to PCPIntro(1) for a complete description of the syntax for endtime.

When used to define the runtime before pcp2arrow will exit, if no samples is given (see -s) then the number of reported samples depends on interval (see -t). If samples is given then interval will be adjusted to allow reporting of samples during runtime. In case all of -T, -s, and -t are given, endtime determines the actual time pcp2arrow will run.

-v,  --omit-flat

Report only set-valued metrics with instances (e.g. disk.dev.read) and omit single-valued “flat” metrics without instances (e.g. kernel.all.sysfork). See -i and -I.

-V,  --version

Display version number and exit.

-z,  --hostzone

Use the local timezone of the host that is the source of the performance metrics, as identified by either the -h or the -a options. The default is to use the timezone of the local host.

-Z timezone, --timezone=timezone

Use timezone for the date and time. Timezone is in the format of the environment variable TZ as described in environ(7). Note that when including a timezone string in output, ISO 8601 -style UTC offsets are used (so something like -Z EST+5 will become UTC-5).

-?,  --help

Display usage message and exit.

Files

pcp2arrow.conf

pcp2arrow configuration file (see -c)

$PCP_SYSCONF_DIR/pmrep/*.conf

system provided default pmrep configuration files

PCP Environment

Environment variables with the prefix PCP_ are used to parameterize the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).

For environment variables affecting PCP tools, see pmGetOptions(3).

See Also

PCPIntro(1), mkaf(1), pcp(1), pmcd(1), pminfo(1), pmrep(1), pmGetOptions(3), pmSpecLocalPMDA(3), LOGARCHIVE(5), pcp.conf(5), pmrep.conf(5), PMNS(5) and environ(7).

Info

PCP Performance Co-Pilot