fapolicyd.timing - Man Page

fapolicyd manual decision timing report

Description

The /run/fapolicyd/fapolicyd.timing file contains one manual decision timing run from fapolicyd(8). The daemon writes this file when a privileged manual timing stop request ends an armed timing run. A stop request while timing is not armed writes a short status report instead of timing data. The daemon may also stop collection and write the report when a timing counter would overflow.

Timing collection is disabled by default. Manual timing requests are honored only when the active daemon configuration has timing_collection=manual in fapolicyd.conf(5). The command fapolicyd-cli --timing-start arms a run, and fapolicyd-cli --timing-stop stops the run and prints this report. --timer-start and --timer-stop are accepted aliases.

The daemon stores aggregate counters only. It does not store one timing record per decision. Each decision worker owns local aggregate blocks containing a call count, total time, maximum time, and fixed latency buckets for each stage. The report snapshots and aggregates those blocks.

Run Summary

The first lines summarize the run:

Mode

Active timing collection mode.

Timing run

Wall-clock start and stop time.

Duration

Elapsed wall-clock run time in H:MM:SS format.

Workers

Number of decision worker timing blocks included in the snapshot.

Max queue depth

Highest internal event queue depth observed during the timing run. The daemon temporarily resets only the max queue depth counter when timing starts and restores the previous high-water mark after timing stops if the previous value was larger. The Queueing section repeats this value next to queue wait timing.

Decisions

Number of completed fanotify decisions timed during the run.

Throughput

Timed decisions per second over the wall-clock run duration. This includes time when fapolicyd was armed but no events were waiting to be decided.

Active decision rate

Timed decisions per second using the sum of decision:total latency instead of wall-clock duration. With more than one decision worker, this is worker-time, not elapsed wall-clock time.

Stop reason

Printed only when the run stopped automatically because a timing counter would overflow.

TL;DR

The TL;DR section appears immediately after the run summary and before Overall decision latency . It surfaces two or three dominant findings drawn from the same data used by the later Derived observations section, such as helper dominance, response formatting cost, or queueing health. The detailed observations remain in their original location later in the report.

Overall Decision Latency

The Overall decision latency section describes end-to-end decision latency from the point a decision worker starts processing a dequeued fanotify event until the decision path returns after response handling.

The section prints average and maximum latency using compact latency units, approximate percentile buckets, and cumulative percentages for useful thresholds such as <=50us , <=100us , <=500us , <=1ms , and >10ms . Percentiles are bucket approximations, not exact per-decision percentiles. The 95th percentile is a useful tail-latency marker: roughly 95% of observed calls were at or below that bucket, so a high p95 points to a recurring slow path rather than a single worst-case outlier.

When high-end tail buckets are observed, the section also prints a compact tail line for >10ms , >25ms , >50ms , >100ms , and >250ms . The line includes both count and percentage so long outliers do not disappear inside a single >10ms bucket.

Queueing

The Queueing section describes work waiting before a decision worker starts processing an event. It is based on time_in_queue:total when enqueue timestamps were observed during the timing run.

avg wait

Average observed time in the internal userspace event queue.

max wait

Longest observed wait in the internal userspace event queue.

p95 bucket

Approximate bucket containing the 95th percentile queue wait.

total queued time

Sum of observed queue wait time. This is useful for spotting producer/consumer imbalance, but it is not decision CPU time.

max queue depth

Highest internal event queue depth observed during the timing run.

If no enqueue timestamps were observed, the section prints not observed and still reports max queue depth.

Decision Phase Timing

The Decision phase timing section shows the main decision-thread phases when they were observed:

event_build:total
evaluation:total
response:total
Phase

Displayed phase name.

Calls

Number of times the phase was measured.

Calls/Dec

Average phase calls per timed decision.

Total

Total measured phase time.

Avg

Average phase call time.

Max

Slowest observed phase call.

p95 bucket

Approximate bucket containing the 95th percentile phase call.

Notes

Short deterministic notes. For example, a response phase dominated by syslog or debug formatting is marked syslog/debug-heavy .

These phase rows are the best first view for explaining where decision-thread time is being spent. Helper rows are separate because some lazy helper operations can be requested from more than one phase.

Lazy Helper Attribution

The report introduces helper attribution with:

Lazy helper attribution:
  Helper timings are attributed to the active logical driver:
  evaluation or response.
  Combined totals are evaluation + response.

The driver table and combined helper table follow this note.

Lazy Helper Attribution by Driver

The Lazy helper attribution by driver section shows helper time split by the logical context that caused it: evaluation or response.

Helper

Short helper path.

Eval total

Total helper time caused while the rule evaluation driver was active.

Response total

Total helper time caused while response, audit metadata, or syslog/debug formatting was active.

Combined

Total helper time from all driver contexts.

Response %

Percentage of combined helper time caused by response-side work. This is useful for deciding whether manual/debug reporting is driving MIME or trust lookup cost.

Combined Lazy Helper Attribution

The Combined lazy helper attribution section groups helper-related stages into stable combined paths. For helpers that are split by driver, the combined row is calculated by adding the phase-specific evaluation and response rows together. These rows answer how often a helper was needed and how expensive that helper was when amortized over all decisions.

Helper path

Short combined helper path.

Calls

Number of helper calls measured.

Calls/Dec

Average helper calls per timed decision.

Total

Total measured helper time.

Avg/call

Average measured helper call time.

Amort/Dec

Total helper time divided by all timed decisions. This shows how much a lazy or rare helper contributes to a typical decision in the measured workload.

Max

Slowest observed helper call.

p95 bucket

Approximate bucket containing the 95th percentile helper call.

The current helper groups are:

mime_detection:total
mime_detection:fast_classification
mime_detection:gather_elf
mime_detection:libmagic_fallback
trust_db_lookup:total
trust_db_lookup:read
trust_db_lookup:lock_wait
hash_ima:total
hash_sha:total
proc_detail_lookup

Derived Observations

The Derived observations section prints short deterministic observations when the required data was present in the run. Examples include:

Stage Tail Summary

When hot detailed stage rows have observations above 10ms, the report prints a Stage tail summary after the detailed table. Rows are sorted by >10ms count, limited to the top five rows, and extended with any additional row that has a nonzero >50ms count. Near-identical parent/child tail rows may be suppressed to keep the section focused.

stage:name: >10ms count/percent, >25ms count/percent, >50ms count/percent

The detailed table remains the canonical total-time ranking; the tail summary only adds high-end resolution for rows that crossed one of the tail thresholds. Thresholds with zero observations are omitted.

Detailed Stage Table

The Detailed stage timing table is sorted by total measured time by default. It answers which measured operations consumed the most total time during the run. This is the detailed view and is intentionally kept below the summary and helper attribution sections.

The decision:total row is included when observed and participates in the same total-time sort as the other rows.

Stage

Measured operation name. The current stage set is listed in Decision Flow.

Calls

Number of times the operation was measured.

Calls/Dec

Average measured calls per timed decision.

Total

Total measured time for the stage.

Avg

Average measured call time for the stage.

Max

Slowest observed call for the stage.

p95 bucket

Approximate bucket containing the 95th percentile call for the stage.

Stages with zero calls are hidden from the table and listed under Not observed . If a stage is not observed, no measured call to that operation occurred during the timing window. For example, an absent evaluation:trust_db_lookup:total and response:trust_db_lookup:total line means no measured trust database lookup occurred from those drivers in that run.

Decision Flow

Stage names use a colon-separated hierarchy:

phase:operation:child

Rows ending in :total are parent scopes. Child operation timings are useful contributors, but they can be nested, lazy, or shared by more than one code path, so child rows are not expected to add up to the parent total.

The main decision-thread view is:

decision:total
time_in_queue:total
event_build:total
event_build:cache_flush
event_build:proc_fingerprint
event_build:fd_stat

evaluation:lock_wait
evaluation:total
evaluation:mime_detection:total
evaluation:mime_detection:fast_classification
evaluation:mime_detection:gather_elf
evaluation:mime_detection:libmagic_fallback
evaluation:fd_path_resolution
evaluation:proc_detail_lookup
evaluation:hash_ima:total
evaluation:hash_sha:total
evaluation:trust_db_lookup:total
evaluation:trust_db_lookup:lock_wait
evaluation:trust_db_lookup:read

response:total
response:syslog_debug_format:total
response:mime_detection:total
response:mime_detection:fast_classification
response:mime_detection:gather_elf
response:mime_detection:libmagic_fallback
response:trust_db_lookup:total
response:trust_db_lookup:lock_wait
response:trust_db_lookup:read
response:audit_metadata:total
response:fanotify_write

Lazy helper operations are attributed to the active logical driver when the helper is called. Rule evaluation uses evaluation: , response logging and audit response work use response: . Helper timing outside those logical drivers is treated as an instrumentation bug and is not reported as a separate phase.

Stages

decision:total

Decision worker latency after a fanotify event has been dequeued, including event build, policy evaluation, optional logging or audit preparation, and response handling. Queue wait time is reported separately.

time_in_queue:total

Time a fanotify event spent in the internal userspace queue between enqueue by the fanotify reader thread and dequeue by a decision worker. Events already in the queue when timing is armed do not have an enqueue timestamp and are not included in this stage.

event_build:total

Construction of the internal event from fanotify metadata. This includes subject and object cache lookups, initial subject and object fingerprints, and some pattern-state work.

event_build:cache_flush

Object cache flush performed on the decision path when the global flush flag is set.

event_build:proc_fingerprint

Initial /proc/<pid> stat used to identify the subject process and detect stale subject cache entries.

event_build:fd_stat

Stat of the fanotify file descriptor used to fingerprint the object.

evaluation:lock_wait

Time spent waiting for the rule lock before policy evaluation.

evaluation:total

Rule list evaluation loop. Lazy subject, object, trust, hash, or type lookups caused by rule fields can appear as child operation timings and also contribute to this total.

evaluation:proc_detail_lookup

On-demand subject procfs detail lookups, including /proc/<pid>/status , /proc/<pid>/exe , subject executable type, session id, and auid lookups. These can be driven lazily by rules, syslog/debug formatting, or audit metadata.

evaluation:fd_path_resolution

Path resolution for the object file descriptor.

For MIME and trust DB helper rows below, * means the driver prefix evaluation , response .

*:mime_detection:total

Object type and ELF/script detection.

*:mime_detection:fast_classification

In-house classification before libmagic fallback. This includes empty-file, ELF/script, shebang, common magic number, common text format, and device checks.

*:mime_detection:gather_elf

ELF/script header scan performed by gather_elf().

*:mime_detection:libmagic_fallback

Full libmagic fallback used when the faster type checks cannot classify the object.

evaluation:hash_ima:total

IMA digest collection from the security.ima xattr through get_ima_hash().

evaluation:hash_sha:total

SHA-style file digest collection through get_hash_from_fd2().

*:trust_db_lookup:total

Total trust database lookup through check_trust_database(). This includes update-thread lock wait, long-term LMDB read setup, the primary LMDB lookup, optional hash work requested by trust lookup, and the top-level /usr symlink retry path.

*:trust_db_lookup:lock_wait

Time spent waiting for the trust database update-thread lock before LMDB read setup. This is a future multi-decision-thread contention signal.

*:trust_db_lookup:read

LMDB read setup and trust database read work after the update-thread lock is held. This currently includes the primary lookup, optional hash work requested by trust lookup, the top-level /usr symlink retry path, and read cleanup.

response:total

Work after the rule loop has completed. This includes syslog/debug formatting, rule/source bookkeeping, decision metric updates, response selection, audit metadata preparation, fanotify response writes, and descriptor close work.

response:syslog_debug_format:total

Formatting of syslog or debug output after a decision. Lazy field lookups needed only for logging may appear as nested stage timings.

response:audit_metadata:total

Preparation of fanotify audit response metadata before the response write. When audit metadata needs object trust, trust lookup time can be nested here.

response:fanotify_write

The actual write(2) of the fanotify response to the kernel.

Notes

The notes footer reports the largest queued-time contributor when queue wait was observed, the largest helper contributor, the largest decision phase contributor, and the slowest observed row by maximum time.

For cache hit, miss, collision, eviction, subject deferral, and early subject eviction counters, see fapolicyd.metrics(5).

If adding a timing count, total, or bucket counter would overflow, fapolicyd disarms timing, records an overflow stop reason, and writes the timing report instead of allowing counters to wrap and skew the data.

Files

/run/fapolicyd/fapolicyd.timing

Manual decision timing report.

/run/fapolicyd/fapolicyd.state

Runtime state report containing timing control state.

/run/fapolicyd/fapolicyd.metrics

Runtime metrics report containing cache counters.

See Also

fapolicyd(8), fapolicyd-cli(8), fapolicyd.conf(5), fapolicyd.state(5), fapolicyd.metrics(5), and fapolicyd.trust(5).

Referenced By

fapolicyd(8), fapolicyd-cli(8), fapolicyd.conf(5), fapolicyd.metrics(5), fapolicyd.state(5).

May 2026 Red Hat File Formats