fapolicyd.timing - Man Page

fapolicyd manual decision timing report

Description

The /run/fapolicyd/fapolicyd.timing file contains one manual decision timing run from fapolicyd(8). The daemon writes this file when a privileged manual timing stop request ends an armed timing run. A stop request while timing is not armed writes a short status report instead of timing data. The daemon may also stop collection and write the report when a timing counter would overflow.

Timing collection is disabled by default. Manual timing requests are honored only when the active daemon configuration has timing_collection=manual in fapolicyd.conf(5). The command fapolicyd-cli --timing-start arms a run, and fapolicyd-cli --timing-stop stops the run and prints this report. --timer-start and --timer-stop are accepted aliases.

The daemon stores aggregate counters only. It does not store one timing record per decision. Each decision worker owns local aggregate blocks containing a call count, total time, maximum time, and fixed latency buckets for each stage. The report snapshots and aggregates those blocks.

Run Summary

The first lines summarize the run:

Mode: Active timing collection mode.
Timing run: Wall-clock start and stop time.
Duration: Elapsed wall-clock run time in H:MM:SS format.
Workers: Number of decision worker timing blocks included in the snapshot.
Max queue depth: Highest internal event queue depth observed during the timing run. The daemon temporarily resets only the max queue depth counter when timing starts and restores the previous high-water mark after timing stops if the previous value was larger. The Queueing section repeats this value next to queue wait timing.
Decisions: Number of completed fanotify decisions timed during the run.
Throughput: Timed decisions per second over the wall-clock run duration. This includes time when fapolicyd was armed but no events were waiting to be decided.
Active decision rate: Timed decisions per second using the sum of decision:total latency instead of wall-clock duration. With more than one decision worker, this is worker-time, not elapsed wall-clock time.
Stop reason: Printed only when the run stopped automatically because a timing counter would overflow.

TL;DR

The TL;DR section appears immediately after the run summary and before Overall decision latency . It surfaces two or three dominant findings drawn from the same data used by the later Derived observations section, such as helper dominance, response formatting cost, or queueing health. The detailed observations remain in their original location later in the report.

Overall Decision Latency

The Overall decision latency section describes end-to-end decision latency from the point a decision worker starts processing a dequeued fanotify event until the decision path returns after response handling.

The section prints average and maximum latency using compact latency units, approximate percentile buckets, and cumulative percentages for useful thresholds such as <=50us , <=100us , <=500us , <=1ms , and >10ms . Percentiles are bucket approximations, not exact per-decision percentiles. The 95th percentile is a useful tail-latency marker: roughly 95% of observed calls were at or below that bucket, so a high p95 points to a recurring slow path rather than a single worst-case outlier.

When high-end tail buckets are observed, the section also prints a compact tail line for >10ms , >25ms , >50ms , >100ms , and >250ms . The line includes both count and percentage so long outliers do not disappear inside a single >10ms bucket.

Queueing

The Queueing section describes work waiting before a decision worker starts processing an event. It is based on time_in_queue:total when enqueue timestamps were observed during the timing run.

avg wait: Average observed time in the internal userspace event queue.
max wait: Longest observed wait in the internal userspace event queue.
p95 bucket: Approximate bucket containing the 95th percentile queue wait.
total queued time: Sum of observed queue wait time. This is useful for spotting producer/consumer imbalance, but it is not decision CPU time.
max queue depth: Highest internal event queue depth observed during the timing run.

If no enqueue timestamps were observed, the section prints not observed and still reports max queue depth.

Decision Phase Timing

The Decision phase timing section shows the main decision-thread phases when they were observed:

event_build:total
evaluation:total
response:total

Phase: Displayed phase name.
Calls: Number of times the phase was measured.
Calls/Dec: Average phase calls per timed decision.
Total: Total measured phase time.
Avg: Average phase call time.
Max: Slowest observed phase call.
p95 bucket: Approximate bucket containing the 95th percentile phase call.
Notes: Short deterministic notes. For example, a response phase dominated by syslog or debug formatting is marked syslog/debug-heavy .

These phase rows are the best first view for explaining where decision-thread time is being spent. Helper rows are separate because some lazy helper operations can be requested from more than one phase.

Lazy Helper Attribution

The report introduces helper attribution with:

Lazy helper attribution:
  Helper timings are attributed to the active logical driver:
  evaluation or response.
  Combined totals are evaluation + response.

The driver table and combined helper table follow this note.

Lazy Helper Attribution by Driver

The Lazy helper attribution by driver section shows helper time split by the logical context that caused it: evaluation or response.

Helper: Short helper path.
Eval total: Total helper time caused while the rule evaluation driver was active.
Response total: Total helper time caused while response, audit metadata, or syslog/debug formatting was active.
Combined: Total helper time from all driver contexts.
Response %: Percentage of combined helper time caused by response-side work. This is useful for deciding whether manual/debug reporting is driving MIME or trust lookup cost.

Combined Lazy Helper Attribution

The Combined lazy helper attribution section groups helper-related stages into stable combined paths. For helpers that are split by driver, the combined row is calculated by adding the phase-specific evaluation and response rows together. These rows answer how often a helper was needed and how expensive that helper was when amortized over all decisions.

Helper path: Short combined helper path.
Calls: Number of helper calls measured.
Calls/Dec: Average helper calls per timed decision.
Total: Total measured helper time.
Avg/call: Average measured helper call time.
Amort/Dec: Total helper time divided by all timed decisions. This shows how much a lazy or rare helper contributes to a typical decision in the measured workload.
Max: Slowest observed helper call.
p95 bucket: Approximate bucket containing the 95th percentile helper call.

The current helper groups are:

mime_detection:total
mime_detection:fast_classification
mime_detection:gather_elf
mime_detection:libmagic_fallback
trust_db_lookup:total
trust_db_lookup:read
trust_db_lookup:lock_wait
hash_ima:total
hash_sha:total
proc_detail_lookup

Derived Observations

The Derived observations section prints short deterministic observations when the required data was present in the run. Examples include:

Wall-clock throughput compared with active decision rate, which can show that the workload was mostly idle while timing was armed.
Queue wait and maximum queue depth compared with q_size when queue timing and configuration data are available.
Response formatting share when syslog/debug formatting dominates response:total .
MIME helper cost and libmagic fallback share.
Rare but expensive integrity work from hash_ima or hash_sha , reported as percent of decisions, average time when called, and amortized decision cost.
Trust database lock wait compared with trust database read time.

Stage Tail Summary

When hot detailed stage rows have observations above 10ms, the report prints a Stage tail summary after the detailed table. Rows are sorted by >10ms count, limited to the top five rows, and extended with any additional row that has a nonzero >50ms count. Near-identical parent/child tail rows may be suppressed to keep the section focused.

stage:name: >10ms count/percent, >25ms count/percent, >50ms count/percent

The detailed table remains the canonical total-time ranking; the tail summary only adds high-end resolution for rows that crossed one of the tail thresholds. Thresholds with zero observations are omitted.

Detailed Stage Table

The Detailed stage timing table is sorted by total measured time by default. It answers which measured operations consumed the most total time during the run. This is the detailed view and is intentionally kept below the summary and helper attribution sections.

The decision:total row is included when observed and participates in the same total-time sort as the other rows.

Stage: Measured operation name. The current stage set is listed in Decision Flow.
Calls: Number of times the operation was measured.
Calls/Dec: Average measured calls per timed decision.
Total: Total measured time for the stage.
Avg: Average measured call time for the stage.
Max: Slowest observed call for the stage.
p95 bucket: Approximate bucket containing the 95th percentile call for the stage.

Stages with zero calls are hidden from the table and listed under Not observed . If a stage is not observed, no measured call to that operation occurred during the timing window. For example, an absent evaluation:trust_db_lookup:total and response:trust_db_lookup:total line means no measured trust database lookup occurred from those drivers in that run.

Decision Flow

Stage names use a colon-separated hierarchy:

phase:operation:child

Rows ending in :total are parent scopes. Child operation timings are useful contributors, but they can be nested, lazy, or shared by more than one code path, so child rows are not expected to add up to the parent total.

The main decision-thread view is:

decision:total
time_in_queue:total
event_build:total
event_build:cache_flush
event_build:proc_fingerprint
event_build:fd_stat

evaluation:lock_wait
evaluation:total
evaluation:mime_detection:total
evaluation:mime_detection:fast_classification
evaluation:mime_detection:gather_elf
evaluation:mime_detection:libmagic_fallback
evaluation:fd_path_resolution
evaluation:proc_detail_lookup
evaluation:hash_ima:total
evaluation:hash_sha:total
evaluation:trust_db_lookup:total
evaluation:trust_db_lookup:lock_wait
evaluation:trust_db_lookup:read

response:total
response:syslog_debug_format:total
response:mime_detection:total
response:mime_detection:fast_classification
response:mime_detection:gather_elf
response:mime_detection:libmagic_fallback
response:trust_db_lookup:total
response:trust_db_lookup:lock_wait
response:trust_db_lookup:read
response:audit_metadata:total
response:fanotify_write

Lazy helper operations are attributed to the active logical driver when the helper is called. Rule evaluation uses evaluation: , response logging and audit response work use response: . Helper timing outside those logical drivers is treated as an instrumentation bug and is not reported as a separate phase.

Stages

decision:total: Decision worker latency after a fanotify event has been dequeued, including event build, policy evaluation, optional logging or audit preparation, and response handling. Queue wait time is reported separately.
time_in_queue:total: Time a fanotify event spent in the internal userspace queue between enqueue by the fanotify reader thread and dequeue by a decision worker. Events already in the queue when timing is armed do not have an enqueue timestamp and are not included in this stage.
event_build:total: Construction of the internal event from fanotify metadata. This includes subject and object cache lookups, initial subject and object fingerprints, and some pattern-state work.
event_build:cache_flush: Object cache flush performed on the decision path when the global flush flag is set.
event_build:proc_fingerprint: Initial /proc/<pid> stat used to identify the subject process and detect stale subject cache entries.
event_build:fd_stat: Stat of the fanotify file descriptor used to fingerprint the object.
evaluation:lock_wait: Time spent waiting for the rule lock before policy evaluation.
evaluation:total: Rule list evaluation loop. Lazy subject, object, trust, hash, or type lookups caused by rule fields can appear as child operation timings and also contribute to this total.
evaluation:proc_detail_lookup: On-demand subject procfs detail lookups, including /proc/<pid>/status , /proc/<pid>/exe , subject executable type, session id, and auid lookups. These can be driven lazily by rules, syslog/debug formatting, or audit metadata.
evaluation:fd_path_resolution: Path resolution for the object file descriptor.

For MIME and trust DB helper rows below, * means the driver prefix evaluation , response .

*:mime_detection:total: Object type and ELF/script detection.
*:mime_detection:fast_classification: In-house classification before libmagic fallback. This includes empty-file, ELF/script, shebang, common magic number, common text format, and device checks.
*:mime_detection:gather_elf: ELF/script header scan performed by gather_elf().
*:mime_detection:libmagic_fallback: Full libmagic fallback used when the faster type checks cannot classify the object.
evaluation:hash_ima:total: IMA digest collection from the security.ima xattr through get_ima_hash().
evaluation:hash_sha:total: SHA-style file digest collection through get_hash_from_fd2().
*:trust_db_lookup:total: Total trust database lookup through check_trust_database(). This includes update-thread lock wait, long-term LMDB read setup, the primary LMDB lookup, optional hash work requested by trust lookup, and the top-level /usr symlink retry path.
*:trust_db_lookup:lock_wait: Time spent waiting for the trust database update-thread lock before LMDB read setup. This is a future multi-decision-thread contention signal.
*:trust_db_lookup:read: LMDB read setup and trust database read work after the update-thread lock is held. This currently includes the primary lookup, optional hash work requested by trust lookup, the top-level /usr symlink retry path, and read cleanup.
response:total: Work after the rule loop has completed. This includes syslog/debug formatting, rule/source bookkeeping, decision metric updates, response selection, audit metadata preparation, fanotify response writes, and descriptor close work.
response:syslog_debug_format:total: Formatting of syslog or debug output after a decision. Lazy field lookups needed only for logging may appear as nested stage timings.
response:audit_metadata:total: Preparation of fanotify audit response metadata before the response write. When audit metadata needs object trust, trust lookup time can be nested here.
response:fanotify_write: The actual write(2) of the fanotify response to the kernel.

Notes

The notes footer reports the largest queued-time contributor when queue wait was observed, the largest helper contributor, the largest decision phase contributor, and the slowest observed row by maximum time.

For cache hit, miss, collision, eviction, subject deferral, and early subject eviction counters, see fapolicyd.metrics(5).

If adding a timing count, total, or bucket counter would overflow, fapolicyd disarms timing, records an overflow stop reason, and writes the timing report instead of allowing counters to wrap and skew the data.

Files

/run/fapolicyd/fapolicyd.timing: Manual decision timing report.
/run/fapolicyd/fapolicyd.state: Runtime state report containing timing control state.
/run/fapolicyd/fapolicyd.metrics: Runtime metrics report containing cache counters.

Referenced By

fapolicyd(8), fapolicyd-cli(8), fapolicyd.conf(5), fapolicyd.metrics(5), fapolicyd.state(5).

May 2026 Red Hat File Formats