debuginfod man page

debuginfod — debuginfo-related http file-server daemon

Synopsis

debuginfod [OPTION]... [PATH]...

Description

debuginfod serves debuginfo-related artifacts over HTTP.  It periodically scans a set of directories for ELF/DWARF files and their associated source code, as well as RPM files containing the above, to build an index by their buildid.  This index is used when remote clients use the HTTP webapi, to fetch these files by the same buildid.

If a debuginfod cannot service a given buildid artifact request itself, and it is configured with information about upstream debuginfod servers, it queries them for the same information, just as debuginfod-find would.  If successful, it locally caches then relays the file content to the original requester.

If the -F option is given, each listed PATH creates a thread to scan for matching ELF/DWARF/source files under the given physical directory.  Source files are matched with DWARF files based on the AT_comp_dir (compilation directory) attributes inside it.  Duplicate directories are ignored.  You may use a file name for a PATH, but source code indexing may be incomplete; prefer using a directory that contains the binaries.  Caution: source files listed in the DWARF may be a path anywhere in the file system, and debuginfod will readily serve their content on demand.  (Imagine a doctored DWARF file that lists /etc/passwd as a source file.)  If this is a concern, audit your binaries with tools such as:

% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^File.name.table/p'
or
% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^Line.number/p'
or even use debuginfod itself:
% debuginfod -vvv -d :memory: -F BINARY 2>&1 | grep 'recorded.*source'
^C

If the -R option is given each listed PATH creates a thread to scan for ELF/DWARF/source files contained in matching RPMs under the given physical directory.  Duplicate directories are ignored.  You may use a file name for a PATH, but source code indexing may be incomplete; prefer using a directory that contains normal RPMs alongside debuginfo/debugsource RPMs.  Because of complications such as DWZ-compressed debuginfo, may require two scan passes to identify all source code.  Source files for RPMs are only served from other RPMs, so the caution for -F does not apply.

If no PATH is listed, or neither -F nor -R option is given, then debuginfod will simply serve content that it scanned into its index in previous runs: the data is cumulative.

File names must match extended regular expressions given by the -I option and not the -X option (if any) in order to be considered.

Options

-F

Activate ELF/DWARF file scanning threads.  The default is off.

-R

Activate RPM file scanning threads.  The default is off.

-d FILE --database=FILE

Set the path of the sqlite database used to store the index.  This file is disposable in the sense that a later rescan will repopulate data.  It will contain absolute file path names, so it may not be portable across machines.  It may be frequently read/written, so it should be on a fast filesytem.  It should not be shared across machines or users, to maximize sqlite locking performance.  The default database file is $HOME/.debuginfod.sqlite.

-D SQL --ddl=SQL

Execute given sqlite statement after the database is opened and initialized as extra DDL (SQL data definition language).  This may be useful to tune performance-related pragmas or indexes.  May be repeated.  The default is nothing extra.

-p NUM --port=NUM

Set the TCP port number on which debuginfod should listen, to service HTTP requests.  Both IPv4 and IPV6 sockets are opened, if possible. The webapi is documented below.  The default port number is 8002.

-I REGEX --include=REGEX -X REGEX --exclude=REGEX

Govern the inclusion and exclusion of file names under the search paths.  The regular expressions are interpreted as unanchored POSIX extended REs, thus may include alternation.  They are evaluated against the full path of each file, based on its realpath(3) canonicalization.  By default, all files are included and none are excluded.  A file that matches both include and exclude REGEX is excluded.  (The contents of RPM files are not subject to inclusion or exclusion filtering: they are all processed.)

-t SECONDS --rescan-time=SECONDS

Set the rescan time for the file and RPM directories.  This is the amount of time the scanning threads will wait after finishing a scan, before doing it again.  A rescan for unchanged files is fast (because the index also stores the file mtimes).  A time of zero is acceptable, and means that only one initial scan should performed.  The default rescan time is 300 seconds.  Receiving a SIGUSR1 signal triggers a new scan, independent of the rescan time (including if it was zero).

-g SECONDS --groom-time=SECONDS

Set the groom time for the index database.  This is the amount of time the grooming thread will wait after finishing a grooming pass before doing it again.  A groom operation quickly rescans all previously scanned files, only to see if they are still present and current, so it can deindex obsolete files.  See also the Data Management section.  The default groom time is 86400 seconds (1 day).  A time of zero is acceptable, and means that only one initial groom should be performed.  Receiving a SIGUSR2 signal triggers a new grooming pass, independent of the groom time (including if it was zero).

-G

Run an extraordinary maximal-grooming pass at debuginfod startup. This pass can take considerable time, because it tries to remove any debuginfo-unrelated content from the RPM-related parts of the index. It should not be run if any recent RPM-related indexing operations were aborted early.  It can take considerable space, because it finishes up with an sqlite "vacuum" operation, which repacks the database file by triplicating it temporarily.  The default is not to do maximal-grooming.  See also the Data Management section.

-c NUM --concurrency=NUM

Set the concurrency limit for all the scanning threads.  While many threads may be spawned to cover all the given PATHs, only NUM may concurrently do CPU-intensive operations like parsing an ELF file or an RPM.  The default is the number of processors on the system; the minimum is 1.

-L

Traverse symbolic links encountered during traversal of the PATHs, including across devices - as in find -L.  The default is to traverse the physical directory structure only, stay on the same device, and ignore symlinks - as in find -P -xdev.  Caution: a loops in the symbolic directory tree might lead to infinite traversal.

-v

Increase verbosity of logging to the standard error file descriptor. May be repeated to increase details.  The default verbosity is 0.

Webapi

debuginfod's webapi resembles ordinary file service, where a GET request with a path containing a known buildid results in a file. Unknown buildid / request combinations result in HTTP error codes. This file service resemblance is intentional, so that an installation can take advantage of standard HTTP management infrastructure.

There are three requests.  In each case, the buildid is encoded as a lowercase hexadecimal string.  For example, for a program /bin/ls, look at the ELF note GNU_BUILD_ID:

% readelf -n /bin/ls | grep -A4 build.id
Note section [ 4] '.note.gnu.buildid' of 36 bytes at offset 0x340:
Owner          Data size  Type
GNU                   20  GNU_BUILD_ID
Build ID: 8713b9c3fb8a720137a4a08b325905c7aaf8429d

Then the hexadecimal BUILDID is simply:

8713b9c3fb8a720137a4a08b325905c7aaf8429d

/buildid/BUILDID/debuginfo

If the given buildid is known to the server, this request will result in a binary object that contains the customary .*debug_* sections.  This may be a split debuginfo file as created by strip, or it may be an original unstripped executable.

/buildid/BUILDID/executable

If the given buildid is known to the server, this request will result in a binary object that contains the normal executable segments.  This may be a executable stripped by strip, or it may be an original unstripped executable.  ET_DYN shared libraries are considered to be a type of executable.

/buildid/BUILDID/source/SOURCE/FILE

If the given buildid is known to the server, this request will result in a binary object that contains the source file mentioned.  The path should be absolute.  Relative path names commonly appear in the DWARF file's source directory, but these paths are relative to individual compilation unit AT_comp_dir paths, and yet an executable is made up of multiple CUs.  Therefore, to disambiguate, debuginfod expects source queries to prefix relative path names with the CU compilation-directory, followed by a mandatory "/".

Note: contrary to RFC 3986, the client should not elide ../ or /./ or extraneous /// sorts of path components in the directory names, because if this is how those names appear in the DWARF files, that is what debuginfod needs to see too.

For example:

#include <stdio.h>/buildid/BUILDID/source/usr/include/stdio.h
/path/to/foo.c/buildid/BUILDID/source/path/to/foo.c

/metrics

This endpoint returns a Prometheus formatted text/plain dump of a variety of statistics about the operation of the debuginfod server. The exact set of metrics and their meanings may change in future versions.  Caution: configuration information (path names, versions) may be disclosed.

Data Management

debuginfod stores its index in an sqlite database in a densely packed set of interlinked tables.  While the representation is as efficient as we have been able to make it, it still takes a considerable amount of data to record all debuginfo-related data of potentially a great many files.  This section offers some advice about the implications.

As a general explanation for size, consider that debuginfod indexes ELF/DWARF files, it stores their names and referenced source file names, and buildids will be stored.  When indexing RPMs, it stores every file name of or in an RPM, every buildid, plus every source file name referenced from a DWARF file.  (Indexing RPMs takes more space because the source files often reside in separate subpackages that may not be indexed at the same pass, so extra metadata has to be kept.)

Getting down to numbers, in the case of Fedora RPMs (essentially, gzip-compressed cpio files), the sqlite index database tends to be from 0.5% to 3% of their size.  It's larger for binaries that are assembled out of a great many source files, or packages that carry much debuginfo-unrelated content.  It may be even larger during the indexing phase due to temporary sqlite write-ahead-logging files; these are checkpointed (cleaned out and removed) at shutdown.  It may be helpful to apply tight -I or -X regular-expression constraints to exclude files from scanning that you know have no debuginfo-relevant content.

As debuginfod runs, it periodically rescans its target directories, and any new content found is added to the database.  Old content, such as data for files that have disappeared or that have been replaced with newer versions is removed at a periodic grooming pass. This means that the sqlite files grow fast during initial indexing, slowly during index rescans, and periodically shrink during grooming. There is also an optional one-shot maximal grooming pass is available.  It removes information debuginfo-unrelated data from the RPM content index such as file names found in RPMs ("rpm sdef" records) that are not referred to as source files from any binaries find in RPMs ("rpm sref" records).  This can save considerable disk space.  However, it is slow and temporarily requires up to twice the database size as free space.  Worse: it may result in missing source-code info if the RPM traversals were interrupted, so the not all source file references were known.  Use it rarely to polish a complete index.

You should ensure that ample disk space remains available.  (The flood of error messages on -ENOSPC is ugly and nagging.  But, like for most other errors, debuginfod will resume when resources permit.)  If necessary, debuginfod can be stopped, the database file moved or removed, and debuginfod restarted.

sqlite offers several performance-related options in the form of pragmas.  Some may be useful to fine-tune the defaults plus the debuginfod extras.  The -D option may be useful to tell debuginfod to execute the given bits of SQL after the basic schema creation commands.  For example, the "synchronous", "cache_size", "auto_vacuum", "threads", "journal_mode" pragmas may be fun to tweak via -D, if you're searching for peak performance.  The "optimize", "wal_checkpoint" pragmas may be useful to run periodically, outside debuginfod.  The default settings are performance- rather than reliability-oriented, so a hardware crash might corrupt the database. In these cases, it may be necessary to manually delete the sqlite database and start over.

As debuginfod changes in the future, we may have no choice but to change the database schema in an incompatible manner.  If this happens, new versions of debuginfod will issue SQL statements to drop all prior schema & data, and start over.  So, disk space will not be wasted for retaining a no-longer-useable dataset.

In summary, if your system can bear a 0.5%-3% index-to-RPM-dataset size ratio, and slow growth afterwards, you should not need to worry about disk space.  If a system crash corrupts the database, or you want to force debuginfod to reset and start over, simply erase the sqlite file before restarting debuginfod.

Security

debuginfod does not include any particular security features. While it is robust with respect to inputs, some abuse is possible.  It forks a new thread for each incoming HTTP request, which could lead to a denial-of-service in terms of RAM, CPU, disk I/O, or network I/O. If this is a problem, users are advised to install debuginfod with a HTTPS reverse-proxy front-end that enforces site policies for firewalling, authentication, integrity, authorization, and load control.  The /metrics webapi endpoint is probably not appropriate for disclosure to the public.

When relaying queries to upstream debuginfods, debuginfod does not include any particular security features.  It trusts that the binaries returned by the debuginfods are accurate.  Therefore, the list of servers should include only trustworthy ones.  If accessed across HTTP rather than HTTPS, the network should be trustworthy.  Authentication information through the internal libcurl library is not currently enabled.

Environment Variables

DEBUGINFOD_URLS

This environment variable contains a list of URL prefixes for trusted debuginfod instances.  Alternate URL prefixes are separated by space. Avoid referential loops that cause a server to contact itself, directly or indirectly - the results would be hilarious.

DEBUGINFOD_TIMEOUT

This environment variable governs the timeout for each debuginfod HTTP connection.  A server that fails to provide at least 100K of data within this many seconds is skipped. The default is 90 seconds.  (Zero or negative means "no timeout".)

DEBUGINFOD_CACHE_PATH

This environment variable governs the location of the cache where downloaded files are kept.  It is cleaned periodically as this program is reexecuted.  The default is $HOME/.debuginfod_client_cache.

Files

$HOME/.debuginfod.sqlite

Default database file.

$HOME/.debuginfod_client_cache

Default cache directory for content from upstream debuginfods.

See Also

debuginfod-find(1) sqlite3(1) https://prometheus.io/docs/instrumenting/exporters/

Referenced By

debuginfod-find(1), debuginfod_find_debuginfo(3), warning::debuginfo.7stap(7).