ioctl_xfs_health_monitor - Man Page
read filesystem health events from the kernel
Synopsis
#include <xfs/xfs_fs.h>
int ioctl(int dest_fd, XFS_IOC_HEALTH_MONITOR, struct xfs_health_monitor *arg);
Description
This XFS ioctl asks the kernel driver to create a pseudo-file from which information about adverse filesystem health events can be read. This new file will be installed into the file descriptor table of the calling process as a read-only file, and will have the close-on-exec flag set.
The specific behaviors of this health monitor file are requested via a structure of the following form:
struct xfs_health_monitor {
__u64 flags;
__u8 format;
__u8 pad[23];
};The field pad must be zero.
The field format controls the format of the event data that can be read:
- XFS_HEALTH_MONITOR_FMT_V0
Event data will be presented in discrete objects of type struct xfs_health_monitor_event. See below for more information.
The field flags control the behavior of the monitor.
- XFS_HEALTH_MONITOR_VERBOSE
Return all health events, including affirmations of healthy metadata.
Return Value
On error, -1 is returned, and errno is set to indicate the error. Otherwise, the return value is a new file descriptor.
Errors
Error codes can be one of, but are not limited to, the following:
- EEXIST
Health monitoring is already active for this filesystem.
- EPERM
The caller does not have permission to open a health monitor. Calling programs must have administrative capability, run in the initial user namespace, and the fd passed to ioctl must be the root directory of an XFS filesystem.
- EINVAL
One or more of the arguments specified is invalid.
- EFAULT
The argument could not be copied into the kernel.
- ENOMEM
There was not sufficient memory to construct the health monitor.
Event Format
Calling programs retrieve XFS health events by calling read(2) on the returned file descriptor. The read buffer must be large enough to hold at least one event object. Partial objects will not be returned; instead, a short read will occur.
Events will be returned in the following format:
struct xfs_health_monitor_event {
__u32 domain;
__u32 type;
__u64 time_ns;
union {
struct xfs_health_monitor_lost lost;
struct xfs_health_monitor_fs fs;
struct xfs_health_monitor_group group;
struct xfs_health_monitor_inode inode;
struct xfs_health_monitor_shutdown shutdown;
struct xfs_health_monitor_media media;
struct xfs_health_monitor_filerange filerange;
} e;
__u64 pad[2];
};The field time_ns records the timestamp at which the health event was generated, in units of nanoseconds since the Unix epoch.
The field pad will be zero.
The field domain indicates the scope of the filesystem affected by the event:
- XFS_HEALTH_MONITOR_DOMAIN_MOUNT
The entire filesystem is affected.
- XFS_HEALTH_MONITOR_DOMAIN_FS
Metadata concerning the entire filesystem is affected. Details are available through the fs field.
- XFS_HEALTH_MONITOR_DOMAIN_AG
Metadata concerning a specific allocation group is affected. Details are available through the group field.
- XFS_HEALTH_MONITOR_DOMAIN_RTGROUP
Metadata concerning a specific realtime allocation group is affected. Details are available through the group field.
- XFS_HEALTH_MONITOR_DOMAIN_INODE
File metadata is affected. Details are available through the inode field.
- XFS_HEALTH_MONITOR_DOMAIN_DATADEV
The main data volume is affected. Details are available through the media field.
- XFS_HEALTH_MONITOR_DOMAIN_RTDEV
The realtime volume is affected. Details are available through the media field.
- XFS_HEALTH_MONITOR_DOMAIN_LOGDEV
The external log is affected. Details are available through the media field.
- XFS_HEALTH_MONITOR_DOMAIN_FILERANGE
File data is affected. Details are available through the filerange field.
The field type indicates what was affected by a health event:
The following types apply to events from the MOUNT domain.
- XFS_HEALTH_MONITOR_TYPE_RUNNING
This filesystem health monitor is now running.
- XFS_HEALTH_MONITOR_TYPE_LOST
Health events were lost. Details are available through the lost field.
- XFS_HEALTH_MONITOR_TYPE_UNMOUNT
The filesystem is being unmounted.
- XFS_HEALTH_MONITOR_TYPE_SHUTDOWN
The filesystem has shut down due to problems. Details are available through the shutdown field.
The following three types apply to events from the FS, AG, RTGROUP, and INODE domains.
- XFS_HEALTH_MONITOR_TYPE_SICK
Filesystem metadata has been scanned by online fsck and found to be corrupt.
- XFS_HEALTH_MONITOR_TYPE_CORRUPT
A metadata corruption problem was encountered during a filesystem operation outside of fsck.
- XFS_HEALTH_MONITOR_TYPE_HEALTHY
Filesystem metadata has either been scanned by online fsck and found to be in good condition, or it has been repaired to good condition.
The following type applies to events from the DATADEV, RTDEV, and LOGDEV domains.
- XFS_HEALTH_MONITOR_TYPE_MEDIA_ERROR
A media error has been observed on one of the storage devices that can be attached to an XFS filesystem.
The following types apply to events from the FILERANGE domain.
- XFS_HEALTH_MONITOR_TYPE_BUFREAD
An attempt to read (or readahead) from a file failed with an I/O error.
- XFS_HEALTH_MONITOR_TYPE_BUFWRITE
An attempt to write dirty data to storage failed with an I/O error.
- XFS_HEALTH_MONITOR_TYPE_DIOREAD
A direct read of file data from storage failed with an I/O error.
- XFS_HEALTH_MONITOR_TYPE_DIOWRITE
A direct write of file data to storage failed with an I/O error.
- XFS_HEALTH_MONITOR_TYPE_DATALOST
A latent media error was discovered on the storage backing part of this file.
The union e contains further details about the health event:
The kernel will use no more than 32KiB of memory per monitoring file to queue health events. If this limit is exceeded, an event will be generated to describe how many events were lost:
struct xfs_health_monitor_lost {
__u64 count;
};The count field records the number of events lost.
If whole-filesystem metadata experiences a health event, the exact type of that metadata is recorded as follows:
struct xfs_health_monitor_fs {
__u32 mask;
};The mask field will contain XFS_FSOP_GEOM_SICK_* flags that are documented in the ioctl_xfs_fsgeometry(2) manual page.
If an allocation group (realtime or data) experiences a health event, the exact type and location of the metadata is recorded as follows:
struct xfs_health_monitor_group {
__u32 mask;
__u32 gno;
};The mask field will contain XFS_AG_SICK_* flags that are documented in the ioctl_xfs_ag_geometry(2) manual page, or the XFS_RTGROUP_SICK_* flags that are documented by the ioctl_xfs_rtgroup_geometry(2) manual page.
The gno field will contain the group number.
If a file experiences a health event, the exact type and handle to the file is recorded as follows:
struct xfs_health_monitor_inode {
__u32 mask;
__u32 gen;
__u64 ino;
};The mask field will contain XFS_BS_SICK_* flags that are documented by the ioctl_xfs_bulkstat(2) manual page.
The ino and gen fields describe a handle to the affected file.
If the filesystem shuts down abnormally, the exact reasons are recorded as follows:
struct xfs_health_monitor_shutdown {
__u32 reasons;
};The reasons field is a combination of the following values:
- XFS_HEALTH_SHUTDOWN_META_IO_ERROR
Metadata I/O errors were encountered.
- XFS_HEALTH_SHUTDOWN_LOG_IO_ERROR
Log I/O errors were encountered.
- XFS_HEALTH_SHUTDOWN_FORCE_UMOUNT
The filesystem was forcibly shut down by an administrator.
- XFS_HEALTH_SHUTDOWN_CORRUPT_INCORE
In-memory metadata are corrupt.
- XFS_HEALTH_SHUTDOWN_CORRUPT_ONDISK
On-disk metadata are corrupt.
- XFS_HEALTH_SHUTDOWN_DEVICE_REMOVED
Storage devices were removed.
If a media error is discovered on the storage device, the exact location is recorded as follows:
struct xfs_health_monitor_media {
__u64 daddr;
__u64 bbcount;
};The daddr and bbcount fields describe the range of the storage that were lost. Both are provided in units of 512-byte blocks.
If a problem is discovered with regular file data, the handle of the file and the exact range of the file are recorded as follows:
struct xfs_health_monitor_filerange {
__u64 pos;
__u64 len;
__u64 ino;
__u32 gen;
__u32 error;
};The ino and gen fields describe a handle to the affected file. The pos and len fields describe the range of the file data that are affected. Both are provided in units of bytes.
The error field describes the error that occurred. See the errno(3) manual page for more information.
Conforming to
This API is specific to XFS filesystem on the Linux kernel.
See Also
Referenced By
ioctl_xfs_health_fd_on_monitored_fs(2), ioctl_xfs_verify_media(2).