System unit to re-configure specific DRBD aspects




If a disk fails or a replication link goes down and as a consequence DRBD loses "quorum" or access to good data, DRBD can be configured to either suspend-io or return io-errors.

io-errors allow file systems and applications to notice and stop themselves or be unmounted. The assumption is that some other node(s) will still have good data and some cluster manager or other monitoring entity will notice the situation and try to bring up services there.

suspend-io allows the problematic incident to be hidden from file systems and applications, they would just block until access to good data is restored. This can be useful to cope with supposedly short "network hickup" like incidents without causing service restarts.

But if IO is "suspended" (blocked, frozen, also known as "un-interruptible sleep" or "D state"), applications cannot be killed, file systems cannot be unmounted.

If some other node was told to take over services meanwhile, we need to demote DRBD on the "frozen" node before we can re-integrate it.

In these scenarios it may be useful to configure DRBD for suspend-io during normal operation, so it would mask intermittent problems, but if services had been taken over by some other partition of nodes in the storage cluster, reconfigure for io-error, to be able to bring down services and unmount file systems before trying to re-integrate this node.

This service reconfigures RESNAME for suspend-io when started and for io-error when stopped.

You should test a lot and maybe talk to LINBIT support before using this.

See also the DRBD User’s Guide[1]


LINBIT HA Solutions GmbH https://linbit.com



DRBD User’s Guide


2022-01-16 drbd-utils DRBD Manual