corosync-qdevice - Man Page

QDevice daemon

Synopsis

corosync-qdevice [-dfh] [-S option=value[,option2=value2,...]]

Description

corosync-qdevice is a daemon running on each node of a cluster. It provides a configured number of votes to the quorum subsystem based on a third-party arbitrator's decision. Its primary use is to allow a cluster to sustain more node failures than standard quorum rules allow.  It is recommended for clusters with an even number of nodes and highly recommended  for 2 node clusters.

Options

-d

Forcefully turn on debug information without the need to change corosync.conf. For bumping syslog messages priority to info, use this parameter twice.

-f

Do not daemonize, run in the foreground.

-h

Show short help text

-S

Set advanced settings described in its own section below. This option shouldn't be generally used because most of the options are not safe to change.

Configuration

corosync-qdevice reads its configuration from corosync.conf file.

The main configuration is within quorum.device sub-key. Each model also has its own configuration within a similarly named sub-key.

model

Specifies the model to be used. This parameter is required. corosync-qdevice is modular and is able to support multiple different models. The model basically defines what type of arbitrator is used. Currently only net is supported.

timeout

Specifies how often corosync-qdevice should call the votequorum_qdevice_poll function. It is also used by the net model to adjust its hearbeat timeout. It is recommended that you don't change this value. Default is 10000.

sync_timeout

Specifies how often corosync-qdevice should call the votequorum_qdevice_poll function during a sync phase. It is recommended that you don't change this value. Default is 30000.

votes

The number of votes provided to the cluster by qdevice. Default is (number_of_nodes - 1) or generally sum(votes_per_node) - 1.

quorum.device.heuristics subkey holds the configuration of the heuristics. Heuristics are set of commands executed locally on startup, cluster membership change, successful connect to corosync-qnetd and optionally also at regular times. Commands are executed in parallel. When all commands finish successfully (their return error code is zero) on time, heuristics have passed, otherwise they have failed. The heuristics result is sent to corosync-qnetd and there it's used in calculations to determine which partition should be quorate.

timeout

Specifies maximum time in milliseconds how long corosync-qdevice waits till the heuristics commands finish. If some command doesn't finish before the timeout, it's killed and heuristics fail. This timeout is used for heuristics executed at regular times. Default value is half of the quorum.device.timeout, so 5000.

sync_timeout

Similar to quorum.device.heuristics.timeout but used during membership changes. Default value is half of the quorum.device.sync_timeout, so 15000.

interval

Specifies interval between two regular heuristics execution. Default value is 3 * quorum.device.timeout, so 30000.

mode

Can be one of on, sync or off and specifies mode of operation of heuristics. Default is off, which means heuristics are disabled. When sync is set, heuristics are executed only during startup, membership change and when connection to corosync-qnetd is established. When heuristics should be running also on regular basis, this option should be set to on value.

exec_NAME

defines executables. NAME can be arbitrary valid cmap key name string and it has no special meaning. The value of this variable must contain a command to execute. The value is parsed (split) into arguments similarly as Bourne shell would do. Quoting is possible by using backslash and double quotes.

quorum.device.net subkey holds the configuration for model net.

tls

Can be one of on, off or required and specifies if tls should be used. on means a connection with TLS is attempted first, but if the server doesn't advertise TLS support  then non-TLS will be used. off is used then TLS is not required and it's then not even tried. This mode is the only one which doesn't need a properly initialized NSS database. required means TLS is required and if the server doesn't support TLS, qdevice will exit with error message. Default is on.

host

Specifies the IP address or host name of the qnetd server to be used. This parameter is required.

port

Specifies TCP port of qnetd server. Default is 5403.

algorithm

Decision algorithm. Can be one of the ffsplit or lms. (actually there are also test and 2nodelms, both of which are mainly for developers and shouldn't be used for production clusters). For a description of what each algorithm means and how the algorithms differ see their individual sections. Default value is ffsplit.

tie_breaker

can be one of lowest, highest or valid_node_id (number) values. It's used as a fallback if qdevice has to decide between two or more equal partitions. lowest means the partition with the lowest node id is chosen. highest means the partition with highest node id is chosen. And valid_node_id means that the partition containing the node with the given node id is chosen. Default is lowest.

connect_timeout

Timeout when corosync-qdevice is trying to connect to corosync-qnetd host. Default is 0.8 * quorum.device.timeout.

force_ip_version

can be one of 0|4|6 and forces the software to use the given IP version. 0 (default value) means IPv6 is preferred and IPv4 should be used as a fallback.

keep_active_partition_tie_breaker

Can be one of on or off and specifies if keep active partition tie breaker should be used. When this option is enabled and tie happens QNetd will prefer partition with members of previously active (quorate) partition. This is hard-coded behavior of LMS algorithm so this setting affects only FFSplit algorithm. Default is on.

Logging configuration is within the logging directive. corosync-qdevice parses and supports only debug option. The logger_subsys sub-directive can be also used if subsys is set to QDEVICE.

For corosync-qdevice to work correctly, the nodelist directive has to be used and properly configured. Also the net model requires that totem.cluster_name option is set.

Model Net TLS Configuration

For model net to work using TLS, it's necessary to create the NSS database, import Qnetd CA certificate, and get/distribute a valid client certificate.

If pcs is used (recommended) the following steps are not needed because pcs does them automatically.

corosync-qdevice-net-certutil is the tool to perform required actions semi-automatically. Please consult the help output of it and its man page. For a first time configuration it may make sense to start with the -Q option.

If TLS is not required just edit corosync.conf file and set quorum.device.net.tls to off.

Depending on configuration of NSS (stored in nss.config file usually in /etc/crypto-policies/back-ends/ directory) disabled ciphers or too short keys may be rejected. Proper solution is to regenerate NSS databases for both corosync-qnetd and corosync-qdevice daemons. As a quick workaround it's also possible to set environment variable NSS_IGNORE_SYSTEM_POLICY=1 before running corosync-qdevice daemon.

When NSS is updated it may also be needed to upgrade database into new format. There is no consensus on recommended way, but following command seems to work just fine (if qdevice sysconfdir is set to /etc)

# certutil -N -d /etc/corosync/qdevice/net/nssdb -f /etc/corosync/qdevice/net/nssdb/pwdfile.txt

Model Net Algorithms

Algorithms are used to change behavior of how corosync-qnetd provides votes to a given node/partition. Currently there are two algorithms supported.

ffsplit

This one makes sense only for clusters with an even number of nodes. It provides exactly one vote to the partition with the highest number of active nodes. If there are two exactly similar partitions, it provides its vote to the partition with higher score. The score is computed as (number_of_connected_nodes + number_of_connected_nodes_with_passed_heuristics - number_of_connected_nodes_with_failed_heuristics) If the scores are equal, the vote is provided to partition with the most clients connected to the qnetd server. If this number is also equal, then the tie_breaker is used. It is able to transition its vote if the currently active partition becomes partitioned and a non-active partition still has at least 50% of the active nodes. Because of this, a vote is not provided if the qnetd connection is not active.

To use this algorithm it's required to set the number of votes per node to 1 (default) and the qdevice number of votes has to be also 1. This is achieved by setting quorum.device.votes key in corosync.conf file to 1.

lms

Last-man-standing. If the node is the only one left in the cluster that can see the qnetd server then we return a vote.

If more than one node can see the qnetd server but some nodes can't see each other then the cluster is divided up into 'partitions' based on their ring_id and this algorithm returns a vote to the partition with highest heuristics score (computed the same way as for the ffsplit algorithm), or if there is more than 1 partition with equal scores, the largest active partition or, if there is more than 1 equal partition, the partition that contains the tie_breaker node (lowest, highest, etc). For LMS to work, the number of qdevice votes has to be set to default (so just delete quorum.device.votes key from corosync.conf).

Advanced Settings

Set by using -S option. The default value is shown in parentheses)  Options beginning with net_ prefix are specific to model net.

lock_file

Lock file location. (/var/run/corosync-qdevice/corosync-qdevice.pid)

local_socket_file

Internal IPC socket file location. (/var/run/corosync-qdevice/corosync-qdevice.sock)

local_socket_backlog

Parameter passed to listen syscall. (10)

max_cs_try_again

How many times to retry the call to a corosync function which has returned CS_ERR_TRY_AGAIN. (10)

votequorum_device_name

Name used for qdevice registration. (Qdevice)

ipc_max_clients

Maximum allowed simultaneous IPC clients. (10)

ipc_max_receive_size

Maximum size of a message received by IPC client. (4096)

ipc_max_send_size

Maximum size of a message allowed to be sent to an IPC client. (65536)

master_wins

Force enable/disable master wins. (default is model)

heuristics_ipc_max_send_buffers

Maximum number of heuristics worker send buffers. (128)

heuristics_ipc_max_send_receive_size

Maximum size of a message allowed to be send to, or received from heuristics worker. (4096)

heuristics_min_timeout

Minimum heuristics timeout accepted by client in ms. (1000)

heuristics_max_timeout

Maximum heuristics timeout accepted by client in ms. (120000)

heuristics_min_interval

Minimum heuristics interval accepted by client in ms. (1000)

heuristics_max_interval

Maximum heuristics interval accepted by client in ms. (3600000)

heuristics_max_execs

Maximum number of exec_ commands. (32)

heuristics_use_execvp

Use execvp instead of execv for executing commands. (off)

heuristics_max_processes

Maximum number of processes running at one time. (160)

heuristics_kill_list_interval

Interval between status is gathered and eventually signal is sent to processes which didn't finished on time in ms. (5000)

net_nss_db_dir

NSS database directory. (/etc/corosync/qdevice/net/nssdb)

net_initial_msg_receive_size

Initial (used during connection parameters negotiation) maximum size of the receive buffer for message (maximum allowed message size received from qnetd). (32768)

net_initial_msg_send_size

Initial (used during connection parameter negotiation) maximum size of one send buffer (message) to be sent to server. (32768)

net_min_msg_send_size

Minimum required size of one send buffer (message) to be sent to server. (32768)

net_max_msg_receive_size

Maximum allowed size of receive buffer for a message sent by server. (16777216)

net_max_send_buffers

Maximum number of send buffers. (10)

net_nss_qnetd_cn

Canonical name of qnetd server certificate. (Qnetd Server)

net_nss_client_cert_nickname

NSS nickname of qdevice client certificate. (Cluster Cert)

net_heartbeat_interval_min

Minimum heartbeat timeout accepted by client in ms. (1000)

net_heartbeat_interval_max

Maximum heartbeat timeout accepted by client in ms. (120000)

net_min_connect_timeout

Minimum connection timeout accepted by client in ms. (1000)

net_max_connect_timeout

Maximum connection timeout accepted by client in ms. (120000)

net_test_algorithm_enabled

Enable test algorithm. (if built with --enable-debug on, otherwise off)

Example

Define qdevice with net model connecting to qnetd running on qnetd.example.org host, using ffsplit algorithm. Heuristics is set to sync mode and executes two commands.

quorum {
  provider: corosync_votequorum
  device {
    votes: 1
    model: net
    net {
      tls: on
      host: qnetd.example.org
      algorithm: ffsplit
    }
    heuristics {
      mode: sync
      exec_ping: /bin/ping -q -c 1 "www.example.org"
      exec_test_txt_exists: /usr/bin/test -f /tmp/test.txt
    }
}

See Also

corosync-qdevice-tool(8) corosync-qdevice-net-certutil(8) corosync-qnetd(8) corosync.conf(5) votequorum_qdevice_poll(3)

Author

Jan Friesse

Referenced By

corosync.conf(5), corosync-qdevice-net-certutil(8), corosync-qdevice-tool(8), corosync-qnetd(8), corosync-qnetd-certutil(8), corosync-qnetd-tool(8), pcs(8), votequorum(5).

2020-10-27