A high availability cluster (HA cluster) is a group of servers that guarantees minimum downtime of virtual machines (VMs). High availability clusters are used, for example, to support database servers, for storage of important information, and for operation of business applications. If one of the cluster servers (nodes) loses connectivity with other nodes or the connected storage, VMmanager will start the process of VM relocation:

  • move the VMs from the failed cluster node to the working nodes;
  • shut down the VMs on the failed cluster node;
  • isolate the failed cluster node.

Relocation is performed automatically without any administrator intervention.

High availability cluster requirements


In the current VMmanager version, you can create a high availability cluster under the following conditions:

  • license type — VMmanager-infrastructure;
  • virtualization type — KVM;
  • storage type — Ceph or SAN; 

    HA cluster testing with Ceph storage was not performed.

  • AlmaLinux 8 or CentOS 7 is installed on the cluster nodes;
  • the cluster includes at least three and no more than 24 nodes;
  • exactly one storage device is connected to the cluster;
  • system time on all nodes must be synchronized.

Note

The functionality of an HA cluster with the IP-fabric network configuration type has a number of limitations. For example, a server with a platform cannot be migrated to a VM in such a cluster.

Operating logic


Services used

VMmanager uses the following to manage a high availability cluster:

  • Corosync software with Kronosnet network technology;
  • own ha-agent service;
  • own hawatch microservice.

The platform runs the ha-agent service on each cluster node. The ha-agent services communicate with each other using the Corosync software. Corosync algorithms assign one of the ha-agent services as the master. In the future, the platform interacts only with this service using hawatch.

Master selection procedure

Selection of the master takes place in the following situations:

  • when high availability system is enabled in the cluster;
  • when the current master goes out of service;
  • when the HA cluster configuration is changed;
  • when the HA cluster version is updated.

For the selection to be successful, it must involve (N/2 + 1) nodes, where N is the total number of nodes in the cluster. The value (N/2 + 1) should be rounded down to an integer. For example, in a cluster with two nodes, both nodes must participate, in a cluster with 17 nodes, 9 nodes are required. If there are fewer serviceable nodes in the cluster than necessary, the selection procedure will not start. If there are more nodes than necessary, only the nodes that were ready for the procedure earlier will participate in the selection. Corosync algorithms ensure that the information on the node readiness time is the same for all cluster members.

When selecting a master, each of the selecting nodes uses a special algorithm to calculate its priority and informs the rest of the cluster members about it. The node with the highest priority is assigned as the master. After assigning a master, the cluster will start in high availability mode.

Nodes that were not ready at the start of master selection are joined to the cluster after the selection procedure is complete. When new nodes are added to the high availability cluster, the master is not reselected.

Usually the selection procedure takes about 15 seconds.

Cluster node statuses

In a high availability cluster, nodes can assume the following statuses:

  • serviceable:
    • master — the node is serviceable and has been selected as the master;
    • participant — the node is serviceable;
  • non-serviceable:
    • network isolation — the node is not accessible via the network, but the node has access to the storage;
    • storage isolation — the node has no access to the storage, but remains accessible via the network;
    • full failure — the node is unavailable via the network and has no access to the storage;
  • special:
    • excluded from HA cluster — the node is unavailable via the network, but the check IP address is available on the node;
    • network unstable — the network connectivity is lost regularly for less than 15 seconds.

 

HA cluster operating scheme

Master — master node

Slave — participant nodes

Network storage — the cluster's network storage

Management network — the network used to manage cluster nodes

Case1 — example of “network isolation” status

Case2 — example of “storage isolation” status

Case3 — example of “full failure” status

Determining node status

The ha-agent service considers a node corrupted if it has lost connectivity with other nodes in the cluster and/or connected storage. Connectivity is verified using Corosync algorithms. Additionally, cluster nodes write information about their status to a file on the storage server. Status is updated once every three seconds. If the status information is not updated, the master will identify the node as damaged.

The average time to determine the non-serviceable status is from 15 to 60 seconds.

In high availability settings, a check IP address may be specified. If connection to the cluster is lost, the node will check the availability of that IP address using the ping utility:

  • If the IP address is unavailable, the node will be isolated and the process of VM relocation will start;
  • if the IP address is available, the node will be excluded from the high availability cluster. The VMs on this node will continue to work.

If a node regularly loses network connectivity for less than 15 seconds, it gets the status "network unstable". The relocation procedure is not performed in this case.

Disaster recovery procedure

When a cluster node is determined to be failed, the ha-agent service on the node:

  1. Shuts down all VMs. If the VM could not be shut down, the node will be rebooted.
  2. Isolates the node.
  3. Transmits information about the status of the node to the master.

When the master receives information about a node failure or independently identifies a node as failed, the VM relocation procedure is started. The order in which VMs are relocated depends on the startup priorities - the higher the priority value, the sooner they will be migrated. The relocation procedure is only started for those VMs, which were selected in the high availability settings.

After restarting the node, its VMs will start only if the node has one of the serviceable statuses — "master" or "participant". Only VMs that belong to this node according to the cluster metadata will be started. This approach avoids cases of "split brain", when two VMs are connected to the same disk at the same time.

Creating a high availability cluster


To create a high availability cluster:

  1. Configure the network storage. 

    1. Configure the Ceph storage nodes. Read more in Pre-configuring Ceph.
    2. Set up a CephFS file system in the storage. An example of configuration:
      1. Create a metadata server (MDS) on the Ceph node:

        ceph-deploy --overwrite-conf mds create <node>
        CODE

        <node> — Ceph node name

      2. Create CephFS pools for data and metadata:

        ceph osd pool create cephfs_data 64 64
        
        CODE
        ceph osd pool create cephfs_metadata 64 64
        CODE
      3. Create the CephFS file system:

        ceph fs new <cephfs_name> cephfs_metadata cephfs_data
        CODE

        <cephfs_name> — file system name

    Configure iSCSI. Read more in Pre-configuring SAN.

  2. Connect the storage to the cluster. Read more in Managing cluster storages.
  3. Configure the high availability settings. Read more in Configuring high availability.

Diagnostics


Сorosync configuration files:

  • /etc/corosync/corosync.conf — general settings;
  • /etc/corosync/storage.conf — storage settings.

ha-agent service log file — /var/log/ha-agent.log.

hawatch microservice log file — /var/log/hawatch.log.