Balancer is a service that enables automatic cluster node load balancing by redistributing virtual machines (VMs) among nodes. The VM allocation procedure is performed automatically at an interval specified in the balancer settings.

Restrictions


The balancer function is only available:

  • in the VMmanager Infrastructure version;
  • in clusters with KVM virtualization type;
  • in clusters with Switching and IP-fabric network configuration type.

Logic of operation


Balancer settings are applied at the cluster level. Different balancer settings can be specified for different clusters.

When the balancer is enabled, the interval at which it checks nodes for overload is set. By default, a node is considered overloaded if its CPU or RAM use is above 70%. If necessary, the CPU and RAM use thresholds can be changed via an API request.

By default, the balancer covers all VMs in the cluster. You can select VMs to which the balancer actions should not be applied.

The balancer migrates VMs between nodes using a live migration mechanism. The conditions under which live migration cannot be performed are specified in the article VM migration. In addition, the following restrictions apply:

  • cannot migrate between clusters;
  • cannot migrate to nodes in maintenance mode;
  • the option to reassign network interfaces when they do not match is unavailable.

The operation of the balancer consists of iterations. In each iteration, the balancer:

  1. Requests information from the statistics service on the average CPU and RAM use for the set balance check period. For example, if the administrator has set the check interval to 10 minutes, the balancer will request data for the last 10 minutes. 

    If the number of nodes in the cluster is large, it is possible that the statistics service will not have time to collect data from all nodes during the balance check period. In this case, the balancer will make decisions based on partial data. The list of nodes whose statistics were not taken into account is recorded in the service log.

  2. Compiles a list of overloaded nodes based on statistics. The higher the average CPU and RAM use on a node, the more overloaded it is considered to be. If there are no overloaded nodes, no VM migration is performed in the current iteration.
  3. Sorts the list in order from the most loaded node to the least loaded node.
  4. Compiles a list of VMs that can be migrated from the first node in the list. Excluded from the candidates for migration are:
    • VMs turned off;
    • VMs for which balancer actions are not applied;
    • VMs to which the ISO image is mounted;
    • VMs for which the snapshots were created;
    • VMs that were created or moved to the node during the last five iterations.
  5. Sorts the list in order from least loaded VM to most loaded VM. If a VM to be migrated is not found, the search is repeated on the next node in the list. If a suitable VM is not found on any node, the migration will not be performed.
  6. Selects the first VM in the list to migrate.
  7. Determines the node to which the VM can be migrated. The selection of a node takes into account:
    • the possibility of live migration of VMs;
    • whether the node will become overloaded after the VM migration.
  8. Performs VM migration to the selected node. If no suitable node is found, the next VM in the list is selected for migration. The node selection procedure is repeated for this VM. If no suitable node is found for any VM, the migration will not be performed.
  9. Waits for the VM migration to complete and records its result in the history table.
  10. Schedules the next iteration to run after a set balance check period.

In the current implementation, the balancer can migrate at most one VM per iteration.

Managing balancer


For the balancer to operate correctly, enable CPU and RAM data collection from nodes and VMs in the statistics collection settings. Read more in Managing statistics.

To manage the balancer in a cluster, enter Clusters → select a cluster → click ParametersBalancer.

Enabling and disabling balancer

To enable the balancer:

  1. Click Enable the balancer.
  2. Specify the Balance check interval in minutes.

A cluster with the balancer enabled is displayed with the icon in the cluster table.

To disable the balancer in the cluster, click Disable the balancer.

To disable the balancer for a specific VM, enter Virtual machines → select VM → click ParametersBalancerApply Balancer to this VM toggle switch. VMs with the balancer disabled are displayed with the icon in the VM table.

Section interface

Editing settings

To edit the CPU and RAM thresholds above which a node is considered overloaded:

  1. Connect to the server with the platform via SSH.
  2. Get the authorization token:

    curl -k -X POST -H "accept: application/json" -H "Content-Type: application/json" 'https://domain.com/auth/v4/public/token' -d '{"email": "admin_email", "password": "admin_pass"}'
    CODE

    domain.com — domain name or IP address of the server with the platform

    admin_email — platform administrator's email

    admin_pass — platform administrator's password

    In response, you will get the message in the form:

    Example of response in JSON

    {
      "confirmed": true,
      "expires_at": null,
      "id": "6",
      "token": "4-e9726dd9-61d9-2940-add3-914851d2cb8a"
    }
    CODE

    Save the received token value.

  3. Execute the API request:

    curl -k -H "x-xsrf-token: <token>" -X POST "https://domain.com/vm/v3/cluster/<cluster_id>/internal_edit" -d '{"balancer_config": {"high_threshold_cpu": <cpu_value>, "high_threshold_mem": <ram_value>}}'
    BASH

    <token> — authorization token

    domain.comdomain name or IP address of the server with the platform

    <cluster_id> — cluster id

    <cpu_value> — CPU use threshold, %

    <ram_value> — RAM use threshold, %

Monitoring

The Monitoring tab displays CPU and RAM use scales on the cluster nodes and a balance indicator. The closer the indicator is to the center of the scale, the better balanced the cluster is by CPU or RAM use.

If the indicator is:

  • in the left part of the scale — the cluster is rather underloaded on this resource;
  • in the right part of the scale — the cluster is rather overloaded on this resource.

The amount of indicator deviation from the center is affected by the number of overloaded nodes in the cluster. If the load of each node in the cluster is within the consumption threshold, the amount of deviation will be minimized.

Example of cluster overload

Viewing history

The History tab displays the balancer's actions to migrate VMs between cluster nodes.

Log files


The balancer actions are performed by the balancer service in the balancer container. Service log file — /var/log/balancer.log.

Logs of related services can be useful for identifying problems with the balancer:

Service

Location of logs

Purpose

vm_reader

/var/log/vm_1_reader.log file in the vm_box container

obtaining configuration data on clusters, nodes and VMs

statistic

statistic container

getting statistical data

notifier

notifier container

receiving notifications of cluster configuration updates

sending notifications to other services

carbonapi

carbonapi container

intermediate service for obtaining node and VM statistics from the clickhouse service

clickhouse

clickhouse container

storage of collected statistics