This article contains commands to identify causes of incorrect operation of the platform, as well as commands to restart containers and some services to restore their work.

Some commands may require superuser privileges.

General diagnostics


The section contains a list of commands that you can run as a first diagnostic step. These commands will eliminate basic problems and reduce troubleshooting time.

Operating system (OS) version

If the master or server OS for the node is not supported by the platform, the installation or connection will end with an error. To determine the OS version, run the command:

cat /etc/*release
CODE

See VMmanager documentation for a list of supported operating systems:

Server date and time

During periodic synchronization with the license server, the date and time are checked. If an incorrect date or time is set on the server with the platform, the platform will be blocked or its operation will be incorrect. To determine the current date and time on the server, run the command:

date -R
CODE

Disk space and RAM usage

For the correct operation of the platform, free disk space and RAM must meet the requirements specified in the Server requirements  article in the VMmanager documentation. In addition, if there is not enough free space or RAM, virtual machines and backups will not be created. To check the amount of used disk space and file system data, run the command:

df -hT
CODE

To check the information about RAM, run the command:

free -h
CODE

Inodes

Inodes is the structure in which file metadata is stored. The platform will not work correctly if the server is out of inodes, even if there is free disk space. Typical behavior when there is a shortage of inodes includes reduced performance, inability to create files, incorrect output of information in the platform interface. To check the number and proportion of inodes used in the filesystem, run the command:


df -i
CODE

Troubleshooting Linux systems


Examine the system logs to troubleshoot your Linux system. Below are tools for troubleshooting and searching for errors in the server logs. For more information about the product logs, see the VMmanager documentation Platform Logs article.

Circular kernel buffer

One way to find out if the system is working incorrectly is to look at the kernel log using the dmesg utility.  The kernel records all events in a circular buffer while the system is booting and running. dmesg will allow you to examine the kernel messages and identify hardware-related problems. To search for problems, run this command:

dmesg | grep -i -E 'error|failed|critical|bug|panic'
CODE

The journalctl utility

You can use the journalctl utility to analyze the logs and detect system problems. The utility displays the logs of the Linux system services. To detect abnormal behavior of a Linux system, run the command:

journalctl | grep -i -E 'error|failed|critical|bug|panic'
CODE

CPU


CPU architecture

Reduced performance of a platform node may be due to the technical characteristics of the CPU. In addition, information about the CPU architecture can be useful in diagnosing fine-tuning problems with virtual machines. To display the information, run the command:

lscpu
CODE

CPU count on nodes

VMmanager 6 licensing only takes into account physical cores. To specify the exact CPU value when ordering a license, count the number of cores on the nodes with the command:

dmidecode --type processor | grep -i "core count" | grep -Eo "[0-9]+?"
CODE

CPU count is also necessary when platform control is blocked due to the CPU cores number on node exceeds license limit error. The error occurs if the number of physical cores in the connected node exceeds the limit. To check if the limit is exceeded, verify the output of the dmidecode command against the VMmanager license settings.

System load

If there is an increased load on the system, the performance of the nodes will decrease. With the command for CPU count, you can determine the load on the system. To do this, compare the number of physical cores with the Load Average parameter. Run the command:

uptime
CODE

The Load Average parameter value must be less than the number of cores obtained by the command for CPU count.

Virtual machines (VMs)


VM status

You can use the virsh utility to display the status of all virtual machines for troubleshooting.

To execute virsh commands, first connect to the node:

docker exec --tty --interactive vm_box ssh -i /opt/ispsystem/vm/etc/.ssh/vmmgr.1 <IP_address> -p 22
CODE

<IP_address> — node IP address

-p 22 — SSH node port

To display the status of all VMs, run the command:

virsh list --all
CODE

To display the status of a particular VM, run the command:

virsh list --all | grep <название ВМ>
CODE

The libvirt virtualization daemon


Libvirt is a toolkit for virtualization management. Without Libvirt (libvirtd) running, the platform will not work correctly. Check the status of the service with the command:

systemctl status libvirtd
CODE

If the service is stopped or inactive, start it:

systemctl start libvirtd
CODE

If libvirt is not installed, the output of the libvirtd systemctl status command will contain a message:

Unit libvirtd.service could not be found
CODE

In this case:

  1.  Install libvirt manually with an OS-dependent command:

    For RHEL-based operating systems (CentOS, AlmaLinux)

    yum install libvirt
    CODE

    For Deb-based operating systems (Ubuntu, Astra Linux)

    apt install libvirt
    CODE
  2. Start the service:

    systemctl start libvirtd
    CODE
  3. Add libvirtd to the autostart:

    systemctl enable libvirtd
    CODE
  4. Re-check the status of the service to make sure that it is running.

Containers


The docker service

The Docker daemon is a service that manages containers as well as other docker entities: networks, storage and images. If this service is not running, the platform will not work. To check the status of docker, run the command:

systemctl status docker
CODE

 If the service is stopped, start it with the command:

systemctl start docker
CODE

 To check the version of docker, run the command:

docker version
CODE

Restarting the docker service

If the docker service is not working properly, restarting the service helps fix it. To do this, run the command:

systemctl restart docker.service
CODE

Перезапуск службы помогает исправить ряд ошибок, которые могут возникнуть при запуске, перезапуске или выключении платформы:

  • error while removing network: network <network_name> has active endpoints

    Пример ошибки

    error while removing network: network vm_vm_box_net id 88888ggggg has active endpoints
    exit status 1 
    CODE
  • ERROR: for <service_name> Cannot start service <service_name>: endpoint with name <container_name> already exists in network <network_name> 

    Пример ошибки

    ERROR: for auth_back Cannot start service auth_back: endpoint with name vm_auth_back_1 already exists in network vm_vm_box_net
    CODE

    In the above example, the vm_auth_back_1 container failed to start.

  • ERROR: for input Cannot start service input: driver failed programming external connectivity on endpoint vm_input_1

To correct the above errors, restart the docker service with the command above.

If the problem could not be resolved, contact technical support through your client area under SupportSupport ticketsAdd.

Status of containers

To diagnose possible problems, display a list of containers and their statuses. To display a list of all running containers, run the command:

docker ps
CODE

To get a list of all containers, including stopped ones, run the command:

docker ps -a
CODE

If you want to check the status of a specific container, run the command:

docker ps | grep <container_name>
CODE

Restarting the container

If the container does not work correctly, restarting it may help. To do this, run the command:

docker restart <container_name>
CODE

Restarting taskmgr

If the Task Manager does not work correctly, for example, there are frozen tasks, restarting the taskmgr container may help. To do this, run the command:

docker exec -it vm_box supervisorctl restart taskmgr
CODE

Restarting monitor

You may need to restart the monitoring service if no statistics are displayed on the nodes. To do this, run the command:

docker exec -it vm_box supervisorctl restart monitor
CODE

Logging

To analyze a container's events, examine its log. To display the last 100 lines of the container log, run the command:

docker logs --tail 100 <container_name>
CODE

Firewall


If there are no rules for the docker service in the firewall, there may be problems with the platform and network. The necessary rules are created automatically when you start the docker service, we do not recommend that you modify or delete them manually.

Service status and configuration

To check the status of the firewall service, run the command depending on the OS:

For Ubuntu, Astra Linux

systemctl status nftables
CODE

For CentOS, AlmaLinux

systemctl status firewalld
CODE

To display the service configuration, run the command depending on the operating system:

For Ubuntu, Astra Linux

nft list ruleset
CODE

For CentOS, AlmaLinux

firewall-cmd --list-ports
CODE

Restarting the service

Restarting the service is necessary if it does not work correctly, as well as to restore the default rules. To restart the service, run the command:

For CentOS, AlmaLinux

systemctl restart firewalld.service
CODE


For Ubuntu, Astra Linux

systemctl restart nftables.service
CODE

To restore the default rules:

  1. Restart the firewall service with one of the commands presented above.
  2. Restart docker with the command:

    systemctl restart docker.service
    CODE
  3. Restart the platform with the command from the section Restarting the platform of this article.

Searching for information in the database


There are potential risks involved in tampering with the DB. We do not recommend making manual edits to the database, as it can disrupt the correct operation of the platform.

Any actions with the database should be performed only after backing up the platform. 

Using queries to the database, you can see information about the state of VMs, nodes and other platform objects. Below is a list of queries for retrieving data from the database.

To run queries, connect to the MySQL container:

docker exec -it mysql bash -c "mysql isp -p\$MYSQL_ROOT_PASSWORD"
CODE

VM info

Information about the VM will display all its status parameters, internal name, and node data.

To get information about the virtual machine, run the query:

select * from vm_host where id=<id_vm>\G;
CODE

<id_vm> —  virtual machine ID

To display information about the node and the internal VM name, run the query:

select id,internal_name,node from vm_host where id=<id_vm>\G;
CODE

<id_vm> — virtual machine ID

Information about the node

Information about the node will display the selected node parameters.

To check the information about the node, run the query:

select id,name,ip_addr,ssh_port from vm_node where id=<id_node>;
CODE

<id_node> — node ID

To check the network on the node, run the query:

select * from vm_node_interfaces where node=<id_node> \G;
CODE

<id_node> — node ID

VM virtual disks

Information about the virtual disk will help to diagnose problems associated with it. For example, the virt-resize: error, which can occur if the value of expand_part (partition to expand) and the size of the virtual disk in the database is incorrect.

To view full information about the disk, run this query:

select * from vm_disk where name = 'example_name' \G;
CODE

example_name — the name of the virtual disk

To check the actual disk size, run the command on the node:

virsh domblkinfo --human 1111_example_name vda
CODE

1111_example_name — virtual machine ID_disk name

Backups

By querying the database, you can check the backup schedule to identify possible problems. To get the information, run the query:

select * from vm_schedule;
CODE