The server diagnostics mode checks server equipment status, adds information on it in DCImanager and prepares the server for a new user.

Note

The MAC-addresses of the servers with 2 or more network cards can be specified only during the server diagnostics stage. This is the requirement of the DHCP-server configuration file.

Preparing for server diagnostics


To diagnose the server:

  1. Enter the server IP and MAC-addresses in DCImanager;
  2. Ensure the availability of DCImanager server for diagnostics;
  3. Set up the network boot;
  4. Connect the server to a PDU or IPMI. 
  5. Select the diagnostics template in Settings → OS templates
    1. Diag-x86_64 —  for network boot with iPXE;
    2. Diag-x86_64-noipxe  —for network boot with noiPXE.
  6. Select the interfaces the DHCP-server runs on in  Settings → Global settings → Check before releasing →  the Interfaces field.

Note

The hard drive connected to the RAID will be defined during the diagnostics process only if they are grouped into the RAID.

Server diagnostics


Manual start

Go to Main menu → Servers → Operations.


  • Operation type —  select "Run diagnostics";
  • Run diagnostics — select a diagnostics template;
  • Clear discs — select the checkbox to clear hard drives during diagnostics. Selecting this checkbox will zero first 512 Bytes of the hard drive. This option can be used only if diagnostics templates support this feature;
  • Full hard drive erase — select the checkboxSystemRescueCD to erase whole hard drives. The whole hard drive will be zeroed. It may take a few hours depending on hard drive size and speed. This option is available only by selecting "Clear discs" option;
  • Inform upon completion —  select the checkbox if you want to be informed when the operation is completed or the server becomes accessible via SSH

Auto start

Diagnostics run automatically:

  • during the server search. Learn more in the article Server search;
  • when a server is released, if the option Check before releasing is selected in Settings → Global settings. Also in the Global settings, you can set the auto diagnostics options: Clear discsFull hard drive erase, Diagnostics templates. Learn more in the article Global settings.

How it works


Server diagnostics algorithm :

  1. The system creates a block in the DHCP-server configuration file, which enables to work with the server's MAC-address
  2. The server passes authorization through DHCP.
  3. The server uploads the diagnostics template.
  4. The server check script starts.
  5. The "Server has hardware issues" status is set for the server.
  6. The system clarifies:
    1. The processor model.
    2. Amount of RAM.
    3. The presence of a hardware RAID-controller.
    4. The presence of HDDs (may not work properly if the server had a hardware raid controller)
    5. Hard drive slots.
  7. The system checks:
    1. Local connection speed.
    2. Read rate and SMART-information of HDD.
  8. If IPMI is detected, then the system configures:
    1. Network settings (IP-address, mask, gateway)
    2. New user and new password.
    3. If the Add IPMI automatically option is enabled in IPMI will be added to the server.
  9. All the information is sent to DCImanager.
  10. The server is powered off if the Power off servers upon checking option is enabled in  Settings Global settings. Otherwise, the server is rebooted in the normal course. 
  11. DCImanager processes diagnostics results:
    1. DCImanager compares that the platform corresponds to the detected server equipment:
      1. The number of processors should be more than 0, but not exceed the value specified for the platform type.
      2. The amount of RAM should be more than 0, but not exceed the value specified for the platform type.
      3. The number of HDD should be more than 0, but not exceed the value specified for the platform type.
        If the results differ from the values specified for the platform type, DCImanager creates a new platform and assigns it to the server.
    2. HDD is plugged off from the server. If the hardware RAID is found, only HHD that were added during the previous diagnostics will be plugged off. The HDD  that was specified manually will remain. Generally,  if the hardware RAID was found on the server, then DCImanager cannot receive correct HDD information.
    3. Read rate and SMART-information of HDD will be checked. Check parameters are specified in Types of equipmentHDDHDD types.
    4. Local connection speed will be checked.
    5. If sockets and scalability are not set for the CPU in Types of equipment Processors, the administrator will be asked to specify the missing data.
    6. The system checks, whether the status "Server has hardware issues" should be removed. The status will be removed, if the following requirements are met:
      1. Local connection speed is within the bounds of <LocalSpeedThreshold*Port_Speed/100> to <Port_Speed>. LocalSpeedTreshold - is a parameters in the DCImanager configuration file, in %( /usr/local/mgr5/etc/dcimgr.conf by default). The default value is 80%. For example, the default threshold for port 100 MB/sec is 80 MB/sec. Local connection speed, in this case, should be from 80 to 100 MB/sec.
      2. Hardware RAID is not present. 
      3. HDD parameters (read speed and SMART-criteria) are within the limits.

To check the last diagnostics results go to the section Main menu Servers Edit →  Diagnostic results block.

Note

If the diagnostic process is interrupted on the server, the server will have the "Server has hardware issues" status.

To remove the status after diagnostics go to the section Main menu → Servers → Edit and enter the necessary fields, which are empty. For example, if the server platform type wasn't defined, in the server edit form you will see "No platform" in the "Platform type" field, and the warning "A platform type is not selected for this server".