Building HA Primary/Secondary Cluster with HAVIP + Keepalived

Last updated: 2021-08-27 14:50:01

    This document describes how to use Keepalived with HAVIP to build a high availability primary/secondary cluster in the Tencent Cloud VPC.

    Note:

    HAVIP is currently in beta testing period. Switching between primary/secondary servers may take 10 seconds. To try it out, please apply to be a beta user.

    Basic Principle

    Typically, a high availability primary/secondary cluster consists of two servers: an active primary server and a standby secondary server. The two servers share the same VIP (virtual IP) which is only valid for the primary server. When the primary server fails, the secondary server will take over the VIP to continue providing services. This mode is widely used in MySQL source/replica switch and Ngnix web access.

    Keepalived is a VRRP-based high availability software that can be used to build a high availability primary/secondary cluster among VPC-based CVMs. To use Keepalived, first complete its configuration in the keepalived.conf file.

    • In traditional physical networks, the primary/secondary status can be negotiated with Keepalived’s VRRP protocol. The primary device periodically sends free-of-charge ARP messages to purge the MAC table or terminal ARP table of the uplink exchange to trigger the VIP migration to the primary device.
    • In a Tencent Cloud VPC, a high availability primary/secondary cluster can also be implemented by deploying Keepalived on CVMs, with the following differences:
      • The VIP must be a HAVIP applied for from Tencent Cloud.
      • HAVIP is subnet-sensitive and can only be bound to a server under the same subnet through announcement.

    Notes

    • We recommend VRRP communications in unicast mode.

    • We recommend that you use Keepalived 1.2.24 or later versions.

    • Ensure that the garp parameters have been configured. Because Keepalived relies on ARP messages to update the IP address, these configurations ensure that the primary device always sends ARP messages for the communication.

      garp_master_delay 1
      garp_master_refresh 5
      
    • Configure a unique VRRP router ID for each primary/secondary cluster in the VPC.

    • Do not use the strict mode. Ensure the “vrrp_strict” configurations have been deleted.

    • Control the number of HAVIPs bound to a single ENI to be no more than 5. If you need to use multiple VIPs, add or modify vrrp_garp_master_repeat 1 in the “global_defs” section of the Keepalived configuration file.

    • Specify the adver_int parameter properly to balance anti-network jitter and disaster recovery speed. If the advert_int parameter is set too small, frequent switchover and temporary active-active (split brain) may occur in case of network jitter. If the advert_int parameter is set too large, it takes a long time for primary-secondary switching to take place after the primary server fails, which cause long service interruption. Please fully assess the impact of the active-active (split brain) status on your businesses.

    • Set the interval parameter in the specific execution item of track_script script (such as checkhaproxy) to a larger value, avoiding the FAULT status caused by script execution timeout.

    • Optional: be aware of increased disk usage due to log printing. This can be solved using logrotate or other tools.

    Directions

    Note:

    This document uses the following environments as an example. Please replace with your actual configurations.

    • Primary CVM: HAVIP-01, 172.16.16.5
    • Secondary CVM: HAVIP-02, 172.16.16.6
    • HAVIP: 172.16.16.12
    • EIP: 81.71.14.118
    • Image: CentOS 7.6 64-bit

    Step 1: apply for a VIP

    1. Log in to the VPC console.
    2. Select IP and ENI > HAVIP in the left sidebar to enter the HAVIP management page.
    3. Select the relevant region on the HAVIP management page and click Apply.
    4. In the pop-up dialog box, enter the name, select a VPC and a subnet for the HAVIP, and click OK.
      Note:

      The IP address of the HAVIP can be automatically assigned or manually specified. If you choose to enter an IP address, make sure that the entered private IP address is within the subnet IP range and is not a reserved IP address of the system. For example, if the subnet IP range is 10.0.0.0/24, the entered private IP address should be within 10.0.0.2 - 10.0.0.254.


      Then you can see the HAVIP you applied for.

    Step 2: install Keepalived (version 1.2.24 or later) on primary and secondary CVMs

    This document uses CentOS 7.6 as an example to install Keepalived.

    1. Run the following command to verify whether the Keepalived version meets the requirements.

      yum list keepalived
      
    2. Install the software package using the yum command.

      yum install -y keepalived
      
    3. Install the software package using the source code.

      tar zxvf keepalived-1.2.24.tar.gz
      cd keepalived-1.2.24
      ./configure --prefix=/
      make; make install
      chmod +x /etc/init.d/keepalived   // Prevent occurrence of env: /etc/init.d/keepalived: Permission denied
      

    Step 3: configure Keepalived, and bind HAVIP to the primary and secondary CVMs.

    1. Log in to the primary CVM HAVIP-01 and run vim /etc/keepalived/keepalived.conf to modify its configurations.

      Note

      In this example, HAVIP-01 and HAVIP-02 are configured with the same weight. Both are in the BACKUP status, with a priority of 100. This will reduce the number of switchovers caused by network jitter.

       ! Configuration File for keepalived
      global_defs {
       notification_email {
         acassen@firewall.loc
         failover@firewall.loc
         sysadmin@firewall.loc
       }
       notification_email_from Alexandre.Cassen@firewall.loc
       smtp_server 192.168.200.1   
       smtp_connect_timeout 30
       router_id LVS_DEVEL
       vrrp_skip_check_adv_addr
       vrrp_garp_interval 0
       vrrp_gna_interval 0
      }
      vrrp_script checkhaproxy
      {
         script "/etc/keepalived/do_sth.sh"   # Check whether the service process runs normally. Replace “do_sth.sh” with your actual script name. Run it as needed.
         interval 5
      }
      vrrp_instance VI_1 {
      # Select proper parameters for the primary and secondary CVMs.
      state BACKUP              # Set the initial status to `Backup`
        interface eth0          # The ENI such as `eth0` used to bind a VIP  
        virtual_router_id 51        # The`virtual_router_id` value for the cluster
        nopreempt                   # Non-preempt mode
        # preempt_delay 10      # Effective only when `state` is `MASTER`    
        priority 100            # Configure the same weight for the two devices
        advert_int 5        
        authentication {
            auth_type PASS
            auth_pass 1111
        }
        unicast_src_ip 172.16.16.5 # Private IP address of the local device
        unicast_peer {
            172.16.16.6                # IP address of the peer device
        }
        virtual_ipaddress {
            172.16.16.12              # HAVIP 
        }
        notify_master "/etc/keepalived/notify_action.sh MASTER"
        notify_backup "/etc/keepalived/notify_action.sh BACKUP"
        notify_fault "/etc/keepalived/notify_action.sh FAULT"
        notify_stop "/etc/keepalived/notify_action.sh STOP"
        garp_master_delay 1    # How long it will take before the ARP cache can be updated after the CVM switches to the primary status
        garp_master_refresh 5   # Time interval between which the primary node sends ARP messages
         track_interface {
                    eth0               # ENI that bound with VIP, such as `eth0`
            }
        track_script {
           checkhaproxy 
        }
      }
      
    2. Press Esc to exit the edit mode and enter :wq! to save and close the file.

    3. Log in to the secondary CVM HAVIP-02 and run vim /etc/keepalived/keepalived.conf to modify its configurations.

      ! Configuration File for keepalived
      global_defs {
      notification_email {
       acassen@firewall.loc
       failover@firewall.loc
       sysadmin@firewall.loc
      }
      notification_email_from Alexandre.Cassen@firewall.loc
      smtp_server 192.168.200.1
      smtp_connect_timeout 30
      router_id LVS_DEVEL
      vrrp_skip_check_adv_addr
      vrrp_garp_interval 0
      vrrp_gna_interval 0
      }
      vrrp_script checkhaproxy
      {
      script "/etc/keepalived/do_sth.sh"
       interval 5
      }
      vrrp_instance VI_1 {
      # Select proper parameters for the primary and secondary CVMs.
      state BACKUP            #Set the initial status to `Backup`
      interface eth0          #The ENI such as `eth0` used to bind a VIP  
      virtual_router_id 51        #The`virtual_router_id` value for the cluster
      nopreempt                   #Non-preempt mode
      # preempt_delay 10      #Effective only when “state MASTER”    
      priority 100            # Configure the same weight for the two devices
      advert_int 5       
      authentication {
          auth_type PASS
          auth_pass 1111
      }
      unicast_src_ip 172.16.16.6   #Private IP of the local device
      unicast_peer {
          172.16.16.5                #IP address of the peer device
      }
      virtual_ipaddress {
          172.16.16.12              #HAVIP 
      }
      notify_master "/etc/keepalived/notify_action.sh MASTER"
      notify_backup "/etc/keepalived/notify_action.sh BACKUP"
      notify_fault "/etc/keepalived/notify_action.sh FAULT"
      notify_stop "/etc/keepalived/notify_action.sh STOP"
      garp_master_delay 1    # How long it will take before the ARP cache can be updated after the CVM switches to the primary status
      garp_master_refresh 5   #Time interval between which the primary node sends ARP messages
      track_interface {
                  eth0               # ENI such as `eth0` that binds a VIP
          }
      track_script {
         checkhaproxy 
      }
      }
      
    4. Press Esc to exit the edit mode and enter :wq! to save and close the file.

    5. Restart Keepalived for the configuration to take effect.

      systemctl start keepalived
      
    6. Check the primary/secondary status of the two CVMs, and confirm that both have HAVIP correctly bound.

      Note:

      In this example, HAVIP-01 starts the Keepalived first and will normally serve as the primary node.

      Log in to the HAVIP console. You will see that HAVIP is bound to the primary CVM HAVIP-01, as shown below.

    Step 4: bind an EIP to HAVIP (optional)

    1. Log in to the HAVIP console, locate the HAVIP you have applied for in Step 1, and click Bind.
    2. In the pop-up dialog box, select the EIP to be bound and click OK. If no EIP is available, first go to the EIP console to apply.

    Step 5: use notify_action.sh for simple logging (optional)

    The Keepalived’s main logs are still recorded in “/var/log/message”, and you can add the “notify” script for simple logging.

    1. Log in to the CVM and run the vim /etc/keepalived/notify_action.sh command to add the following “notify_action.sh” script.

      #!/bin/bash
      #/etc/keepalived/notify_action.sh
      log_file=/var/log/keepalived.log
      log_write()
      {
      echo "[`date '+%Y-%m-%d %T'`] $1" >> $log_file
      }
      [ ! -d /var/keepalived/ ] && mkdir -p /var/keepalived/
      case "$1" in
      "MASTER" )
          echo -n "$1" > /var/keepalived/state
          log_write " notify_master"
          echo -n "0" /var/keepalived/vip_check_failed_count
          ;;
      "BACKUP" )
          echo -n "$1" > /var/keepalived/state
          log_write " notify_backup"
          ;;
      "FAULT" )
          echo -n "$1" > /var/keepalived/state
          log_write " notify_fault"
          ;;
      "STOP" )
          echo -n "$1" > /var/keepalived/state
          log_write " notify_stop"
          ;;
      *)
          log_write "notify_action.sh: STATE ERROR!!!"
          ;;
      esac
      
    2. Run the chmod a+x /etc/keepalived/notify_action.sh command to modify the script permission.

    Step 6: verify whether VIP and public IP are switched normally during primary/secondary switch

    Simulate the CVM failure by restarting the Keepalived process or restarting the CVM to check whether the VIP can be migrated.

    • If the primary/secondary switch succeeds, the secondary CVM will become the server bound with the HAVIP in the console.
    • You can also ping a VIP from within the VPC to check the time lapse from network interruption to recovery. Each switch may cause an interruption for about 4 seconds. If you ping the EIP bound to HAVIP over a public network, the result will be the same.
    • Run the ip addr show command to check whether the HAVIP is bound to the primary ENI.