Setting the Health Check for a Workload

Last updated: 2020-02-25 10:12:39

PDF

Health Check Types

Health checks are divided into the following types:

  • Liveness check : it checks whether the container is alive, which is similar with checking whether a process exists via ps command. If the liveness check fails, the cluster will restart the container. No action is performed is the liveness check succeeds.
  • Container Readiness check Used to check whether the container is ready to start processing user requests For example, the startup time of a program is long, as the disk data must be loaded or an external module must be started before the program can provide services. In this case, you can use the container readiness check to determine whether the program has been started by checking the program process. If the container's readiness check fails, the cluster will block requests from accessing the container. If the container's readiness check succeeds, access to the container will be opened.

Health Check Methods

TCP Port Probe

TCP port probe works as follows:
For a container that provides TCP communication services, the cluster periodically establishes a TCP connection with the container. If the connection is successful, the probe succeeds; otherwise, it fails. To select the TCP port probe method, you must specify the port that the container listens to.
For example, for a Redis container with a service port of 6379, if you configure TCP port probe for the container and specify the probe port to be 6379, then the cluster will periodically initiate a TCP connection to port 6379 of the container.

HTTP Request Probe

HTTP request probe is well suited for a container that provides HTTP/HTTPS services, where the cluster periodically initiates a HTTP/HTTPS GET request to the container. If the return code in the HTTP/HTTPS response is in the range of 200399, the probe succeeds; otherwise, it fails.
For example, for a container that provides HTTP services with a service port of 80 and HTTP check path of /health-check , the cluster will periodically initiate a GET http://containerIP:80/health-check Request to the container.

Run Command Check

Is a powerful check method. After the user specify an executable command within the container, the cluster periodically executes the command in the container. If the result is 0, the check succeeds. Otherwise, the check fails.
You can replace both TCP port probe And HTTP request probe By performing a run command check:

  • For TCP port probe, a specific program is written to connect the port of the container. If the connection succeeds, the script returns 0; otherwise, it returns-1.
  • For HTTP request detection, you can write a script to wget the container and check the return code of the response. For example, wget http://127.0.0.1:80/health-check . If the return code is in the range of 200 - 399, the script will return 0; otherwise, -1.
  • The program to be run must be placed in the image of the container; otherwise, the run will fail as the program cannot be found.
  • If the command to be run is a shell script, you cannot directly specify the script as the run command; instead, you need to add the script's interpreter. For example, if the script is /data/scripts/health_check.sh, the specified program should be sh /data/scripts/health_check.sh when run command check is used.

Other Common Parameters

  • Start delay: In seconds. This specifies the time before the probe starts after the container is started. For example, if the start delay is set to 5, then the health check will start 5 seconds after the container is started.
  • Interval: In seconds. This specifies the frequency of health checks. For example, if the interval is set to 10, then the cluster will be checked one every 10 seconds.
  • Response timeout: In seconds. This specifies the timeout period for health probes. It indicates the TCP connection timeout period, the HTTP request response timeout period, and the run command timeout period for TCP port detection, HTTP request detection, and run command check, respectively.
  • Healthy threshold: In times. It specifies the times of consecutive health check successes before it is determined that the container is healthy. For example, if the healthy threshold is set to 3, the container will be considered healthy only if 3 consecutive probes succeed.
    If the type of health check is a survival check, then the healthy threshold can only be 1, and other values you set will be considered invalid.
  • Unhealthy threshold: In times. It specifies the times of consecutive health check failures before it is determined that the container is unhealthy. For example, if the unhealthy threshold is set to 3, the container will be considered unhealthy only if 3 consecutive probes fail.