High availability (HA) refers to the ability of an application system to maintain uninterrupted operation, which is usually achieved by improving the fault tolerance of the system. In general, the application fault tolerance can be improved by configuring
replicas to create multiple replicas of the application, but this does not necessarily mean that the application will have high availability.
This document describes four best practices for deploying applications with high availability. You can choose from them based on your situation.
Kubernetes assumes that nodes are unreliable, so the more nodes there are, the higher the probability of nodes being unavailable due to software or hardware failures will be. Therefore, we usually have to deploy multiple replicas of applications and adjust the
replicas value based on the actual situation. If its value is 1, there must be risks of single-point failures. Even if its value is greater than 1 but all replicas are scheduled to the same node, the single-point failure risks will still be there.
To prevent single-point failures, we need to have an appropriate number of replicas, and we also need to make sure different replicas are scheduled to different nodes. We can do so with anti-affinity. See the example below:
affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - weight: 100 labelSelector: matchExpressions: - key: k8s-app operator: In values: - kube-dns topologyKey: kubernetes.io/hostname
The relevant configurations in this example are shown below:
preferredDuringSchedulingIgnoredDuringExecutionto instruct the scheduler to always try to meet the anti-affinity condition. If no node meets the condition, pods can still be scheduled to certain nodes.
kubernetes.io/hostnameto indicate that pods are prevented from being scheduled to the same node.
failure-domain.beta.kubernetes.io/zone. Generally, all the nodes in the same cluster are in one region. If there are cross-region nodes, there will be considerable latency even if direct connect is used. If pods have to be scheduled to nodes in the same region, you can use
The placement group and the TKE self-deployed cluster need to be in the same region.
The placement group policy takes effect only for nodes of the same batch. Therefore, you need to add a label for each batch of nodes and specify different values to mark different batches.
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "placement-set-uniq" operator: In values: - "rack1" podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - nginx topologyKey: kubernetes.io/hostname
Node draining involves negative impacts. The following describes the process of draining a node:
Such a process first deletes the pods and then creates new pods instead of using rolling update. Therefore, if all replicas of a service are on the drained node, the service may become unavailable during the updating process. Normally, the service may become unavailable for two reasons:
apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: zk-pdb spec: minAvailable: 2 selector: matchLabels: app: zookeeper
For more details, please read Kubernetes documentation: Specifying a Disruption Budget for your Application.
apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: zk-pdb spec: maxUnavailable: 1 selector: matchLabels: app: zookeeper
If configuration is not optimized for a service, some traffic errors may occur during the service update with the default configuration. Please refer to the following steps when making deployment.
Some service update scenarios include:
During a rolling update, the pods corresponding to the service being updated will be created or terminated, and the endpoints of the service will also add and remove
Pod IP:Port corresponding to the pods. Then kube-proxy will update the forwarding rules according to the updated
Pod IP:Port list, but such rules are not updated immediately.
The forwarding rules are not updated immediately because Kubernetes components are decoupled from each other. Each component uses the controller mode to ListAndWatch the resources it is interested in and responds with actions. Therefore, all the steps in the process, including pod creation or termination, endpoint update, and forwarding rules update, happen in an asynchronous manner.
When forwarding rules are not immediately updated, some connection errors could occur during the service update. The following describes two possible scenarios to analyze the reasons behind the connection errors:
Scenario 1: Pods have been created but have not fully started yet. Endpoint controller adds the pods to the
Pod IP:Port list of the service. kube-proxy watches the update and updates the service forwarding rules (iptables/ipvs). If there is a request made at this point, it could be forwarded to a pod that has not fully started yet. A connection error may occur because the pod is not able to properly process the request yet.
Scenario 2: Pods have been terminated, but since all the steps in the process are asynchronous, the forwarding rules have not been updated when the Pods have been fully terminated. In such a case, new requests can still be forwarded to the terminated Pods, leading to connection errors.
To address problems in scenario 1, you can add readinessProbe to the containers in the pods. After a container fully starts, it will listen to an HTTP port to which kubelet will send readiness probe packets. If the container can respond normally, it means the container is ready, and the container’s status will be modified to Ready. Only when all the containers in a pod are ready will the pod be added by the endpoint controller to the
IP:Port list in the corresponding endpoint of the Service. Then, kube-proxy will update the forwarding rules. In this way, even if a request is immediately forwarded to the new pod, it will be able to normally process the request, thereby avoiding connection errors.
To address problems in scenario 2, you can add preStop hook to the containers in the pods so that, before the pods are fully terminated, they will sleep for some time during which the endpoint controller and kube-proxy can update the endpoints and the forwarding rules. During that time, the pods will be in the Terminating state. Even if a request is forwarded to a terminating pod before the forwarding rules are fully updated, the pod can still normally process the request because it has not been terminated yet.
Below is a YAML sample:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx spec: replicas: 1 selector: matchLabels: component: nginx template: metadata: labels: component: nginx spec: containers: - name: nginx image: "nginx" ports: - name: http hostPort: 80 containerPort: 80 protocol: TCP readinessProbe: httpGet: path: /healthz port: 80 httpHeaders: - name: X-Custom-Header value: Awesome initialDelaySeconds: 15 timeoutSeconds: 1 lifecycle: preStop: exec: command: ["/bin/bash", "-c", "sleep 30"]