Using VPA to Realize Pod Scaling up and Scaling down in TKE

Prev Next

search by keyword

Recent Pages

Documentation

Download PDF

Using VPA to Realize Pod Scaling up and Scaling down in TKE

Last updated: 2021-06-09 17:46:36

Download PDF

Overview

Kubernetes Vertical Pod Autoscaler (VPA) can automatically adjust the reserved CPU and memory of Pod, improve cluster resource utilization and release CPU and memory for other Pods. This document describes how to use the VPA community edition in TKE to implement the scaling up and scaling down of Pods.

Use Cases

The auto-scaling feature of VPA makes the TKE very flexible and adaptive. When the business load increases sharply, VPA can quickly increase the Request of the container within the user's setting range. When the business load decreases, VPA can appropriately reduce the Request based on the actual needs to save computing resources. The entire process is automated without manual intervention. It is suitable for scenarios that require rapid expansion and stateful application expansion. In addition, VPA can be used to recommend a more reasonable Request to user, and improve the resource utilization of the container while ensuring that the container has sufficient available resources.

VPA Strengths

Compared with Horizontal Pod Autoscaler (HPA), VPA has the following advantages:

VPA does not need to adjust the replicas of Pod for expansion, and the expansion speed is faster.
VPA can achieve the expansion of the stateful applications, while HPA is not suitable for the scaling out of the stateful applications.
If the Request is set too large, the cluster resource utilization is still very low when HPA is used to scale in the Pods to a Pod. In this case, you can use VPA to scale down to improve the cluster resource utilization.

VPA Limits

Note：

VPA community edition is in testing. Use this feature with caution. We recommend setting "updateMode" to "Off" to ensure that VPA will not automatically change the value of Request. You can still view the recommended value of request bound to the load in the VPA object.

You can use the VPA to update the resource configurations of the running Pods. This feature is in testing. The configuration updates will lead to Pod restart and rebuilding, and the Pods may be scheduled to other nodes.
The VPA does not evict the Pods that are not run under a controller. For these Pods, the Auto mode is equivalent to the Initial mode.
You cannot run VPA simultaneously with the HPA that uses the CPU and memory as metrics. If the HPA uses other metrics except CPU and memory, you can run the VPA with the HPA at the same time. For details, see Using Custom Metrics for Auto Scaling in TKE.
The VPA uses an Admission Webhook as its admission controller. If there are other Admission Webhooks in the cluster, you need to ensure that they do not conflict with the Admission Webhooks of the VPA. The execution sequence of admission controllers is defined in the configuration parameters of the API Server.
The VPA can react to most Out of Memory (OOM) events.
The VPA performance has not been tested in large-scale clusters.
The recommended value of Pod resource Request set by the VPA may exceed the upper limit of the available resources (such as node resources, idle resources, and resource quotas). In this case, the Pod may go to Pending and cannot be scheduled. This can be partly addressed by using the VPA together with the Cluster Autoscaler.
Multiple VPA resources matching the same pod have undefined behavior.

For more limitations on VPA, see VPA Known limitations.

Prerequisites

You have created a TKE cluster.
The cluster has been connected via the command line tool Kubectl. For how to connect to a cluster, see Connecting to a Cluster.

Directions

Deploying VPA

Log in to the CVM in the cluster.
You can connect to a TKE cluster from a local client using the command line tool kubectl.
Run the following command to clone the kubernetes/autoscaler from GitHub Repository.
```
sh
git clone https://github.com/kubernetes/autoscaler.git
```
Run the following command to switch to thevertical-pod-autoscaler directory.
```
cd autoscaler/vertical-pod-autoscaler/
```
(Optional) If you have already deployed another version of VPA, run the following command to remove it. Otherwise an exception may occur.
```
./hack/vpa-down.sh
```
Run the following command to deploy VPA related components to your cluster.
```
./hack/vpa-up.sh
```
Run the following command to verify whether the VPA component is successfully created.
```
kubectl get deploy -n kube-system | grep vpa
```
After successfully creating the VPA component, you can check the three Deployments in the kube-system namespace, namely vpa-admission-controller, vpa-recommender, and vpa-updater, as shown below:

Sample 1: using VPA to obtain the recommended value of Request

Note：

We do not recommend using VPA to automatically update Request in a production environment.

You can use VPA to view the recommended value of Request and manually trigger the update as needed.

In this sample, you will create a VPA object with updateMode set to Off and create a Deployment with two Pods, and each Pod has a container. After the Pod is created, VPA will analyze the CPU and memory requirements of the container and record the recommended value of Request in the status field. VPA will not automatically update the resource requests of the running containers.

Run the following command in kubectl to generate a VPA object named tke-vpa, pointing to a Deployment named tke-deployment:

shell
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: tke-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: tke-deployment
  updatePolicy:
    updateMode: "Off"
EOF

Run the following command to generate a Deployment object named tke-deployment:

shell
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tke-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: tke-deployment
  template:
    metadata:
      labels:
        app: tke-deployment
    spec:
      containers:
      - name: tke-container
        image: nginx
EOF

The generated Deployment object is show as follows:

Note：

The tke-deployment created above does not set the Request of CPU or memory, and the Qos of the Pod is set to BestEffort. In this case, Pod is easy to be evicted. We recommend that you set the Request and Limit when creating the Deployment of the application. If you create a workload via the TKE console, the default Request and Limit of each container will be automatically set.

Run the following command to view the recommended Requests of CPU and memory by VPA:

shell
kubectl get vpa tke-vpa -o yaml

The execution results are as follows:

yaml
...
recommendation:
     containerRecommendations:
     - containerName: tke-container
       lowerBound:
         cpu: 25m
         memory: 262144k
       target:# Recommended value
         cpu: 25m
         memory: 262144k
       uncappedTarget:
         cpu: 25m
         memory: 262144k
       upperBound:
         cpu: 1771m
         memory: 1851500k

The CPU and memory corresponding to target are the recommended Requests. You can remove the previous Deployment and create a new Deployment with the recommended Request.

Field	Description
lowerBound	The minimum value recommended. The use of a Request smaller than this value may have a major impact on performance or availability.
target	Recommended value. The VPA calculates the most appropriate Request.
uncappedTarget	The latest recommended value. It is only based on the actual resource usage and does not consider the recommended value range of the container set in `.spec.resourcePolicy.containerPolicies`. The uncappedTarget may differ from the recommended `lowerBound` and `upperBound`. This field is only used to indicate the status and will not affect the actual resource allocation.
upperBound	The maximum value recommended. The use of a Request larger than this value may cause a resource waste.

Sample 2: Disabling a specific container

If there are multiple containers in the Pod, for example, one is an application container and the other is a secondary container. You can choose to stop recommending Request for the secondary container to save the cluster resources.

In this sample, you will create a VPA with a specific container disabled, and create a Deployment with a Pod, and the Pod contains two containers. After the Pod is created, VPA only creates and calculates the recommended value for one container, and stops recommending Request for the other container.

Run the following command in the kubectl to generate a VPA object named tke-opt-vpa, pointing to a Deployment named tke-opt-deployment:

shell
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: tke-opt-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: tke-opt-deployment
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: tke-opt-sidecar
      mode: "Off"
EOF

Note：

In the .spec.resourcePolicy.containerPolicies of the VPA, the mode of tke-opt-sidecar is set to "Off", and VPA will not calculate and recommend a new Request for tke-opt-sidecar.

Run the following command to generate a Deployment object named tke-deployment:

sh
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
   name: tke-opt-deployment
spec:
   replicas: 1
   selector:
     matchLabels:
       app: tke-opt-deployment
   template:
     metadata:
       labels:
         app: tke-opt-deployment
     spec:
       containers:
       - name: tke-opt-container
         image: nginx
       - name: tke-opt-sidecar
         image: busybox
         command: ["sh","-c","while true; do echo TKE VPA; sleep 60; done"]
EOF

The generated Deployment object is show as follows:

Run the following command to view the recommended Requests of CPU and memory by VPA:

shell
kubectl get vpa tke-opt-vpa -o yaml

The execution results are as follows:

yaml
...
  recommendation:
    containerRecommendations:
    - containerName: tke-opt-container
      lowerBound:
        cpu: 25m
        memory: 262144k
      target:
        cpu: 25m
        memory: 262144k
      uncappedTarget:
        cpu: 25m
        memory: 262144k
      upperBound:
        cpu: 1595m
        memory: 1667500k

In the execution result, there is only the recommended value of tke-opt-container, and no recommended value of tke-opt-sidecar.

Sample 3: updating the Request automatically

Note：

Automatic updating the resources of the running Pods is an experimental feature of VPA. We recommend that you do not use this feature in a production environment.

In this sample, you will create a VPA that can automatically adjust the CPU and memory Requests, and create a Deployment with two Pods. Each Pod will set the Request and Limit of the resource.

Run the following command in the kubectl to generate a VPA object named tke-auto-vpa, pointing to a Deployment named tke-auto-deployment:

yaml
cat <<EOF | kubectl apply -f - 
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
   name: tke-auto-vpa
spec:
   targetRef:
     apiVersion: "apps/v1"
     kind: Deployment
     name: tke-auto-deployment
   updatePolicy:
     updateMode: "Auto"
EOF

Note：

The updateMode field of this VPA is set to Auto, which means that the VPA can update the CPU and memory Requests during the life cycle of the Pod. VPA can remove the Pod, adjust the CPU and memory Requests, and then rebuild a Pod.

Run the following command to generate a Deployment object named tke-auto-deployment:

shell
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
   name: tke-auto-deployment
spec:
   replicas: 2
   selector:
     matchLabels:
       app: tke-auto-deployment
   template:
     metadata:
       labels:
         app: tke-auto-deployment
     spec:
       containers:
       - name: tke-container
         image: nginx
         resources:
           requests:
             cpu: 100m
             memory: 100Mi
           limits:
             cpu: 200m
             memory: 200Mi
EOF

Note：

When the Deployment is created in the above operation, the Request and Limit of the resource have been set. In this case, VPA will not only recommend the Request, but also automatically recommend the Limit based on the initial ratio of Request and Limit. For example, the initial ratio of CPU’s Request and Limit in YAML is 100m:200m, namely 1:2, then the value of Limit recommended by VPA is twice the value of Request recommended in the VPA object.

The generated Deployment object is show as follows:

Run the following command to obtain the detailed information of the running Pod:

sh
kubectl get pod pod-name -o yaml

The execution result is shown below. VPA modified the original Request and Limits to the recommended value of VPA, and maintained the initial ratio of Request and Limits. At the same time, an annotation that recorded the updates is generated:

yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    ...
    vpaObservedContainers: tke-container
    vpaUpdates: Pod resources updated by tke-auto-vpa: container 0: memory request, cpu request
...
spec:
  containers:
  ...
    resources:
      limits:# The new Request and Limits will maintain the initial ratio
        cpu: 50m    
        memory: 500Mi
      requests:
        cpu: 25m
        memory: 262144k
    ...

Run the following command to obtain the detailed information of the relevant VPA:

sh
kubectl get vpa tke-auto-vpa -o yaml

The execution results are as follows:

yaml
...
  recommendation:
    containerRecommendations:
    - containerName: tke-container
      Lower Bound:
        Cpu:     25m
        Memory:  262144k
      Target:
        Cpu:     25m
        Memory:  262144k
      Uncapped Target:
        Cpu:     25m
        Memory:  262144k
      Upper Bound:
        Cpu:     101m
        Memory:  262144k

target means that the container will run in the best state when the Requests of CPU and memory are 25m and 262144k respectively.

VPA uses the recommended values of lowerBound and upperBound to decide whether to evict a Pod and replace it with a new Pod. If the Pod’s Request is smaller than the lower limit or larger than the upper limit, VPA will remove the Pod and replace it with a Pod with a recommended value.

Troubleshooting

1. An error occurs when running the `vpa-up.sh` script.

Errors

shell
ERROR: Failed to create CA certificate for self-signing. If the error is "unknown option -addext", update your openssl version or deploy VPA from the vpa-release-0.8 branch.

Solutions

If you have not run the command through the CVM in the cluster, we recommend that you download the Autoscaler project in the CVM and deploy VPA. If you need to connect the cluster to your CVM, see Connecting to a Cluster.
If the errors still exist, please check whether the following problems exist:
- Check whether the openssl version of the cluster CVM is later than v1.1.1.
- Whether the vpa-release-0.8 branch of the Autoscaler project is used.

Errors

If the VPA-related load fails to start up, and the following message is generated:

Message 1: indicates that the Pods in the load fail to run.
Message 2: indicates the address of the image.

Solutions

The VPA-related load could not be started up because the image located in GCR could not be downloaded. You can try the following steps to solve the problem:

Download the image.
Visit the "k8s.gcr.io/" image repository and download the images of vpa-admission-controller, vpa-recommender, and vpa-updater.
Replace the image tags and push the images.
Replace the image tags of vpa-admission-controller, vpa-recommender, and vpa-updater and push them to your image repository. For how to push and upload the image, please see TCR Personal Edition.
Change the image address in YAML.
In the YAML file, update the image addresses of vpa-admission-controller, vpa-recommender, and vpa-updater to the new addresses you set.

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support

tencent cloud

Recent Pages

Using VPA to Realize Pod Scaling up and Scaling down in TKE

Overview

Use Cases

VPA Strengths

VPA Limits

Prerequisites

Directions

Deploying VPA

Sample 1: using VPA to obtain the recommended value of Request

Sample 2: Disabling a specific container

Sample 3: updating the Request automatically

Troubleshooting

1. An error occurs when running the `vpa-up.sh` script.

Errors

Solutions

Errors

Solutions

Was this page helpful?

Was this page helpful?

tencent cloud

Sign Up

Log in

Recent Pages

Using VPA to Realize Pod Scaling up and Scaling down in TKE

Overview

Use Cases

VPA Strengths

VPA Limits

Prerequisites

Directions

Deploying VPA

Sample 1: using VPA to obtain the recommended value of Request

Sample 2: Disabling a specific container

Sample 3: updating the Request automatically

Troubleshooting

1. An error occurs when running the vpa-up.sh script.

Errors

Solutions

2. The VPA-related load could not be started up.

Errors

Solutions

Was this page helpful?

Was this page helpful?

1. An error occurs when running the `vpa-up.sh` script.