tencent cloud

Feedback

QoSAgent

Last updated: 2024-02-05 16:28:54
    QoS Agent is an extended component enhanced by Tencent Cloud based on quality of service, offering an array of capabilities. It ensures stability while increasing the utilization rate of cluster resources.
    Note:
    QoS capabilities are only supported on native nodes. If your nodes are not native, or your workload does not reside on native nodes, these capabilities will not be effective.

    Kubernetes objects deployed in a cluster

    Kubernetes Object Name
    Type
    Default Resource Occupation
    Associated Namespaces
    avoidanceactions.ensurance.crane.io
    CustomResourceDefinition
    -
    -
    nodeqoss.ensurance.crane.io
    CustomResourceDefinition
    -
    -
    podqoss.ensurance.crane.io
    CustomResourceDefinition
    -
    -
    timeseriespredictions.prediction.crane.io
    CustomResourceDefinition
    -
    -
    kube-system
    Namespace
    -
    -
    all-be-pods
    PodQOS
    -
    kube-system
    qos-agent
    ClusterRole
    -
    -
    qos-agent
    ClusterRoleBinding
    -
    -
    crane-agent
    Service
    -
    kube-system
    qos-agent
    ServiceAccount
    -
    kube-system
    qos-agent
    Daemonset
    -
    kube-system

    Feature Overview

    Feature
    Description
    Priority of CPU Usage
    The feature of setting CPU usage priority ensures a sufficient supply of resources for high-priority tasks during resource competition, thereby suppressing low-priority tasks.
    CPU Burst
    CPU Burst permits temporary provision of resources beyond the limit for latency-sensitive applications, ensuring their stability.
    CPU Hyperthreading Isolation
    Preventing L2 Cache of high-priority container threads from being affected by low-priority threads running on the same CPU physical core.
    Memory QoS Enhancement
    A comprehensive enhancement of memory performance, along with the flexible limitations on the memory usage of the container.
    Network QoS Enhancement
    A comprehensive enhancement of network performance, along with flexible limitations on the network usage of the container.
    Disk IO QoS Enhancement
    A comprehensive enhancement of disk performance, along with flexible limitations on the disk usage of the container.

    QoS Agent Permission

    Note:
    The Permission Scenarios section only lists the permissions related to the core features of the components, for a complete permission list, please refer to the Permission Definition.

    Permission Description

    The permission of this component is the minimal dependency required for the current feature to operate.

    Permission Scenarios

    Feature
    Involved Object
    Involved Operation Permission
    Reading podqos, nodeqos, time series, and other configurations
    podqoss / nodeqoss / avoidanceactions
    get/list/watch/update
    Viewing the pod information of the current node
    pod
    get/list/watch
    Enabling isolation capability based on Podqos/ Modifying node resources to increase offline resources
    pod status
    update/patch
    Adding a taint to the node
    node
    get/list/watch/update
    Sending events based on the status of isolation and resource interference
    event
    All Permissions

    Permission Definition

    rules:
    - apiGroups:
    - ""
    resources:
    - pods
    verbs:
    - get
    - list
    - watch
    - apiGroups:
    - ""
    resources:
    - pods/status
    verbs:
    - update
    - patch
    - apiGroups:
    - ""
    resources:
    - nodes
    verbs:
    - get
    - list
    - watch
    - update
    - apiGroups:
    - ""
    resources:
    - nodes/status
    - nodes/finalizers
    verbs:
    - update
    - patch
    - apiGroups:
    - ""
    resources:
    - pods/eviction
    verbs:
    - create
    - apiGroups:
    - ""
    resources:
    - configmaps
    verbs:
    - get
    - list
    - watch
    - apiGroups:
    - ""
    resources:
    - events
    verbs:
    - "*"
    - apiGroups:
    - "ensurance.crane.io"
    resources:
    - podqoss
    - nodeqoss
    - avoidanceactions
    verbs:
    - get
    - list
    - watch
    - update
    - apiGroups:
    - "prediction.crane.io"
    resources:
    - timeseriespredictions
    - timeseriespredictions/finalizers
    verbs:
    - get
    - list
    - watch
    - create
    - update
    - patch
    - apiGroups:
    - "topology.crane.io"
    resources:
    - "noderesourcetopologies"
    verbs:
    - get
    - list
    - watch
    - create
    - update
    - patch

    Deployment Methods

    1. Log into the Tencent Kubernetes Engine Console, and choose Cluster from the left navigation bar.
    2. In the Cluster list, click the desired Cluster ID to access its detailed page.
    3. Select Add-on management from the left-side menu, and click Create within the Component Management page.
    4. On the Create Add-on management page, tick the box for QoS Agent.
    5. Click Complete to install the add-on.
    Please Note:
    With the completion of the deployment, you need to manually select the corresponding driver due to potential differences in cgroup driver of the cluster. The instructions are as follows:
    1. Within the Add-on in your cluster, locate the successfully deployed QoS Agent, and click Update configuration on the right.
    2. On the add-on configuration page of QoS Agent, select the dropdown box to the right of the cgroupDrive option, and choose cgroupDrive that matches your cluster.
    3. Click Complete.

    FAQs

    How to confirm the cgroupDrive of a cluster?

    The cgroupDrive of a cluster can only be either cgroupfs or systemd. The confirmation method is as follows:
    Initially, the operation of peekcluster can be viewed in the "basic information" page of the cluster, specifically in the "operating add-on", by determining whether the current cluster serves as a docker or containerd.
    If the operating cluster is docker, on any node in the cluster, execute docker info and view the field content of Cgroup Driver.
    If the operating cluster is containerd, in the file of /etc/containerd/config.toml on any node in the cluster, the presence of the field: SystemdCgroup = true signifies a systemd, otherwise, it is a cgroup.

    How to select the operating business or node?

    Choosing a specific resource object via label or scope is supported.
    Note:
    When both of the following selectors exist concurrently, the operation used is an "and", i.e. all conditions must be met.

    labelSelector

    The labelSelector filters resources by associating them with the resource labels of the object. The usual method of usage is to attach a specific tag to the designated workloads on the business end. This Tag is then given to the operation team. When creating a PodQOS, the operation team associates this tag through the labelSelector field, effectively granting different QoS capabilities to different businesses.

    scopeSelector

    The scopeSelector is composed of multiple MatchExpressions. The relationship between these MatchExpressions is an "and". There are three fields in MatchExpressions, namely ScopeName, Operator, and Values corresponding to ScopeName;
    The ScopeName includes three types: QOSClass, Priority, and Namespace;
    QOSClass refers to a desired Workload that is associated with a specific QOSClass. The Values can be one or more among Guaranteed, Burstable, and BestEffort;
    Priority refers to a desired Workload that is associated with a specific Priority. The Values can be specific priority values, such as ["1000", "2000-3000"], supporting a range of priorities;
    Namespace refers to a desired Workload that is associated with a specific Namespace. The Values can be one or more.
    Operator includes two types, specifically In and NotIn. If left it blank, the default type is In.
    As illustrated below, it denotes that the BestEffortPod meets a condition of app-type=offline, with a CPU priority of 7:
    apiVersion: ensurance.crane.io/v1alpha1
    kind: PodQOS
    metadata:
    name: offline-task
    spec:
    allowedActions:
    - eviction
    resourceQOS:
    cpuQOS:
    cpuPriority: 7
    scopeSelector:
    matchExpressions:
    - operator: In
    scopeName: QOSClass
    values:
    - BestEffort
    labelSelector:
    matchLabels:
    app-type: offline
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support