tencent cloud

Tencent Kubernetes Engine
Tencent Kubernetes Engine (TKE) is an enterprise-level container management service with containers as the core, providing highly scalable solutions based on native Kubernetes. As the industry's first to adopt a single-cluster hybrid node resource management mode, it offers comprehensive scenario-based solutions around Agentic AI application deployment and ultimate resource efficiency, releasing infinite computing power for users in the AI era.
50%
business performance
300%
increase in resource usage rate
Second-level
pod startup speed
Why Choose Tencent Kubernetes Engine
Scenario Application
Fully support Agent Deployment, model inference, reinforcement learning, large-scale data processing, and microservice scenarios.
Scenario Application
Fully support Agent Deployment, model inference, reinforcement learning, large-scale data processing, and microservice scenarios.
Full Link Accelerator
For model deployment, reasoning, and large-scale data processing scenarios, TKE has built an optimized FLA system.
Full Link Accelerator
For model deployment, reasoning, and large-scale data processing scenarios, TKE has built an optimized FLA system.
Ultimate Resource Performance
Deeply integrated with FinOps principles and equipped with the self-developed Crane scheduler to help users achieve over 300% performance enhancement in resource utilization.
Ultimate Resource Performance
Deeply integrated with FinOps principles and equipped with the self-developed Crane scheduler to help users achieve over 300% performance enhancement in resource utilization.
Flexible Deployment
Provide diverse node deployment options, support managing Serverless and IDC resources.
Flexible Deployment
Provide diverse node deployment options, support managing Serverless and IDC resources.
Secure and Reliable
Combining Tencent's comprehensive self-developed cloud migration technical practice, it performs parameter tuning and adaptation for the operating system, runtime environment, and kubernetes in an all-round manner.
Secure and Reliable
Combining Tencent's comprehensive self-developed cloud migration technical practice, it performs parameter tuning and adaptation for the operating system, runtime environment, and kubernetes in an all-round manner.
Ultra-large-scale cluster
A single cluster control plane can support stable operation of over 50,000 nodes with 10x throughput enhancement.
Ultra-large-scale cluster
A single cluster control plane can support stable operation of over 50,000 nodes with 10x throughput enhancement.
Discover the key features​​
Cluster Management

TKE supports dynamic scaling of clusters and vertical scaling of nodes.

The nodes in the cluster are manageable and deployable across availability zones, and the containers in the service are schedulable across availability zones.

The nodes in the cluster are provided with

Cluster Management

TKE supports dynamic scaling of clusters and vertical scaling of nodes.

The nodes in the cluster are manageable and deployable across availability zones, and the containers in the service are schedulable across availability zones.

The nodes in the cluster are provided with

Service Management

Improved service deployment efficiency

Service version is controlled through templates, and the image guarantees environmental consistency, allowing for faster service migration.

Service discovery is supported. Services can be accessed through load balancing domain names or service names together with port number, circumventing the impact of IP changes when the service backend changes.

Micro-services are supported, reducing

Service Management

Improved service deployment efficiency

Service version is controlled through templates, and the image guarantees environmental consistency, allowing for faster service migration.

Service discovery is supported. Services can be accessed through load balancing domain names or service names together with port number, circumventing the impact of IP changes when the service backend changes.

Micro-services are supported, reducing

Configuration Management

Manageable business configurations of different environments

Different environments for the same applications can be deployed, making it easy to update and roll back the applications.

Multiple versions are supported, but only

Configuration Management

Manageable business configurations of different environments

Different environments for the same applications can be deployed, making it easy to update and roll back the applications.

Multiple versions are supported, but only

Image Management

Official Dockerhub image management

Accelerated retrieval of official Dockerhub images is available.

Private image management

A secure and reliable private image repository is provided.

Fast image upload and download is

Image Management

Official Dockerhub image management

Accelerated retrieval of official Dockerhub images is available.

Private image management

A secure and reliable private image repository is provided.

Fast image upload and download is

Cluster Management

TKE supports dynamic scaling of clusters and vertical scaling of nodes.

The nodes in the cluster are manageable and deployable across availability zones, and the containers in the service are schedulable across availability zones.

The nodes in the cluster are provided with

Service Management

Improved service deployment efficiency

Service version is controlled through templates, and the image guarantees environmental consistency, allowing for faster service migration.

Service discovery is supported. Services can be accessed through load balancing domain names or service names together with port number, circumventing the impact of IP changes when the service backend changes.

Micro-services are supported, reducing

Configuration Management

Manageable business configurations of different environments

Different environments for the same applications can be deployed, making it easy to update and roll back the applications.

Multiple versions are supported, but only

Image Management

Official Dockerhub image management

Accelerated retrieval of official Dockerhub images is available.

Private image management

A secure and reliable private image repository is provided.

Fast image upload and download is

How it works in various businesses scenarios
Agent Sandbox
Model Inference
Reinforcement Learning
Process Data
Microservice
Agentic AI (Intelligent Agent) applications normally possess high autonomy, enabling autonomous decision-making in complex environments, calling external tools, and may involve operations like code execution. This enables Agent to become a potential security risk, such as executing malicious code, sensitive data leakage, or system resource abuse. Its workflow is often multi-round and long-running, requiring extremely high status management and task isolation.
  • Security isolation: Each sandbox runs in an isolated controlled environment.
  • Ultimate startup speed: Instances launch in milliseconds, ensuring Intelligent Agent calls are out-of-the-box.
  • Various types: built-in browser sandbox, code sandbox, support scalable custom sandbox.
  • Multiple access methods: sandbox interface and protocols compatible with mainstream open-source communities
Model inference is the final stage of AI service provision, typically manifested as large-scale, high-concurrency online services with sudden request surges. The core challenges lie in low latency (ensuring user experience) and high throughput (supporting business concurrency), while also requiring extremely high resource usage rate to reduce costs. Inference service often involves multi-model deployment, model version grayscale release, and dynamic scaling.
  • Inference acceleration: Tencent Self-developed TACO inference acceleration framework for all-round improvement in inference performance.
  • GPU sharing: Deeply integrated qGPU sharing technology to fully leverage heterogeneous computing power
  • Ultimate elasticity: super node pre-created sandbox technology, combined with HPA/VPA to achieve millisecond-level response for reasoning resources
Reinforcement learning and deep learning model training require long-term, high-concurrency, and high communication efficiency resources. Large-scale distributed training in particular has stringent requirements for GPU memory, high-speed networks (such as RDMA), and fault tolerance. Training tasks must run steadily for several hours or even weeks, and once interrupted, the cost is extremely high.
  • High-speed interconnection: Supports high-performance networks such as RDMA to ensure low-delay data transmission.
  • Professional scheduling and fault tolerance: Optimizing resource allocation and topology awareness for distributed jobs
  • Self-healing: When a node fault occurs, the training task can automatically Checkpoint and restore based on fault notification, ensuring training duration and achievement.
The success of AI models critically depends on high-quality data. Data processing tasks (such as ETL, feature engineering, annotation) are usually driven by Batch Compute or workflows. These tasks are characterized by short lifecycles, bursty resource requirements, complex dependency relationships, and high demands for storage and data access performance.
  • Workflow orchestration and scheduling: Supports mainstream cloud native workflow engines such as Argo Workflows, enabling easy orchestration of complex data preprocessing pipelines through containerized tasks.
  • Efficient storage mounting: Seamless integration of high-performance cloud storage (such as CFS Turbo, Goosefs) enables fast, large-scale container data access through CSI plugin implementation.
  • Cost optimization: Support offline hybrid deployment to significantly reduce data processing costs.
Microservices architecture is suitable for building complex applications by breaking down monolithic applications into multiple microservices from different dimensions. Each microservice is managed using a docker image. Without changing functionality, the application is split into multiple manageable services, with each monolithic service being easy to understand, develop, and maintain. Different microservices can also be developed by different teams, and development teams can freely select development technologies and programming languages. Each service can be deployed and expanded independently.
  • Agile development and delivery: Provide standardized delivery carriers, combine GitOps/CI/CD pipelines, quickly build, test, and deploy applications, and accelerate service iteration cycles.
  • End-to-end observability: Unified logs, monitoring, and alarm platform covering every level from applications, containers to the cluster.
  • Stable and reliable: high availability deployment with backup and recovery support
Resources and documentation
Cluster Overview
TKE provides a container-centric solution based on native Kubernetes, addressing environment-related issues in user development, testing, and Ops processes to reduce costs and improve efficiency.
Purchase Guide
Tencent Kubernetes Engine (TKE) charges cluster management fees for managed clusters of varying specifications. Other cloud product resources (such as CVM, CBS, and CLB) created in use are billed based on their respective cloud services' billing modes.
Native Node Overview
Native node is a brand-new node type launched by TKE, relying on Tencent Cloud's 10-million-core container operation and maintenance technical precipitation to provide users with native, highly stable, and fast response K8s node management capability.
Super Node Overview
Super nodes provide users with AZ-level node capabilities that support custom specifications, similar to using a very large specification CVM, making resource management and scaling easier.
FAQS

Frequently Asked

Questions

High-risk operations in TKE that require attention?

You can go to the TKE high-risk operation to view and understand.

How to create a container cluster?

To effectively use Tencent Cloud container service (Tencent Kubernetes Engine (TKE)), you can check the quick start document for deploying TKE.

How should the resource operation permissions for cloud resources involved in the cluster be configured?

When using TKE, you may encounter multiple scenarios that require service authorization to access related Tencent Cloud resources. Each scenario usually corresponds to preset policies contained in different roles, mainly involving two roles: TKE_QCSRole and IPAMDofTKE_QCSRole. Go to the service authorization related role permission description for more info.

FAQS

Frequently Asked

Questions

High-risk operations in TKE that require attention?

You can go to the TKE high-risk operation to view and understand.

How to create a container cluster?

To effectively use Tencent Cloud container service (Tencent Kubernetes Engine (TKE)), you can check the quick start document for deploying TKE.

How should the resource operation permissions for cloud resources involved in the cluster be configured?

When using TKE, you may encounter multiple scenarios that require service authorization to access related Tencent Cloud resources. Each scenario usually corresponds to preset policies contained in different roles, mainly involving two roles: TKE_QCSRole and IPAMDofTKE_QCSRole. Go to the service authorization related role permission description for more info.

Are you ready to get started?
For more information on use cases and technical architecture, please contact our sales and technical support teams.