Situations in the cluster emerge one after another and are unpredictable, such as exceptional node status and pod restarts. If these situations cannot be perceived in the first place, users will miss the best time to deal with them. It's often too late to find out when the problem worsens and affects businesses.
Event logs record comprehensive information about cluster status changes, helping users find and troubleshoot problems in the first place.
An event log is one of many resource objects in Kubernetes and is usually used to record status changes within a cluster, ranging from cluster node exceptions to pod startup and scheduling success. You can use the 'kubectl describe' command to view the event log information of resources.
type): currently only the Normal and Warning levels are supported. If necessary, you can customize a level.
involvedobject): objects involved in the event, such as Pod, Deployment, and Node.
source): component that reports the event, such as Scheduler and Kubelet.
reason): brief description of the current event. Generally, an enumerated value is used. This field is used within the program.
message): detailed description of the current event.
count): number of times the event occurs
CLS provides a one-stop service for Kubernetes event logs, including collection, storage, search, and analysis capabilities. You only need to enable the cluster event log feature with a few clicks to obtain a visual event log analysis dashboard out of the box. With visual charts, you can easily solve most common OPS problems via the console.
You have purchased the Tencent Kubernetes Engine (TKE) service and enabled the cluster event log feature. For more information, please see Operation Guide.
The query result is displayed. As shown in the following figure, an event is found in the Insufficient Node Disk Space area.
Check the exception event trend and top exception events.
As shown in the above figure, starting from
2020-11-25, the node
172.16.18.13 was exceptional due to insufficient disk space. Then Kubelet began to drain pods on the node to repossess the node's disk space.
For clusters with node pool auto scaling enabled, the Cluster Autoscaler (CA) component will automatically increase or decrease the number of nodes in the cluster based on the actual load. If the nodes in the cluster are automatically scaled out, users can trace the entire scaling process through event search.
event.source.component : "cluster-autoscaler"
event.involvedObject.nameto display. Sort the query results in reverse order by
Log Time. The result is as shown in the following figure.
According to the event flow in the above figure, you can find that the node scaling occurred around
2020-11-25 20:35:45 and was triggered by three NGINX pods (nginx-5dbf784b68-tq8rd, nginx-5dbf784b68-fpvbx, and nginx-5dbf784b68-v9jv5). After three nodes were scaled out, the subsequent scaling was not triggered because the number of nodes in the node pool reached the upper limit.