OOM-Guard is an add-on provided by TKE for handling container cgroup out-of-memory (OOM) in user mode. Because the process by which the Linux kernel handles cgroup OOM has many bugs, frequent cgroup OOM can often incur node failures (crash, restart, and unkillable abnormal processes). When cgroup OOM occurs, before the kernel kills the container process, OOM-Guard can kill the limit-exceeding container in the user space, reducing the chance of various kernel failures triggered by code branches that fail to be repossessed by the kernel cgroup memory.
Before the OOM threshold is triggered, OOM-Guard writes memory.force_empty
to trigger-related cgroup memory repossessing. After the repossession, if memory.stat
shows a sufficient cache capacity, no subsequent processing policies will be triggered. After a container is killed due to cgroup OOM, the add-on reports the OomGuardKillContainer
event to Kubernetes. You can query the event by running the kubectl get event
command.
Kubernetes Object Name | Type | Default Resource Consumption | Namespaces |
---|---|---|---|
oomguard | ServiceAccount | - | kube-system |
system:oomguard | ClusterRoleBinding | - | - |
oom-guard | DaemonSet | 0.02-core CPU and 120-MB memory | kube-system |
This add-on is suitable for Kubernetes clusters where the node memory pressure is high and business container OOM often incur node failures.
/run/docker/containerd/docker-containerd.sock
/run/containerd/containerd.sock
/sys/fs/cgroup/memory
mount target is retained.
Was this page helpful?