OOM-Guard is an add-on provided by TKE for handling container cgroup out-of-memory (OOM) in user mode. Because the process by which the Linux kernel handles cgroup OOM has many bugs, frequent cgroup OOM can often incur node failures (crash, restart, and unkillable abnormal processes). When cgroup OOM occurs, before the kernel kills the container process, OOM-Guard can kill the limit-exceeding container in the user space, reducing the chance of various kernel failures triggered by code branches that fail to be repossessed by the kernel cgroup memory.
Before the OOM threshold is triggered, OOM-Guard writes
memory.force_empty to trigger-related cgroup memory repossessing. After the repossession, if
memory.stat shows a sufficient cache capacity, no subsequent processing policies will be triggered. After a container is killed due to cgroup OOM, the add-on reports the
OomGuardKillContainer event to Kubernetes. You can query the event by running the
kubectl get event command.
|Kubernetes Object Name||Type||Default Resource Consumption||Namespaces|
|oom-guard||DaemonSet||0.02-core CPU and 120-MB memory||kube-system|
This add-on is suitable for Kubernetes clusters where the node memory pressure is high and business container OOM often incur node failures.
/sys/fs/cgroup/memorymount target is retained.