Pod Remains in ContainerCreating or Waiting

Last updated: 2020-05-25 09:34:49

    This article describes the causes that will lead a Pod to become stuck in the ContainerCreating or Waiting status and how to troubleshoot this issue. Refer to the following instructions to troubleshoot and solve these issues.

    Possible Causes

    • Incorrect Pod configurations
    • Volume failed to mount
    • Insufficient disk space
    • Node memory fragmentation
    • The value of Limit is too small or uses the wrong unit
    • Failure to pull the image
    • CNI network error
    • controller-manager exception
    • New Docker installed without completely uninstalling the old version
    • Duplicate container names

    Troubleshooting

    Checking Pod configuration

    1. Make sure the image is properly packaged.
    2. Make sure the container parameters are configured correctly.

    Checking volume mounting

    In the following two scenarios, volume mounting issues may cause exceptions:

    1. A volume fails to unmount due to Pod float

    Analysis
    The default volume in a managed Kubernetes cluster is usually a storage class cloud disk. If a node malfunctions and causes kubelet to fail or not be able to communicate with apiserver and the time threshold is reached, the Pods on the node are drained and backup Pods on another node are automatically started. This is called Pod floating. Drained Pods cannot function properly nor are they aware of their states. Therefore, the volume mounted to the node is not properly unmounted.
    cloud-controller-manager requires the volume to unmount properly in order to invoke vendor APIs to unmount disks from the node. Pod floating causes cloud-controller-manager to force unmount a volume after the time threshold is reached and mount it to the node where the Pod is scheduled.

    Impact
    The Pod may spend an extended period of time in ContainerCreating but will launch successfully.

    2. Hitting a subpath bug when mounting configmap/secret

    If you modify the content of configmap or secret that is already mounted and the container restarts in place, such as restarting after being killed for failing a liveness check, this bug is triggered.

    In this case, the container continuously fails to launch. The error messages are as follows:

    $ kubectl -n prod get pod -o yaml manage-5bd487cf9d-bqmvm
    ...
    lastState: terminated
    containerID: containerd://e6746201faa1dfe7f3251b8c30d59ebf613d99715f3b800740e587e681d2a903
    exitCode: 128
    finishedAt: 2019-09-15T00:47:22Z
    message: 'failed to create containerd task: OCI runtime create failed: container_linux.go:345:
    starting container process caused "process_linux.go:424: container init
    caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/211d53f4-d08c-11e9-b0a7-b6655eaf02a6/volume-subpaths/manage-config-volume/manage/0\\\"
    to rootfs \\\"/run/containerd/io.containerd.runtime.v1.linux/k8s.io/e6746201faa1dfe7f3251b8c30d59ebf613d99715f3b800740e587e681d2a903/rootfs\\\"
    at \\\"/run/containerd/io.containerd.runtime.v1.linux/k8s.io/e6746201faa1dfe7f3251b8c30d59ebf613d99715f3b800740e587e681d2a903/rootfs/app/resources/application.properties\\\"
    caused \\\"no such file or directory\\\"\"": unknown'

    For more information on how to resolve this issue, see pr82784.

    Checking for insufficient disk space

    A Pod uses the CRI APIs to create containers when it launches. This usually involves creating directories and files for the new containers under the data directory. If there is not enough disk space, container creation will fail with the following error messages:

    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedCreatePodSandBox 2m (x4307 over 16h) kubelet, 10.179.80.31 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "apigateway-6dc48bf8b6-l8xrw": Error response from daemon: mkdir /var/lib/docker/aufs/mnt/1f09d6c1c9f24e8daaea5bf33a4230de7dbc758e3b22785e8ee21e3e3d921214-init: no space left on device

    For more information and further instructions, see Disk Full.

    Checking for node memory fragmentation

    If node memory is severely fragmented or lacks large page memory, requests for more memory will fail even though there is plenty of memory left. For instructions on troubleshooting and solutions, refer to Memory Fragmentation.

    Checking for limit configuration

    Error description

    • Run kubectl describe pod and get the following message:
      Pod sandbox changed, it will be killed and re-created.
    • kubelet outputs the following error message:
      to start sandbox container for pod ... Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"signal: killed\"": unknown

    Solution

    If the value of limit is too small, Sandbox will fail to run. This will cause the Pod to remain in the ContainerCreating or Waiting status. This is usually a memory limit unit issue.

    For example, if you used m as the memory limit unit, then Kubernetes reads it as byte. The correct unit to use is Mi or M. If you set a memory limit to 1024m, that translates to 1.024 bytes, which causes a container to be killed by cgroup-oom every it attempts to launch. This results in the Pod remaining in the ContainerCreating state.

    Checking for image pull failures

    The failure to pull an image produces the same issue. There are many reasons why image pull may fail. The common ones are as follows:

    • The wrong image is used
    • kubelet cannot access the image registry. For example, images hosted on gcr.io are more difficult to access from Mainland China.
    • imagePullSecret is not present or is incorrect when pulling private images.
    • Image pull times out due to the size of the image. Adjust the value of --runtime-request-timeout and --image-pull-progress-deadline.

    Pull the image again after checking the above items and check the state of the Pod.

    Checking for CNI errors

    Make sure that CNI is configured and running properly. If not, you get the following messages:

    • Cannot configure Pod network
    • Cannot assign Pod IP address

    Checking for controller-manager issues

    Make sure the Master kube-controller-manager is running properly. Restart it if it is not.

    Checking for existing Docker versions

    If the node already has Docker installed or installed Docker without completely uninstalling the old Docker, a Pod may encounter the same issue.
    For example, if you have installed Docker multiple times using the following command in CentOS:

    yum install -y docker

    Due to the incompatibility issue among components of different versions, dockerd continuously fails to create containers. This results in the Pod remaining in the ContainerCreating status. Use kubectl describe pod and get the following error messages:

    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedCreatePodSandBox 18m (x3583 over 83m) kubelet, 192.168.4.5 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "nginx-7db9fccd9b-2j6dh": Error response from daemon: ttrpc: client shutting down: read unix @->@/containerd-shim/moby/de2bfeefc999af42783115acca62745e6798981dff75f4148fae8c086668f667/shim.sock: read: connection reset by peer: unknown
    Normal SandboxChanged 3m12s (x4420 over 83m) kubelet, 192.168.4.5 Pod sandbox changed, it will be killed and re-created.

    Choose a Docker version to keep and completely uninstall the other versions.

    Checking for duplicate container names

    Duplicate container names on the same node cause sandbox creation failures, which leads to Pods remaining in the ContainerCreating and Waiting statuses.
    Run kubectl describe pod and get the following error messages:

    Warning FailedCreatePodSandBox 2m kubelet, 10.205.8.91 Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "lomp-ext-d8c8b8c46-4v8tl": operation timeout: context deadline exceeded
    Warning FailedCreatePodSandBox 3s (x12 over 2m) kubelet, 10.205.8.91 Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "lomp-ext-d8c8b8c46-4v8tl": Error response from daemon: Conflict. The container name "/k8s_POD_lomp-ext-d8c8b8c46-4v8tl_default_65046a06-f795-11e9-9bb6-b67fb7a70bad_0" is already in use by container "30aa3f5847e0ce89e9d411e76783ba14accba7eb7743e605a10a9a862a72c1e2". You have to remove (or rename) that container to be able to reuse that name.

    Change container names and make sure there are no duplicate container names on the same node.

    Was this page helpful?

    Was this page helpful?

    • Not at all
    • Not very helpful
    • Somewhat helpful
    • Very helpful
    • Extremely helpful
    Send Feedback
    Help