Kubernetes pod stuck ContainerCreating: volume, network, and image issues

A pod stuck in ContainerCreating never produces logs or readiness events. The kubelet accepted the spec but blocked during initialization after scheduling and before the container runtime starts the user process. The dominant failure domains are volume mount deadlocks, CNI sandbox creation failures, and image pull problems. They all surface the same status but need different fixes. This guide shows how to identify the stuck subsystem and resolve it.

What this means

ContainerCreating is the phase where the kubelet pulls the image, creates the pod sandbox via CRI, attaches and mounts volumes, injects ConfigMaps and Secrets, and starts the container. These steps run largely synchronously in the pod worker. If a mount hangs, a CNI plugin errors, or the registry rejects the pull, the worker blocks and the pod stays in ContainerCreating. The node condition can remain Ready because the kubelet sync loop and PLEG are still healthy, so cluster monitoring may miss the problem. Identifying whether the block is in storage, network, or images is the first step.

Common causes

CauseWhat it looks likeFirst thing to check
Volume mount deadlockPod stuck for more than 10 minutes; kubectl describe shows MountVolume.SetUp or AttachVolume.Attach events with old timestamps; other pods with volumes on the same node may also stallVolumeAttachment objects and the CSI driver pods on the node
CNI plugin failureFailedCreatePodSandBox events referencing CNI or network plugin errors; existing pods on the node run fine but new pods cannot startCNI DaemonSet pod health and /etc/cni/net.d/ contents on the node
Image pull failureFailed to pull image or registry auth errors; pod may move to ImagePullBackOff after initial ContainerCreatingNode disk space, registry credentials, and image tag validity
Stale VolumeAttachmentMulti-attach error for a ReadWriteOnce volume still bound to a previous node after a crash or force deleteVolumeAttachment .spec.nodeName versus the current scheduled node
CSI driver unavailabilityCSI driver pod on the node is not running; AttachVolume.Attach failed events appearCSI driver pods in the driver namespace
Node disk pressureDiskPressure condition prevents image pulls and may defer container creation; image garbage collection cannot keep upNode filesystem usage on nodefs and imagefs

Quick checks

Check pod events to locate the failure domain. kubectl describe pod surfaces whether the block is in volumes, network, or image pull.

kubectl describe pod <pod-name> -n <namespace>

List events chronologically. Surfaces transient errors that may have scrolled out of kubectl describe pod.

kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.lastTimestamp'

Check for stuck volume attachments. A ReadWriteOnce volume still attached to a previous node blocks rescheduling indefinitely.

kubectl get volumeattachment -o wide

Verify CNI pod health on the specific node. A crashed or evicted CNI DaemonSet pod prevents sandbox creation.

kubectl get pods -n kube-system -l app=<cni-label> --field-selector spec.nodeName=<node-name>

Inspect CNI configuration and binaries on the node. Missing or corrupted files in these directories prevent sandbox creation.

ls -la /etc/cni/net.d/
ls -la /opt/cni/bin/

Test container runtime responsiveness. If the runtime socket is slow or hung, all pod operations stall.

crictl --runtime-endpoint unix:///run/containerd/containerd.sock info

Check disk space for image pulls and volume storage. Image extraction fails silently when imagefs or nodefs is full.

df -h /var/lib/kubelet /var/lib/containerd

Look for hung mounts involving kubelet paths. A stuck NFS or CSI mount blocks the volume manager goroutine.

mount | grep -E "/var/lib/kubelet/pods"

Check kubelet logs for volume, network, or image errors. Filter recent logs to correlate errors with the pod creation time.

journalctl -u kubelet --since "10 minutes ago" | grep -iE "sandbox|volume|mount|image|pull"

Check node conditions for resource pressure. Pressure conditions cause the kubelet to defer container creation without surfacing an explicit pod event.

kubectl describe node <node-name> | grep -E "MemoryPressure|DiskPressure|PIDPressure"

How to diagnose it

  1. Inspect pod events to identify the blocking phase. kubectl describe pod Events show FailedMount or AttachVolume for storage, FailedCreatePodSandBox for CNI, and Failed to pull image for registry or disk issues. No recent events means the operation is hung rather than failed, or an init container is blocking startup.

  2. If the events point to volumes, check for hung or stale attachments. Run kubectl get volumeattachment and compare .spec.nodeName to your pod’s scheduled node. A ReadWriteOnce volume still attached to a previous node after a crash blocks the new pod. A force-deleted node may leave the attachment permanently. Check mount | grep kubelet on the node. A mount pending for minutes means the volume manager goroutine may be blocked. Verify the CSI driver pod on the node is running and not restarting.

  3. If the events point to network, verify CNI configuration and pod health. A missing or corrupted file in /etc/cni/net.d/ or a missing binary in /opt/cni/bin/ prevents sandbox creation. Check that the CNI DaemonSet pod on the node is running and not OOMKilled. Run crictl pods to see if sandboxes are being created at all. If sandbox creation errors are increasing, the CNI plugin is the failure domain, not the kubelet. Existing pods continue running because CNI only runs during setup.

  4. If the events point to images, test pulls and verify disk space. Run crictl pull <image> directly on the node to confirm registry reachability and authentication. Check df -h on imagefs and nodefs. If disk usage is above the image garbage collection high threshold, pulls may fail because there is no space to extract layers. Verify the pod’s imagePullSecrets and that the tag exists in the registry. Image pulls are serialized by default, so one slow pull blocks all subsequent pulls on the node.

  5. Check for node resource pressure that defers creation. A node with DiskPressure, MemoryPressure, or PIDPressure may silently delay new container creation. kubectl describe node shows active conditions. If pressure exists, the kubelet is protecting the node by refusing to start new workloads until resources are freed. This can look like a ContainerCreating hang with no error events.

  6. Distinguish ContainerCreating from init container delays. If kubectl get pod shows Init:0/1, the main containers are waiting for an init container to finish. This is not a volume, network, or image failure in the main container. Check init container statuses separately before investigating the main startup path.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
storage_operation_duration_secondsVolume attach and mount latency directly block pod startupOperations exceeding 2 minutes or increasing p99
kubelet_runtime_operations_errors_total{operation_type="run_podsandbox"}CNI failures surface as sandbox creation errorsError rate above zero sustained for more than one minute
kubelet_image_pull_duration_secondsSlow or failing pulls delay container creationp99 pull time trending above baseline or timeouts appearing
kubelet_pod_start_duration_secondsEnd-to-end time from spec to runningp99 startup latency above 30 seconds for cached images
Node DiskPressure conditionPrevents image extraction and can defer all container creationCondition True or nodefs usage above 80 percent
kubelet_sync_loop_duration_secondsA stressed kubelet falls behind on reconciliationDuration trending above 10 seconds on typical nodes
kubelet_pleg_relist_duration_secondsDistinguishes node-level runtime slowness from individual pod issuesp99 above 5 seconds indicates runtime stress
Running vs desired pod countA gap means the kubelet cannot start some podsSustained gap on a node with Ready=True

Fixes

If the cause is a volume mount deadlock

Identify the stuck volume via kubectl describe pod. If the volume is still attached to a previous node, scale the owning StatefulSet or Deployment to zero replicas and wait for the controller to detach it. If the VolumeAttachment object persists after the workload is scaled down, deleting it may be necessary. Warning: Only delete a VolumeAttachment when you are certain the volume is not mounted on the original node. Data corruption or multi-node mount corruption can result. For CSI drivers, restart the driver pod on the affected node if it is crash-looping. Cordon the node if multiple pods are affected.

If the cause is a CNI failure

Check the CNI DaemonSet pod on the node. If it is missing or failed, deleting the pod forces recreation. Verify that /etc/cni/net.d/ contains a valid .conflist file and that /opt/cni/bin/ contains the required binaries. If the configuration is corrupted, restore it from a known-good source or redeploy the CNI DaemonSet. For IP exhaustion, check the IPAM allocation and release unused IPs if the plugin supports it.

If the cause is an image pull failure

Free disk space on nodefs and imagefs if DiskPressure is active. Manually removing unused images with crictl rmi frees space but forces subsequent pods to re-pull, so prefer letting kubelet image garbage collection run first. Verify imagePullSecrets are attached to the pod or its service account. Re-create the pod to reset the exponential backoff if the registry issue was transient. If the image tag is invalid, fix the deployment spec. Test the pull directly on the node with crictl pull to confirm resolution.

If the cause is node resource pressure

Evict non-critical pods to relieve memory or PID pressure. For disk pressure, identify large log directories under /var/log/pods/ or orphaned data under /var/lib/kubelet/pods/, and clean them safely. Expand the node filesystem or add a dedicated imagefs partition if the root cause is capacity. Do not restart kubelet as a first response. Resolving the resource constraint allows the kubelet to resume normal operation.

Prevention

  • Alert on storage_operation_duration_seconds p99 above 60 seconds to catch volume hangs before they deadlock the volume manager.
  • Monitor CNI DaemonSet health per node, not just cluster-wide, because CNI is a node-local dependency.
  • Track imagefs and nodefs usage trends separately. Inode exhaustion is as damaging as space exhaustion.
  • Set containerLogMaxSize and containerLogMaxFiles in the kubelet configuration to prevent log-driven disk pressure.
  • Use resource requests and limits to avoid node pressure that silently defers pod creation.
  • Validate VolumeAttachment cleanup in your node termination runbook. Force-deleted nodes leave orphaned attachments that block rescheduling.

How Netdata helps

  • Correlate node disk pressure and imagefs usage with image pull failures in the same time window.
  • Surface kubelet sync loop duration and PLEG relist latency to distinguish a saturated kubelet from a stuck volume or network operation.
  • Track container runtime operation errors per type to spot CNI sandbox failures without log diving.
  • Monitor node filesystem usage and inodes to predict disk pressure before it blocks pod creation.