Kubernetes init container fails: blocking main container start

An init container failure blocks the entire pod. Until every init container exits zero, no main container starts. In automated clusters, one failing init container can stall a deployment while pods sit in PodInitializing. This guide covers init container retry mechanics, how to read pod status to find the failing step, and how to distinguish application bugs, resource limits, and kubelet issues that leave pods permanently stuck.

What this means

Classic init containers run sequentially before app containers start. The pod’s restartPolicy governs kubelet retry behavior:

  • Always: kubelet restarts a failed init container in a CrashLoopBackOff loop until it succeeds.
  • Never: kubelet does not retry. The pod transitions to Failed.
  • OnFailure: kubelet retries on non-zero exit; zero is success.

A pod running init containers shows Init:N/M, where N is completed and M is total. The container at index N (0-indexed) is currently running; if it fails, all subsequent init and main containers are blocked. Classic init containers do not support probes (livenessProbe, readinessProbe, startupProbe); adding them causes a validation error.

Sidecar containers (stable since Kubernetes v1.33) are init containers with restartPolicy: Always. They continue running for the pod’s lifetime and support probes.

Known kubelet bugs can leave a pod in PodInitializing after a terminal failure:

  • Kubernetes 1.23+: an init container OOMKilled with pod restartPolicy: Never leaves the pod stuck (GitHub issue #116676).
  • Kubernetes 1.30.0 with SidecarContainers enabled: a SyncPod RunContainerError during init startup can leave the container in CONTAINER_CREATED with no retry (GitHub issue #126440).
  • v1.35: crashed or OOMKilled sidecars with restartPolicy: Always may fail to restart (GitHub issue #136910).

Per-container restartPolicy (ContainerRestartRules) is an alpha feature in Kubernetes v1.34, requiring the ContainerRestartRules feature gate. It is not production-ready.

Common causes

CauseWhat it looks likeFirst thing to check
Image pull failureInit:ImagePullBackOff or Init:ErrImagePullkubectl describe pod events for image errors
OOMKilled or resource pressureLast state Terminated, reason OOMKilled, exit code 137Node memory pressure and init container memory limits
Command or script errorWaiting with reason CrashLoopBackOff, non-zero exitkubectl logs -c <init-container-name>
Network or DNS timeoutLong Init duration, non-zero exit, timeouts in logsDNS resolution and network policies from the node
Sidecar probe race (bug #132826)Sidecar startupProbe fails first attempt, never readyKubelet version and sidecar probe config
OOMKilled + restartPolicy: Never stuck state (bug #116676)Pod stays PodInitializing, exit state 0, reason OOMKilledInit container memory limits and kubelet version
SyncPod RunContainerError with SidecarContainers (bug #126440)Init container stuck in CONTAINER_CREATED, no Started eventKubelet logs for RunContainerError

Quick checks

# Pod phase and init status
kubectl get pod <pod-name> -n <namespace>

# Per-init-container state, exit code, and restart count
kubectl describe pod <pod-name> -n <namespace>

# Logs from the failing init container
kubectl logs <pod-name> -c <init-container-name> -n <namespace>

# Previous run logs after a restart
kubectl logs <pod-name> -c <init-container-name> --previous -n <namespace>

# Programmatic init container statuses
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.initContainerStatuses}'

# Recent events for image pull or scheduling issues
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>

# Node memory pressure
kubectl describe node <node-name> | grep -A 5 "Conditions:"

Expected progression: Init:2/3 -> Init:3/3 -> Running. Stuck states: Init:1/3 for minutes, or Init:CrashLoopBackOff.

How to diagnose it

flowchart TD
    A[Pod stuck in PodInitializing] --> B{kubectl get pod
Init:N/M?} B -->|Yes| C[kubectl describe pod
check Reason/Exit Code] C --> D{Exit Code 137?} D -->|Yes| E[Check node memory pressure
and container limits] D -->|No| F{ImagePullBackOff?} F -->|Yes| G[Verify image, tag, registry
and pull secrets] F -->|No| H[kubectl logs -c init-container
check application error] E --> I{restartPolicy: Never
and OOMKilled?} I -->|Yes| J[Delete pod; increase limits
bug #116676 workaround] I -->|No| K[Increase limit or move pod] H --> L{Sidecar with startupProbe?} L -->|Yes| M[Check kubelet version
bug #132826] L -->|No| N[Fix script/image
and recreate pod]
  1. Find the failing init container. kubectl get pod shows Init:N/M. The container at index N (0-indexed) has not completed.
  2. Check termination reason and exit code. kubectl describe pod shows Waiting/CrashLoopBackOff or Terminated with Error/OOMKilled. Record the exit code.
  3. Read logs. kubectl logs <pod> -c <init-container>. Add --previous if the container restarted.
  4. Check for image pull failures. ImagePullBackOff blocks the pod indefinitely. Verify the tag, registry credentials, and node connectivity.
  5. Check for OOM or resource pressure. Exit code 137 (SIGKILL) usually means OOM. Compare the init container memory limit to its observed peak usage. If the node reports MemoryPressure, kubelet may be evicting.
  6. Evaluate sidecar bugs. If the pod uses a sidecar with a startupProbe and the kubelet matches bug #132826, check if the failure is limited to the first attempt. If an init container is stuck in CONTAINER_CREATED on v1.30.0 with SidecarContainers enabled, search kubelet logs for RunContainerError.
  7. Determine retry behavior from restartPolicy. With restartPolicy: Never and a non-zero exit, the pod should be Failed. If it stays PodInitializing, you may have hit bug #116676 or #126440. Recovery usually requires deleting the pod.
  8. Check activeDeadlineSeconds. If set, the pod fails after that duration regardless of init state.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Pod phase Pending, kubectl status Init:N/MIncomplete init sequenceDuration in init state exceeds expected startup
Init container restart countRetry loops under Always or OnFailureRestart count increasing steadily
Init container exit code and reasonDistinguishes OOM, error, or successExit code 137 or non-zero application exit
Node memory pressure / OOM killsInit containers share node cgroups; OOM kills block startupNode MemoryPressure=True or container OOMKilled
Image pull latency and errorsImagePullBackOff on an init container blocks the podImagePullBackOff or ErrImagePull events
DNS resolution latency from nodeInit containers that depend on external services stall on DNSDNS lookup timeouts in init container logs
Pod active deadline exceededactiveDeadlineSeconds forces terminal failure after hangsPod phase Failed with DeadlineExceeded

Fixes

If the cause is image pull failure

Verify the image tag exists and is accessible from the node. Check imagePullSecrets on the service account or pod spec. If the registry is down or rate-limiting, the pod remains in PodInitializing until the image is available or the pod is deleted. There is no automatic timeout.

If the cause is resource pressure or OOMKilled

Increase the init container memory limit. Init containers compete for node resources like app containers. If the node is under MemoryPressure, cordon the node, delete the pod, and let the scheduler place it elsewhere. For bug #116676 (Kubernetes 1.23+, OOMKilled with restartPolicy: Never), raising the limit is the only workaround.

If the cause is an application or script error

Fix the command or script in the image. You cannot patch a running pod’s init container image. Roll out the parent workload (Deployment, Job, etc.) or delete the pod to force recreation.

If the cause is network or DNS timeout

Verify the init container can reach its dependency from the node. Check that NetworkPolicy allows egress during initialization. If the dependency is a cluster service, confirm CoreDNS health and endpoint readiness. Increase init container timeouts instead of relying solely on activeDeadlineSeconds.

If the cause is a sidecar probe race or kubelet bug

If a sidecar has a startupProbe and the kubelet is affected by bug #132826, remove the probe or upgrade kubelet. If an init container is stuck in CONTAINER_CREATED due to bug #126440, delete the pod. For sidecars that fail to restart after being killed (bug #136910), evaluate kubelet patch levels.

If the cause is restartPolicy: Never with a terminal init failure

With restartPolicy: Never, a failed init container should make the pod Failed. If the pod hangs in PodInitializing, delete it. Do not attempt to restart an individual init container; pod-level policy does not support it.

Prevention

  • Set resource limits on init containers. They are often omitted because they are short-lived, but an OOMKilled init container blocks the pod indefinitely.
  • Use activeDeadlineSeconds on the pod spec to force-fail a hung init sequence.
  • Design init containers to fail fast. Avoid long timeouts in scripts; exit quickly so the backoff or failure policy applies.
  • Avoid restartPolicy: Never for init-dependent workloads unless you have explicit handling for Failed pods. Always or OnFailure allows recovery from transient errors.
  • Validate sidecar probe configurations. Misconfigured startupProbe settings can trigger kubelet race conditions.
  • Monitor node memory and disk pressure. Init containers are subject to the same eviction signals as app containers.

How Netdata helps

  • Correlate pod-level init duration with node memory pressure and OOM kill events to confirm resource-driven failures.
  • Track container restart counts per pod to detect init container retry loops before they block deployments.
  • Monitor image pull latency and registry error rates alongside pod status to isolate registry failures from application bugs.
  • Alert on node MemoryPressure and disk saturation as leading indicators for init container terminations.