$ guides / kubernetes / kubernetes-pod-init-container-fails ▌

Operations Guides

Kubernetes init container fails: blocking main container start

An init container failure blocks the entire pod. Until every init container exits zero, no main container starts. In automated clusters, one failing init container can stall a deployment while pods sit in PodInitializing. This guide covers init container retry mechanics, how to read pod status to find the failing step, and how to distinguish application bugs, resource limits, and kubelet issues that leave pods permanently stuck.

What this means

Classic init containers run sequentially before app containers start. The pod’s restartPolicy governs kubelet retry behavior:

Always: kubelet restarts a failed init container in a CrashLoopBackOff loop until it succeeds.
Never: kubelet does not retry. The pod transitions to Failed.
OnFailure: kubelet retries on non-zero exit; zero is success.

A pod running init containers shows Init:N/M, where N is completed and M is total. The container at index N (0-indexed) is currently running; if it fails, all subsequent init and main containers are blocked. Classic init containers do not support probes (livenessProbe, readinessProbe, startupProbe); adding them causes a validation error.

Sidecar containers (stable since Kubernetes v1.33) are init containers with restartPolicy: Always. They continue running for the pod’s lifetime and support probes.

Known kubelet bugs can leave a pod in PodInitializing after a terminal failure:

Kubernetes 1.23+: an init container OOMKilled with pod restartPolicy: Never leaves the pod stuck (GitHub issue #116676).
Kubernetes 1.30.0 with SidecarContainers enabled: a SyncPod RunContainerError during init startup can leave the container in CONTAINER_CREATED with no retry (GitHub issue #126440).
v1.35: crashed or OOMKilled sidecars with restartPolicy: Always may fail to restart (GitHub issue #136910).

Per-container restartPolicy (ContainerRestartRules) is an alpha feature in Kubernetes v1.34, requiring the ContainerRestartRules feature gate. It is not production-ready.

Common causes

Cause	What it looks like	First thing to check
Image pull failure	`Init:ImagePullBackOff` or `Init:ErrImagePull`	`kubectl describe pod` events for image errors
OOMKilled or resource pressure	Last state `Terminated`, reason `OOMKilled`, exit code 137	Node memory pressure and init container memory limits
Command or script error	`Waiting` with reason `CrashLoopBackOff`, non-zero exit	`kubectl logs -c <init-container-name>`
Network or DNS timeout	Long `Init` duration, non-zero exit, timeouts in logs	DNS resolution and network policies from the node
Sidecar probe race (bug #132826)	Sidecar `startupProbe` fails first attempt, never ready	Kubelet version and sidecar probe config
`OOMKilled` + `restartPolicy: Never` stuck state (bug #116676)	Pod stays `PodInitializing`, exit state 0, reason `OOMKilled`	Init container memory limits and kubelet version
`SyncPod RunContainerError` with `SidecarContainers` (bug #126440)	Init container stuck in `CONTAINER_CREATED`, no `Started` event	Kubelet logs for `RunContainerError`

Quick checks

# Pod phase and init status
kubectl get pod <pod-name> -n <namespace>

# Per-init-container state, exit code, and restart count
kubectl describe pod <pod-name> -n <namespace>

# Logs from the failing init container
kubectl logs <pod-name> -c <init-container-name> -n <namespace>

# Previous run logs after a restart
kubectl logs <pod-name> -c <init-container-name> --previous -n <namespace>

# Programmatic init container statuses
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.initContainerStatuses}'

# Recent events for image pull or scheduling issues
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>

# Node memory pressure
kubectl describe node <node-name> | grep -A 5 "Conditions:"

Expected progression: Init:2/3 -> Init:3/3 -> Running. Stuck states: Init:1/3 for minutes, or Init:CrashLoopBackOff.

How to diagnose it

flowchart TD
    A[Pod stuck in PodInitializing] --> B{kubectl get pod
Init:N/M?}
    B -->|Yes| C[kubectl describe pod
check Reason/Exit Code]
    C --> D{Exit Code 137?}
    D -->|Yes| E[Check node memory pressure
and container limits]
    D -->|No| F{ImagePullBackOff?}
    F -->|Yes| G[Verify image, tag, registry
and pull secrets]
    F -->|No| H[kubectl logs -c init-container
check application error]
    E --> I{restartPolicy: Never
and OOMKilled?}
    I -->|Yes| J[Delete pod; increase limits
bug #116676 workaround]
    I -->|No| K[Increase limit or move pod]
    H --> L{Sidecar with startupProbe?}
    L -->|Yes| M[Check kubelet version
bug #132826]
    L -->|No| N[Fix script/image
and recreate pod]

Find the failing init container. kubectl get pod shows Init:N/M. The container at index N (0-indexed) has not completed.
Check termination reason and exit code. kubectl describe pod shows Waiting/CrashLoopBackOff or Terminated with Error/OOMKilled. Record the exit code.
Read logs. kubectl logs <pod> -c <init-container>. Add --previous if the container restarted.
Check for image pull failures. ImagePullBackOff blocks the pod indefinitely. Verify the tag, registry credentials, and node connectivity.
Check for OOM or resource pressure. Exit code 137 (SIGKILL) usually means OOM. Compare the init container memory limit to its observed peak usage. If the node reports MemoryPressure, kubelet may be evicting.
Evaluate sidecar bugs. If the pod uses a sidecar with a startupProbe and the kubelet matches bug #132826, check if the failure is limited to the first attempt. If an init container is stuck in CONTAINER_CREATED on v1.30.0 with SidecarContainers enabled, search kubelet logs for RunContainerError.
Determine retry behavior from restartPolicy. With restartPolicy: Never and a non-zero exit, the pod should be Failed. If it stays PodInitializing, you may have hit bug #116676 or #126440. Recovery usually requires deleting the pod.
Check activeDeadlineSeconds. If set, the pod fails after that duration regardless of init state.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Pod phase `Pending`, kubectl status `Init:N/M`	Incomplete init sequence	Duration in init state exceeds expected startup
Init container restart count	Retry loops under `Always` or `OnFailure`	Restart count increasing steadily
Init container exit code and reason	Distinguishes OOM, error, or success	Exit code 137 or non-zero application exit
Node memory pressure / OOM kills	Init containers share node cgroups; OOM kills block startup	Node `MemoryPressure=True` or container `OOMKilled`
Image pull latency and errors	`ImagePullBackOff` on an init container blocks the pod	`ImagePullBackOff` or `ErrImagePull` events
DNS resolution latency from node	Init containers that depend on external services stall on DNS	DNS lookup timeouts in init container logs
Pod active deadline exceeded	`activeDeadlineSeconds` forces terminal failure after hangs	Pod phase `Failed` with `DeadlineExceeded`

Fixes

If the cause is image pull failure

Verify the image tag exists and is accessible from the node. Check imagePullSecrets on the service account or pod spec. If the registry is down or rate-limiting, the pod remains in PodInitializing until the image is available or the pod is deleted. There is no automatic timeout.

If the cause is resource pressure or OOMKilled

Increase the init container memory limit. Init containers compete for node resources like app containers. If the node is under MemoryPressure, cordon the node, delete the pod, and let the scheduler place it elsewhere. For bug #116676 (Kubernetes 1.23+, OOMKilled with restartPolicy: Never), raising the limit is the only workaround.

If the cause is an application or script error

Fix the command or script in the image. You cannot patch a running pod’s init container image. Roll out the parent workload (Deployment, Job, etc.) or delete the pod to force recreation.

If the cause is network or DNS timeout

Verify the init container can reach its dependency from the node. Check that NetworkPolicy allows egress during initialization. If the dependency is a cluster service, confirm CoreDNS health and endpoint readiness. Increase init container timeouts instead of relying solely on activeDeadlineSeconds.

If the cause is a sidecar probe race or kubelet bug

If a sidecar has a startupProbe and the kubelet is affected by bug #132826, remove the probe or upgrade kubelet. If an init container is stuck in CONTAINER_CREATED due to bug #126440, delete the pod. For sidecars that fail to restart after being killed (bug #136910), evaluate kubelet patch levels.

If the cause is `restartPolicy: Never` with a terminal init failure

With restartPolicy: Never, a failed init container should make the pod Failed. If the pod hangs in PodInitializing, delete it. Do not attempt to restart an individual init container; pod-level policy does not support it.

Prevention

Set resource limits on init containers. They are often omitted because they are short-lived, but an OOMKilled init container blocks the pod indefinitely.
Use activeDeadlineSeconds on the pod spec to force-fail a hung init sequence.
Design init containers to fail fast. Avoid long timeouts in scripts; exit quickly so the backoff or failure policy applies.
Avoid restartPolicy: Never for init-dependent workloads unless you have explicit handling for Failed pods. Always or OnFailure allows recovery from transient errors.
Validate sidecar probe configurations. Misconfigured startupProbe settings can trigger kubelet race conditions.
Monitor node memory and disk pressure. Init containers are subject to the same eviction signals as app containers.

How Netdata helps

Correlate pod-level init duration with node memory pressure and OOM kill events to confirm resource-driven failures.
Track container restart counts per pod to detect init container retry loops before they block deployments.
Monitor image pull latency and registry error rates alongside pod status to isolate registry failures from application bugs.
Alert on node MemoryPressure and disk saturation as leading indicators for init container terminations.

For rollout issues when init container failures block a deployment, see Kubernetes Deployment rollout stuck: stalled rollouts and ready replicas.
If your init container depends on cluster DNS, see Kubernetes DNS resolution failures inside pods.
For resource pressure that kills init containers before they complete, see Kubernetes API server memory pressure: OOM cycle and tuning.
For scheduling constraints that leave pods pending before init containers even start, see Kubernetes DaemonSet pods Pending: scheduling and tolerations.

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes init container fails: blocking main container start

Kubernetes init container fails: blocking main container start

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is image pull failure

If the cause is resource pressure or OOMKilled

If the cause is an application or script error

If the cause is network or DNS timeout

If the cause is a sidecar probe race or kubelet bug

If the cause is restartPolicy: Never with a terminal init failure

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata

If the cause is `restartPolicy: Never` with a terminal init failure