Kubernetes init container fails: blocking main container start
An init container failure blocks the entire pod. Until every init container exits zero, no main container starts. In automated clusters, one failing init container can stall a deployment while pods sit in PodInitializing. This guide covers init container retry mechanics, how to read pod status to find the failing step, and how to distinguish application bugs, resource limits, and kubelet issues that leave pods permanently stuck.
What this means
Classic init containers run sequentially before app containers start. The pod’s restartPolicy governs kubelet retry behavior:
Always: kubelet restarts a failed init container in aCrashLoopBackOffloop until it succeeds.Never: kubelet does not retry. The pod transitions toFailed.OnFailure: kubelet retries on non-zero exit; zero is success.
A pod running init containers shows Init:N/M, where N is completed and M is total. The container at index N (0-indexed) is currently running; if it fails, all subsequent init and main containers are blocked. Classic init containers do not support probes (livenessProbe, readinessProbe, startupProbe); adding them causes a validation error.
Sidecar containers (stable since Kubernetes v1.33) are init containers with restartPolicy: Always. They continue running for the pod’s lifetime and support probes.
Known kubelet bugs can leave a pod in PodInitializing after a terminal failure:
- Kubernetes 1.23+: an init container
OOMKilledwith podrestartPolicy: Neverleaves the pod stuck (GitHub issue #116676). - Kubernetes 1.30.0 with
SidecarContainersenabled: aSyncPod RunContainerErrorduring init startup can leave the container inCONTAINER_CREATEDwith no retry (GitHub issue #126440). - v1.35: crashed or
OOMKilledsidecars withrestartPolicy: Alwaysmay fail to restart (GitHub issue #136910).
Per-container restartPolicy (ContainerRestartRules) is an alpha feature in Kubernetes v1.34, requiring the ContainerRestartRules feature gate. It is not production-ready.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Image pull failure | Init:ImagePullBackOff or Init:ErrImagePull | kubectl describe pod events for image errors |
| OOMKilled or resource pressure | Last state Terminated, reason OOMKilled, exit code 137 | Node memory pressure and init container memory limits |
| Command or script error | Waiting with reason CrashLoopBackOff, non-zero exit | kubectl logs -c <init-container-name> |
| Network or DNS timeout | Long Init duration, non-zero exit, timeouts in logs | DNS resolution and network policies from the node |
| Sidecar probe race (bug #132826) | Sidecar startupProbe fails first attempt, never ready | Kubelet version and sidecar probe config |
OOMKilled + restartPolicy: Never stuck state (bug #116676) | Pod stays PodInitializing, exit state 0, reason OOMKilled | Init container memory limits and kubelet version |
SyncPod RunContainerError with SidecarContainers (bug #126440) | Init container stuck in CONTAINER_CREATED, no Started event | Kubelet logs for RunContainerError |
Quick checks
# Pod phase and init status
kubectl get pod <pod-name> -n <namespace>
# Per-init-container state, exit code, and restart count
kubectl describe pod <pod-name> -n <namespace>
# Logs from the failing init container
kubectl logs <pod-name> -c <init-container-name> -n <namespace>
# Previous run logs after a restart
kubectl logs <pod-name> -c <init-container-name> --previous -n <namespace>
# Programmatic init container statuses
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.initContainerStatuses}'
# Recent events for image pull or scheduling issues
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>
# Node memory pressure
kubectl describe node <node-name> | grep -A 5 "Conditions:"
Expected progression: Init:2/3 -> Init:3/3 -> Running.
Stuck states: Init:1/3 for minutes, or Init:CrashLoopBackOff.
How to diagnose it
flowchart TD
A[Pod stuck in PodInitializing] --> B{kubectl get pod
Init:N/M?}
B -->|Yes| C[kubectl describe pod
check Reason/Exit Code]
C --> D{Exit Code 137?}
D -->|Yes| E[Check node memory pressure
and container limits]
D -->|No| F{ImagePullBackOff?}
F -->|Yes| G[Verify image, tag, registry
and pull secrets]
F -->|No| H[kubectl logs -c init-container
check application error]
E --> I{restartPolicy: Never
and OOMKilled?}
I -->|Yes| J[Delete pod; increase limits
bug #116676 workaround]
I -->|No| K[Increase limit or move pod]
H --> L{Sidecar with startupProbe?}
L -->|Yes| M[Check kubelet version
bug #132826]
L -->|No| N[Fix script/image
and recreate pod]- Find the failing init container.
kubectl get podshowsInit:N/M. The container at indexN(0-indexed) has not completed. - Check termination reason and exit code.
kubectl describe podshowsWaiting/CrashLoopBackOfforTerminatedwithError/OOMKilled. Record the exit code. - Read logs.
kubectl logs <pod> -c <init-container>. Add--previousif the container restarted. - Check for image pull failures.
ImagePullBackOffblocks the pod indefinitely. Verify the tag, registry credentials, and node connectivity. - Check for OOM or resource pressure. Exit code
137(SIGKILL) usually means OOM. Compare the init container memory limit to its observed peak usage. If the node reportsMemoryPressure, kubelet may be evicting. - Evaluate sidecar bugs. If the pod uses a sidecar with a
startupProbeand the kubelet matches bug #132826, check if the failure is limited to the first attempt. If an init container is stuck inCONTAINER_CREATEDon v1.30.0 withSidecarContainersenabled, search kubelet logs forRunContainerError. - Determine retry behavior from restartPolicy. With
restartPolicy: Neverand a non-zero exit, the pod should beFailed. If it staysPodInitializing, you may have hit bug #116676 or #126440. Recovery usually requires deleting the pod. - Check activeDeadlineSeconds. If set, the pod fails after that duration regardless of init state.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Pod phase Pending, kubectl status Init:N/M | Incomplete init sequence | Duration in init state exceeds expected startup |
| Init container restart count | Retry loops under Always or OnFailure | Restart count increasing steadily |
| Init container exit code and reason | Distinguishes OOM, error, or success | Exit code 137 or non-zero application exit |
| Node memory pressure / OOM kills | Init containers share node cgroups; OOM kills block startup | Node MemoryPressure=True or container OOMKilled |
| Image pull latency and errors | ImagePullBackOff on an init container blocks the pod | ImagePullBackOff or ErrImagePull events |
| DNS resolution latency from node | Init containers that depend on external services stall on DNS | DNS lookup timeouts in init container logs |
| Pod active deadline exceeded | activeDeadlineSeconds forces terminal failure after hangs | Pod phase Failed with DeadlineExceeded |
Fixes
If the cause is image pull failure
Verify the image tag exists and is accessible from the node. Check imagePullSecrets on the service account or pod spec. If the registry is down or rate-limiting, the pod remains in PodInitializing until the image is available or the pod is deleted. There is no automatic timeout.
If the cause is resource pressure or OOMKilled
Increase the init container memory limit. Init containers compete for node resources like app containers. If the node is under MemoryPressure, cordon the node, delete the pod, and let the scheduler place it elsewhere. For bug #116676 (Kubernetes 1.23+, OOMKilled with restartPolicy: Never), raising the limit is the only workaround.
If the cause is an application or script error
Fix the command or script in the image. You cannot patch a running pod’s init container image. Roll out the parent workload (Deployment, Job, etc.) or delete the pod to force recreation.
If the cause is network or DNS timeout
Verify the init container can reach its dependency from the node. Check that NetworkPolicy allows egress during initialization. If the dependency is a cluster service, confirm CoreDNS health and endpoint readiness. Increase init container timeouts instead of relying solely on activeDeadlineSeconds.
If the cause is a sidecar probe race or kubelet bug
If a sidecar has a startupProbe and the kubelet is affected by bug #132826, remove the probe or upgrade kubelet. If an init container is stuck in CONTAINER_CREATED due to bug #126440, delete the pod. For sidecars that fail to restart after being killed (bug #136910), evaluate kubelet patch levels.
If the cause is restartPolicy: Never with a terminal init failure
With restartPolicy: Never, a failed init container should make the pod Failed. If the pod hangs in PodInitializing, delete it. Do not attempt to restart an individual init container; pod-level policy does not support it.
Prevention
- Set resource limits on init containers. They are often omitted because they are short-lived, but an
OOMKilledinit container blocks the pod indefinitely. - Use
activeDeadlineSecondson the pod spec to force-fail a hung init sequence. - Design init containers to fail fast. Avoid long timeouts in scripts; exit quickly so the backoff or failure policy applies.
- Avoid
restartPolicy: Neverfor init-dependent workloads unless you have explicit handling forFailedpods.AlwaysorOnFailureallows recovery from transient errors. - Validate sidecar probe configurations. Misconfigured
startupProbesettings can trigger kubelet race conditions. - Monitor node memory and disk pressure. Init containers are subject to the same eviction signals as app containers.
How Netdata helps
- Correlate pod-level init duration with node memory pressure and OOM kill events to confirm resource-driven failures.
- Track container restart counts per pod to detect init container retry loops before they block deployments.
- Monitor image pull latency and registry error rates alongside pod status to isolate registry failures from application bugs.
- Alert on node
MemoryPressureand disk saturation as leading indicators for init container terminations.
Related guides
- For rollout issues when init container failures block a deployment, see Kubernetes Deployment rollout stuck: stalled rollouts and ready replicas.
- If your init container depends on cluster DNS, see Kubernetes DNS resolution failures inside pods.
- For resource pressure that kills init containers before they complete, see Kubernetes API server memory pressure: OOM cycle and tuning.
- For scheduling constraints that leave pods pending before init containers even start, see Kubernetes DaemonSet pods Pending: scheduling and tolerations.






