Kubernetes pod exits immediately: how to diagnose it
When a pod shows Completed or Error with zero restarts, the container exited on its first run. The diagnostic evidence lives in termination metadata, not in a growing restart count. This is distinct from CrashLoopBackOff, where the kubelet has already applied exponential backoff after multiple restarts.
This guide covers how to distinguish a clean exit, an OOM kill, an application crash, and a configuration error using only the kubelet’s reported state and the previous container logs, plus which node-level and control-plane signals to check when the container produced no logs.
What this means
When a container terminates before the kubelet restarts it, the pod phase becomes Succeeded for exit code 0, or Failed for non-zero. A Deployment defaults to restartPolicy: Always, so even a clean exit triggers an immediate restart. Under Never or OnFailure, the pod stays terminal.
Immediately after the first termination, the RESTARTS counter is still 0. The Last State: Terminated block in kubectl describe pod captures the exit code and reason from that run. If the kubelet restarts the container, the first termination state shifts into lastState while currentState becomes Running or Waiting. Capture the first exit event before that happens.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
One-shot command with restartPolicy: Always | Pod exits cleanly (code 0) but immediately restarts | kubectl describe pod for Last State: Terminated, Reason: Completed, Exit Code: 0 |
| OOMKilled | Exit code 137, often with no application logs | kubectl describe pod for Reason: OOMKilled; node MemoryPressure condition |
| Application startup crash | Exit code 1, stack trace or config error in logs | kubectl logs <pod> --previous |
| Missing secret, configmap, or env | Exit code 1, FileNotFoundError or similar in logs | Pod events and --previous logs |
| Init container failure | Main containers never start; init exits with error | kubectl describe pod for init container state |
| Sub-second exit before log flush | Empty --previous logs, exit code present | Structured container status via kubectl get pod -o jsonpath |
| Node resource pressure eviction | Pod terminated by kubelet, status Evicted | Node conditions and kubelet_evictions_total |
Quick checks
Run these checks in order. They are read-only and safe.
# Check pod phase and restart count
kubectl get pod <pod-name> -o jsonpath='{.status.phase}{"\t"}{.status.containerStatuses[0].restartCount}{"\n"}'
# Check termination reason and exit code
kubectl describe pod <pod-name> | grep -A 5 "Last State:"
# Retrieve logs from the terminated container instance
kubectl logs <pod-name> --previous
# Extract structured container status including exit code and reason
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}' | jq
# Check node-level pressure conditions
kubectl describe node <node-name> | grep -E "MemoryPressure|DiskPressure|PIDPressure"
# Check for kernel OOM events on the node
dmesg | grep -i "out of memory"
# Inspect restartPolicy and container command
kubectl get pod <pod-name> -o yaml | grep -A 2 "restartPolicy"
# Check if pod was evicted
kubectl get pod <pod-name> -o jsonpath='{.status.reason}'
For a healthy long-running pod, expect Running phase, restartCount: 0, and no Last State: Terminated. Bad output depends on the cause: Exit Code: 0 with Completed suggests a one-shot job misconfigured with restartPolicy: Always; Exit Code: 137 with OOMKilled signals memory pressure; empty logs with a non-zero exit code means the process crashed before flushing buffers.
How to diagnose it
Confirm a first-run exit. Run
kubectl get pod <name>. IfRESTARTSis0and the phase isSucceeded,Failed, orError, the container exited on its first run. If the count is1or higher, the kubelet has already restarted it; treat that as aCrashLoopBackOffpattern instead.Capture termination metadata immediately. Run
kubectl describe pod <name>and look underLast State: Terminated. Record theReason(Completed,Error,OOMKilled),Exit Code, andMessage. If the pod has already been restarted, this block may have shifted. Query it directly withkubectl get pod <name> -o jsonpath='{.status.containerStatuses[*].lastState.terminated}'.Interpret the exit code.
0: The process exited cleanly. If the pod is restarting, check whetherrestartPolicyisAlwayswhen it should beNeverorOnFailure.1: Generic application error. Look for stack traces or configuration failures inkubectl logs --previous.137(128 + 9): The process receivedSIGKILL. In Kubernetes, this almost always meansOOMKilledwhen it appears in container status. Cross-check with node memory pressure and container limits.143(128 + 15): The process receivedSIGTERM. This is normal during graceful shutdown but unexpected on startup.
Retrieve logs from the terminated instance. Run
kubectl logs <pod-name> --previous. Empty output means the container exited before writing to stdout or stderr, or the runtime buffers were not flushed. Rely on termination metadata and node-level signals instead.Check for node-level pressure. Run
kubectl describe node <node-name>and look at conditions.MemoryPressure=Truemeans the kubelet is evicting pods.DiskPressure=Truecan prevent image pulls or log writes. Checkkubelet_evictions_totalmetrics for the specific eviction signal.Inspect init container state. If the pod is stuck in
Init:Error, the init container exited immediately. Runkubectl logs <pod-name> -c <init-container-name>to see why. Main containers will not start until all init containers complete successfully.Correlate with cluster events. Run
kubectl get events --field-selector involvedObject.name=<pod-name>. Look forFailedScheduling,FailedMount,FailedCreatePodSandBox, orKillingevents that preceded the exit. AKillingevent from the kubelet indicates an eviction or termination signal, not an application crash.Compare the API server state to the original spec. Run
kubectl get pod <pod-name> -o yamland compare it against the manifest that created it. Silent mutations, defaulted fields, or injected sidecars can change the effective container command or environment.
flowchart TD A[Pod exits immediately
RESTARTS: 0] --> B{Exit code?} B -->|0| C[One-shot job with
restartPolicy: Always?] B -->|1| D[Application error
Check logs --previous] B -->|137| E[OOMKilled or SIGKILL
Check memory limits
and node pressure] B -->|143| F[SIGTERM on startup
Check preStop hooks
and grace period] C -->|Yes| G[Change restartPolicy
to Never or OnFailure] C -->|No| H[Check for expected
clean completion] D --> I[Fix code, config,
or missing secrets] E --> J[Raise limits or
reduce memory usage] F --> K[Adjust shutdown
behavior]
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Pod phase distribution | Reveals pods terminating outside normal churn | Sustained increase in Failed or Succeeded pods that should be Running |
| Container restart count | Lagging indicator of prior exits | Restart count increasing for a stable workload |
lastState.terminated.reason | Distinguishes OOM, error, and clean completion | OOMKilled or Error in terminal state |
Node MemoryPressure condition | Triggers kernel OOM or kubelet eviction | MemoryPressure=True on production nodes |
| Container memory working set vs limit | OOM occurs when usage exceeds the cgroup limit | Working set within 10% of the memory limit |
kubelet_evictions_total | Kubelet evicts pods to reclaim resources | Any eviction event for non-best-effort workloads |
kubelet_pleg_relist_duration_seconds | Slow PLEG delays state reporting to the API server | p99 relist duration above 5 seconds |
| API server mutating request latency | Slow admission or etcd delays status updates | p99 mutating latency above 1 second sustained |
Fixes
If the cause is a one-shot job with restartPolicy: Always
Change the pod or Deployment restartPolicy to Never for one-shot jobs, or use a Kubernetes Job object which defaults to OnFailure. A container that exits cleanly with code 0 will still be restarted under Always.
If the cause is OOMKilled
Increase the container memory limit, or reduce the application’s memory footprint. For Java applications, ensure the max heap size leaves headroom for native memory and the container overhead. If the node itself is under MemoryPressure, scale the node pool or evict heavy best-effort pods.
If the cause is an application startup error
Read kubectl logs --previous to find the stack trace, missing file, or configuration failure. Verify that ConfigMaps, Secrets, and environment variables referenced in the pod spec exist and are mounted correctly. Fix the application code or container image.
If the cause is an init container failure
Run kubectl logs <pod> -c <init-container> to capture the init container’s output. Fix the initialization script, dependency, or command. Init container restarts are counted separately and can block the main pod indefinitely.
If the cause is node resource pressure
Warning: Disruptive. Cordon prevents new pods from scheduling to the node.
Cordon the node, then free disk space, remove unused images, or add nodes to the pool. Set resource requests and limits on all workloads so the scheduler and kubelet can make informed eviction decisions.
If the cause is missing log output
If the container exits before flushing logs, add a log flush call at application startup as a temporary debugging measure. Alternatively, write a termination message to the termination message path so kubectl describe pod surfaces it without relying on log buffers.
Prevention
- Match restartPolicy to workload type. Use
Alwaysfor long-running services,OnFailurefor batch jobs, andNeverfor one-shot tasks. - Set memory requests and limits. This prevents the kernel OOM killer from targeting containers unpredictably and gives the scheduler the data it needs.
- Use startup probes for slow-starting containers. Do not use liveness probes to catch startup failures; a failing liveness probe on a container that is still initializing causes unnecessary restarts.
- Monitor pod phase distribution and container restart counts. Baseline these metrics per workload so you can detect a sudden shift to
FailedorSucceeded. - Write termination messages. Configure
terminationMessagePathandterminationMessagePolicyso application fatal errors are surfaced inkubectl describe podeven when logs are empty. - Include
kubectl logs --previousin runbooks. Operators should run this immediately after detecting an unexpected exit, before the kubelet restarts the container and the evidence shifts.
How Netdata helps
- Netdata collects kubelet metrics such as
kubelet_running_pods,kubelet_container_start_duration_seconds, andkubelet_evictions_totalto correlate pod exits with node-level events. - Per-container cgroup memory charts show working set growth approaching the limit before the OOM killer triggers.
- Node condition alerts for
MemoryPressureandDiskPressuretrigger before the kubelet begins evicting pods. - API server latency monitoring detects slow admission webhooks or etcd disk latency that delays pod status updates and masks the true timing of a container exit.
Related guides
- See Kubernetes eviction cascade: when one node failure takes down the cluster for node-pressure cascades.
- See Kubernetes kubelet memory leak: detection and OOM cycle for kubelet-level OOM patterns.
- See Kubernetes kubelet not responding: PLEG, runtime, and certificate issues when node-level health is the root cause.
- See Kubernetes DNS resolution failures inside pods if the exit is caused by a failing dependency.
- See Kubernetes API server slow or unresponsive: causes and fixes when control plane latency delays pod lifecycle reporting.






