Kubernetes pod OOMKilled: cgroup limits, evictions, and fixes

A pod status of OOMKilled means the container restarted after the kernel sent SIGKILL because it could not satisfy a memory allocation. There is no graceful shutdown.

Distinguish whether the kill happened at the container cgroup level (a limit you set) or at the node level (a system-wide shortage). Then separate kernel OOM kills from kubelet evictions, identify the correct fix, and prevent recurrence without guessing at memory limits.

What this means

OOMKilled means the Linux OOM killer selected a process in your container and terminated it with SIGKILL (signal 9). Exit code 137 (128 + 9) often maps to OOMKilled, but the code alone is ambiguous: it can also result from a manual docker kill or an escalated graceful termination. Verify the pod status Reason field.

Three distinct mechanisms produce this symptom.

Container-level cgroup OOM. The process inside the container exceeded its cgroup memory limit, which Kubernetes sets via resources.limits.memory. The kernel kills the process within that cgroup. Kubelet restarts the container according to the pod’s restartPolicy. The pod status shows Last State: Terminated, Reason: OOMKilled, Exit Code: 137. The limit did exactly what it was designed to do.

Node-level kernel OOM. The node as a whole ran out of memory. The global OOM killer selected a victim process anywhere on the node: a pod container, a system daemon, a container runtime shim, or even the kubelet itself. The kernel log reads Out of memory: Killed process... without a specific cgroup identifier. The pod status may not show OOMKilled if the killed process was not the container’s PID 1, or if only a subprocess died while the container survived. Node-level OOM is an infrastructure emergency. It can destabilize the entire node.

Kubelet eviction is a separate mechanism. When the node crosses a configured eviction threshold (default memory.available < 100Mi), the kubelet eviction manager kills pods in priority order: BestEffort first, then Burstable, then Guaranteed. Eviction respects PriorityClass and uses configurable grace periods. The kernel OOM killer does neither. Under a rapid memory spike, the kernel OOM killer can terminate a high-priority pod before kubelet eviction has time to act.

On cgroup v2 nodes, which most managed clusters use by default for Kubernetes 1.25+, OOM semantics are more predictable than on cgroup v1.

Common causes

CauseWhat it looks likeFirst thing to check
Container memory limit too lowContainer restarts with OOMKilled at predictable load spikes; memory usage flatlines at the limitkubectl describe pod for Last State: OOMKilled
Application memory leakOOMKilled restarts become more frequent; memory usage grows steadily until it hits the limitContainer memory usage trend against its limit
JVM or native runtime misaccountingJava or other managed runtime OOMKilled despite a low reported heap; off-heap memory exceeds headroomRuntime memory flags and native memory usage
Node-level memory exhaustionMultiple unrelated pods killed; dmesg shows global Out of memory: Killed process; MemoryPressure=TrueNode MemAvailable and kernel logs
Burstable QoS with low memory requestHigh-priority pod killed by node OOM before lower-priority pods because its oom_score_adj is highPod QoS class and node-level OOM log
Cgroup v2 unified OOM behaviorEntire container cgroup dies together when any process inside it is OOM killed; no silent subprocess deathCgroup version on the node

Quick checks

# Confirm the pod was terminated for OOMKilled
kubectl describe pod <pod-name> | grep -A5 "Last State"

# Check if the node is under memory pressure
kubectl get node <node-name> -o jsonpath='{.status.conditions[?(@.type=="MemoryPressure")].status}'
echo

# Look for node-level kernel OOM events (root is usually required)
dmesg -T | grep -i "out of memory\|killed process"

# Check actual available memory on the node (not MemFree)
cat /proc/meminfo | grep MemAvailable

# List recent evictions and terminations
kubectl get events --all-namespaces | grep -iE "evicted|oom"

# Check cgroup v2 per-container OOM events (find cgroup path first)
crictl inspect <container-id> | grep cgroupsPath
# Then inspect the specific path:
cat /sys/fs/cgroup/<path-from-above>/memory.events | grep oom_kill

# Check cgroup v1 OOM control status (path varies by pod QoS and cgroup driver)
cat /sys/fs/cgroup/memory/kubepods/besteffort/*/memory.oom_control

# Identify top memory consumers on the node
ps aux --sort=-%mem | head -20

How to diagnose it

  1. Confirm the symptom is a cgroup OOM and not an external SIGKILL. Run kubectl describe pod <pod>. If Last State: Terminated shows Reason: OOMKilled and Exit Code: 137, the container exceeded its cgroup memory limit. If the exit code is 137 but the reason is Error or Completed, suspect a node-level OOM or an external kill -9.

  2. Determine whether the node is under memory pressure. Check kubectl get node <node> for MemoryPressure=True. If the condition is true, kubelet eviction is active or imminent. Check kubectl get events for pods with Reason: Evicted. Eviction is managed; kernel OOM is not. If you see both evictions and OOM kills, the kernel likely won the race against kubelet.

  3. Check kernel logs for a global OOM event. Run dmesg -T | grep -i "out of memory" on the node. Node-level OOM log lines do not reference a specific cgroup and name the killed process by PID. Container-level cgroup OOM may not appear in dmesg at all on some kernels. If dmesg shows a kill and the pod status does not show OOMKilled, the node itself was the source of the pressure.

  4. Correlate cgroup version with observed behavior. On cgroup v2 hosts, the OOM killer may treat all processes in the container cgroup as a single unit. If a subprocess is killed, the entire container dies, making the failure visible in pod status. On cgroup v1, a single subprocess could be killed silently while the container remained running, which Kubernetes would see as healthy. If your pod shows unexplained restarts or missing worker processes without OOMKilled, verify whether the node uses cgroup v1 or v2.

  5. Inspect application memory behavior. If the container is OOMKilled repeatedly at the same memory value, the limit is simply too low. If memory grows steadily until death, the application has a leak. For JVM workloads, check whether the runtime is configured to read cgroup memory limits and whether heap plus off-heap overhead (metaspace, direct buffers, thread stacks) exceeds the container limit.

  6. Evaluate QoS class and OOM score. Check kubectl get pod <pod> -o jsonpath='{.status.qosClass}'. Guaranteed pods use oom_score_adj=-997 and are nearly immune to the node-level OOM killer. BestEffort pods use 1000 and are the first to die. Burstable pods use a calculated score based on memory request versus node capacity. A Burstable pod with a small memory request receives a high oom_score_adj and can be killed before a BestEffort pod that consumes more memory. PriorityClass does not influence oom_score_adj for kernel OOM. The strongest protection is Guaranteed QoS, which requires resources.requests.memory equal to resources.limits.memory (and CPU requests equal to limits).

  7. Check for allocatable drift caused by reservations. Run kubectl describe node <node> and compare Allocatable.memory to Capacity.memory. Large gaps from kube-reserved, system-reserved, and eviction thresholds reduce the memory available to pods. A node can run out of memory even when the sum of pod requests fits within allocatable, because actual usage ignores requests.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Container lastState.terminated.reasonDistinguishes cgroup OOM from external killsOOMKilled with an increasing restart count
Node MemoryPressure conditionIndicates kubelet eviction is active or imminentMemoryPressure=True
MemAvailable from /proc/meminfoActual memory the kernel can reclaim before OOMSustained decline toward the eviction threshold
Kernel OOM kill logs in dmesgReveals node-level OOM killer activity outside cgroup controlAny Out of memory: Killed process line
Container memory usage vs limitShows whether the limit is too low or a leak existsUsage repeatedly flatlining at the limit
kubelet_evictions_totalTracks managed evictions that precede or accompany OOMIncreasing counter for memory.available
Pod QoS classDetermines oom_score_adj and kernel OOM killer priorityBurstable pods with small requests on busy nodes
Runtime memory metricsOff-heap memory is invisible to application-level heap metricsHeap well below limit but container OOMKilled

Fixes

If the cause is a low container limit

Raise resources.limits.memory and set resources.requests.memory to the same value. Equal requests and limits promote the pod to Guaranteed QoS, which protects it from the node-level OOM killer. Do not raise the limit blindly. Add only enough headroom to cover observed peak usage plus runtime overhead (typically 10-20% above the application’s peak resident set).

If the cause is an application memory leak

Profile the application to find the leak. For JVM workloads, capture heap dumps and enable native memory tracking. Raising the limit will only delay the next crash. Fix the root cause.

If the cause is a JVM or managed runtime

Configure the runtime to read cgroup memory limits. Cap the heap at roughly 70-80% of the container limit to leave headroom for off-heap memory such as metaspace, direct buffers, and thread stacks. If the runtime predates cgroup awareness, it may default to 1/4 of the host memory, which guarantees an OOM kill on a container with a smaller limit.

If the cause is node-level memory exhaustion

Identify the top consumer on the node with ps aux --sort=-%mem or per-cgroup memory metrics. If a single pod is responsible, evict it manually with kubectl delete pod. To evacuate the entire node, use kubectl drain <node>: this is disruptive and will reschedule all pods. If the node is chronically oversubscribed, add capacity or reduce workload density. Cordon the node (kubectl cordon <node>) to stop new scheduling while you remediate.

If the cause is Burstable QoS causing premature node OOM death

Set resources.requests.memory equal to resources.limits.memory (and CPU requests equal to limits) for critical pods. This moves the pod to Guaranteed QoS and sets oom_score_adj=-997, which is the strongest protection available against the node-level OOM killer.

If kubelet eviction is too slow to prevent kernel OOM

Raise the hard eviction threshold (for example, memory.available < 5% or a larger absolute Mi value) so kubelet begins evicting before the kernel OOM killer fires. Increase the eviction pressure transition period to dampen oscillation between pressure and normal states. This gives kubelet more runway to act.

If cgroup v2 kills the entire container when only one subprocess should die

On cgroup v2 nodes, the kernel may kill all processes in the container cgroup together. If your workload relies on a single subprocess dying and the container surviving, verify whether your Kubernetes version supports opting out of group OOM kill behavior for that pod.

Prevention

  • Set memory requests equal to limits for critical workloads to achieve Guaranteed QoS and protect against node-level OOM.
  • Monitor MemAvailable on nodes, not MemFree, and alert when it trends below 15-20% of total memory.
  • Alert when container memory usage is consistently above 80% of its limit.
  • Properly configure kube-reserved and system-reserved so node allocatable reflects real capacity.
  • Tune JVM and runtime memory settings to stay well within cgroup limits, including off-heap overhead.
  • Review Burstable pods on dense nodes. Small memory requests create high oom_score_adj values that surprise operators during node-level memory pressure.

How Netdata helps

Netdata correlates per-container cgroup memory usage against memory.max limits, node MemAvailable, kubelet_evictions_total, and kernel OOM events on a single timeline. Alerts for memory saturation and cgroup pressure fire before the OOM killer or eviction manager act, reducing the need to cross-reference kubectl, dmesg, and node SSH sessions manually.