Kubernetes node MemoryPressure: detection, eviction order, and prevention

Before adding RAM, determine whether kubelet is evicting because workloads are genuinely starving or because memory requests are misaligned with reality.

What this means

Kubelet evaluates memory.available against an eviction threshold. On Linux the default hard threshold is memory.available < 100Mi. Kubelet derives this from cgroup stats, not free -m. It measures working-set memory (RSS plus active file-backed pages) and subtracts that from total capacity. When the threshold is crossed, kubelet sets the node condition MemoryPressure=True and adds the taint node.kubernetes.io/memory-pressure:NoSchedule. New pods are blocked from scheduling until the condition clears.

Once the threshold is crossed, kubelet’s eviction manager kills pods to reclaim memory. The eviction order is not random. It targets BestEffort pods first, then Burstable pods whose usage exceeds their requests, and finally Guaranteed pods and Burstable pods whose usage is within requests. Within each tier, kubelet ranks pods by consumption above request, then by PriorityClass ascending (lower priority first). Guaranteed pods are the last to be touched, but they are not immune.

This mechanism is separate from the kernel OOM killer. OOMKill fires immediately when a container exceeds its cgroup memory limit with no grace period. Kubelet eviction is a node-level, signal-driven process that respects pod priority and QoS.

flowchart TD
    A["memory.available < eviction-hard threshold"] --> B{"Kernel OOM or kubelet eviction?"}
    B -->|"Container exceeds cgroup limit"| C["Kernel OOMKilled immediate"]
    B -->|"Node-level pressure"| D["Kubelet sets MemoryPressure=True"]
    D --> E["Taint node.kubernetes.io/memory-pressure:NoSchedule added"]
    E --> F{"Eviction order by QoS"}
    F --> G["BestEffort pods first"]
    F --> H["Burstable where usage > request"]
    F --> I["Guaranteed and Burstable where usage <= request last"]
    G --> J["Reclaim memory"]
    H --> J
    I --> J
    J --> K{"Pressure clears?"}
    K -->|"No"| F
    K -->|"Yes"| L["Remove condition after transition period"]

Common causes

CauseWhat it looks likeFirst thing to check
Actual workload memory growthWorking set climbs steadilykubectl top nodes and kubectl top pods
Missing or low memory requestsScheduler overcommits the nodekubectl describe node allocated resources
Memory leak in applicationPod restart count climbing with OOMKilledPod status and lastState.terminated.reason
Kernel page cache accumulationfree -m shows free memory but MemAvailable is low/proc/meminfo on the node
Sudden memory spike or thundering herdMemoryPressure flips rapidly then clearsNode memory utilization graph over time
Kubelet eviction threshold too tightNode enters pressure under normal loadKubelet config --eviction-hard

Quick checks

# Check node conditions for MemoryPressure
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[?(@.type=="MemoryPressure")].status}{"\n"}{end}'

# Check taint
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.taints[?(@.key=="node.kubernetes.io/memory-pressure")].effect}{"\n"}{end}'

# Check kubelet eviction configuration (path varies by distribution)
grep -A5 'evictionHard' /var/lib/kubelet/config.yaml

# Check actual memory vs allocatable
kubectl describe node <node-name> | grep -A5 "Allocated resources"

# Check for evicted pods
kubectl get pods --all-namespaces --field-selector spec.nodeName=<node-name>,status.phase=Failed | grep Evicted

# Check system memory composition
grep -E "MemTotal|MemAvailable|MemFree" /proc/meminfo

# Check kubelet eviction metrics (requires node metrics access)
kubectl get --raw /api/v1/nodes/<node-name>/proxy/metrics | grep kubelet_evictions_total

# Check pod OOMKilled status
kubectl get pods -n <namespace> -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.reason}{"\n"}{end}'

How to diagnose it

  1. Verify the node condition and taint. Confirm MemoryPressure=True and the node.kubernetes.io/memory-pressure:NoSchedule taint are present. This blocks new scheduling.
  2. Distinguish eviction from OOMKill. Inspect lastState.terminated.reason. OOMKilled means the container hit its cgroup limit. Evicted means kubelet removed the pod due to node pressure. OOMKilled points to a limit or leak; Evicted points to node capacity or request alignment.
  3. Inspect node memory composition. Compare MemAvailable to MemFree in /proc/meminfo. If MemAvailable is low while MemFree is high, active page cache is consuming memory. Kubelet evaluates working set, not MemFree.
  4. Compare requested versus actual usage. kubectl describe node shows allocated resources; kubectl top pods shows real utilization. If actual usage far exceeds requests, the scheduler overcommitted the node.
  5. Identify evicted pods. Query events with reason=Evicted and correlate with pod QoS classes. Guaranteed pods being evicted means severe oversubscription.
  6. Check kubelet logs. Look for eviction manager: log lines naming the signal memory.available and the selected pod. This confirms the threshold crossed and the victim chosen.
  7. Verify eviction threshold configuration. Inspect --eviction-hard. Specifying this flag or config field replaces the entire default set. If you tighten a threshold while the node is already violating it, kubelet may evict aggressively.
  8. Look for memory leaks. A climbing restart count with OOMKilled means a container is hitting its cgroup limit and contributing to pressure. Fix the leak or right-size the limit.
  9. Check scheduler versus kubelet divergence. The scheduler computes allocatable = capacity - kube-reserved - system-reserved - eviction-hard. If workloads consume more than requests but less than limits, the scheduler sees room while kubelet sees pressure. Accurate requests close this gap.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
kube_node_status_condition{condition="MemoryPressure"}Binary indicator of node pressureValue == 1
node_memory_MemAvailable_bytes vs kubelet-reported availableActual node memory state vs kubelet’s viewMemAvailable trending toward eviction threshold
kubelet_evictions_totalCount of kubelet-driven evictionsCounter incrementing
kube_pod_container_status_terminated_reason{reason="OOMKilled"}Cgroup-level OOM eventsRate increasing
container_memory_working_set_bytes per podActual pod memory consumptionSustained usage above request
Node allocatable vs requested memory ratioScheduling headroomRequests approaching 80% of allocatable
MemoryPressure condition agePersistence of pressureCondition True for > 5 minutes

Fixes

If the cause is resource pressure

Cordon the node to stop new scheduling. Identify highest-memory consumers with kubectl top pods. For Burstable pods exceeding requests, scale horizontally or increase requests and limits. If Guaranteed pods are evicted, the node needs more physical memory or fewer pods. Do not tune eviction thresholds to mask genuine capacity exhaustion.

If the cause is configuration drift

Adjust kubelet --eviction-hard if thresholds are unrealistically tight. Set --eviction-minimum-reclaim to create hysteresis and avoid flapping. Warning: specifying --eviction-hard via flag or kubelet config replaces the entire default set. If you tighten a threshold while the node is already violating it, kubelet may evict aggressively.

If the cause is application behavior

Set memory requests to steady-state usage. Set limits with headroom for spikes. Use LimitRange defaults to prevent BestEffort pods. For leaks, fix the application or use VerticalPodAutoscaler.

If the cause is working set confusion

On nodes with heavy file I/O, active page cache inflates working set. This is expected; do not disable eviction. Provision more memory or reduce workload density.

Prevention

  • Set realistic memory requests and limits for every pod. Use LimitRange policies to enforce defaults.
  • Monitor the gap between container_memory_working_set_bytes and pod memory requests. A growing gap signals overcommit before eviction triggers.
  • Configure kubelet reserved resources with --kube-reserved and --system-reserved so the scheduler accounts for node overhead.
  • Tune eviction thresholds based on node size. Set --eviction-minimum-reclaim to avoid rapid condition flapping.
  • Enable node-problem-detector or equivalent to catch memory pressure trends before they reach the eviction threshold.
  • Use Guaranteed QoS with requests equal to limits, and assign PriorityClasses so critical pods are evicted last. Guaranteed does not prevent OOMKill if the limit itself is wrong.
  • Do not rely on free -m for capacity planning. Use MemAvailable and kubelet working-set metrics.

How Netdata helps

  • Correlates node_memory_MemAvailable_bytes with kubelet_evictions_total to show the exact moment pressure turns into pod death.
  • Tracks kube_node_status_condition transitions and taint application latency to catch scheduling blocks immediately.
  • Visualizes per-pod container_memory_working_set_bytes against requests and limits to surface overcommit before eviction triggers.
  • Alerts on sustained MemoryPressure conditions alongside system-level MemAvailable trends, distinguishing kubelet eviction from kernel OOM.
  • Aggregates node-level memory composition (cache, buffers, RSS) so you can see whether pressure is from active workloads or reclaimable page cache.