$ guides / kubernetes / kubernetes-node-memory-pressure ▌

Operations Guides

Kubernetes node MemoryPressure: detection, eviction order, and prevention

Before adding RAM, determine whether kubelet is evicting because workloads are genuinely starving or because memory requests are misaligned with reality.

What this means

Kubelet evaluates memory.available against an eviction threshold. On Linux the default hard threshold is memory.available < 100Mi. Kubelet derives this from cgroup stats, not free -m. It measures working-set memory (RSS plus active file-backed pages) and subtracts that from total capacity. When the threshold is crossed, kubelet sets the node condition MemoryPressure=True and adds the taint node.kubernetes.io/memory-pressure:NoSchedule. New pods are blocked from scheduling until the condition clears.

Once the threshold is crossed, kubelet’s eviction manager kills pods to reclaim memory. The eviction order is not random. It targets BestEffort pods first, then Burstable pods whose usage exceeds their requests, and finally Guaranteed pods and Burstable pods whose usage is within requests. Within each tier, kubelet ranks pods by consumption above request, then by PriorityClass ascending (lower priority first). Guaranteed pods are the last to be touched, but they are not immune.

This mechanism is separate from the kernel OOM killer. OOMKill fires immediately when a container exceeds its cgroup memory limit with no grace period. Kubelet eviction is a node-level, signal-driven process that respects pod priority and QoS.

flowchart TD
    A["memory.available < eviction-hard threshold"] --> B{"Kernel OOM or kubelet eviction?"}
    B -->|"Container exceeds cgroup limit"| C["Kernel OOMKilled immediate"]
    B -->|"Node-level pressure"| D["Kubelet sets MemoryPressure=True"]
    D --> E["Taint node.kubernetes.io/memory-pressure:NoSchedule added"]
    E --> F{"Eviction order by QoS"}
    F --> G["BestEffort pods first"]
    F --> H["Burstable where usage > request"]
    F --> I["Guaranteed and Burstable where usage <= request last"]
    G --> J["Reclaim memory"]
    H --> J
    I --> J
    J --> K{"Pressure clears?"}
    K -->|"No"| F
    K -->|"Yes"| L["Remove condition after transition period"]

Common causes

Cause	What it looks like	First thing to check
Actual workload memory growth	Working set climbs steadily	`kubectl top nodes` and `kubectl top pods`
Missing or low memory requests	Scheduler overcommits the node	`kubectl describe node` allocated resources
Memory leak in application	Pod restart count climbing with OOMKilled	Pod status and `lastState.terminated.reason`
Kernel page cache accumulation	`free -m` shows free memory but `MemAvailable` is low	`/proc/meminfo` on the node
Sudden memory spike or thundering herd	MemoryPressure flips rapidly then clears	Node memory utilization graph over time
Kubelet eviction threshold too tight	Node enters pressure under normal load	Kubelet config `--eviction-hard`

Quick checks

# Check node conditions for MemoryPressure
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[?(@.type=="MemoryPressure")].status}{"\n"}{end}'

# Check taint
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.taints[?(@.key=="node.kubernetes.io/memory-pressure")].effect}{"\n"}{end}'

# Check kubelet eviction configuration (path varies by distribution)
grep -A5 'evictionHard' /var/lib/kubelet/config.yaml

# Check actual memory vs allocatable
kubectl describe node <node-name> | grep -A5 "Allocated resources"

# Check for evicted pods
kubectl get pods --all-namespaces --field-selector spec.nodeName=<node-name>,status.phase=Failed | grep Evicted

# Check system memory composition
grep -E "MemTotal|MemAvailable|MemFree" /proc/meminfo

# Check kubelet eviction metrics (requires node metrics access)
kubectl get --raw /api/v1/nodes/<node-name>/proxy/metrics | grep kubelet_evictions_total

# Check pod OOMKilled status
kubectl get pods -n <namespace> -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].lastState.terminated.reason}{"\n"}{end}'

How to diagnose it

Verify the node condition and taint. Confirm MemoryPressure=True and the node.kubernetes.io/memory-pressure:NoSchedule taint are present. This blocks new scheduling.
Distinguish eviction from OOMKill. Inspect lastState.terminated.reason. OOMKilled means the container hit its cgroup limit. Evicted means kubelet removed the pod due to node pressure. OOMKilled points to a limit or leak; Evicted points to node capacity or request alignment.
Inspect node memory composition. Compare MemAvailable to MemFree in /proc/meminfo. If MemAvailable is low while MemFree is high, active page cache is consuming memory. Kubelet evaluates working set, not MemFree.
Compare requested versus actual usage. kubectl describe node shows allocated resources; kubectl top pods shows real utilization. If actual usage far exceeds requests, the scheduler overcommitted the node.
Identify evicted pods. Query events with reason=Evicted and correlate with pod QoS classes. Guaranteed pods being evicted means severe oversubscription.
Check kubelet logs. Look for eviction manager: log lines naming the signal memory.available and the selected pod. This confirms the threshold crossed and the victim chosen.
Verify eviction threshold configuration. Inspect --eviction-hard. Specifying this flag or config field replaces the entire default set. If you tighten a threshold while the node is already violating it, kubelet may evict aggressively.
Look for memory leaks. A climbing restart count with OOMKilled means a container is hitting its cgroup limit and contributing to pressure. Fix the leak or right-size the limit.
Check scheduler versus kubelet divergence. The scheduler computes allocatable = capacity - kube-reserved - system-reserved - eviction-hard. If workloads consume more than requests but less than limits, the scheduler sees room while kubelet sees pressure. Accurate requests close this gap.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`kube_node_status_condition{condition="MemoryPressure"}`	Binary indicator of node pressure	Value == 1
`node_memory_MemAvailable_bytes` vs kubelet-reported available	Actual node memory state vs kubelet’s view	MemAvailable trending toward eviction threshold
`kubelet_evictions_total`	Count of kubelet-driven evictions	Counter incrementing
`kube_pod_container_status_terminated_reason{reason="OOMKilled"}`	Cgroup-level OOM events	Rate increasing
`container_memory_working_set_bytes` per pod	Actual pod memory consumption	Sustained usage above request
Node allocatable vs requested memory ratio	Scheduling headroom	Requests approaching 80% of allocatable
MemoryPressure condition age	Persistence of pressure	Condition True for > 5 minutes

Fixes

If the cause is resource pressure

Cordon the node to stop new scheduling. Identify highest-memory consumers with kubectl top pods. For Burstable pods exceeding requests, scale horizontally or increase requests and limits. If Guaranteed pods are evicted, the node needs more physical memory or fewer pods. Do not tune eviction thresholds to mask genuine capacity exhaustion.

If the cause is configuration drift

Adjust kubelet --eviction-hard if thresholds are unrealistically tight. Set --eviction-minimum-reclaim to create hysteresis and avoid flapping. Warning: specifying --eviction-hard via flag or kubelet config replaces the entire default set. If you tighten a threshold while the node is already violating it, kubelet may evict aggressively.

If the cause is application behavior

Set memory requests to steady-state usage. Set limits with headroom for spikes. Use LimitRange defaults to prevent BestEffort pods. For leaks, fix the application or use VerticalPodAutoscaler.

If the cause is working set confusion

On nodes with heavy file I/O, active page cache inflates working set. This is expected; do not disable eviction. Provision more memory or reduce workload density.

Prevention

Set realistic memory requests and limits for every pod. Use LimitRange policies to enforce defaults.
Monitor the gap between container_memory_working_set_bytes and pod memory requests. A growing gap signals overcommit before eviction triggers.
Configure kubelet reserved resources with --kube-reserved and --system-reserved so the scheduler accounts for node overhead.
Tune eviction thresholds based on node size. Set --eviction-minimum-reclaim to avoid rapid condition flapping.
Enable node-problem-detector or equivalent to catch memory pressure trends before they reach the eviction threshold.
Use Guaranteed QoS with requests equal to limits, and assign PriorityClasses so critical pods are evicted last. Guaranteed does not prevent OOMKill if the limit itself is wrong.
Do not rely on free -m for capacity planning. Use MemAvailable and kubelet working-set metrics.

How Netdata helps

Correlates node_memory_MemAvailable_bytes with kubelet_evictions_total to show the exact moment pressure turns into pod death.
Tracks kube_node_status_condition transitions and taint application latency to catch scheduling blocks immediately.
Visualizes per-pod container_memory_working_set_bytes against requests and limits to surface overcommit before eviction triggers.
Alerts on sustained MemoryPressure conditions alongside system-level MemAvailable trends, distinguishing kubelet eviction from kernel OOM.
Aggregates node-level memory composition (cache, buffers, RSS) so you can see whether pressure is from active workloads or reclaimable page cache.

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes node MemoryPressure: detection, eviction order, and prevention

Kubernetes node MemoryPressure: detection, eviction order, and prevention

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is resource pressure

If the cause is configuration drift

If the cause is application behavior

If the cause is working set confusion

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata