$ guides / kubernetes / kubernetes-node-pid-pressure ▌

Operations Guides

Kubernetes node PIDPressure: detection and remediation

PID exhaustion is a cliff-edge failure: once the kernel cannot fork, containers fail to start, health checks fail, and ssh to the node may hang. Kubernetes surfaces this through the PIDPressure node condition, but many clusters ship without PID-based eviction thresholds. Without them, the first symptom is usually EAGAIN or ENOMEM from fork failures, not a kubelet eviction.

This guide shows how to detect PIDPressure before it triggers an outage, distinguish between application leaks, runtime shim accumulation, and kernel limits, and remediate the root cause. You will correlate node-level PID utilization with specific pods, validate kubelet cgroup enforcement, and configure thresholds that provide lead time.

What this means

The kubelet monitors node-level PID capacity through the pid.available eviction signal, computed from node.stats.rlimit.maxpid - node.stats.rlimit.curproc. When available PIDs drop below the configured hard eviction threshold, kubelet sets PIDPressure=True and taints the node with node.kubernetes.io/pid-pressure:NoSchedule. The scheduler stops placing new pods, and the kubelet eviction manager may remove pods.

Two caveats. First, eviction is reactive and periodic. A rapid PID spike can exceed the limit before the next housekeeping cycle triggers eviction. Second, many cluster distributions ship without a default evictionHard value for pid.available. If you have not configured one, PIDPressure may never fire, and the first symptom will be fork failures at the kernel level.

PIDs are node-global. Every container process, thread, runtime shim, and zombie consumes a PID from the same kernel pool. When the pool is empty, fork() returns EAGAIN or ENOMEM. Inside Kubernetes, this manifests as failed container starts, probe failures, and cascading pod restarts that worsen the pressure.

Common causes

Cause	What it looks like	First thing to check
Zombie process accumulation inside pods	PID count climbs steadily; `ps` shows `<defunct>` entries	`ps aux
containerd-shim leak	PIDs consumed by runtime after container exit	`pgrep -c containerd-shim` compared to running container count
Per-pod thread or subprocess explosion	A single pod consumes thousands of PIDs via unbounded thread pools or shell loops	`pstree -p <pid>` or `/proc/<pid>/status` inside the pod
Low `kernel.pid_max`	Node hits 32768 limit despite idle CPU/memory	`cat /proc/sys/kernel/pid_max`
Missing or ineffective PodPidsLimit	Kubelet config sets a limit, but the container cgroup does not reflect it	`cat /sys/fs/cgroup/pids/pids.max` inside a running container
Workload fork bomb or CI runner	Sudden PID spike correlated with a specific deployment or job	Process tree on the node sorted by thread count

Quick checks

# Check PIDPressure status across the cluster
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[?(@.type=="PIDPressure")].status}{"\n"}{end}'

# Describe a specific node for condition details
kubectl describe node <node-name> | grep -A 5 PIDPressure

# Compare running PIDs against the kernel limit
echo "limit: $(cat /proc/sys/kernel/pid_max); used: $(ls /proc | grep -E '^[0-9]+$' | wc -l)"

# Check kubelet eviction flags for pid thresholds (only shows flags, not config file values)
ps aux | grep kubelet | grep -o 'eviction-hard=[^ ]*'

# Inspect the container cgroup PID limit from inside a pod (cgroup v1)
cat /sys/fs/cgroup/pids/pids.max

# Inspect current cgroup PID usage (cgroup v1)
cat /sys/fs/cgroup/pids/pids.current

# Find zombie processes on the node
ps aux | awk '$8 ~ /^Z/ {print $0}'

# Count container runtime shims
pgrep -c containerd-shim

# Check kubelet eviction metrics for pid signals (requires accessible metrics endpoint)
kubectl get --raw /api/v1/nodes/<node-name>/proxy/metrics | grep -E 'eviction.*pid'

On cgroup v2 hosts, the PID limit path is /sys/fs/cgroup/pids.max and usage is /sys/fs/cgroup/pids.current.

How to diagnose it

flowchart TD
    A[PIDPressure=True or fork failures] --> B{Node PIDs near pid_max?}
    B -->|Yes| C[Identify top PID consumers]
    B -->|No| D[Check kubelet eviction config]
    C --> E{Zombies or shims?}
    E -->|Zombies| F[Fix app reaping or restart pod]
    E -->|Shims| G[Restart runtime or drain node]
    E -->|Thread leak| H[Reduce pod parallelism]
    D --> I[Configure pid.available evictionHard]

Confirm the node condition and taint. Use kubectl get nodes to check PIDPressure. If the status is True, verify the taint node.kubernetes.io/pid-pressure:NoSchedule is present. This confirms kubelet has detected the problem, but it does not reveal how many PIDs remain or which workload is responsible.
Determine absolute headroom. On the node, compare ls /proc | grep -E '^[0-9]+$' | wc -l against /proc/sys/kernel/pid_max. If utilization is above 80% of the limit, the node is in immediate danger regardless of what kubelet reports. If the limit is 32768, expect exhaustion on any node running more than a few dozen pods with multi-threaded runtimes.
Identify top consumers by parent process. List processes grouped by container runtime shim or pod cgroup. On the host, inspect pids.current files under /sys/fs/cgroup/pids/kubepods/ (exact path varies by cgroup version and QoS class). If your runtime is containerd, ctr tasks list shows per-container process counts.
Look for zombies. Zombie processes hold PIDs in the kernel task table until their parent calls waitpid(). Run ps aux | awk '$8 ~ /^Z/' on the node. If zombies cluster under a specific container, the application inside that pod is failing to reap children. Restarting the pod provides immediate relief but does not fix the code.
Check for orphaned runtime shims. Each container managed by containerd runs a containerd-shim process. If a shim is orphaned, it continues to consume a PID. Compare pgrep -c containerd-shim against the number of running containers from crictl ps | wc -l. A large gap indicates shim leakage.
Verify kubelet and cgroup enforcement. Check whether PodPidsLimit is configured in the kubelet config (/var/lib/kubelet/config.yaml) or startup flags. Then enter a pod and run cat /sys/fs/cgroup/pids/pids.max. If the limit is max or a value far larger than PodPidsLimit, enforcement is not reaching the container. This is common with dockershim or cri-dockerd; verify that your runtime and cgroup driver support PID limits.
Inspect eviction configuration. If PIDPressure is not firing despite high usage, check kubelet’s --eviction-hard and --eviction-soft settings. If pid.available is absent, kubelet will not evict for PID pressure. Add an explicit threshold such as pid.available<100 or a percentage, depending on node density.
Correlate with events and logs. Check kubectl get events --field-selector involvedObject.kind=Node,reason=Evicted for PID-triggered evictions. Read kubelet logs with journalctl -u kubelet | grep -i pid to find the exact eviction signal value at the time of the event.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Node condition `PIDPressure`	Binary indicator from kubelet	`status=True` sustained for >1 minute
Node PIDs used vs `pid_max`	Measures true system headroom before fork fails	Utilization >80% of `pid_max`
Kubelet eviction metrics with pid signal	Confirms kubelet is actively shedding load	Any non-zero eviction rate for pid
Per-pod PID usage	Identifies noisy neighbors before they exhaust the node	Any pod approaching its `PodPidsLimit`
Zombie process count	Zombies consume PIDs but no other resources	Sustained >0 on production nodes
containerd-shim count	Orphaned shims leak node-level PIDs	Count growing while pod count is flat
Container cgroup `pids.current` vs `pids.max`	Validates that enforcement is working	`pids.current` within 10% of `pids.max`, or `pids.max` unlimited
`kernel.pid_max`	System-wide ceiling that is easy to misconfigure	Value < 65536 on dense nodes

Fixes

If the cause is zombie or leaking processes inside pods

Fix the application to reap child processes properly. If the code cannot be changed immediately, setting a stricter PodPidsLimit contains the blast radius to that pod rather than the node. To relieve pressure immediately, delete the offending pod. This is disruptive to the workload but safe for the node.

If the cause is container runtime shim accumulation

Orphaned shims often require a container runtime restart to clear. Cordon the node and drain its workloads before restarting containerd (systemctl restart containerd). Verify shim counts drop before uncordoning. If the leak is due to a runtime bug, upgrade to a patched version.

If the cause is low kernel.pid_max

Raise the limit immediately:

# Apply now
sysctl -w kernel.pid_max=131072

# Persist
echo "kernel.pid_max = 131072" > /etc/sysctl.d/99-pid.conf
sysctl --system

Nodes running more than 50 pods or hosting Java, Node.js, or heavily threaded workloads should use at least 131072. The change does not require a reboot, but existing processes do not count against the new limit retroactively.

If the cause is kubelet/cgroup enforcement gap

If PodPidsLimit is set in kubelet configuration but containers show pids.max = max inside their cgroup, the runtime is not propagating the limit. This is common with dockershim or cri-dockerd. Migrate to containerd or CRI-O, then verify enforcement by running cat /sys/fs/cgroup/pids/pids.max inside a new pod.

If the cause is a fork bomb or runaway subprocess

Identify the parent process using ps aux --sort=-nlwp | head -20 on the node. Delete the pod immediately. To prevent recurrence, lower PodPidsLimit for that workload’s priority class or namespace, and review application logic for unbounded fork, exec, or thread creation.

Prevention

Explicitly configure evictionHard (and optionally evictionSoft) for pid.available in kubelet configuration. A starting point is pid.available<100 on smaller nodes, or a percentage on larger ones.
Set PodPidsLimit in KubeletConfiguration to a value that matches your density. Verify it is reflected in container cgroups.
Tune kernel.pid_max to at least 131072 during node provisioning if the node will host more than a few dozen pods.
Monitor per-pod PID usage in staging and CI. Applications that leak PIDs should be caught before production deployment.
Ensure container images and init systems reap zombie processes correctly. Avoid PID 1 processes that do not delegate to a proper init or use tini.
Use containerd or CRI-O rather than dockershim to ensure PodPidsLimit is enforced at the cgroup level.
Include PID headroom in capacity planning alongside CPU and memory.

How Netdata helps

Netdata tracks per-cgroup pids.current and pids.max, surfacing which pods are approaching their PID limit.
The process state chart highlights zombie counts per node, making it obvious when an application is failing to reap children.
Node-level process count metrics, correlated with the Kubernetes PIDPressure condition, let you distinguish a true kernel shortage from a kubelet configuration gap.
Alerts on per-container PID saturation and node-wide pid_max utilization fire before kubelet eviction, providing lead time to cordon or drain the node.

See Kubernetes eviction cascade: when one node failure takes down the cluster for how PID pressure can trigger broader cluster instability.
See Kubernetes kubelet not responding: PLEG, runtime, and certificate issues when PID exhaustion causes kubelet health checks to fail.
See Kubernetes kubelet memory leak: detection and OOM cycle for another kubelet-related resource exhaustion pattern.
See Kubernetes conntrack exhaustion: dropped connections under load to differentiate PID pressure from connection tracking limits.

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes node PIDPressure: detection and remediation

Kubernetes node PIDPressure: detection and remediation

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is zombie or leaking processes inside pods

If the cause is container runtime shim accumulation

If the cause is low kernel.pid_max

If the cause is kubelet/cgroup enforcement gap

If the cause is a fork bomb or runaway subprocess

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata