Kubernetes pod stuck Pending: scheduling failures explained

A Deployment scales up and the new replicas stay Pending. No containers start, no logs appear, and kubectl logs returns nothing because no node is assigned yet. When a pod is stuck in Pending, the scheduler has either not yet evaluated it or has rejected every candidate node. Containers cannot start until the pod is assigned, so this blocks rollouts, autoscaling, and recovery.

This guide covers indefinite Pending caused by scheduling failures: the scheduler Filter phase returns zero viable nodes. Read the FailedScheduling event, distinguish capacity shortages from impossible constraints, and fix the root cause without guessing.

What this means

In Kubernetes, Pending is the phase before a pod is bound to a node. The scheduler watches for pods with an empty spec.nodeName and evaluates them against nodes through the Filter phase (hard constraints) and Score phase (soft preferences). If every node fails filtering, the pod moves to the unschedulable queue and retries with exponential backoff.

A pod stuck Pending from scheduling failure has PodScheduled condition False with reason Unschedulable. The Events section of kubectl describe pod contains the rejection reason, such as Insufficient cpu or node(s) had untolerated taint. If there is no FailedScheduling event, the scheduler may not be running, or the pod may target a custom scheduler that does not exist.

The scheduler uses resource requests, not actual usage, when evaluating CPU and memory. A node reporting low utilization in kubectl top nodes can still reject a pod if existing pods have claimed the remaining allocatable capacity.

Common causes

Cause	What it looks like	First thing to check
Insufficient CPU or memory requests	`0/N nodes are available: Insufficient cpu` or `Insufficient memory`	`kubectl describe node` Allocated resources
Untolerated taints	`node(s) had untolerated taint`	Node taints vs pod tolerations
Node affinity or nodeSelector mismatch	`didn't match Pod's node affinity/selector`	Node labels vs pod affinity rules
Unbound Immediate PVC	`pod has unbound immediate PersistentVolumeClaims`	PVC phase and StorageClass provisioner
Pod anti-affinity unsatisfiable	`node(s) didn't match pod anti-affinity rules`	Existing pod labels and topology distribution
Host port collision	`node(s) didn't have free ports for the requested pod ports`	Pod spec hostPort and node port usage
Missing custom scheduler	No FailedScheduling event; pod never scheduled	`spec.schedulerName` and scheduler health
Namespace ResourceQuota exhausted	FailedCreate events with `forbidden: exceeded quota`	`kubectl describe resourcequota`

Quick checks

# Pending pods and scheduler health
kubectl get pods -A --field-selector=status.phase=Pending -o wide
kubectl get pods -n kube-system -l component=kube-scheduler

# Exact scheduler rejection reason
kubectl describe pod <pod-name> -n <namespace>

# Node scheduling headroom (requests vs allocatable)
kubectl describe nodes | grep -A 5 "Allocated resources"

# PVC blocking scheduling
kubectl get pvc -n <namespace>

# Pod constraints that narrow the node pool
kubectl get pod <pod-name> -n <namespace> -o jsonpath='NodeSelector: {.spec.nodeSelector}{"\n"}Tolerations: {.spec.tolerations}{"\n"}Affinity: {.spec.affinity}'

# Namespace quota consumption
kubectl describe resourcequota -n <namespace>

How to diagnose it

Check scheduler health and leadership. Run kubectl get pods -n kube-system -l component=kube-scheduler and verify one instance is Ready. In HA clusters, check the scheduler Lease object to confirm a leader is elected. A scheduler with no leader never processes the queue. If the scheduler is healthy but pods are pending, the problem is constraints or capacity, not the scheduler process.
Read the pod Events for the FailedScheduling message. Run kubectl describe pod and look under Events. The message after 0/N nodes are available: tells you which filter failed. Start here.
Determine if the rejection is global or selective. Insufficient cpu on all nodes means the cluster is capacity-constrained. A taint or node affinity error affects only a subset of nodes. Scale the cluster for global rejections; change constraints for selective rejections.
Compare pod requests to node allocatable. The scheduler subtracts existing pod requests from status.allocatable to compute headroom. Run kubectl describe node and check the Allocated resources section. If CPU or memory requests are at or near allocatable, no new pods requesting those resources can land. kubectl top nodes shows usage, which can be much lower than requests.
Validate tolerations against node taints. Control-plane nodes and dedicated nodes carry taints that repel pods without matching tolerations. If the pod lacks a matching toleration, those nodes are excluded. Check node taints with kubectl get nodes -o json | jq '.items[].spec.taints'.
Inspect PVC binding state. A PVC with volumeBindingMode: Immediate must be bound before the pod is scheduled. If the PVC is Pending, the pod stays Pending. Verify the StorageClass provisioner is healthy and that zone restrictions match node zones in multi-zone clusters. Switch to volumeBindingMode: WaitForFirstConsumer to let provisioning follow scheduling.
Evaluate anti-affinity and topology rules. Hard requiredDuringSchedulingIgnoredDuringExecution pod anti-affinity requires at least one node per topology value that does not already run a matching pod. If you have three replicas and only two zones, one pod will be unschedulable. Topology spread constraints with maxSkew: 1 behave similarly when the cluster has fewer topology values than replicas.
Verify scheduler name and ResourceQuota. If spec.schedulerName is set to a name that does not match any running scheduler, the pod will sit silently with no FailedScheduling event. If namespace ResourceQuota is at 100%, new pod creation is blocked at admission time and the Deployment will stall.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Scheduler pending pods by queue (activeQ, backoffQ, unschedulableQ)	Indicates whether the scheduler is falling behind or whether pods are impossible to place	unschedulableQ growing for > 5 minutes
Node allocatable vs requested resources	The scheduler uses this ratio to decide fit; it is the definitive capacity signal	Any node > 85% requested CPU or memory
Scheduler schedule attempts total (result=unschedulable)	Direct count of scheduler rejections	Sustained increase in unschedulable attempts
FailedScheduling event rate	Narrative log of why pods are rejected	Any FailedScheduling event on critical workloads
API server request latency	Slow API server delays binding writes and can stall scheduling during storms	Mutating request p99 > 1 second sustained
Controller workqueue depth (deployment/replicaset)	Distinguishes scheduling failures from controller lag that prevents ReplicaSet creation	Depth > 100 sustained for > 5 minutes
etcd disk WAL fsync latency	Slow etcd causes the API server to slow down, which delays scheduler cache updates and binding	p99 > 10 ms sustained

Fixes

If the cause is resource exhaustion

Scale the cluster by adding nodes or reducing pod resource requests. Lowering requests is safe only if actual usage is well below the new request value; otherwise you risk CPU throttling or OOM kills. Evict BestEffort pods that do not need guaranteed capacity. If cluster-autoscaler is configured, verify it is not capped at maxSize.

If the cause is taints or node affinity

Add the required toleration to the pod spec if the node taint is expected. If the taint was applied accidentally, remove it with kubectl taint node <node> <key>:<effect>-. Removing a taint is disruptive if workloads unexpectedly schedule onto that node. For node affinity or nodeSelector mismatches, align the pod’s required labels with the node’s actual labels, or relax the rule to preferredDuringSchedulingIgnoredDuringExecution.

If the cause is volume binding

For PVCs configured to bind immediately, ensure the StorageClass has a working provisioner and that the requested storage size or access mode is satisfiable. For multi-zone clusters, ensure StorageClass topology restrictions match node zones, or switch to deferred binding so the volume is provisioned after the pod is placed. If a volume is stuck attaching from a previous node, force-detach may be required after the old node is fully terminated. Force-detach risks data corruption if the old node is still writing to the volume.

If the cause is pod affinity, anti-affinity, or topology

Convert hard rules to soft rules where the distribution guarantee is not strictly required. Increase the number of topology values by adding nodes in new zones, or reduce the replica count to match the available topology values. For hostPort conflicts, remove hostPort from the spec if it is not required by the workload.

If the cause is a missing scheduler or exhausted quota

Correct spec.schedulerName to match an active scheduler, or remove the field to use the default. For ResourceQuota, increase the quota or clean up existing objects in the namespace to free capacity. Quota is enforced at admission time; running pods are not evicted when quota is reached.

Prevention

Set resource requests based on measured usage. Use LimitRange defaults in namespaces to prevent pods from entering the cluster with zero requests. Avoid hostPort unless the protocol requires it. Validate node affinity and anti-affinity rules against node labels and topology before deploying to production. Design topology spread constraints with ScheduleAnyway where strict distribution is not required. Monitor node allocatable headroom and alert when the cluster-wide request ratio exceeds 80%. Stream Kubernetes events to persistent storage; events expire after one hour by default and are the primary source of scheduling failure history. Verify scheduler names and StorageClass binding behavior in the deployment pipeline.

How Netdata helps

Correlate pending pod count with per-node resource utilization and actual vs requested CPU/memory to distinguish true capacity exhaustion from request bloat.
Surface etcd disk latency and API server latency to determine whether pending pods are caused by scheduler constraints or control plane saturation delaying binding.
Track container CPU throttling and memory usage to validate whether lowering resource requests is safe before making the change.
Visualize node pressure conditions alongside pending pod counts to confirm that evictions and scheduling failures share a common resource root cause.

For control plane latency that can stall scheduling, see Kubernetes API server slow or unresponsive: causes and fixes.
For node-level failures that remove capacity from the scheduling pool, see Kubernetes kubelet not responding: PLEG, runtime, and certificate issues.
For a broader monitoring strategy, see Kubernetes monitoring checklist: the signals every production cluster needs.
If your pods are restarting rather than pending, see Kubernetes pod CrashLoopBackOff: causes, diagnosis, and fixes.
If pods are stuck because images cannot be pulled, see Kubernetes pod ImagePullBackOff: registry, auth, and network diagnosis.

Kubernetes pod stuck Pending: scheduling failures explained

Kubernetes pod stuck Pending: scheduling failures explained

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is resource exhaustion

If the cause is taints or node affinity

If the cause is volume binding

If the cause is pod affinity, anti-affinity, or topology

If the cause is a missing scheduler or exhausted quota

Prevention

How Netdata helps

Related guides