Kubernetes DaemonSet pods Pending: scheduling and tolerations
A DaemonSet pod in Pending on a node where it should run is a scheduling failure, not a workload crash. Since Kubernetes 1.18, DaemonSet pods pass through the default scheduler like any other pod. The DaemonSet controller creates the pod and pins it to a target node via nodeAffinity, but the scheduler still evaluates predicates: taints, tolerations, resource requests, and node state. If the scheduler rejects the pod, it stays Pending. The controller will not create a replacement; it waits for the scheduler to succeed.
This matters because DaemonSets often run cluster-critical software: CNI plugins, node exporters, log collectors, and CSI drivers. A Pending DaemonSet pod can leave a node without networking, monitoring, or storage. It can also create circular dependencies where a node cannot become Ready because the DaemonSet that provides its readiness is the pod that cannot schedule.
What this means
When a DaemonSet pod is Pending, the scheduler has not yet set spec.nodeName. The DaemonSet controller has already decided which node the pod belongs to and has written requiredDuringSchedulingIgnoredDuringExecution node affinity to match that node by name. The scheduler must still admit the pod. If the node has taints the pod does not tolerate, if the pod’s resource requests exceed available allocatable capacity, or if scheduler backoff or a race condition delays processing, the pod remains Pending.
The DaemonSet controller injects a set of NoSchedule tolerations into every pod it creates: node.kubernetes.io/memory-pressure, node.kubernetes.io/disk-pressure, node.kubernetes.io/pid-pressure, node.kubernetes.io/unschedulable, and, for pods with hostNetwork: true, node.kubernetes.io/network-unavailable. It also injects NoExecute tolerations for node.kubernetes.io/not-ready and node.kubernetes.io/unreachable with no tolerationSeconds, which prevents eviction due to transient node health issues. If your cluster uses custom taints or runs on control plane nodes, the DaemonSet must explicitly tolerate those taints; the automatic tolerations do not cover them.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Missing toleration for custom or control-plane taint | Pod Pending only on tainted nodes (control plane, GPU, dedicated workload taints) | Pod events for FailedScheduling with taint predicate failures; compare node spec.taints to pod spec.tolerations |
| Resource request exceeds node allocatable | Pod Pending on a specific node with no taint issues; other pods run | kubectl describe node for Allocated resources; pod events for insufficient cpu or insufficient memory |
| Cordoned node race condition | Node is cordoned and the DaemonSet pod is Pending despite the unschedulable toleration | Whether the scheduler evaluated predicates before the toleration modifier applied; pod may need recreation |
| Circular CNI dependency deadlock | Node is NotReady, network routes are missing, and the CNI DaemonSet pod is Pending | Node conditions and the CNI DaemonSet tolerations; the CNI pod must tolerate node.kubernetes.io/not-ready:NoExecute |
| Node selector mismatch | Pending only on nodes that lack a specific label | DaemonSet spec.template.spec.nodeSelector versus node labels |
| Cluster autoscaler misinterpretation | Pending DaemonSet pods trigger unnecessary node scale-up events in managed clusters | Autoscaler logs showing scale-up for DaemonSet pods; pending pods without workload demand |
Quick checks
Run these in order. They are read-only and safe.
# List Pending DaemonSet pods
kubectl get pods -n kube-system -l app=my-daemonset --field-selector=status.phase=Pending
# Check scheduling events for a Pending pod
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 Events
# Check node taints
kubectl get node <node-name> -o jsonpath='{.spec.taints}'
# Check pod tolerations (including automatically injected ones)
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.tolerations}'
# Check node resource allocation
kubectl describe node <node-name> | grep -A 5 "Allocated resources"
# Check DaemonSet nodeSelector
kubectl get daemonset <ds-name> -n <namespace> -o jsonpath='{.spec.template.spec.nodeSelector}'
# Verify if a node is cordoned
kubectl get node <node-name> -o jsonpath='{.spec.unschedulable}'
# Check scheduler events for FailedScheduling
kubectl get events --all-namespaces --field-selector reason=FailedScheduling --sort-by='.lastTimestamp'
# Check DaemonSet status (does not block)
kubectl get daemonset <ds-name> -n <namespace> -o jsonpath='{.status.numberReady}/{.status.desiredNumberScheduled}'
How to diagnose it
Follow this flow to isolate the blocker.
Confirm the pod is scheduler-bound. Check
spec.nodeName. If it is empty, the scheduler has not yet bound the pod. If it is set but the pod is stillPending, look at kubelet or runtime issues instead of scheduling.Read the pod events.
kubectl describe podis the single best source. Look for messages fromdefault-schedulermentioning taints, resources, or volume conflicts. The event names the predicate that failed.Compare node taints to pod tolerations. If the event mentions a taint, retrieve the node taints with
kubectl get node -o jsonpath='{.spec.taints}'. Then check the pod tolerations. Remember that the DaemonSet controller injects standard tolerations automatically, but it will not inject tolerations for custom taints or the control-plane taint unless the DaemonSet template defines them.Check resource requests against node allocatable. A DaemonSet pod that requests more CPU or memory than the node has available after existing pod requests will remain
Pendingwithinsufficient cpuorinsufficient memoryin events. This affects all nodes simultaneously if every node is similarly constrained.Evaluate cordoned nodes. If the node is cordoned, it carries
node.kubernetes.io/unschedulable:NoSchedule. The DaemonSet controller automatically tolerates this taint, but a race condition exists where the scheduler evaluates predicates before the toleration modifier is applied. If a pod is stuckPendingon a cordoned node, deleting the pod forces recreation and re-evaluation. Deleting the pod is disruptive and may interrupt node-level services; only do this if you accept the interruption.Inspect control plane node taints. Control plane nodes carry
node-role.kubernetes.io/control-plane:NoSchedule(formerlymaster). Most CNI and infrastructure DaemonSets ship with explicit tolerations for these taints. If a custom DaemonSet lacks them, its pods will remainPendingon control plane nodes indefinitely.Check for circular dependency. If the node is
NotReadybecause its CNI plugin is not running, and the CNI plugin is deployed as a DaemonSet, verify that the CNI DaemonSet toleratesnode.kubernetes.io/not-ready:NoExecuteandnode.kubernetes.io/unreachable:NoExecute. Without these, the CNI pod may be evicted or fail to schedule, preventing the node from ever becomingReady.Validate nodeSelector constraints. If the DaemonSet defines a
nodeSelector, ensure the target node has the required labels. APendingpod on a node that lacks the selector is expected behavior, not a failure.Review scheduler and controller-manager state. If multiple DaemonSet pods are
Pendingacross many nodes and events are sparse, check that the scheduler is healthy and that the DaemonSet controller is not lagging. Look at controller workqueue depth (workqueue_depthfor thedaemonsetcontroller) and API server request latency (apiserver_request_duration_seconds).
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| DaemonSet unavailable pod count | Direct measure of pods that are not Running or Ready | status.numberUnavailable > 0 for > 5 minutes |
| Node taint additions | New taints can block existing DaemonSets that lack tolerations | Unexpected taint on production nodes |
| Node resource pressure conditions | Pressure taints can block pods if custom tolerations are missing; resource requests still fail under pressure | Any pressure condition True |
| Node allocatable vs requested resources | Scheduling headroom for DaemonSet pods | CPU or memory requests > 90% of allocatable |
| Scheduler pending pod queue depth | Backlog in the scheduling pipeline | scheduler_pending_pods{queue="unschedulableQ"} growing |
| Pod scheduling duration | Latency from pod creation to node binding | p99 scheduling latency > 30 seconds |
| API server request latency | Slow API server delays scheduler and DaemonSet controller | Mutating request p99 > 1 second sustained |
Fixes
If the cause is a missing toleration
Edit the DaemonSet to add the toleration for the taint blocking scheduling. For control plane nodes, add:
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
For custom taints, match the key, value, and effect exactly. Apply the change and verify the DaemonSet rolls out. Existing Pending pods will be replaced by new pods with the updated tolerations.
If the cause is resource exhaustion
Lower the DaemonSet pod’s resource requests if the workload can tolerate it, or increase node capacity. DaemonSets that request large amounts of CPU or memory on small nodes will never schedule. Use kubectl describe node to confirm the gap. If the node is genuinely full, scale the node pool or evict non-critical workloads. Eviction is destructive and can impact workload availability; use kubectl drain or kubectl delete pod with caution.
If the cause is a cordoned node race
If the node was cordoned unintentionally, uncordon it:
kubectl uncordon <node>
If the node is cordoned intentionally for maintenance and the pod is stuck Pending despite the automatic unschedulable toleration, delete the Pending pod. The DaemonSet controller will recreate it, and the scheduler will re-evaluate it with the toleration present. Deleting the pod is disruptive and may interrupt node-level services; only do this if you accept the interruption.
If the cause is a circular CNI dependency
Ensure the CNI DaemonSet tolerates the node-condition taints that prevent scheduling. The DaemonSet must tolerate node.kubernetes.io/not-ready:NoExecute and node.kubernetes.io/unreachable:NoExecute. The automatically injected NoExecute tolerations cover these, but verify they are present in the pod spec. If the node is stuck in a NotReady loop because the CNI pod is Pending, temporarily removing the taint from the node may be required to bootstrap the network. Removing a taint modifies node state and can allow unwanted pods to schedule; revert the change immediately after the CNI pod starts.
If the cause is autoscaler misinterpretation
On managed clusters with node autoscaling, Pending DaemonSet pods may trigger unnecessary node scale-ups because the autoscaler interprets the Pending state as unschedulable workload demand. Ensure the DaemonSet does not target nodes that the autoscaler manages for workload scaling, or configure the autoscaler to ignore DaemonSet pods when calculating scale-up need.
Prevention
- Audit tolerations against your taint strategy. Compare your node taint strategy against every DaemonSet’s tolerations. Infrastructure DaemonSets (CNI, CSI, monitoring) must tolerate control-plane and custom taints.
- Right-size requests. DaemonSet pods run on every node. A request that is harmless on a 64-core node can block scheduling on a 2-core node. Keep requests minimal and use limits to cap actual usage.
- Use nodeSelectors intentionally. Restrict DaemonSets to the node pools that actually need them. This reduces scheduling surface area and prevents capacity conflicts.
- Alert on unavailable DaemonSet pods. Configure alerts on
status.numberUnavailablefor critical DaemonSets. A Pending pod that persists for more than a few minutes is an incident. - Validate DaemonSet states after cordoning. Include a step in your cordoning runbook to verify DaemonSet pod states. Expect transient Pending states, but confirm they resolve.
How Netdata helps
- Correlate pending pod spikes with node-level resource pressure charts (CPU, memory, disk) to distinguish capacity issues from taint issues.
- Track node condition state transitions (
Ready,MemoryPressure,DiskPressure) alongside DaemonSet pod phases to identify node-level blockers. - Monitor scheduler queue depth and API server mutating latency to detect control-plane delays that slow DaemonSet scheduling.
- Alert on node allocatable saturation, giving early warning before DaemonSet pods are blocked by resource requests.
Related guides
For related control plane and node-level failure patterns, see:
- Kubernetes API server certificate rotation: detection and grace handling
- Kubernetes API server etcd latency: detection and cascading failures
- Kubernetes API server memory pressure: OOM cycle and tuning
- Kubernetes API server rate limiting: APF priority levels and starvation
- Kubernetes API server slow or unresponsive: causes and fixes
- Kubernetes API server watch storm: re-list cascades and connection floods
- Kubernetes conntrack exhaustion: dropped connections under load
- Kubernetes controller-manager leader election failures
- Kubernetes CSI driver failures: detection, recovery, and version skew
- Kubernetes DNS resolution failures inside pods
- Kubernetes eviction cascade: when one node failure takes down the cluster
- Kubernetes kube-proxy iptables sync stall: causes and recovery
flowchart TD
A[DaemonSet pod Pending] --> B{Pod events mention taint?}
B -->|Yes| C[Compare node taints to pod tolerations]
B -->|No| D{Events mention insufficient resources?}
D -->|Yes| E[Check node allocatable vs pod requests]
D -->|No| F{Node cordoned or NotReady?}
F -->|Yes| G[Check unschedulable toleration and CNI dependency]
F -->|No| H[Check nodeSelector and scheduler health]





