Kubernetes kubelet pod CIDR changes: detection and rolling fix
Pod sandbox creation errors, nodes registering without an IP range, and cross-node traffic failures are symptoms of pod CIDR drift. The kubelet does not assign its own pod CIDR; the kube-controller-manager node IPAM controller writes the range into node.spec.podCIDR and node.spec.podCIDRs. When that assignment fails or diverges, the result is usually a slow fracture: some nodes host pods while others cannot, or pods on different nodes lose connectivity.
This guide shows how to detect node-level pod CIDR mismatches, distinguish built-in IPAM failures from CNI-delegated allocation, and apply a rolling fix that reassigns CIDRs without cluster-wide downtime.
What this means
The canonical source of truth for per-node pod IP allocation in built-in IPAM clusters is the Node object. kube-controller-manager watches for nodes without spec.podCIDRs and assigns one from the pool defined by --cluster-cidr. This requires --allocate-node-cidrs=true.
The field node.spec.podCIDR (singular string) was deprecated in Kubernetes v1.13, but both fields are still written by the controller on nodes that receive an allocation.
Pod CIDR drift usually falls into one of three categories:
- A node loses its CIDR entirely and cannot allocate IPs for new pods.
- The cluster-wide
--cluster-cidris changed, but existing nodes retain old assignments while new nodes receive new ones, creating incompatible pod networks. - The CIDR pool is exhausted, so new or recreated nodes receive no allocation.
On managed cloud providers (EKS, GKE, AKS), node.spec.podCIDR is often empty because the CNI plugin handles IPAM independently. An empty field on these platforms is normal and should not trigger the same response as an empty field in a kubeadm or bare-metal cluster using built-in IPAM.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| IPAM pool exhaustion | “CIDR allocation failed: out of CIDRs” in controller logs; new nodes have no spec.podCIDR | Controller-manager logs for allocation failures |
Cluster-wide --cluster-cidr drift | Nodes have mismatched CIDR ranges; routing failures between pods on different nodes | kubectl get nodes -o jsonpath for spec.podCIDRs across all nodes |
| Node recreated without retaining CIDR | Node re-registers with empty spec.podCIDR after drain or delete | Node object status and controller-manager allocation logs |
| Managed cloud CNI delegation | node.spec.podCIDR is empty but pods have valid IPs; IPAM handled by cloud CNI | CNI plugin state (for example, AWS VPC CNI, CiliumNode CRD) |
| Out-of-tree node-ipam-controller | ClusterCIDR CRD exists but node.spec.podCIDRs does not match the CRD pools | ClusterCIDR custom resources and controller logs |
Quick checks
# List every node and its assigned CIDR
kubectl get nodes -o custom-columns=NAME:.metadata.name,CIDR:.spec.podCIDR
# Show both IPv4 and IPv6 assignments for dual-stack
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{": "}{.spec.podCIDRs}{"\n"}{end}'
# Count nodes with a CIDR
kubectl get nodes -o jsonpath='{range .items[*]}{if .spec.podCIDR}{.metadata.name}{"\n"}{end}' | wc -l
# Check controller-manager logs for allocation events
kubectl logs -n kube-system kube-controller-manager-<node> | grep -i "allocated cidr"
# Verify controller-manager is running with IPAM enabled (run on a control plane node)
ps aux | grep kube-controller-manager | grep -E 'allocate-node-cidrs|cluster-cidr'
# Check for CNI-specific IPAM state if using Cilium
kubectl get ciliumnode -o json | jq '[.items[] | {node: .metadata.name, cidr: .spec.ipam.podCIDRs[0]}]'
How to diagnose it
Confirm your cluster’s IPAM model. If you are on EKS, GKE, or AKS, or if you explicitly disabled
--allocate-node-cidrs,node.spec.podCIDRmay be empty by design. Check your CNI plugin’s IPAM state instead of the Node object. If you are on kubeadm or bare metal with built-in IPAM, an empty field is abnormal.Audit all nodes for CIDR consistency. Run the jsonpath checks above. In a healthy built-in IPAM cluster, every node should have a non-empty
spec.podCIDRs. Look for nodes with empty fields or CIDRs that fall outside the current--cluster-cidrrange.Check kube-controller-manager logs for allocation errors. Look for “out of CIDRs” or repeated allocation attempts for the same node. If the controller is exhausted, no new CIDRs can be assigned until the pool is expanded or old allocations are freed.
Verify controller-manager startup flags. The flag
--allocate-node-cidrsmust be true, and--cluster-cidrmust match your operational intent. If the flag was changed on a live cluster without a migration plan, old nodes will hold stale CIDRs.For dual-stack clusters, inspect both IPv4 and IPv6 entries in
spec.podCIDRs. A missing stack or partial assignment can cause pods to come up with only one family.Correlate pod sandbox failures with node CIDR status. If pods on a specific node are stuck in
ContainerCreatingwith network setup errors, and that node lacks aspec.podCIDR, the root cause is likely IPAM failure.If using an out-of-tree node-ipam-controller, inspect ClusterCIDR CRDs. The controller allocates only to new nodes; it does not retroactively rewrite existing assignments. Migrating to a new pool requires draining and recreating nodes.
flowchart TD
A[Pod sandbox failures or CIDR warnings] --> B{node.spec.podCIDR empty?}
B -->|Yes| C{Managed cloud provider?}
B -->|No| D[Check CIDR consistency across nodes]
C -->|Yes| E[Inspect CNI plugin IPAM state]
C -->|No| F[Check controller-manager logs for allocation errors]
D --> G{Mismatched or stale CIDRs?}
G -->|Yes| H[Plan rolling drain/delete/restart]
G -->|No| I[Monitor for pool exhaustion]
F --> J{Out of CIDRs?}
J -->|Yes| K[Free pool or plan CIDR migration]
J -->|No| L[Restart kubelet to trigger re-registration]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Node spec.podCIDR / spec.podCIDRs | Canonical per-node allocation state in built-in IPAM | Empty or missing field on any node |
| Controller-manager CIDR allocation logs | Direct evidence of pool exhaustion or assignment failure | “out of CIDRs” or timeout errors |
| Pod sandbox creation failures | Downstream symptom of missing or invalid CIDR | FailedCreatePodSandBox events clustered on one node |
| Node Ready status | Node may flip to NotReady if kubelet cannot reconcile networking | Ready=False or Unknown correlated with IPAM gaps |
| etcd object count (nodes) | Node churn drives re-allocation pressure | Spike in node object creation/deletion rate |
| CNI IPAM events (cloud/provider-specific) | Managed CNIs track allocation independently of the Node object | IP allocation failures in CNI DaemonSet logs |
Fixes
If the cause is IPAM pool exhaustion
There is no supported in-place resize of --cluster-cidr; the flag is read at controller-manager startup. If you hit “out of CIDRs”, you have two paths:
- Immediate relief: identify unused nodes, drain them, and delete their Node objects to free CIDRs back to the pool. This is destructive to workloads and should be done only if the cluster has excess capacity.
- Proper fix: plan a maintenance window to migrate to a wider cluster CIDR. Recreate nodes under the new configuration to receive fresh allocations.
If the cause is a node that lost its CIDR
Warning: This procedure evicts workloads and deletes the Node object. Do not run it against multiple nodes concurrently. If your cloud provider manages node lifecycle (for example, managed node groups), verify that deleting the Node object will not trigger instance termination.
Apply the fix to one node at a time:
- Cordon and drain the target node:
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data - Delete the Node object:
kubectl delete node <node> - Restart kubelet on the target host:
systemctl restart kubelet - Wait for the node to re-register and for the controller-manager to assign a fresh CIDR. Verify:
kubectl get node <node> -o jsonpath='{.spec.podCIDR}' - Uncordon the node once it reports Ready.
This changes the node’s pod CIDR. Pods that were using the old CIDR are already evicted; new pods receive IPs from the fresh range. Ensure the workload tolerates rescheduling before you start.
If the cause is configuration drift after changing --cluster-cidr
Do not allow old and new CIDRs to coexist without explicit routing. The safest recovery is to rotate nodes:
- Cordon all nodes carrying the old CIDR.
- Drain them one at a time.
- Delete the Node object after each drain.
- Let the node re-register with the new controller-manager configuration to receive a CIDR from the updated pool.
If the cause is managed provider CNI behavior
On platforms that delegate IPAM to the CNI plugin, do not manually edit node.spec.podCIDR. Instead:
- For AWS VPC CNI, check the
aws-nodeDaemonSet and instance ENI/IP limits. - For Cilium, inspect the
CiliumNodeCRD and verify thatspec.ipam.podCIDRsis consistent. - Follow your provider’s documented expansion path.
Prevention
- Monitor
node.spec.podCIDRandspec.podCIDRsas part of your cluster health baseline. Alert when any built-in IPAM node lacks a CIDR. - Do not treat
--cluster-cidras a value that can be changed in-place. If you must resize it, write a runbook that recreates nodes. - Size your initial
--cluster-cidrwith at least 2x headroom for node growth. - Document whether your environment uses built-in IPAM or CNI-delegated IPAM in your on-call runbook so responders do not chase an empty
node.spec.podCIDRthat is expected. - For out-of-tree IPAM, monitor ClusterCIDR CRD capacity and plan node rotations for pool migrations.
How Netdata helps
- Per-node pod counts and sandbox creation latencies surface nodes that are approaching IP exhaustion before they hit hard limits.
- Correlating node
NotReadytransitions with pod scheduling failures distinguishes IPAM problems from general kubelet issues. - Network interface and CNI metrics on the node show when IP allocation slows or fails at the host level, bridging the gap between control-plane IPAM and runtime behavior.






