$ guides / kubernetes / kubernetes-kubelet-pod-cidr-changes ▌

Operations Guides

Kubernetes kubelet pod CIDR changes: detection and rolling fix

Pod sandbox creation errors, nodes registering without an IP range, and cross-node traffic failures are symptoms of pod CIDR drift. The kubelet does not assign its own pod CIDR; the kube-controller-manager node IPAM controller writes the range into node.spec.podCIDR and node.spec.podCIDRs. When that assignment fails or diverges, the result is usually a slow fracture: some nodes host pods while others cannot, or pods on different nodes lose connectivity.

This guide shows how to detect node-level pod CIDR mismatches, distinguish built-in IPAM failures from CNI-delegated allocation, and apply a rolling fix that reassigns CIDRs without cluster-wide downtime.

What this means

The canonical source of truth for per-node pod IP allocation in built-in IPAM clusters is the Node object. kube-controller-manager watches for nodes without spec.podCIDRs and assigns one from the pool defined by --cluster-cidr. This requires --allocate-node-cidrs=true.

The field node.spec.podCIDR (singular string) was deprecated in Kubernetes v1.13, but both fields are still written by the controller on nodes that receive an allocation.

Pod CIDR drift usually falls into one of three categories:

A node loses its CIDR entirely and cannot allocate IPs for new pods.
The cluster-wide --cluster-cidr is changed, but existing nodes retain old assignments while new nodes receive new ones, creating incompatible pod networks.
The CIDR pool is exhausted, so new or recreated nodes receive no allocation.

On managed cloud providers (EKS, GKE, AKS), node.spec.podCIDR is often empty because the CNI plugin handles IPAM independently. An empty field on these platforms is normal and should not trigger the same response as an empty field in a kubeadm or bare-metal cluster using built-in IPAM.

Common causes

Cause	What it looks like	First thing to check
IPAM pool exhaustion	“CIDR allocation failed: out of CIDRs” in controller logs; new nodes have no `spec.podCIDR`	Controller-manager logs for allocation failures
Cluster-wide `--cluster-cidr` drift	Nodes have mismatched CIDR ranges; routing failures between pods on different nodes	`kubectl get nodes -o jsonpath` for `spec.podCIDRs` across all nodes
Node recreated without retaining CIDR	Node re-registers with empty `spec.podCIDR` after drain or delete	Node object status and controller-manager allocation logs
Managed cloud CNI delegation	`node.spec.podCIDR` is empty but pods have valid IPs; IPAM handled by cloud CNI	CNI plugin state (for example, AWS VPC CNI, CiliumNode CRD)
Out-of-tree node-ipam-controller	`ClusterCIDR` CRD exists but `node.spec.podCIDRs` does not match the CRD pools	ClusterCIDR custom resources and controller logs

Quick checks

# List every node and its assigned CIDR
kubectl get nodes -o custom-columns=NAME:.metadata.name,CIDR:.spec.podCIDR

# Show both IPv4 and IPv6 assignments for dual-stack
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{": "}{.spec.podCIDRs}{"\n"}{end}'

# Count nodes with a CIDR
kubectl get nodes -o jsonpath='{range .items[*]}{if .spec.podCIDR}{.metadata.name}{"\n"}{end}' | wc -l

# Check controller-manager logs for allocation events
kubectl logs -n kube-system kube-controller-manager-<node> | grep -i "allocated cidr"

# Verify controller-manager is running with IPAM enabled (run on a control plane node)
ps aux | grep kube-controller-manager | grep -E 'allocate-node-cidrs|cluster-cidr'

# Check for CNI-specific IPAM state if using Cilium
kubectl get ciliumnode -o json | jq '[.items[] | {node: .metadata.name, cidr: .spec.ipam.podCIDRs[0]}]'

How to diagnose it

Confirm your cluster’s IPAM model. If you are on EKS, GKE, or AKS, or if you explicitly disabled --allocate-node-cidrs, node.spec.podCIDR may be empty by design. Check your CNI plugin’s IPAM state instead of the Node object. If you are on kubeadm or bare metal with built-in IPAM, an empty field is abnormal.
Audit all nodes for CIDR consistency. Run the jsonpath checks above. In a healthy built-in IPAM cluster, every node should have a non-empty spec.podCIDRs. Look for nodes with empty fields or CIDRs that fall outside the current --cluster-cidr range.
Check kube-controller-manager logs for allocation errors. Look for “out of CIDRs” or repeated allocation attempts for the same node. If the controller is exhausted, no new CIDRs can be assigned until the pool is expanded or old allocations are freed.
Verify controller-manager startup flags. The flag --allocate-node-cidrs must be true, and --cluster-cidr must match your operational intent. If the flag was changed on a live cluster without a migration plan, old nodes will hold stale CIDRs.
For dual-stack clusters, inspect both IPv4 and IPv6 entries in spec.podCIDRs. A missing stack or partial assignment can cause pods to come up with only one family.
Correlate pod sandbox failures with node CIDR status. If pods on a specific node are stuck in ContainerCreating with network setup errors, and that node lacks a spec.podCIDR, the root cause is likely IPAM failure.
If using an out-of-tree node-ipam-controller, inspect ClusterCIDR CRDs. The controller allocates only to new nodes; it does not retroactively rewrite existing assignments. Migrating to a new pool requires draining and recreating nodes.

flowchart TD
    A[Pod sandbox failures or CIDR warnings] --> B{node.spec.podCIDR empty?}
    B -->|Yes| C{Managed cloud provider?}
    B -->|No| D[Check CIDR consistency across nodes]
    C -->|Yes| E[Inspect CNI plugin IPAM state]
    C -->|No| F[Check controller-manager logs for allocation errors]
    D --> G{Mismatched or stale CIDRs?}
    G -->|Yes| H[Plan rolling drain/delete/restart]
    G -->|No| I[Monitor for pool exhaustion]
    F --> J{Out of CIDRs?}
    J -->|Yes| K[Free pool or plan CIDR migration]
    J -->|No| L[Restart kubelet to trigger re-registration]

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Node `spec.podCIDR` / `spec.podCIDRs`	Canonical per-node allocation state in built-in IPAM	Empty or missing field on any node
Controller-manager CIDR allocation logs	Direct evidence of pool exhaustion or assignment failure	“out of CIDRs” or timeout errors
Pod sandbox creation failures	Downstream symptom of missing or invalid CIDR	`FailedCreatePodSandBox` events clustered on one node
Node Ready status	Node may flip to NotReady if kubelet cannot reconcile networking	Ready=False or Unknown correlated with IPAM gaps
etcd object count (nodes)	Node churn drives re-allocation pressure	Spike in node object creation/deletion rate
CNI IPAM events (cloud/provider-specific)	Managed CNIs track allocation independently of the Node object	IP allocation failures in CNI DaemonSet logs

Fixes

If the cause is IPAM pool exhaustion

There is no supported in-place resize of --cluster-cidr; the flag is read at controller-manager startup. If you hit “out of CIDRs”, you have two paths:

Immediate relief: identify unused nodes, drain them, and delete their Node objects to free CIDRs back to the pool. This is destructive to workloads and should be done only if the cluster has excess capacity.
Proper fix: plan a maintenance window to migrate to a wider cluster CIDR. Recreate nodes under the new configuration to receive fresh allocations.

If the cause is a node that lost its CIDR

Warning: This procedure evicts workloads and deletes the Node object. Do not run it against multiple nodes concurrently. If your cloud provider manages node lifecycle (for example, managed node groups), verify that deleting the Node object will not trigger instance termination.

Apply the fix to one node at a time:

Cordon and drain the target node:

kubectl drain <node> --ignore-daemonsets --delete-emptydir-data

Delete the Node object:
```
kubectl delete node <node>
```
Restart kubelet on the target host:
```
systemctl restart kubelet
```
Wait for the node to re-register and for the controller-manager to assign a fresh CIDR. Verify:
```
kubectl get node <node> -o jsonpath='{.spec.podCIDR}'
```
Uncordon the node once it reports Ready.

This changes the node’s pod CIDR. Pods that were using the old CIDR are already evicted; new pods receive IPs from the fresh range. Ensure the workload tolerates rescheduling before you start.

If the cause is configuration drift after changing `--cluster-cidr`

Do not allow old and new CIDRs to coexist without explicit routing. The safest recovery is to rotate nodes:

Cordon all nodes carrying the old CIDR.
Drain them one at a time.
Delete the Node object after each drain.
Let the node re-register with the new controller-manager configuration to receive a CIDR from the updated pool.

If the cause is managed provider CNI behavior

On platforms that delegate IPAM to the CNI plugin, do not manually edit node.spec.podCIDR. Instead:

For AWS VPC CNI, check the aws-node DaemonSet and instance ENI/IP limits.
For Cilium, inspect the CiliumNode CRD and verify that spec.ipam.podCIDRs is consistent.
Follow your provider’s documented expansion path.

Prevention

Monitor node.spec.podCIDR and spec.podCIDRs as part of your cluster health baseline. Alert when any built-in IPAM node lacks a CIDR.
Do not treat --cluster-cidr as a value that can be changed in-place. If you must resize it, write a runbook that recreates nodes.
Size your initial --cluster-cidr with at least 2x headroom for node growth.
Document whether your environment uses built-in IPAM or CNI-delegated IPAM in your on-call runbook so responders do not chase an empty node.spec.podCIDR that is expected.
For out-of-tree IPAM, monitor ClusterCIDR CRD capacity and plan node rotations for pool migrations.

How Netdata helps

Per-node pod counts and sandbox creation latencies surface nodes that are approaching IP exhaustion before they hit hard limits.
Correlating node NotReady transitions with pod scheduling failures distinguishes IPAM problems from general kubelet issues.
Network interface and CNI metrics on the node show when IP allocation slows or fails at the host level, bridging the gap between control-plane IPAM and runtime behavior.

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes kubelet pod CIDR changes: detection and rolling fix

Kubernetes kubelet pod CIDR changes: detection and rolling fix

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is IPAM pool exhaustion

If the cause is a node that lost its CIDR

If the cause is configuration drift after changing --cluster-cidr

If the cause is managed provider CNI behavior

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata

If the cause is configuration drift after changing `--cluster-cidr`