Kubernetes kube-proxy and CNI rule conflicts: detection and fix
Pods stuck in ContainerCreating while the node reports Ready. Services time out despite existing endpoints. Intermittent connection resets during rolling updates. These symptoms usually indicate a conflict between kube-proxy and the container network interface (CNI) plugin over netfilter rules. Both components program the same kernel tables, compete for the same locks, and can corrupt each other’s chains. The data plane degrades while the control plane stays healthy.
This article covers how kube-proxy and CNI plugins interact on the netfilter path, the failure modes that arise when they conflict, and how to distinguish a rule conflict from a generic network outage.
What this means
kube-proxy implements Kubernetes Services by programming netfilter rules. In iptables mode, it writes chains such as KUBE-SERVICES, KUBE-SVC-*, and KUBE-SEP-* into the nat and filter tables. In ipvs mode, it still maintains a small iptables ruleset for masquerading and filtering. CNI plugins such as Calico, Cilium, Flannel, Weave, and AWS VPC CNI independently write their own chains into the same tables to handle pod ingress, egress, IP masquerading, and NetworkPolicy enforcement.
Because both subsystems share the xtables lock and the same kernel tables, three conflict classes appear in production:
Lock contention. kube-proxy runs
iptables-restoreto atomically replace its chains. That operation holds the xtables lock exclusively. If a CNI plugin or NetworkPolicy controller tries to update rules concurrently, it blocks until kube-proxy finishes. When rule counts are high, the lock can be held for seconds, delaying pod sandbox creation and service updates.Rule corruption. Containerized CNI agents and host-level iptables tooling may use different iptables versions or backends. A save/modify/restore cycle from one component can strip match conditions that another component relies on, turning targeted rules into broad DROP statements.
Sync lag under churn. During rolling updates or horizontal scaling, endpoint changes arrive faster than kube-proxy can program them. If the CNI stack is also rewriting rules for new pods, the combined churn extends the window during which traffic is routed to dead endpoints or new pods lack connectivity.
The result is pod network isolation: the kubelet reports the node as Ready, but new pods cannot start, existing pods lose Service connectivity, and packets are silently dropped in the kernel.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| xtables lock contention | kube-proxy sync duration spikes; CNI plugin timeouts; pods stuck in ContainerCreating | kubeproxy_sync_proxy_rules_duration_seconds p99 |
| iptables version mismatch between host and containers | Chains lose match conditions after a component restart or upgrade, producing bare DROP rules | iptables -t filter -S for malformed KUBE-* or CNI chains |
| Rapid endpoint and pod churn | Sync duration climbs during deployments; endpoint rules lag behind API state | kubeproxy_sync_proxy_rules_endpoint_changes_pending |
| CNI plugin crash or eviction | New pods fail sandbox creation; existing pods retain network but new pods do not | CNI DaemonSet pod status on the node |
| NetworkPolicy rule bloat | Multiple controllers compete for the filter table; sync latency grows linearly with rule count | iptables -t filter -S | wc -l |
Quick checks
Run these checks on the affected node before restarting anything.
# Check kube-proxy sync latency and last successful sync timestamp
curl -s localhost:10249/metrics | grep -E "kubeproxy_sync_proxy_rules_duration_seconds|kubeproxy_sync_proxy_rules_last_timestamp_seconds"
# Check kube-proxy logs for xtables lock contention
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=500 | grep -i "xtables lock"
# Check CNI pod health on the node
kubectl get pods -n kube-system -l k8s-app=calico-node --field-selector spec.nodeName=$(hostname)
# Count kube-proxy iptables rules
iptables -t nat -S | grep -c "^-A KUBE-"
# Check conntrack table utilization (percentage)
echo $(( 100 * $(cat /proc/sys/net/netfilter/nf_conntrack_count) / $(cat /proc/sys/net/netfilter/nf_conntrack_max) ))
# List pods stuck in ContainerCreating on this node
kubectl get pods --all-namespaces --field-selector spec.nodeName=$(hostname),status.phase=Pending -o json | jq -r '.items[] | select(.status.containerStatuses[]?.state.waiting.reason == "ContainerCreating") | .metadata.namespace + "/" + .metadata.name'
If kube-proxy is running in ipvs mode, replace the iptables rule count with:
# Count IPVS virtual servers and real servers
ipvsadm -Ln | grep -c "^TCP\|^UDP"
ipvsadm -Ln | grep -c "^\s*->"
If the node is experiencing CNI-related sandbox failures, check the CNI-specific logs. For Calico, read the calico-node pod logs. For containerd-based runtimes, check journalctl -u containerd for CNI invocation errors.
How to diagnose it
Confirm kube-proxy sync duration is elevated. Query
kubeproxy_sync_proxy_rules_duration_secondsfrom the metrics endpoint. In iptables mode, healthy syncs typically finish in under one second for clusters with fewer than one thousand Services. If p99 exceeds five seconds, the sync loop is stressed.Check for xtables lock errors in kube-proxy logs. Look for messages such as
Another app is currently holding the xtables lock. If these appear more than once per minute, the node has significant lock contention between kube-proxy and another iptables consumer.Correlate sync spikes with pod lifecycle events. Check whether endpoint churn is outpacing sync capacity by comparing sync duration with the rate of
kubeproxy_sync_proxy_rules_endpoint_changes_total. Cross-reference with Deployment rollout timestamps.Verify CNI plugin health and rule activity. Check whether the CNI DaemonSet pod on the node is running or restarting. If the CNI plugin is alive, inspect its logs for
iptables-restoretimeouts or netlink errors. These indicate the CNI plugin is waiting for the xtables lock or failing to apply its own rules.Inspect iptables chains for corruption or gaps. Run
iptables -t nat -Sandiptables -t filter -S. Look for KUBE-* chains that are missing expected match conditions, such as a bare-j DROPinKUBE-FIREWALLwithout a preceding mark match condition. Also verify that expectedKUBE-SVC-*chains exist for active Services.Check if the conntrack table is under pressure. High conntrack utilization combined with sync lag can cause connection tracking entries to be dropped before rules are fully updated. This produces intermittent connection timeouts that look like routing failures but are actually table exhaustion.
Determine whether the issue is node-local or cluster-wide. If only one node is affected, the cause is usually local lock contention or a crashed CNI pod. If all nodes show elevated sync duration simultaneously, look for a cluster-wide event such as a mass Deployment rollout or an API server watch disruption.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| kubeproxy_sync_proxy_rules_duration_seconds p99 | Measures how long kube-proxy holds the xtables lock during each sync | p99 > 5 seconds sustained |
| kubeproxy_sync_proxy_rules_last_timestamp_seconds age | A stale timestamp means rules are not being refreshed | Age > 60 seconds |
| kubeproxy_sync_proxy_rules_endpoint_changes_pending | Indicates endpoint churn is outpacing sync capacity | Non-zero and growing for > 2 minutes |
| xtables lock errors in kube-proxy logs | Direct evidence of contention with CNI or other components | > 1 error per minute |
| conntrack table utilization | Shared resource; exhaustion drops new connections silently | > 75% of nf_conntrack_max |
| iptables rule count | Scales O(services * endpoints) in iptables mode; high counts extend lock hold time | Rapid growth or > 10,000 rules |
| CNI plugin pod restart count | A restarting CNI plugin can leave partial or conflicting rules | Any restart on a node with network symptoms |
| Pod sandbox creation latency | CNI timeouts waiting for the xtables lock block pod startup | ContainerCreating > 5 minutes |
Fixes
If the cause is xtables lock contention
Reduce lock pressure before migrating architectures.
- Increase kube-proxy’s
--iptables-sync-periodto reduce how often it grabs the lock. The tradeoff is that rules stay stale longer between syncs. - Cordon the node to stop new pod scheduling. Fewer new pods means fewer CNI operations competing for the lock.
- Identify the competing process with
lsof /run/xtables.lock. If a NetworkPolicy controller or DaemonSet is flapping, restart it to clear the lock storm. - For clusters with thousands of Services, plan a migration to ipvs mode. IPVS uses hash-based lookups and incremental updates, which dramatically reduces xtables lock contention.
- Evaluate nftables mode if your Kubernetes version and kernel support it. nftables uses per-table transactions and avoids the global xtables lock, though cross-table priority ordering still requires verification with your CNI plugin.
If the cause is rule corruption
Do not restart services blindly. A restart with corrupted rules can make the node unreachable.
- Dump current rules with
iptables-saveand compare them against a healthy node. Look for missing match conditions inKUBE-FIREWALLorKUBE-MARK-MASQchains. - If a specific chain is corrupted, cordon the node, flush only the affected kube-proxy chains with
iptables -t <table> -F <chain>, then restart kube-proxy to reprogram clean rules. Warning: flushing chains interrupts Service traffic on the node. - Ensure that containerized CNI agents and the host use compatible iptables backends. A mismatch between legacy iptables and iptables-nft can cause silent rule stripping during save/restore cycles.
If the cause is CNI plugin failure
- Restart the CNI DaemonSet pod on the affected node. For Calico, delete the
calico-nodepod. For AWS VPC CNI, delete theaws-nodepod. - Verify CNI configuration files in
/etc/cni/net.d/have not been overwritten or truncated. - Check IPAM allocation status. If the CNI cannot assign a pod IP, sandbox creation fails before kube-proxy rules matter.
- If the CNI plugin is being OOM-killed, increase its memory limit. CNI pods are often memory-starved on dense nodes.
If the cause is conntrack exhaustion
- Increase the table size immediately:
sysctl -w net.netfilter.nf_conntrack_max=<higher_value>. Each entry consumes roughly 300 bytes of kernel memory. - Identify connection leaks with
conntrack -Landconntrack -S. If TIME_WAIT or UDP entries dominate, tune the respective timeouts or fix the application connection pooling. - See Kubernetes conntrack exhaustion: dropped connections under load for deeper tuning.
Prevention
- Monitor
kubeproxy_sync_proxy_rules_duration_secondsp99 and alert when it exceeds three seconds for more than five minutes. This catches lock contention before it blocks pod creation. - Monitor xtables lock errors from kube-proxy and CNI logs. Any sustained rate indicates architectural scaling pressure.
- Size nodes with conntrack headroom. Keep utilization below sixty percent during peak traffic to absorb bursts.
- Keep CNI and kube-proxy versions aligned. Run validation after Kubernetes upgrades to confirm that CNI agents still program rules correctly against the host’s iptables or nftables backend.
- For clusters running more than two thousand Services, adopt ipvs mode or nftables mode proactively. iptables mode scales linearly and will eventually overwhelm the sync loop.
- Limit endpoint churn where possible. Avoid rapid scaling events that simultaneously replace hundreds of pods and rewrite thousands of iptables rules.
How Netdata helps
- Netdata collects kube-proxy sync duration and rule count metrics from
:10249/metrics, exposing p99 latency and queue depth per node. - Conntrack utilization charts (
net.netfilter.nf_conntrack_countvsmax) show saturation before packets drop. - Pod-level network metrics and CNI pod health status are visible alongside kube-proxy data, so you can correlate sandbox creation timeouts with sync latency spikes.
- Node CPU softirq time and network stack latency charts help distinguish xtables lock contention from generic CPU saturation.
Related guides
- Kubernetes conntrack exhaustion: dropped connections under load
- Kubernetes DNS resolution failures inside pods
- Kubernetes Deployment rollout stuck: stalled rollouts and ready replicas
flowchart TD
A[Pod network failure or slow startup] --> B{Check kube-proxy sync duration}
B -->|Elevated| C{Check logs for xtables lock}
B -->|Normal| D[CNI or other network issue]
C -->|Lock errors present| E[xtables lock contention]
C -->|No lock errors| F{Check rule count and chain validity}
F -->|Orphaned or corrupted chains| G[Rule corruption or version skew]
F -->|Rules valid but timestamp stale| H[API server watch failure]
E --> I[Cordon node, increase sync period, plan IPVS or nftables migration]
G --> J[Flush corrupted chains, restart kube-proxy, align iptables versions]
D --> K[Check CNI pod health, IPAM, and sandbox logs]





