ARP cache staleness: when IP-to-MAC mapping goes bad
Hosts on the same subnet stop reaching each other after a VM live-migrates, a container gets rescheduled, or a firewall fails over. ICMP works from some hosts but not others. TCP sessions hang or reset. The data plane is healthy, but the ARP cache on one or more hosts holds a stale IP-to-MAC mapping.
ARP cache staleness is the gap between when a MAC address changes and when every interested host learns about the change. On Linux, this gap is governed by the neighbor (NUD) state machine and its timing parameters. On Windows Vista and later, the neighbor cache follows the same RFC 4861 model. Both platforms default to roughly the same reachable time window: about 15 to 45 seconds before an entry transitions to a stale state, followed by a probe sequence that adds several more seconds before resolution or eviction.
The operational pain comes from the interaction between these timers and workloads that move faster than the cache converges. Container orchestration, VM mobility, and failover clusters routinely change which MAC owns an IP address. If the recovery mechanism (gratuitous ARP) is missing, blocked, or delayed, hosts send traffic to a MAC that no longer owns the destination IP.
Linux NUD state machine
Linux tracks each ARP entry through a Neighbor Unreachability Detection (NUD) state machine. The states and their transitions under default kernel settings:
- REACHABLE: the entry is valid. Lifetime is randomized between 50% and 150% of
base_reachable_time_ms. Default is 30,000 ms, producing a window of 15 to 45 seconds. Positive feedback (a TCP ACK, or an application usingMSG_CONFIRM) extends the lifetime. - STALE: the REACHABLE window expired without positive feedback. The kernel does not discard the entry. It defers revalidation until the next time traffic is sent to this neighbor. There is no fixed expiry timer.
- DELAY: triggered when traffic is sent to a STALE entry. The kernel waits
delay_first_probe_time(default 5 seconds) before actively probing. - PROBE: sends
ucast_solicit(default 3) unicast ARP probes atretrans_time_ms(default 1,000 ms) intervals, thenmcast_solicit(default 3) multicast probes. - FAILED: the probe budget is exhausted. The entry is evicted on the next garbage collection pass.
stateDiagram-v2
[*] --> INCOMPLETE: first packet to unknown neighbor
INCOMPLETE --> REACHABLE: ARP reply received
INCOMPLETE --> FAILED: no reply after mcast_solicit probes
REACHABLE --> STALE: reachable timer expired
STALE --> DELAY: traffic sent to stale entry
DELAY --> PROBE: delay_first_probe_time 5s
PROBE --> REACHABLE: unicast probe answered
PROBE --> FAILED: probe budget exhausted
FAILED --> [*]: GC evicts entryWindows Vista and later implement the same RFC 4861 model for both IPv4 and IPv6. BaseReachable Time defaults to 30,000 ms, producing the same 15 to 45 second reachable window. The default neighbor cache limit is 256 entries on client Windows and 1,024 on Windows Server.
A stale entry does not immediately cause packet loss. Loss happens only when the MAC address behind the IP has actually changed and the entry has not yet been revalidated. Under default timers, recovery from STALE takes roughly 8 to 11 seconds once traffic triggers revalidation: 5 seconds in DELAY, then up to 6 seconds of unicast and multicast probing. If the entry was REACHABLE when the MAC changed, add up to 45 seconds for the reachable timer to expire before revalidation can begin. Without gratuitous ARP, this is how long connectivity can remain broken after a MAC change.
ARP entry timeout also varies by platform at the cache level. Cisco devices default to a 4-hour ARP entry timeout, while Linux entries cycle through the NUD states described above. Stale-versus-fresh interpretation differs accordingly.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| VM migration or failover without GARP | Intermittent connectivity to a specific IP after vMotion, VRRP, or HSRP failover. Some hosts work, others do not. | NUD state of the entry on a failing host: ip neigh show | grep <ip> |
| ARP cache table overflow (gc_thresh3) | Kernel log: “neighbor table overflow!” or “arp_cache: neighbor table overflow!”. New hosts cannot resolve MACs. Common on container hosts and hypervisors. | Entry count vs gc_thresh values: ip neigh show | wc -l |
| Flush leaving incomplete entries | After running ip neigh flush all or arp -d, entries show “(incomplete)” and still count against gc_thresh3. | Incomplete entries: ip -s -s neigh show | grep -i incomplete |
| kube-proxy iptables mode + stale ARP | Intermittent “connection reset by peer” to Kubernetes service backends after pod rescheduling or node restart. Traffic is sent to the wrong node’s MAC. | ARP entries for pod IP ranges on the affected node. |
| Container-dense hosts | Excessive ARP traffic and rapid cache fill on Docker bridge networks with more than ~100 containers. | ARP entry count and gc_thresh on the host. |
| No positive feedback from UDP protocols | NFS over UDP or similar connectionless protocols cause unnecessary MAC reprobes, generating excess ARP traffic. | ARP probe rate in tcpdump for known neighbors. |
Quick checks
# Check all ARP entries with NUD state
ip -4 neigh show
# Detailed view with statistics (shows NUD state per entry)
ip -s -s neigh show
# Raw ARP table
cat /proc/net/arp
# Watch live state changes for a specific neighbor
watch -n 2 'ip neigh show | grep <ip>'
# Count total entries and compare to gc_thresh limits
ip neigh show | wc -l
sysctl net.ipv4.neigh.default.gc_thresh1
sysctl net.ipv4.neigh.default.gc_thresh2
sysctl net.ipv4.neigh.default.gc_thresh3
# Check key timing parameters
sysctl net.ipv4.neigh.default.base_reachable_time_ms
sysctl net.ipv4.neigh.default.gc_stale_time
sysctl net.ipv4.neigh.default.delay_first_probe_time
# Check for incomplete entries (still count against gc_thresh3)
ip -s -s neigh show | grep -i incomplete
# SNMP: query ARP table from a network device
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.4.35.1
# On Windows: show neighbor cache
netsh interface ipv4 show neighbors
How to diagnose it
- Identify the affected IP and check its NUD state on the host that cannot connect.
ip neigh show | grep <affected-ip>
Look for STALE, DELAY, PROBE, or FAILED states. A REACHABLE entry with the wrong MAC means the host learned the wrong MAC (possibly ARP spoofing or a race during migration).
- Verify the correct MAC for the target IP. Check on a host that can reach the target, or query the switch directly:
# On a working host
ip neigh show | grep <affected-ip>
# On a Cisco switch
show ip arp <ip>
- If the MAC differs, the stale entry is the problem. The host will not refresh it until it transitions through STALE to PROBE. To force immediate resolution:
# Delete the stale entry (safe, triggers re-resolution on next use)
ip neigh del <ip> dev <iface>
- Check for cache overflow. If the entry is missing entirely or in FAILED state, the cache may be full:
dmesg | grep -i "neighbor table overflow"
ip neigh show | wc -l
- On container hosts, check whether Docker bridge networking is exhausting the cache. The Docker bridge network driver relies heavily on the ARP cache and can exhaust defaults on hosts with many containers:
ip neigh show dev docker0 | wc -l
For Kubernetes environments with intermittent connection resets, check ARP entries for pod IP ranges on the affected node. Stale entries for pod IPs that moved to another node cause traffic to be delivered to the wrong MAC.
Capture ARP traffic to verify probes and gratuitous ARP are flowing:
tcpdump -i <iface> -n arp | grep <affected-ip>
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| ARP entry count per device | Tracks cache utilization toward gc_thresh limits (Linux) or neighborcachelimit (Windows). | Entry count above 80% of platform limit. |
| NUD state distribution | Shows how many entries are STALE vs REACHABLE, indicating convergence health. | Sudden increase in STALE or FAILED entries. |
| gc_thresh3 exhaustion events | Kernel drops new ARP resolutions when cache is full. New hosts cannot communicate. | dmesg: “neighbor table overflow!” |
| SNMP ipNetToPhysicalEntry (.1.3.6.1.2.1.4.35.1) | Cross-device ARP table via IP-MIB for network gear. Sudden growth signals new subnet, scan, or ARP poisoning. | Growth above 2x rolling baseline. |
| ARP probe rate | High probe rates indicate churn or lack of positive feedback from connectionless protocols. | Sustained high rate of ARP requests for known neighbors. |
| Gratuitous ARP frequency | Normal during HSRP, VRRP, and failover. Abnormal spikes may indicate mobility events. | GARP without a corresponding failover event. |
Fixes
VM migration or failover without GARP
The fastest recovery path is gratuitous ARP (GARP). When a device’s MAC changes during failover, vMotion, or live migration, the owning host should broadcast a GARP announcement so neighbors update their caches immediately. On Linux, GARP is triggered automatically when arp_notify=1 is set for new IP assignment or address change:
# Check current setting
sysctl net.ipv4.conf.<iface>.arp_notify
# Enable GARP on address changes
sysctl -w net.ipv4.conf.<iface>.arp_notify=1
Without GARP, recovery requires waiting for the natural STALE to PROBE to FAILED cycle. Under default settings this is roughly 8 to 11 seconds from when traffic triggers revalidation, plus up to 45 seconds if the entry is still REACHABLE.
For immediate resolution on a single affected host, delete the stale entry:
ip neigh del <ip> dev <iface>
ARP cache table overflow
Default gc_thresh values vary by kernel version and available memory. Verify yours with sysctl net.ipv4.neigh.default.gc_thresh1 through gc_thresh3. Values that are fine for workstations are insufficient for hypervisors, container hosts, and routers with dense neighbor populations. For large environments:
# WARNING: changes affect ARP cache behavior system-wide. Test in staging.
# Persistent configuration via sysctl.conf or a file in /etc/sysctl.d/:
net.ipv4.neigh.default.gc_thresh1 = 512
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096
Apply with sysctl -p. This must be done before deploying dense workloads, not after overflow symptoms appear.
On Windows Server, the default neighbor cache limit is 1,024 entries. If approaching this limit:
netsh interface ipv4 set global neighborcachelimit=<n>
Flush leaving incomplete entries
Using ip -s -s neigh flush all or arp -d removes entries but leaves rows with HWaddress marked “(incomplete)”. These residual entries persist and still count against gc_thresh3. Full removal requires deleting individual entries or waiting for garbage collection:
ip neigh del <ip> dev <iface>
kube-proxy and stale ARP
In Kubernetes with iptables-mode kube-proxy, stale ARP entries for pod IPs cause traffic to be sent to the wrong node after pod rescheduling or node restart. The node’s ARP cache holds the old MAC until the entry reaches STALE and is re-probed. Until re-probing completes, packets go to the wrong MAC.
Mitigation: ensure arp_notify=1 on nodes. For Cilium users, upgrade to 1.9.4 or later; stale ARP entries for terminated pod IPs after cluster autoscaler rescheduling were a known issue in versions prior to 1.9.4.
Container-dense host ARP storms
Hosts with more than 100 containers on a single Docker network can trigger excessive ARP traffic and fill the cache rapidly. The Docker bridge network driver relies heavily on the ARP cache.
Mitigation: raise gc_thresh values before deploying the workload. Consider macvlan networking for high-density container hosts to reduce bridge-level ARP dependency.
Prevention
- Set gc_thresh values proactively on hypervisors, container hosts, and routers. Verify current defaults with sysctl before assuming they are sufficient for your workload.
- Enable arp_notify=1 on hosts that participate in failover or VM mobility. This triggers gratuitous ARP on address changes.
- Monitor ARP entry count continuously. On Linux,
/proc/net/stat/arp_cacheis the raw kernel source. On network devices, poll IP-MIBipNetToPhysicalEntryat.1.3.6.1.2.1.4.35.1. Alert when cache size exceeds 80% of platform limit or grows above 2x rolling baseline. - Do not use the deprecated
arpcommand from net-tools. The modern replacement isip neigh(also acceptsip neighbourorip neighbor). - For connectionless protocols over UDP (NFS, some user-space protocols), expect higher ARP probe rates. These protocols cannot signal positive feedback to the kernel’s NUD layer, so ndisc reprobes MAC addresses unnecessarily.
- Remember that FDB/ARP freshness is not exposed as a standard metric. Time since entry was last refreshed is not in standard MIBs. Staleness must be inferred from polling deltas. Entries may be hours stale; devices may have moved; endpoints may be offline.
- Gratuitous ARP for failover (HSRP, VRRP) is normal. Do not alert on the resulting cache updates.
How Netdata helps
- Netdata collectors can track ARP/neighbor table entry counts per device from
/proc/net/stat/arp_cacheand related procfs sources, giving visibility into cache utilization before gc_thresh exhaustion triggers kernel-level drops. - Correlate ARP entry count spikes with container scheduling events, VM migration timestamps, or failover events to identify mobility-driven staleness as the root cause of intermittent connectivity.
- Alert on sudden ARP cache growth above 2x rolling baseline, or when entry count approaches platform limits, before kernel overflow messages appear in dmesg.
- Cross-reference ARP cache state with interface counters, ICMP reachability, and BGP session state to distinguish ARP-driven connectivity loss from routing or physical-layer failures.
- Monitor per-core softirq rates alongside ARP probe storms to distinguish cache pressure from general network load or RSS misconfiguration.
Related guides
- Asymmetric routing: why your path and latency measurements lie
- BGP flapping: why a peer keeps resetting and how to find the cause
- BGP NOTIFICATION and Cease messages: what each subcode is telling you
- BGP RIB and FIB growth: monitoring route-table size before it bites
- BGP route leak and hijack: the detection signals and alerts that matter
- BGP session Established but stale: detecting silent route loss
- NetFlow storage sizing: how much disk your flow collector really needs
- Flow export-to-ingest latency: why your NetFlow data is minutes behind
- Network monitoring checklist: the signals every production network needs
- NetFlow v9/IPFIX template desync: flows decoded wrong or dropped after a reboot
- Silent UDP flow data loss: why your NetFlow collector is dropping records
- NetFlow vs sFlow vs IPFIX: what they measure and how each one fails







