Locating endpoints behind NAT and wireless: the positioning problem
Endpoint positioning maps a MAC address or IP to a specific switch port, access point, or VLAN. It underpins security investigations, access control enforcement, and day-to-day troubleshooting. When the endpoint sits behind a NAT boundary, the Layer 2 and Layer 3 signals that topology engines rely on (FDB entries, ARP tables, flow records) all report the NAT device’s identity, not the endpoint behind it. The endpoint becomes operationally invisible upstream.
This is not a rare edge case. Wireless controllers performing client NAT, SD-WAN branches that NAT into the overlay, cloud VPC NAT gateways, and container runtimes that NAT pod traffic all create this problem. The symptoms are subtle: topology confidence scores drop silently for affected segments, endpoint-positioning orphan rates climb, and security investigations waste time tracing to the NAT device instead of the actual host.
The positioning problem has two layers. First, identity opacity: flow records and L2 tables carry post-NAT addresses, so the true source IP and MAC are hidden from anything upstream. Second, location opacity: the topology engine places the endpoint at the NAT device’s switch port, which is technically correct for the L2 domain the NAT device occupies but useless for finding the actual machine. Recovery requires NAT translation logs correlated by timestamp and port, which many platforms do not retain long enough to support investigations.
What it is and why it matters
The endpoint positioning inference module is a probabilistic subsystem. Given a partial snapshot of FDB entries, ARP tables, CDP/LLDP neighbor data, and STP state, it deduces which switch port, AP, or VLAN a given endpoint MAC or IP is connected to. Accuracy degrades as input data becomes stale or incomplete.
On a flat, non-NAT segment, positioning works well. The endpoint’s MAC appears in the access switch’s forwarding database tied to a specific port. The IP-to-MAC mapping appears in the gateway router’s ARP cache. The topology engine cross-references FDB and ARP to pin the endpoint to a physical port with high confidence. DHCP snooping binding tables and RADIUS or 802.1X accounting logs provide additional IP-MAC-session bindings that further strengthen positioning.
Behind a NAT device, every upstream-visible signal is replaced by the NAT device’s own identity. The upstream switch’s FDB shows the NAT device’s MAC on the port facing the NAT device. The upstream router’s ARP cache maps the NAT device’s IP to the NAT device’s MAC. Flow records carry the post-NAT source IP and port. The topology engine, lacking the endpoint’s actual MAC or true IP in any upstream table, can only place the endpoint at the NAT device’s port.
Any action that depends on endpoint location (blocking a compromised host, tracing a malicious flow, enforcing a VLAN policy) targets the NAT device, not the endpoint. Security teams lose time. Access policies fail silently. Troubleshooting goes to the wrong physical location.
How NAT and wireless break endpoint positioning
Standard endpoint positioning relies on signal sources being visible upstream of the endpoint. The table below shows what each source normally provides and what NAT replaces it with.
| Signal source | What it normally provides | What NAT substitutes |
|---|---|---|
| FDB (MAC table) on access switch | Endpoint MAC on a specific port | NAT device MAC on the port facing the NAT device |
| ARP cache on gateway router | Endpoint IP mapped to endpoint MAC | NAT device IP mapped to NAT device MAC |
| Flow records (NetFlow, IPFIX, sFlow) | Endpoint IP as source in 5-tuple | Post-NAT IP and port as source |
| CDP/LLDP neighbor data | Device identity on directly connected links | NAT device identity, or nothing if the NAT device does not speak CDP/LLDP |
| DHCP snooping binding table | IP-MAC-port binding for each client | NAT device binding, or nothing if DHCP is proxied |
| RADIUS or 802.1X accounting | Session-level IP-MAC-user binding | Binding for the NAT device itself, not its clients |
When all upstream-visible signals converge on the NAT device’s identity, the topology engine has no data to distinguish individual endpoints behind it. The NAT device’s own internal state (its translation table and internal ARP cache for NAT’d clients) is the only source of truth. That state is not exposed through standard MIBs and is typically not polled by the monitoring platform.
flowchart LR
EP[Endpoint
10.0.0.42
MAC AA:BB:CC] --> NAT[NAT boundary
wireless controller
SD-WAN gateway]
NAT -->|post-NAT source| FLOW[Flow records
198.51.100.5:54321]
NAT -->|FDB and ARP upstream| L2[Upstream tables
NAT device MAC only]
NAT -->|internal state| XLOG[Translation table
10.0.0.42:54321
maps to 198.51.100.5:54321]
L2 --> TOPO[Topology engine
places endpoint at
NAT device port]
FLOW --> TOPO
XLOG -.->|retained and correlated| RECOVER[Identity recovery
via timestamp and port]
XLOG -.->|expired or absent| LOST[Identity unrecoverable]You can confirm the opacity by checking whether the endpoint’s MAC or IP appears in upstream tables. If neither does, the endpoint is behind a NAT boundary on that segment.
# Check whether endpoint MAC appears in upstream FDB
# Uses Q-BRIDGE-MIB dot1qTpFdbPort (VLAN-aware bridge)
snmpwalk -v2c -c <community> <switch> .1.3.6.1.2.1.17.7.1.2.2.1.1 | grep <endpoint-mac>
ssh <switch> 'show mac address-table address <endpoint-mac>'
# Check whether endpoint IP appears in upstream ARP
# Uses IP-MIB ipNetToPhysicalTable (replaces legacy ipNetToMediaTable)
snmpwalk -v2c -c <community> <gateway> .1.3.6.1.2.1.4.35.1 | grep <endpoint-ip>
ssh <gateway> 'show ip arp <endpoint-ip>'
# Identify NAT egress: many distinct destinations from one source IP
nfdump -R /var/nfdump/ -s srcip:bytes -n 20
# Check firewall or wireless controller session/NAT table
ssh <fw> 'show session info'
# PAN-OS API equivalent (do not embed API keys in shell history or URLs in production)
curl -sk -H "X-PAN-KEY: <apikey>" "https://<fw>/api/?type=op&cmd=<show><session><info></info></session></show>"
If the endpoint MAC is absent from the upstream FDB but the NAT device’s MAC is present on the same segment, the endpoint is behind NAT. If the endpoint IP is absent from upstream ARP but the NAT device’s IP maps to the NAT device’s MAC, the same conclusion holds.
Wireless networks add a second layer of opacity. iOS 14+ and Android 10+ randomize the MAC address during probe requests and sometimes during association. The factory MAC is only used after authentication completes. For platforms that rely on MAC-based device fingerprinting or static MAC whitelists, the randomized MAC means the identity visible to the wireless infrastructure is ephemeral and does not match any asset inventory entry.
Where it shows up in production
The NAT boundary opacity pattern manifests across several common deployment variants:
Wireless controllers performing client NAT. Guest WLANs and some corporate WLANs NAT at the controller. The controller’s upstream-facing interface carries the post-NAT IP for all client traffic. The access switch sees only the controller’s MAC. Endpoints behind the controller are individually invisible to any upstream monitoring tool. Topology inference confidence for these endpoints drops because the only L2 signal upstream is the controller’s MAC on its uplink port.
SD-WAN overlay NAT at branches. Branch traffic is NAT’d as it enters the SD-WAN overlay tunnel. The overlay’s flow records show the branch gateway’s overlay IP, not the individual endpoint. Underlay monitoring sees encrypted tunnel traffic, not the original flows. The orchestrator’s API may expose per-tunnel SLA data, but per-endpoint identity inside the branch requires the branch gateway’s own translation logs.
Cloud VPC NAT gateways. AWS, Azure, and GCP provide managed NAT gateways for outbound traffic from private subnets. Cloud flow logs (VPC Flow Logs, NSG Flow Logs) show the NAT gateway’s IP as the source for outbound traffic. The instance IP is only visible in the VPC-internal flow records, if they are collected at all. There is no SNMP, no CDP, no LLDP in these environments. The flow record is the only native signal, and it carries the post-NAT address.
Container orchestration NAT. Container runtimes performing NAT for pod traffic create the same opacity. The host node’s IP appears in upstream flow records. Individual pod IPs are visible only within the cluster’s own network namespace. MAC masquerading by virtualization and container platforms further complicates tracking, as MACs may be generated dynamically and change across restarts.
Carrier-grade NAT (CGNAT). ISP-deployed CGNAT uses the shared address space 100.64.0.0/10 (RFC 6598). Multiple subscribers share a single public IP. Port forwarding cannot be configured by the subscriber because the ISP’s CGN box owns the external port mapping. IP-based abuse filters targeting the shared public IP will affect all subscribers behind that address. Detection: check whether the WAN interface address falls within 100.64.0.0/10.
Common misuses and false confidence
Trusting topology confidence without checking data freshness. Topology engines often present endpoint positions with high confidence even when the underlying FDB or ARP data is stale. Many devices learn FDB entries but do not age them aggressively (default CAM aging is often 300 seconds, but misconfigured or static entries persist indefinitely). An FDB entry may long outlast the endpoint’s actual presence on that port, yet the platform still reports “endpoint on port X” with high confidence. See stale FDB/MAC tables for the detailed failure mode.
Not retaining NAT translation logs. NAT translation logs are the only mechanism to recover endpoint identity from post-NAT flow records. If translation logs expire before an investigation begins, the identity is permanently lost. Many platforms default to short retention because translation logs are voluminous. The investigation window and the retention window must be aligned. If your security team’s average investigation starts 24 hours after the event, translation logs must be retained for at least that long.
Using post-NAT IPs in security alerts without correlation. A security alert that fires on a post-NAT source IP leads investigators to the NAT device, not the endpoint. Without translation log correlation built into the alerting pipeline, every NAT’d endpoint alert generates a dead-end investigation. The fix is to enrich flow records with translation log data at ingestion time, so that alerts carry both the post-NAT address (for the flow) and the pre-NAT identity (for the investigation).
Assuming ARP freshness when entries are stale. ARP entry timeout varies by platform: 4 hours on Cisco IOS, approximately 20 minutes on Linux depending on sysctl gc_stale_time and reachable_time. An ARP entry for the NAT device’s IP will be fresh because the NAT device is actively communicating, but any internal mapping the NAT device holds for its clients may have a different lifetime entirely. See ARP cache staleness for how stale ARP entries corrupt topology inference more broadly.
Relying on MAC-based tracking for wireless endpoints. MAC randomization on modern mobile operating systems means the MAC visible during probe and association is not the factory MAC. Asset inventory correlation by MAC fails silently. 802.1X or MAB with a dynamic whitelist is more reliable than static MAC whitelisting for BYOD environments where randomization is in effect.
Ignoring topology confidence for specific endpoint classes. Topology confidence is per-endpoint, not aggregate. An overall high-confidence score can mask low confidence for every endpoint behind a specific wireless controller or SD-WAN gateway. VM mobility events also temporarily drop confidence until convergence. Alert on confidence drops per endpoint class (wired, wireless, virtualized), not just in aggregate.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
| Topology inference confidence score | Drops for endpoints behind NAT because upstream FDB and ARP show only the NAT device | Sustained low confidence for a class of endpoints (wireless, branch, cloud) |
| Endpoint positioning orphan rate | Measures MACs the engine cannot resolve to a physical switch port | Orphan rate above 2% of total discovered endpoints; above 10% in security zones is critical |
| Flow source IP concentration | Many flows from a single IP to many distinct destinations indicates a NAT egress point | Sudden increase in flows sourced from one IP that is a known NAT gateway |
| NAT and session table utilization | Indicates the NAT boundary is active and how many translations exist | Utilization above 70% sustained; above 90% means new connections may be denied |
| FDB and ARP freshness | Stale entries mean the topology engine builds on outdated data | Entries older than 3 to 4 times the expected refresh interval |
| Topology view consistency (CDP/LLDP vs FDB vs ARP) | Disagreement between sources drops positioning confidence | Persistent inconsistency lasting more than 24 hours on critical infrastructure |
| DHCP snooping binding table coverage | Provides IP-MAC-port bindings independent of upstream NAT | Binding table gaps for segments known to have active endpoints |
Query the topology engine directly to check per-endpoint confidence rather than relying on aggregate dashboards.
# Check topology engine confidence for specific endpoints
# <!-- TODO: verify API endpoint name and path for your platform -->
curl -s http://localhost:<port>/api/topology/confidence | jq '.'
# Check NAT/session table utilization on firewalls
ssh <fw> 'show session info'
# Monitor endpoint positioning orphan rate from topology engine metrics
curl -s http://localhost:<port>/metrics | grep -E 'orphan|unresolved'
How Netdata helps
Netdata collects the underlying signals that reveal NAT boundary opacity:
- SNMP polling of FDB tables, ARP caches, and device metrics surfaces stale entries and missing endpoints that indicate NAT is hiding hosts from upstream topology.
- Flow and sFlow collection via the nfacctd plugin captures flow records that, when correlated against NAT translation logs, enable identity recovery during investigations.
- Firewall and load balancer metrics (session counts, NAT pool utilization) from SNMP or agent-based collectors warn before session exhaustion denies new connections.
- Syslog ingestion captures NAT translation events and wireless controller association logs that can be correlated with flow records for identity recovery.
- Per-endpoint metric dimensions allow you to build alerts on confidence drops or orphan rate increases for specific endpoint classes (wireless, branch, cloud) rather than aggregate scores that mask localized problems.
Related guides
- ARP cache staleness: when IP-to-MAC mapping goes bad
- Asymmetric routing: why your path and latency measurements lie
- Audit log gaps: detecting syslog/trap tampering or loss
- BGP flapping: why a peer keeps resetting and how to find the cause
- BGP NOTIFICATION and Cease messages: what each subcode is telling you
- BGP RIB and FIB growth: monitoring route-table size before it bites
- BGP route leak and hijack: the detection signals and alerts that matter
- BGP session Established but stale: detecting silent route loss
- Stale FDB/MAC tables: why endpoint location is wrong
- NetFlow storage sizing: how much disk your flow collector really needs
- Flow export-to-ingest latency: why your NetFlow data is minutes behind
- Network monitoring checklist: the signals every production network needs







