Locating endpoints behind NAT and wireless: the positioning problem

Endpoint positioning maps a MAC address or IP to a specific switch port, access point, or VLAN. It underpins security investigations, access control enforcement, and day-to-day troubleshooting. When the endpoint sits behind a NAT boundary, the Layer 2 and Layer 3 signals that topology engines rely on (FDB entries, ARP tables, flow records) all report the NAT device’s identity, not the endpoint behind it. The endpoint becomes operationally invisible upstream.

This is not a rare edge case. Wireless controllers performing client NAT, SD-WAN branches that NAT into the overlay, cloud VPC NAT gateways, and container runtimes that NAT pod traffic all create this problem. The symptoms are subtle: topology confidence scores drop silently for affected segments, endpoint-positioning orphan rates climb, and security investigations waste time tracing to the NAT device instead of the actual host.

The positioning problem has two layers. First, identity opacity: flow records and L2 tables carry post-NAT addresses, so the true source IP and MAC are hidden from anything upstream. Second, location opacity: the topology engine places the endpoint at the NAT device’s switch port, which is technically correct for the L2 domain the NAT device occupies but useless for finding the actual machine. Recovery requires NAT translation logs correlated by timestamp and port, which many platforms do not retain long enough to support investigations.

What it is and why it matters

The endpoint positioning inference module is a probabilistic subsystem. Given a partial snapshot of FDB entries, ARP tables, CDP/LLDP neighbor data, and STP state, it deduces which switch port, AP, or VLAN a given endpoint MAC or IP is connected to. Accuracy degrades as input data becomes stale or incomplete.

On a flat, non-NAT segment, positioning works well. The endpoint’s MAC appears in the access switch’s forwarding database tied to a specific port. The IP-to-MAC mapping appears in the gateway router’s ARP cache. The topology engine cross-references FDB and ARP to pin the endpoint to a physical port with high confidence. DHCP snooping binding tables and RADIUS or 802.1X accounting logs provide additional IP-MAC-session bindings that further strengthen positioning.

Behind a NAT device, every upstream-visible signal is replaced by the NAT device’s own identity. The upstream switch’s FDB shows the NAT device’s MAC on the port facing the NAT device. The upstream router’s ARP cache maps the NAT device’s IP to the NAT device’s MAC. Flow records carry the post-NAT source IP and port. The topology engine, lacking the endpoint’s actual MAC or true IP in any upstream table, can only place the endpoint at the NAT device’s port.

Any action that depends on endpoint location (blocking a compromised host, tracing a malicious flow, enforcing a VLAN policy) targets the NAT device, not the endpoint. Security teams lose time. Access policies fail silently. Troubleshooting goes to the wrong physical location.

How NAT and wireless break endpoint positioning

Standard endpoint positioning relies on signal sources being visible upstream of the endpoint. The table below shows what each source normally provides and what NAT replaces it with.

Signal sourceWhat it normally providesWhat NAT substitutes
FDB (MAC table) on access switchEndpoint MAC on a specific portNAT device MAC on the port facing the NAT device
ARP cache on gateway routerEndpoint IP mapped to endpoint MACNAT device IP mapped to NAT device MAC
Flow records (NetFlow, IPFIX, sFlow)Endpoint IP as source in 5-tuplePost-NAT IP and port as source
CDP/LLDP neighbor dataDevice identity on directly connected linksNAT device identity, or nothing if the NAT device does not speak CDP/LLDP
DHCP snooping binding tableIP-MAC-port binding for each clientNAT device binding, or nothing if DHCP is proxied
RADIUS or 802.1X accountingSession-level IP-MAC-user bindingBinding for the NAT device itself, not its clients

When all upstream-visible signals converge on the NAT device’s identity, the topology engine has no data to distinguish individual endpoints behind it. The NAT device’s own internal state (its translation table and internal ARP cache for NAT’d clients) is the only source of truth. That state is not exposed through standard MIBs and is typically not polled by the monitoring platform.

flowchart LR
    EP[Endpoint
10.0.0.42
MAC AA:BB:CC] --> NAT[NAT boundary
wireless controller
SD-WAN gateway] NAT -->|post-NAT source| FLOW[Flow records
198.51.100.5:54321] NAT -->|FDB and ARP upstream| L2[Upstream tables
NAT device MAC only] NAT -->|internal state| XLOG[Translation table
10.0.0.42:54321
maps to 198.51.100.5:54321] L2 --> TOPO[Topology engine
places endpoint at
NAT device port] FLOW --> TOPO XLOG -.->|retained and correlated| RECOVER[Identity recovery
via timestamp and port] XLOG -.->|expired or absent| LOST[Identity unrecoverable]

You can confirm the opacity by checking whether the endpoint’s MAC or IP appears in upstream tables. If neither does, the endpoint is behind a NAT boundary on that segment.

# Check whether endpoint MAC appears in upstream FDB
# Uses Q-BRIDGE-MIB dot1qTpFdbPort (VLAN-aware bridge)
snmpwalk -v2c -c <community> <switch> .1.3.6.1.2.1.17.7.1.2.2.1.1 | grep <endpoint-mac>
ssh <switch> 'show mac address-table address <endpoint-mac>'

# Check whether endpoint IP appears in upstream ARP
# Uses IP-MIB ipNetToPhysicalTable (replaces legacy ipNetToMediaTable)
snmpwalk -v2c -c <community> <gateway> .1.3.6.1.2.1.4.35.1 | grep <endpoint-ip>
ssh <gateway> 'show ip arp <endpoint-ip>'

# Identify NAT egress: many distinct destinations from one source IP
nfdump -R /var/nfdump/ -s srcip:bytes -n 20

# Check firewall or wireless controller session/NAT table
ssh <fw> 'show session info'
# PAN-OS API equivalent (do not embed API keys in shell history or URLs in production)
curl -sk -H "X-PAN-KEY: <apikey>" "https://<fw>/api/?type=op&cmd=<show><session><info></info></session></show>"

If the endpoint MAC is absent from the upstream FDB but the NAT device’s MAC is present on the same segment, the endpoint is behind NAT. If the endpoint IP is absent from upstream ARP but the NAT device’s IP maps to the NAT device’s MAC, the same conclusion holds.

Wireless networks add a second layer of opacity. iOS 14+ and Android 10+ randomize the MAC address during probe requests and sometimes during association. The factory MAC is only used after authentication completes. For platforms that rely on MAC-based device fingerprinting or static MAC whitelists, the randomized MAC means the identity visible to the wireless infrastructure is ephemeral and does not match any asset inventory entry.

Where it shows up in production

The NAT boundary opacity pattern manifests across several common deployment variants:

Wireless controllers performing client NAT. Guest WLANs and some corporate WLANs NAT at the controller. The controller’s upstream-facing interface carries the post-NAT IP for all client traffic. The access switch sees only the controller’s MAC. Endpoints behind the controller are individually invisible to any upstream monitoring tool. Topology inference confidence for these endpoints drops because the only L2 signal upstream is the controller’s MAC on its uplink port.

SD-WAN overlay NAT at branches. Branch traffic is NAT’d as it enters the SD-WAN overlay tunnel. The overlay’s flow records show the branch gateway’s overlay IP, not the individual endpoint. Underlay monitoring sees encrypted tunnel traffic, not the original flows. The orchestrator’s API may expose per-tunnel SLA data, but per-endpoint identity inside the branch requires the branch gateway’s own translation logs.

Cloud VPC NAT gateways. AWS, Azure, and GCP provide managed NAT gateways for outbound traffic from private subnets. Cloud flow logs (VPC Flow Logs, NSG Flow Logs) show the NAT gateway’s IP as the source for outbound traffic. The instance IP is only visible in the VPC-internal flow records, if they are collected at all. There is no SNMP, no CDP, no LLDP in these environments. The flow record is the only native signal, and it carries the post-NAT address.

Container orchestration NAT. Container runtimes performing NAT for pod traffic create the same opacity. The host node’s IP appears in upstream flow records. Individual pod IPs are visible only within the cluster’s own network namespace. MAC masquerading by virtualization and container platforms further complicates tracking, as MACs may be generated dynamically and change across restarts.

Carrier-grade NAT (CGNAT). ISP-deployed CGNAT uses the shared address space 100.64.0.0/10 (RFC 6598). Multiple subscribers share a single public IP. Port forwarding cannot be configured by the subscriber because the ISP’s CGN box owns the external port mapping. IP-based abuse filters targeting the shared public IP will affect all subscribers behind that address. Detection: check whether the WAN interface address falls within 100.64.0.0/10.

Common misuses and false confidence

Trusting topology confidence without checking data freshness. Topology engines often present endpoint positions with high confidence even when the underlying FDB or ARP data is stale. Many devices learn FDB entries but do not age them aggressively (default CAM aging is often 300 seconds, but misconfigured or static entries persist indefinitely). An FDB entry may long outlast the endpoint’s actual presence on that port, yet the platform still reports “endpoint on port X” with high confidence. See stale FDB/MAC tables for the detailed failure mode.

Not retaining NAT translation logs. NAT translation logs are the only mechanism to recover endpoint identity from post-NAT flow records. If translation logs expire before an investigation begins, the identity is permanently lost. Many platforms default to short retention because translation logs are voluminous. The investigation window and the retention window must be aligned. If your security team’s average investigation starts 24 hours after the event, translation logs must be retained for at least that long.

Using post-NAT IPs in security alerts without correlation. A security alert that fires on a post-NAT source IP leads investigators to the NAT device, not the endpoint. Without translation log correlation built into the alerting pipeline, every NAT’d endpoint alert generates a dead-end investigation. The fix is to enrich flow records with translation log data at ingestion time, so that alerts carry both the post-NAT address (for the flow) and the pre-NAT identity (for the investigation).

Assuming ARP freshness when entries are stale. ARP entry timeout varies by platform: 4 hours on Cisco IOS, approximately 20 minutes on Linux depending on sysctl gc_stale_time and reachable_time. An ARP entry for the NAT device’s IP will be fresh because the NAT device is actively communicating, but any internal mapping the NAT device holds for its clients may have a different lifetime entirely. See ARP cache staleness for how stale ARP entries corrupt topology inference more broadly.

Relying on MAC-based tracking for wireless endpoints. MAC randomization on modern mobile operating systems means the MAC visible during probe and association is not the factory MAC. Asset inventory correlation by MAC fails silently. 802.1X or MAB with a dynamic whitelist is more reliable than static MAC whitelisting for BYOD environments where randomization is in effect.

Ignoring topology confidence for specific endpoint classes. Topology confidence is per-endpoint, not aggregate. An overall high-confidence score can mask low confidence for every endpoint behind a specific wireless controller or SD-WAN gateway. VM mobility events also temporarily drop confidence until convergence. Alert on confidence drops per endpoint class (wired, wireless, virtualized), not just in aggregate.

Signals to watch in production

SignalWhy it mattersWarning sign
Topology inference confidence scoreDrops for endpoints behind NAT because upstream FDB and ARP show only the NAT deviceSustained low confidence for a class of endpoints (wireless, branch, cloud)
Endpoint positioning orphan rateMeasures MACs the engine cannot resolve to a physical switch portOrphan rate above 2% of total discovered endpoints; above 10% in security zones is critical
Flow source IP concentrationMany flows from a single IP to many distinct destinations indicates a NAT egress pointSudden increase in flows sourced from one IP that is a known NAT gateway
NAT and session table utilizationIndicates the NAT boundary is active and how many translations existUtilization above 70% sustained; above 90% means new connections may be denied
FDB and ARP freshnessStale entries mean the topology engine builds on outdated dataEntries older than 3 to 4 times the expected refresh interval
Topology view consistency (CDP/LLDP vs FDB vs ARP)Disagreement between sources drops positioning confidencePersistent inconsistency lasting more than 24 hours on critical infrastructure
DHCP snooping binding table coverageProvides IP-MAC-port bindings independent of upstream NATBinding table gaps for segments known to have active endpoints

Query the topology engine directly to check per-endpoint confidence rather than relying on aggregate dashboards.

# Check topology engine confidence for specific endpoints
# <!-- TODO: verify API endpoint name and path for your platform -->
curl -s http://localhost:<port>/api/topology/confidence | jq '.'

# Check NAT/session table utilization on firewalls
ssh <fw> 'show session info'

# Monitor endpoint positioning orphan rate from topology engine metrics
curl -s http://localhost:<port>/metrics | grep -E 'orphan|unresolved'

How Netdata helps

Netdata collects the underlying signals that reveal NAT boundary opacity:

  • SNMP polling of FDB tables, ARP caches, and device metrics surfaces stale entries and missing endpoints that indicate NAT is hiding hosts from upstream topology.
  • Flow and sFlow collection via the nfacctd plugin captures flow records that, when correlated against NAT translation logs, enable identity recovery during investigations.
  • Firewall and load balancer metrics (session counts, NAT pool utilization) from SNMP or agent-based collectors warn before session exhaustion denies new connections.
  • Syslog ingestion captures NAT translation events and wireless controller association logs that can be correlated with flow records for identity recovery.
  • Per-endpoint metric dimensions allow you to build alerts on confidence drops or orphan rate increases for specific endpoint classes (wireless, branch, cloud) rather than aggregate scores that mask localized problems.