Asymmetric routing: why your path and latency measurements lie

Your monitoring says the path is fine. Ping latency is normal, traceroute shows a clean route, and interface counters look healthy. But applications are slow, TCP sessions stall or reset, and users are complaining. Your tools are measuring only half the path.

In asymmetric routing, traffic from host A to host B takes one path (P1) while return traffic from B to A takes a different path (P2). When P2 is degraded, congested, or broken, your measurements average the healthy forward path with the impaired return path. Every acknowledgment and response is fighting through a bad route while the aggregate looks acceptable.

Standard traceroute makes this worse. It reports round-trip time (forward plus return combined) and traces only the forward path hop by hop. A latency spike on the return leg is invisible. Classic traceroute also produces phantom routes under equal-cost multipath (ECMP): each probe varies the source port, so ECMP routers hash different probes onto different physical paths. The resulting hop list is synthetic. No single packet traversed that route.

What this means

Asymmetric routing is not itself a failure. Many production networks route asymmetrically by design: BGP policy differences across peers, per-flow load balancing across unequal links, policy-based routing for traffic engineering, and cloud provider peering arrangements all create paths where forward and return traffic diverge. The problem arises when one direction degrades and your monitoring cannot see it.

The failure pattern is specific: forward-path probe latency reads healthy, reverse-path probe latency is elevated or shows loss, ICMP RTT shows high variance, and application-layer RTT from flow data shows high p99 with low p50. Traceroute from A to B and from B to A tells different stories. BGP route changes often appear around the same time.

flowchart LR
    A["Host A"] -->|"P1: forward path - healthy"| B["Host B"]
    B -.->|"P2: return path - degraded or lossy"| A

Measurements must be taken in both directions independently. A single-direction probe, or a round-trip measurement that conflates both directions, will hide the problem.

Common causes

CauseWhat it looks likeFirst thing to check
BGP policy asymmetryRoutes advertised differently in each direction; one peer preferred outbound, another inboundshow ip bgp summary on both ends; compare AS-path and next-hop
Per-flow load balancing (ECMP) with unequal pathsIntermittent loss or latency variance; some flows affected, others fineParis traceroute with constant 5-tuple to identify ECMP hashing
Route redistribution asymmetryDifferent routing protocols redistributing routes differently on each deviceCompare route tables on both endpoints: show ip route <prefix>
Policy-based routing (PBR)Forward traffic matches one PBR rule, return matches anotherCheck PBR policy maps on both devices
NAT in one directionReturn traffic sourced from a different IP; stateful firewalls may drop unsolicited return packetsCheck NAT translation logs; compare pre-NAT and post-NAT addresses
Linux rp_filter in strict modeReturn packets silently dropped by kernel because they arrive on an unexpected interfacesysctl net.ipv4.conf.all.rp_filter (1 = strict, 2 = loose)
Stateful firewall on asymmetric pathTCP SYN crosses firewall A; SYN-ACK returns via a different path; firewall never sees completing ACK; session times outCheck firewall session tables for half-open connections

Quick checks

Run these from both endpoints where possible. All are read-only and non-disruptive.

# Traceroute in both directions - compare hop counts, paths, and latency
traceroute -n <target>
# Then run from the target back to your source

# mtr for sustained path monitoring (shows per-hop loss)
mtr -n -c 100 <target>

# Paris traceroute: holds the 5-tuple constant to defeat ECMP hash variation
paris-traceroute -n <target>

# Check Linux reverse-path filter setting (1=strict, 2=loose, 0=disabled)
sysctl net.ipv4.conf.all.rp_filter
sysctl net.ipv4.conf.default.rp_filter
# Also check per-interface overrides
sysctl net.ipv4.conf.eth0.rp_filter

# Show which route the kernel uses for a specific destination
ip route get <target_ip>

# Check per-direction interface utilization via SNMP
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.6   # ifHCInOctets
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.10  # ifHCOutOctets

# Compare BGP route tables from both perspectives
ssh <router> 'show ip bgp summary'
ssh <router> 'show ip route <prefix>'

How to diagnose it

  1. Run traceroute in both directions. This is the single most important step. If the hop lists differ, routing is asymmetric. Compare not just the hop count but the intermediate routers and the latency at each hop. A 5ms forward path paired with an 80ms return path is the signature of asymmetric degradation.

  2. Use Paris traceroute to rule out ECMP artifacts. Classic traceroute varies the source port per probe, causing ECMP routers to hash each probe onto a potentially different physical path. The stitched hop list is synthetic. Paris traceroute holds the full 5-tuple (source IP, destination IP, source port, destination port, protocol) constant so every probe hashes to the same ECMP path.

  3. Check for high RTT variance. High p99 with low p50 RTT on ICMP or application-layer probes means some packets are taking a longer path. This is the signature of partial asymmetry where some flows follow one route and others follow another. A jitter value greater than 0.3 times the mean RTT is a queueing or path-divergence indicator.

  4. Compare forward and reverse flow data. If you collect flow data (NetFlow, IPFIX, sFlow) at both endpoints, compare byte and packet counts for the same 5-tuple in both directions. Asymmetric routing is normal in many networks. What matters is consistency over time. A sudden step change in the forward/reverse ratio indicates a routing change or a failed link.

  5. Check rp_filter on Linux hosts. Strict mode (value 1) drops packets whose source address is reachable only via a different interface than the one the packet arrived on. In asymmetric routing, legitimate return packets may arrive on the “wrong” interface from the kernel’s perspective. Loose mode (value 2) drops packets only when no route to the source exists at all. Mode 2 is required when asymmetric routing is present. The effective per-interface value is max(all, interface_setting), so setting all=2 overrides any individual interface still set to 1.

  6. Examine BGP route tables from both ends. Compare what each router sees as the best path to the other. Policy differences, AS-path prepending, or community-tag-based local preference can cause each side to prefer different upstreams. Look for BGP route changes around the time the symptoms started.

  7. Check stateful firewall session tables. If a stateful firewall sits on one direction of the path, it expects to see the full TCP handshake (SYN, SYN-ACK, ACK). When the SYN-ACK returns via a different path, the firewall never sees the completing ACK. After the embryonic connection timeout (typically 30-60 seconds depending on vendor), it purges the half-open session. Subsequent packets from the client are silently dropped. Look for half-open connections or a high rate of session table purges.

  8. Verify interface utilization per direction. Asymmetric saturation (one direction at capacity, the other idle) is a symptom, not a cause. If you see 95% utilization inbound on one interface and 5% outbound on the same interface, while the return path uses a different interface, the problem is capacity on the saturated direction.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Forward-path probe latency (IPSLA, TWAMP, HTTP)Measures one direction independentlyHealthy while applications are failing
Reverse-path probe latencyMeasures the return direction independentlyElevated or showing loss while forward path is clean
ICMP RTT variance (jitter)High variance indicates some packets taking different pathsp99/p50 ratio greater than 3
Application-layer RTT from flowsReal user experience, including retransmitsHigh p99 with low p50
Per-direction interface utilizationReveals asymmetric saturationOne direction near 100%, other idle
Forward/reverse flow byte ratioDetects routing changesSudden step change from baseline
BGP prefix count per peerRoute changes cause path shiftsSudden increase or decrease around symptom onset
Linux rp_filter settingStrict mode silently drops asymmetric return packetsValue = 1 when asymmetric routing is expected
Stateful firewall session purge rateHalf-open sessions indicate asymmetric handshakesHigh purge rate for incomplete connections

Fixes

Fix Linux rp_filter blocking return traffic

If the return path arrives on a different interface than the kernel expects, strict reverse-path filtering (mode 1) will silently drop those packets. Set loose mode (2), which validates only that a route to the source exists, not that it matches the arriving interface.

# Set to loose mode (takes effect immediately)
sysctl -w net.ipv4.conf.all.rp_filter=2
sysctl -w net.ipv4.conf.default.rp_filter=2

# Persist across reboots
echo "net.ipv4.conf.all.rp_filter=2" >> /etc/sysctl.d/99-asymmetric-routing.conf
echo "net.ipv4.conf.default.rp_filter=2" >> /etc/sysctl.d/99-asymmetric-routing.conf
sysctl --system

Warning: switching from strict to loose mode reduces protection against spoofed source addresses. Apply only where asymmetric routing is known to occur, and ensure perimeter filtering handles spoofing at the network edge.

Some modern distributions default to loose mode (2), but enterprise distributions and hardened baselines often set strict mode (1). If you are running a hardened baseline, this is a common source of mysterious packet loss.

Fix BGP policy asymmetry

If the asymmetry is unintentional, the fix is routing policy correction: align BGP local preference, AS-path prepending, or MED values so that both endpoints prefer the same path for both directions. If the asymmetry is intentional (traffic engineering, cost optimization), the fix is monitoring, not routing. Ensure your probes measure each direction independently.

Fix stateful firewall drops

Stateful firewalls require seeing both directions of a TCP connection. When the handshake is split across paths, the firewall sees only the SYN and never the SYN-ACK. Vendor-specific mechanisms exist to handle this. pfSense offers a “sloppy” state type that does not enforce handshake sequencing.

Alternatively, restructure routing so that both directions of a flow traverse the same firewall. In cloud environments, this is particularly important: AWS Network Firewall expects symmetric flow state, and Azure ExpressRoute takes priority over coexisting Site-to-Site VPN connections, which can silently black-hole return traffic.

Fix monitoring blind spots

The most important fix is often not a routing change but a monitoring change. If your probes measure only round-trip latency, they will always hide reverse-path degradation. Deploy active probes in both directions: IPSLA from A to B and from B to A, or TWAMP sessions that measure one-way delay. Track forward and reverse flow data separately so you can detect when the ratio changes.

Prevention

  • Measure both directions independently. Round-trip probes are necessary but not sufficient. Forward and reverse probes together reveal asymmetric degradation that aggregate measurements hide.
  • Baseline the forward/reverse flow ratio. Asymmetric routing is normal in many networks. A sudden step change in the ratio is the event that indicates a routing problem.
  • Verify rp_filter settings after provisioning new hosts. Strict mode is the default in some hardened baselines and will silently break asymmetric routing. Check it as part of your host bring-up checklist.
  • Document intentional asymmetry. If traffic engineering intentionally routes return traffic differently, ensure the operations team knows and monitoring measures both paths. Undocumented intentional asymmetry looks identical to a routing failure during an incident.
  • Use Paris traceroute in runbooks. Classic traceroute produces misleading results under ECMP. Standardize on Paris traceroute (or equivalent constant-5-tuple probing) for path diagnosis.

How Netdata helps

  • Per-direction interface utilization collected via the SNMP plugin lets you see asymmetric saturation (one direction at 100%, the other idle) without manual walks.
  • ICMP RTT probes with per-sample granularity reveal high variance that aggregate metrics miss. Correlate RTT p99 spikes with BGP route changes to confirm path divergence.
  • Flow data collection from NetFlow, IPFIX, and sFlow can be correlated across collection points to detect sudden changes in the forward/reverse byte ratio.
  • BGP monitoring tracks prefix counts, session state, and route changes, so you can correlate routing events with path degradation timestamps.
  • Custom alerting on RTT p99/p50 ratio detects partial asymmetry where some flows take a different path, surfacing the problem before users complain.