sFlow sampling rate: why your traffic totals are off by 1000x

Your sFlow-derived bandwidth charts show a 10G link carrying 12 Mbps. SNMP counters on the same interface show 8.4 Gbps. The switch is not broken and the collector is not dropping packets. The analytics pipeline is summing raw sampled bytes without multiplying by the sampling rate.

sFlow is not NetFlow. It does not maintain a flow cache on the device, aggregate bytes per conversation, and export summary totals. sFlow exports individual packet samples, one per datagram, each carrying the packet’s header data and metadata about the sampling process. The collector is responsible for turning those samples into traffic estimates through multiplication. When that multiplication is missing, every chart, alert, capacity plan, and billing report built on the data is wrong by the sampling factor.

On a typical 1:1000 sampling configuration, the numbers are off by 1000x. A link carrying 8 Gbps appears to carry 8 Mbps. A DDoS attack at 5 Gbps looks like ordinary background traffic. Capacity decisions get made on data that is three orders of magnitude below reality.

What this means

sFlow is a sample-datagram protocol, not a template-flow protocol. The distinction determines where the math happens.

In NetFlow v5, v9, or IPFIX, the device maintains a flow cache. It aggregates bytes and packets per 5-tuple conversation. When a flow expires or the cache fills, the device exports an accumulated record with the total byte and packet counts. The collector receives numbers that already represent observed traffic volume.

sFlow works differently. The device samples packets at a configured rate (1 in N), wraps each sampled packet’s header and metadata into a UDP datagram, and sends it to the collector. There is no flow cache on the agent. Each datagram contains:

  • frame_length: the original packet length on the wire. Whether this includes the FCS depends on the implementation and platform.
  • sampling_rate: the integer N used for this sample (a value of 1000 means 1 in 1000 packets was sampled)
  • Additional metadata: interface index, source and destination information, header protocol data

The collector receives raw samples. To estimate actual traffic volume, it must multiply: estimated_bytes = SUM(frame_length) * sampling_rate.

Without this multiplication, you are reporting the volume of sampled packets as if it were the total. On a 1:1000 link, you report 0.1% of reality.

flowchart TD
    A["Device samples
1 in N packets"] --> B["sFlow datagram
frame_length + sampling_rate"] B --> C["UDP to collector
port 6343"] C --> D["Collector receives
raw samples"] D --> E{"Multiply by
sampling_rate?"} E -->|Yes| F["Correct traffic
estimate"] E -->|No| G["Off by factor N
often 1000x"]

A critical subtlety: in sFlow v5, sampling_rate is embedded per-sample. It can change between samples from the same device. Some platforms adjust the rate dynamically based on interface speed or traffic load. Collectors that hardcode a single rate will produce wrong totals whenever the device adjusts.

Common causes

CauseWhat it looks likeFirst thing to check
Normalization skipped entirelyAll traffic uniformly low by the same factor across all exportersWhether the collector or analytics layer applies sampling_rate multiplication
Wrong base metric usedConsistent undercount, roughly 14 bytes per packet below real trafficVerify the pipeline uses frame_length, not IP payload length
Variable rate not handled per-sampleSome exporters correct, others off by varying factors over timeInspect whether sampling_rate changes between samples from the same source
Hardware sample dropsOff by a variable factor despite correct nominal rateCheck the drops field in flow_sample structures and compare sample_pool against received count
UDP transport lossIntermittent undercounting, worse during traffic burstsCheck Udp_RcvbufErrors and compare collector inbound rate against device export counters

Quick checks

These commands are read-only. None modify device or collector state.

# Verify sFlow datagrams are arriving at the collector
tcpdump -i eth0 -nn 'udp port 6343' -c 1000

# Check for UDP socket buffer drops (silent sample loss)
cat /proc/net/snmp | grep '^Udp:'

# Check current UDP receive buffer sizing on the sFlow listener
ss -lun '( sport = :6343 )' -m

# Compare SNMP-derived interface utilization against sFlow-derived values
# ifHCInOctets gives ground truth for bytes on the wire
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.6

# Check device-side flow export statistics (Cisco)
# TODO: verify the correct OID subtree for Cisco sFlow export stats, as this may require a table index
snmpwalk -v2c -c <community> <device> .1.3.6.1.4.1.9.9.387.1.4

How to diagnose it

  1. Cross-validate against SNMP counters. Poll ifHCInOctets and ifHCOutOctets on the same interface the sFlow data covers. Compute utilization from SNMP. If SNMP shows 8 Gbps and sFlow analytics show 8 Mbps, the ratio tells you the sampling factor immediately.

  2. Calculate the expected factor. Divide SNMP-derived bytes by sFlow-derived bytes for the same time window and interface. If the ratio is close to the configured sampling rate (1000, 4096, 8192, etc.), normalization is missing. If the ratio is close but not exact, investigate hardware sample drops or UDP loss.

  3. Inspect raw sFlow samples. Decode a stream of sFlow datagrams and check two fields per sample: frame_length and sampling_rate. Confirm that sampling_rate matches what you expect. Confirm that frame_length reflects full wire-layer length, not just IP payload.

  4. Check for per-sample rate variation. If the device uses adaptive sampling, different samples may carry different sampling_rate values. A collector that applies a single hardcoded rate will miscalculate for any sample whose actual rate differs.

  5. Verify the normalization formula with a worked example. 50 sFlow samples of 1500-byte packets at a 1:64 sampling rate over 60 seconds should yield 50 * 1500 * 64 * 8 / 60 = 640,000 bps. If your analytics report 50 * 1500 * 8 / 60 = 10,000 bps, the sampling rate multiplication is missing.

  6. Check for hardware sample loss. Compare the sample_pool field (total packets eligible for sampling) against the number of samples actually received. A large gap means the device is generating fewer samples than its nominal rate implies, often due to hardware rate limiting. The drops field in flow_sample tracks packets the sampling mechanism could not process.

  7. Check for UDP transport loss. Compare the device’s export counter against the collector’s receive counter. If the device exported 50,000 datagrams and the collector received 35,000, the 15,000 gap is silent loss that compounds the sampling error. Check Udp_RcvbufErrors in /proc/net/snmp for kernel-level confirmation.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Flow sampling rate consistencyDetects when the rate reported by the exporter differs from what the collector appliesAny mismatch between reported and applied rate
sFlow-derived vs SNMP-derived utilizationGround truth comparison; SNMP counters reflect actual wire trafficsFlow total below 50% of SNMP total for the same interface and window
UDP socket buffer drops (Udp_RcvbufErrors)Silent sample loss at the kernel level; compounds sampling errorAny nonzero increment on a flow collector
Device-side flow exporter dropsPackets generated but not exported by the deviceDrop rate exceeding 0.1% of total exported
Per-exporter sampling rate over timeDetects adaptive rate changes that collectors must handleRate changing without a corresponding change ticket

Fixes

Apply normalization in the ingestion pipeline

The fix depends on where the gap is. If the collector library does not apply sampling rate multiplication, add it as a post-ingestion step: for each sample, multiply frame_length by sampling_rate before aggregating. For a constant rate, the formula is estimated_bytes = SUM(frame_length) * sampling_rate. For per-sample rates, use estimated_bytes = SUM(frame_length * sampling_rate).

Use frame_length, not IP payload length

Some pipelines use the Layer 3 total length field from the decoded packet header as the base for multiplication. This omits Layer 2 overhead: Ethernet header (14 bytes), VLAN tags (4 bytes each), and FCS (4 bytes). The sFlow v5 spec defines frame_length as the original packet length on the wire. Use that field as the base for all byte calculations.

Handle variable sampling rates per-sample

Do not assume a single sampling rate per exporter. Read the sampling_rate field from each flow_sample structure and apply it individually. If your collector or analytics platform only supports a global rate, it will produce wrong totals whenever a device adjusts its sampling rate. This is particularly important on platforms that use adaptive sampling keyed to interface speed.

Tune UDP buffers if loss is compounding

If the normalization is correct but totals are still low, UDP loss may be dropping samples before they reach the analytics layer. The Linux default net.core.rmem_max is often under 1 MB on many distributions and is inadequate for high-pps sFlow collectors.

Production deployments should target 16 MB or higher, with explicit SO_RCVBUF set on the listener socket.

# WARNING: These commands change system state and affect ALL UDP sockets on the host.
# Verify no other UDP consumers will be impacted before applying.
# These changes are non-persistent and will be lost on reboot without sysctl.conf or systemd-sysctl entries.
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.rmem_default=16777216

Prevention

Always cross-validate sFlow against SNMP. Build a periodic check that compares sFlow-derived throughput against ifHCInOctets for critical interfaces. If the ratio drifts from the expected sampling factor, something has changed: the sampling rate, the collector normalization, or the UDP transport.

Track sampling rate changes over time. Log the sampling_rate from each exporter and alert on changes. If a device silently switches from 1:1000 to 1:4096, every downstream chart will shift by a factor of 4 without any error message.

Monitor hardware sample drops. The drops field in flow_sample and the ratio of sample_pool to received samples reveal when the device cannot keep up with its configured rate. This produces a systematic undercount that multiplication alone cannot fix.

Validate vendor defaults. Default sampling rates vary by platform and vendor, with common values ranging from 1:1000 to 1:16384 or higher. A device inherited from another team may have a sampling rate that differs from what your analytics assume. Always verify the configured rate on the device, not just in the collector.

How Netdata helps

Netdata collects both SNMP interface metrics (ifHCInOctets, ifHCOutOctets) and system-level UDP statistics (Udp_RcvbufErrors) alongside flow data, enabling direct cross-validation between sFlow-derived and SNMP-derived throughput on the same dashboard. Per-core CPU metrics and softirq percentages help identify RSS misconfiguration that funnels sFlow processing to a single core, causing buffer drops that silently reduce sample counts. UDP socket buffer drop trends correlate with flow data quality degradation, providing an early warning before analytics drift becomes visible in charts.