sFlow sampling rate: why your traffic totals are off by 1000x
Your sFlow-derived bandwidth charts show a 10G link carrying 12 Mbps. SNMP counters on the same interface show 8.4 Gbps. The switch is not broken and the collector is not dropping packets. The analytics pipeline is summing raw sampled bytes without multiplying by the sampling rate.
sFlow is not NetFlow. It does not maintain a flow cache on the device, aggregate bytes per conversation, and export summary totals. sFlow exports individual packet samples, one per datagram, each carrying the packet’s header data and metadata about the sampling process. The collector is responsible for turning those samples into traffic estimates through multiplication. When that multiplication is missing, every chart, alert, capacity plan, and billing report built on the data is wrong by the sampling factor.
On a typical 1:1000 sampling configuration, the numbers are off by 1000x. A link carrying 8 Gbps appears to carry 8 Mbps. A DDoS attack at 5 Gbps looks like ordinary background traffic. Capacity decisions get made on data that is three orders of magnitude below reality.
What this means
sFlow is a sample-datagram protocol, not a template-flow protocol. The distinction determines where the math happens.
In NetFlow v5, v9, or IPFIX, the device maintains a flow cache. It aggregates bytes and packets per 5-tuple conversation. When a flow expires or the cache fills, the device exports an accumulated record with the total byte and packet counts. The collector receives numbers that already represent observed traffic volume.
sFlow works differently. The device samples packets at a configured rate (1 in N), wraps each sampled packet’s header and metadata into a UDP datagram, and sends it to the collector. There is no flow cache on the agent. Each datagram contains:
frame_length: the original packet length on the wire. Whether this includes the FCS depends on the implementation and platform.
sampling_rate: the integer N used for this sample (a value of 1000 means 1 in 1000 packets was sampled)- Additional metadata: interface index, source and destination information, header protocol data
The collector receives raw samples. To estimate actual traffic volume, it must multiply: estimated_bytes = SUM(frame_length) * sampling_rate.
Without this multiplication, you are reporting the volume of sampled packets as if it were the total. On a 1:1000 link, you report 0.1% of reality.
flowchart TD
A["Device samples
1 in N packets"] --> B["sFlow datagram
frame_length + sampling_rate"]
B --> C["UDP to collector
port 6343"]
C --> D["Collector receives
raw samples"]
D --> E{"Multiply by
sampling_rate?"}
E -->|Yes| F["Correct traffic
estimate"]
E -->|No| G["Off by factor N
often 1000x"]A critical subtlety: in sFlow v5, sampling_rate is embedded per-sample. It can change between samples from the same device. Some platforms adjust the rate dynamically based on interface speed or traffic load. Collectors that hardcode a single rate will produce wrong totals whenever the device adjusts.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Normalization skipped entirely | All traffic uniformly low by the same factor across all exporters | Whether the collector or analytics layer applies sampling_rate multiplication |
| Wrong base metric used | Consistent undercount, roughly 14 bytes per packet below real traffic | Verify the pipeline uses frame_length, not IP payload length |
| Variable rate not handled per-sample | Some exporters correct, others off by varying factors over time | Inspect whether sampling_rate changes between samples from the same source |
| Hardware sample drops | Off by a variable factor despite correct nominal rate | Check the drops field in flow_sample structures and compare sample_pool against received count |
| UDP transport loss | Intermittent undercounting, worse during traffic bursts | Check Udp_RcvbufErrors and compare collector inbound rate against device export counters |
Quick checks
These commands are read-only. None modify device or collector state.
# Verify sFlow datagrams are arriving at the collector
tcpdump -i eth0 -nn 'udp port 6343' -c 1000
# Check for UDP socket buffer drops (silent sample loss)
cat /proc/net/snmp | grep '^Udp:'
# Check current UDP receive buffer sizing on the sFlow listener
ss -lun '( sport = :6343 )' -m
# Compare SNMP-derived interface utilization against sFlow-derived values
# ifHCInOctets gives ground truth for bytes on the wire
snmpwalk -v2c -c <community> <device> .1.3.6.1.2.1.31.1.1.1.6
# Check device-side flow export statistics (Cisco)
# TODO: verify the correct OID subtree for Cisco sFlow export stats, as this may require a table index
snmpwalk -v2c -c <community> <device> .1.3.6.1.4.1.9.9.387.1.4
How to diagnose it
Cross-validate against SNMP counters. Poll
ifHCInOctetsandifHCOutOctetson the same interface the sFlow data covers. Compute utilization from SNMP. If SNMP shows 8 Gbps and sFlow analytics show 8 Mbps, the ratio tells you the sampling factor immediately.Calculate the expected factor. Divide SNMP-derived bytes by sFlow-derived bytes for the same time window and interface. If the ratio is close to the configured sampling rate (1000, 4096, 8192, etc.), normalization is missing. If the ratio is close but not exact, investigate hardware sample drops or UDP loss.
Inspect raw sFlow samples. Decode a stream of sFlow datagrams and check two fields per sample:
frame_lengthandsampling_rate. Confirm thatsampling_ratematches what you expect. Confirm thatframe_lengthreflects full wire-layer length, not just IP payload.Check for per-sample rate variation. If the device uses adaptive sampling, different samples may carry different
sampling_ratevalues. A collector that applies a single hardcoded rate will miscalculate for any sample whose actual rate differs.Verify the normalization formula with a worked example. 50 sFlow samples of 1500-byte packets at a 1:64 sampling rate over 60 seconds should yield
50 * 1500 * 64 * 8 / 60 = 640,000 bps. If your analytics report50 * 1500 * 8 / 60 = 10,000 bps, the sampling rate multiplication is missing.Check for hardware sample loss. Compare the
sample_poolfield (total packets eligible for sampling) against the number of samples actually received. A large gap means the device is generating fewer samples than its nominal rate implies, often due to hardware rate limiting. Thedropsfield inflow_sampletracks packets the sampling mechanism could not process.Check for UDP transport loss. Compare the device’s export counter against the collector’s receive counter. If the device exported 50,000 datagrams and the collector received 35,000, the 15,000 gap is silent loss that compounds the sampling error. Check
Udp_RcvbufErrorsin/proc/net/snmpfor kernel-level confirmation.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Flow sampling rate consistency | Detects when the rate reported by the exporter differs from what the collector applies | Any mismatch between reported and applied rate |
| sFlow-derived vs SNMP-derived utilization | Ground truth comparison; SNMP counters reflect actual wire traffic | sFlow total below 50% of SNMP total for the same interface and window |
| UDP socket buffer drops (Udp_RcvbufErrors) | Silent sample loss at the kernel level; compounds sampling error | Any nonzero increment on a flow collector |
| Device-side flow exporter drops | Packets generated but not exported by the device | Drop rate exceeding 0.1% of total exported |
| Per-exporter sampling rate over time | Detects adaptive rate changes that collectors must handle | Rate changing without a corresponding change ticket |
Fixes
Apply normalization in the ingestion pipeline
The fix depends on where the gap is. If the collector library does not apply sampling rate multiplication, add it as a post-ingestion step: for each sample, multiply frame_length by sampling_rate before aggregating. For a constant rate, the formula is estimated_bytes = SUM(frame_length) * sampling_rate. For per-sample rates, use estimated_bytes = SUM(frame_length * sampling_rate).
Use frame_length, not IP payload length
Some pipelines use the Layer 3 total length field from the decoded packet header as the base for multiplication. This omits Layer 2 overhead: Ethernet header (14 bytes), VLAN tags (4 bytes each), and FCS (4 bytes). The sFlow v5 spec defines frame_length as the original packet length on the wire. Use that field as the base for all byte calculations.
Handle variable sampling rates per-sample
Do not assume a single sampling rate per exporter. Read the sampling_rate field from each flow_sample structure and apply it individually. If your collector or analytics platform only supports a global rate, it will produce wrong totals whenever a device adjusts its sampling rate. This is particularly important on platforms that use adaptive sampling keyed to interface speed.
Tune UDP buffers if loss is compounding
If the normalization is correct but totals are still low, UDP loss may be dropping samples before they reach the analytics layer. The Linux default net.core.rmem_max is often under 1 MB on many distributions and is inadequate for high-pps sFlow collectors.
Production deployments should target 16 MB or higher, with explicit SO_RCVBUF set on the listener socket.
# WARNING: These commands change system state and affect ALL UDP sockets on the host.
# Verify no other UDP consumers will be impacted before applying.
# These changes are non-persistent and will be lost on reboot without sysctl.conf or systemd-sysctl entries.
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.rmem_default=16777216
Prevention
Always cross-validate sFlow against SNMP. Build a periodic check that compares sFlow-derived throughput against ifHCInOctets for critical interfaces. If the ratio drifts from the expected sampling factor, something has changed: the sampling rate, the collector normalization, or the UDP transport.
Track sampling rate changes over time. Log the sampling_rate from each exporter and alert on changes. If a device silently switches from 1:1000 to 1:4096, every downstream chart will shift by a factor of 4 without any error message.
Monitor hardware sample drops. The drops field in flow_sample and the ratio of sample_pool to received samples reveal when the device cannot keep up with its configured rate. This produces a systematic undercount that multiplication alone cannot fix.
Validate vendor defaults. Default sampling rates vary by platform and vendor, with common values ranging from 1:1000 to 1:16384 or higher. A device inherited from another team may have a sampling rate that differs from what your analytics assume. Always verify the configured rate on the device, not just in the collector.
How Netdata helps
Netdata collects both SNMP interface metrics (ifHCInOctets, ifHCOutOctets) and system-level UDP statistics (Udp_RcvbufErrors) alongside flow data, enabling direct cross-validation between sFlow-derived and SNMP-derived throughput on the same dashboard. Per-core CPU metrics and softirq percentages help identify RSS misconfiguration that funnels sFlow processing to a single core, causing buffer drops that silently reduce sample counts. UDP socket buffer drop trends correlate with flow data quality degradation, providing an early warning before analytics drift becomes visible in charts.







