Flow export-to-ingest latency: why your NetFlow data is minutes behind
Flow export-to-ingest latency accumulates across a pipeline: the exporter’s active timeout, the UDP transport path, the kernel socket buffer, the collector’s parser, and the storage write queue. Each stage can add seconds or minutes, and each has a different fix.
The most common cause is the active timeout default on most network devices: 30 minutes. Long-lived flows (VPN tunnels, database connections, bulk transfers) are not exported until the timer expires. The collector is not slow and the network is not congested. The device is behaving as configured. But if you need near-real-time visibility, a 30-minute export delay is indistinguishable from broken telemetry.
What this means
Flow export-to-ingest latency is the time between when a flow record leaves the exporter and when the collector ingests it. On NetFlow v9, IPFIX, and sFlow, you can compute this directly from timestamps embedded in the records. On NetFlow v5, there is no exporter-side timestamp, so you must infer latency by comparing flow end times against wall clock time or SNMP-derived traffic patterns.
Sustained latency above 30 seconds means something in the pipeline is bottlenecked. Above 5 minutes, the data is too stale for real-time detection use cases such as DDoS visibility, anomaly detection, or operational troubleshooting. The causes fall into two categories: configuration-driven latency (active timeout too long, template refresh misaligned) and resource-driven latency (collector backlog, UDP buffer drops, parser saturation).
flowchart LR
A["Device flow cache
active timeout: 30 min default"] --> B["Export buffer"]
B -->|"UDP datagram"| C["Kernel socket buffer
rmem_max varies by distro"]
C -->|"drain"| D["Parser / aggregator"]
D --> E["TSDB write queue"]
E --> F["Stored and queryable"]
B -.->|"overflow = silent drops"| G["cnfESPktsDropped"]
C -.->|"overflow = silent drops"| H["Udp_RcvbufErrors"]
D -.->|"backlog = growing latency"| I["Queue depth rising"]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Active timeout too long (30 min default) | Long-lived flows arrive in batches every 30 minutes; short flows arrive fine | Check the exporter’s active timeout configuration |
| Collector UDP buffer overflow | Traffic charts show declining volume while SNMP interface counters show traffic is normal or rising | cat /proc/net/snmp | grep '^Udp:' for RcvbufErrors |
| Collector parser or write backlog | Flow latency grows steadily; queue depth metric rising | Check collector intake and write queue depth |
| Device-side export buffer drops | Device exports fewer records than expected; gap between device and collector counts | Cisco: snmpget .1.3.6.1.4.1.9.9.387.1.4.6 (cnfESPktsDropped) |
| Template desync (NetFlow v9 / IPFIX) | Datagrams arriving but decoded records are zero or anomalously low | Check collector logs for “template not found” or cache miss |
| Clock skew on exporter | Latency values are impossible (negative or wildly inconsistent) | Check NTP offset on the exporter |
Quick checks
# Kernel UDP socket buffer drops on the collector
cat /proc/net/snmp | grep '^Udp:'
# RcvbufErrors column: any nonzero value is silent data loss
# Current socket buffer fill for the flow listener
ss -lun '( sport = :2055 )' -m
# NetFlow v5/v9 uses port 2055; IPFIX uses 4739; sFlow uses 6343
# Current rmem_max setting
sysctl net.core.rmem_max
# Inspect flow record timestamps for latency (nfdump)
nfdump -R /var/nfdump/ -o fmt:'%ts %te %fl' | head -50
# Compare flow end time against current wall clock
# Device-side export drops (Cisco IOS-XE)
snmpget -v2c -c <community> <device> .1.3.6.1.4.1.9.9.387.1.4.6.0
# cnfESPktsDropped: should be near zero relative to exported count
<!-- TODO: verify these Cisco OIDs. They vary by platform, IOS version, and MIB support. Test against your specific device. -->
# Device-side total exported (Cisco IOS-XE)
snmpget -v2c -c <community> <device> .1.3.6.1.4.1.9.9.387.1.4.4.0
# cnfESPktsExported: compare rate against collector inbound rate
# Collector CPU per core (look for single-core saturation from RSS)
mpstat -P ALL 1 5
# Watch %soft (softirq) and %sys columns
# Collector write queue depth (vendor-specific stats endpoint)
curl -s http://localhost:<stats-port>/metrics | grep -E 'write|queue'
# NTP offset on the exporter (if SNMP accessible)
snmpget -v2c -c <community> <device> .1.3.6.1.2.1.25.1.2.0
# Returns device-local time (hrSystemDate); compare against collector time
How to diagnose it
Determine whether the latency is configuration-driven or resource-driven. If all flows from a single exporter are uniformly delayed (long-lived flows arriving at 30-minute intervals), the active timeout is the cause. If latency varies and correlates with traffic volume, the bottleneck is in the collector pipeline.
Check the active timeout on the exporter. On Cisco IOS-XE, inspect the flow monitor configuration. The default is typically 30 minutes. If you need near-real-time visibility, set it to 60 seconds. The tradeoff is more export packets and higher collector load. Juniper and other vendors have equivalent settings under flow sampling or monitoring configuration.
Check for UDP socket buffer drops. Run
cat /proc/net/snmp | grep '^Udp:'and look at the RcvbufErrors column. Any nonzero increment means the kernel received datagrams but could not deliver them to the application because the socket buffer was full. This is silent data loss, not just latency. The dropped records never arrive.Compare device-side export counts against collector inbound counts. Poll
cnfESPktsExportedon the device and compare the rate against what the collector reports receiving. A gap means records are being lost in transit through the UDP buffer, NIC ring, or network path.Check collector parser and write queue depth. If the collector exposes a queue depth metric, watch it over time. A growing queue means the parser or storage layer cannot keep up with the incoming rate. The backlog adds latency to every record in the queue.
Verify template state if using NetFlow v9 or IPFIX. After a device reboot or configuration change, templates must be re-received before data records can be decoded. Check collector logs for template-related messages. The blind window can be 5 to 30 minutes depending on the template refresh interval.
Check for clock skew. If the exporter’s clock is off, the computed latency will be wrong. Verify NTP synchronization on the exporter. Even 200ms of drift can break cross-device correlation in postmortem analysis.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Flow export-to-ingest latency | Direct measure of pipeline delay | Sustained above 30 sec; above 5 min means data is stale |
| UDP socket buffer drops (Udp_RcvbufErrors) | Silent data loss at the kernel layer | Any nonzero increment |
| Flow packets received rate | Confirms exporter is sending and collector is receiving | Sudden drop from one exporter means exporter or path issue |
| Flow exporter drops (cnfESPktsDropped) | Device-side overflow before records leave the device | Rising rate during traffic bursts |
| Collector inbound vs device exported rate | End-to-end loss detection | Gap above 0.1% of exported rate |
| Collector CPU per core | Single-core saturation from RSS funneling | One core at 100% while others are idle |
| TSDB write queue depth | Storage-layer backlog | Growing queue without bound |
| NTP offset on exporter | Corrupts latency computation and cross-device correlation | Offset above 100ms |
| Template cache state (v9 / IPFIX) | Records discarded if template is missing | Collector logs showing template cache misses |
Fixes
Active timeout tuning
The active timeout controls how often long-lived flows are exported. The default of 30 minutes means a persistent flow is invisible until either it ends or the timer expires. For real-time use cases, set it to 60 seconds.
The tradeoff is increased export volume. Shorter timeouts generate more flow records and more UDP datagrams. A 60-second active timeout on a busy exporter can increase export packet rate by an order of magnitude compared to 30 minutes. Ensure your collector can handle the additional load before making this change globally.
Cisco ASA note: The flow-export active refresh-interval command was removed in ASA versions 8.5(1) through 9.1(1), causing persistent tunnels to export only at teardown. If you are running an affected version, upgrading to 9.1(2) or later restores the capability.
UDP buffer sizing
If Udp_RcvbufErrors is incrementing, the kernel socket buffer is overflowing. The fix has two parts:
Raise the system maximum. Run
sysctl -w net.core.rmem_max=16777216to set it to 16 MB. For very high-volume collectors, use 33 MB. This is a runtime change; persist it in/etc/sysctl.d/for reboot survival.Ensure the collector application explicitly sets
SO_RCVBUFon its listener socket. Without this, the socket usesrmem_default(typically around 212 KB on most Linux distributions), which is far too small for flow collection regardless ofrmem_max.
Also check RSS (Receive Side Scaling) configuration. If all flow traffic is funneled to a single CPU core, that core becomes the bottleneck even though the system has spare capacity. Verify IRQ distribution with cat /proc/interrupts | grep eth0.
Collector parser backlog
If the write queue is growing, the parser or storage layer is the bottleneck. Options:
- Increase parser thread count if the collector supports it.
- Check whether the parser is doing expensive work per record (regex matching, enrichment lookups) that can be deferred or batched.
- Verify disk IOPS are not saturated. Flow record storage is disk-intensive.
- Separate flow storage from syslog storage on different volumes.
Template refresh alignment
For NetFlow v9 and IPFIX, templates must be received before data records can be decoded. After a device reboot or config change, the collector is blind until the next template arrives. Set the template refresh interval shorter than the active timeout. A misaligned configuration where template refresh is longer than active timeout causes template expiry before data records arrive, leaving the collector unable to decode flows.
RFC 7011 Section 8.4 requires that exporting processes using UDP periodically retransmit active templates. Ensure your exporter’s template refresh rate is configured and that the collector’s template cache lifetime is at least three times the retransmission interval.
Prevention
- Set active timeout to 60 seconds on exporters where real-time visibility matters. Document the increased export load and size collectors accordingly.
- Monitor
Udp_RcvbufErrorscontinuously. This is the single most missed signal in flow collection. Any nonzero value represents silent data loss. Setrmem_maxto 16 MB or higher and confirm the collector setsSO_RCVBUF. - Compare device-side export counts against collector inbound counts. This is the only reliable end-to-end loss detection method. A sustained gap means records are lost somewhere in the path.
- Verify NTP on exporters. Clock skew corrupts latency computation and breaks cross-device correlation.
- Test template cache persistence across collector restarts. Some collectors lose in-memory template state on restart, creating a 5 to 30 minute blind window.
How Netdata helps
- UDP buffer drop detection. Netdata monitors
Udp_RcvbufErrorsfrom/proc/net/snmpwith per-second resolution, catching socket buffer overflow before it becomes a data gap. - Per-core CPU breakdown. Netdata’s CPU collector shows softirq and system time per core, making RSS funneling visible without manual
mpstatsessions. - Collector process metrics. Netdata monitors the collector’s own CPU, memory, and disk I/O, correlating collector-side bottlenecks with flow data latency.
- NIC-level drop counters. Netdata tracks
/proc/net/devRX drops andethtoolcounters such asrx_missed_errors, distinguishing NIC-level drops from socket-buffer drops. - Cross-signal correlation. When flow data latency spikes, Netdata’s unified timeline lets you correlate it with CPU saturation, disk I/O wait, and UDP buffer errors in a single view.







