Syslog parser backpressure: when one chatty device stalls the pipeline
A single device floods your syslog collector. The parser thread pool saturates, queues fill, and UDP datagrams start dropping at the kernel socket buffer. Critical messages from other devices, including BGP NOTIFICATIONS and hardware alarms, are silently lost. The dashboard shows a normal or slightly elevated syslog rate because dropped packets never reach the application layer.
The collector process is still running. The network is fine. The failure is inside the ingestion pipeline, at the seam between the kernel socket buffer and the parser, where backpressure builds and has nowhere to go.
How backpressure develops
Backpressure cascades through three stages.
Stage one: a single source (a flapping interface, a misconfigured debug level, a compromised host, a device in a boot loop) generates syslog at a rate that exceeds the parser’s drain capacity. The parser thread pool becomes CPU-bound on regex matching, RFC 3164/5424 framing, or enrichment lookups.
Stage two: the collector’s internal queue fills. In rsyslog, if a per-action queue cannot drain in time, messages back up into the main message queue. Once the main queue reaches its high-water mark, the collector attempts to throttle delayable inputs (TCP, RELP, imfile). But UDP syslog is inherently non-delayable. There is no flow control in UDP.
Stage three: the kernel socket receive buffer overflows. Datagrams arriving when the buffer is full are silently dropped. The counter UdpRcvbufErrors (the RcvbufErrors column under Udp: in /proc/net/snmp) increments. The collector never sees these messages. No log entry records the loss. The highest-priority messages from other devices, which arrived during the burst window, are statistically the most likely to be dropped because they land in a buffer that is already full.
flowchart TD
A[Chatty device flood] --> B[Parser thread pool saturates]
B --> C[Internal queue fills to high-water mark]
C --> D{Delayable input?}
D -->|TCP/RELP| E[Sender throttled]
D -->|UDP 514| F[Cannot throttle]
F --> G[Kernel socket buffer overflows]
G --> H[UdpRcvbufErrors increments]
H --> I[Silent loss from ALL devices]Because the main queue and parser pool are shared resources, backpressure affects every source sending to that collector. A BGP NOTIFICATION from a core router, a hardware alarm from a firewall, an authentication failure from a switch: all can be lost if they arrive during the window when the buffer is full.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Link-flap or STP cascade | Burst of linkDown/linkUp syslog pairs, thousands per second | ifOperStatus history on the flapping interface; correlate with STP topology change count |
| Debug-level logging left on | Sustained high-volume DEBUG or INFO from one device, no severity escalation | Per-source syslog rate breakdown; check device running config for debug enable |
| Device boot loop | Repeating boot sequence messages from one device at regular intervals | sysUpTime for that device; coldStart trap rate |
| Compromised or scanning host | Spike in auth-failure or security syslog from one source IP | Source IP in syslog messages; correlate with SNMP auth failures |
| Collector-side parser bottleneck | CPU-bound parser, high %user or %soft on collector, all sources affected | mpstat per-core CPU; per-thread CPU via top -H |
| Undersized UDP socket buffer | UdpRcvbufErrors incrementing during bursts, receiver process not CPU-bound | sysctl net.core.rmem_max; ss -lun -m for current Recv-Q |
Quick checks
Run these on the syslog collector host.
# Primary silent-loss signal: datagrams dropped because the socket buffer was full
nstat -az UdpRcvbufErrors
# Raw UDP statistics from /proc (look at the RcvbufErrors column under Udp:)
cat /proc/net/snmp | grep '^Udp:'
# Syslog listener socket state and buffer fill level
ss -lun '( sport = :514 )' -m
# Per-core CPU saturation (RSS funneling shows as one core pinned)
mpstat -P ALL 1 5
# NIC ring buffer drops (pre-socket-buffer loss)
# Replace eth0 with your collector interface
ethtool -S eth0 | grep -iE 'drop|miss'
# Current kernel max receive buffer size
sysctl net.core.rmem_max net.core.rmem_default
# rsyslog queue stats (requires impstats module loaded and a destination configured)
# Adjust path to your impstats output file
grep -E 'queue|enqueued|full' /var/log/rsyslog-stats.log
# Per-source syslog volume (field position depends on your syslog format)
# $4 is typical for RFC 3164 traditional format; adjust for your layout
awk '{print $4}' /var/log/network-devices.log | sort | uniq -c | sort -rn | head -20
Systematic diagnosis
Confirm silent loss. Check
nstat -az UdpRcvbufErrors. Any nonzero increment means datagrams arrived at the kernel but were dropped because the socket buffer was full. This is definitive evidence the collector cannot keep up.Determine whether the bottleneck is the parser or the queue. Run
mpstat -P ALL 1 5. High %user indicates parser CPU. High %soft indicates kernel packet processing. A single core at 100% with others idle points to RSS funneling, not parser throughput.Identify the chatty source. Break down syslog volume by source device. The chatty source will dominate by orders of magnitude. Correlate with the device’s operational state: is an interface flapping? Is debug logging enabled? Did the device recently reboot?
Check the collector’s internal queue depth. In rsyslog with impstats, look for queue size approaching the configured maximum and for full or delay indicators. In syslog-ng, check the stats counters for output queue length on each destination.
Verify scope of impact. Compare the syslog receive rate from a known-quiet device during the burst versus outside it. If the quiet device’s messages are missing during the burst, the backpressure is global and the shared pipeline is stalled.
Rule out exporter-side loss. Check the device’s own logging counters to confirm it is sending what you expect. If the device reports a higher send rate than the collector receives, the gap is in transit or at the collector.
Metrics to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
UdpRcvbufErrors (/proc/net/snmp) | Only direct signal of UDP datagrams dropped at the socket buffer | Any nonzero increment; proportional to incoming rate means chronic undersizing |
| Syslog receive rate per source | Identifies which device is generating the burst | Single source exceeding 5x its rolling 1-hour average sustained |
Collector per-core CPU (mpstat) | Detects parser thread pool saturation or RSS funneling | Single core at 100% with others idle, or aggregate %user above 70% |
| rsyslog main queue depth (impstats) | Shows backpressure building before drops occur | Queue size approaching queue.size; full events logged |
| syslog-ng output queue length | Shows destination backpressure | Queue growing without bound for a specific destination |
NIC RX drops (/proc/net/dev) | Pre-socket-buffer loss at the ring buffer | rx_missed_errors incrementing |
| Syslog severity distribution | Distinguishes real events from noise storms | Rate spike without severity escalation means noise (flap, debug) |
/proc/net/softnet_stat | Kernel packet processing backpressure | Column 3 (dropped) incrementing |
Fixes
Isolate the chatty source in its own queue
The most effective fix is structural: prevent one source from consuming the shared parser and queue resources.
In rsyslog, assign the chatty device to a dedicated ruleset with its own action queue. Configure disk-assisted queuing (queue.filename) for that ruleset so bursts spill to disk rather than backing up into the main queue. Use RainerScript queue.* syntax; legacy dollar-sign directives still work but can produce nondeterministic behavior when mixed with advanced syntax in rsyslog 8.x.
In syslog-ng, route the chatty source to a separate log path with explicit flow-control and disk-based buffering. Without flow-control declared, a slow destination in a shared log path causes silent message drops across all sources in that path. With flow-control, syslog-ng spills to disk before dropping, buying time during downstream outages.
The tradeoff: isolating the source means its messages may be delayed during bursts (disk-assisted queuing adds latency). For a noisy source whose messages are low-value, this is acceptable. For a source whose messages are high-value but high-volume (a core firewall), you need a bigger queue or more parser threads, not isolation alone.
Size the UDP socket buffer correctly
The Linux default net.core.rmem_max varies by distribution and may be as low as 212,992 bytes (208 KB) on some systems. For a syslog collector receiving bursts, this is frequently insufficient. The buffer must be large enough to absorb several seconds of peak burst while the parser catches up.
Set net.core.rmem_max to 16 MB or higher for production syslog collectors, and set SO_RCVBUF explicitly on the listener socket. In rsyslog, use the so-rcvbuf() option on the imudp input. In syslog-ng, use so-rcvbuf() on the UDP source definition.
When sizing, target 1 to 2 seconds of peak-rate headroom. The kernel internally allocates roughly twice the value you request via SO_RCVBUF (capped at rmem_max), so account for that when calculating.
Scale parser threads
The default number of worker threads per queue in rsyslog is 1. A single worker processing a burst from one source will serialize all parsing for that queue. Increase queue.workerThreads to allow parallel processing.
For syslog-ng versions before 4.2, UDP reception on a given port is single-threaded regardless of CPU cores. Even so-reuseport(yes) routes all packets from one source IP to the same thread. syslog-ng 4.2.0 introduces an ebpf(reuseport(sockets(N))) plugin that distributes a single high-rate UDP source across N worker threads using eBPF SO_REUSEPORT. This plugin is disabled by default at compile time and requires a recent kernel.
Apply rate limiting at ingress
If the chatty source is genuinely noisy and its messages are low-value, rate-limit it at the collector before messages enter the main queue.
In rsyslog, the imuxsock module enforces per-PID rate limiting by default (200 messages per 5-second interval). When exceeded, rsyslog logs imuxsock begins to drop messages from pid XXXX due to rate-limiting. This applies to local Unix socket inputs, not remote UDP. For remote UDP sources, there is no built-in per-source-IP rate limiter in imudp.
Do not disable rate limiting entirely (RateLimit.Burst=0). Without any rate limit, a runaway source can fill /var and take down the collector’s host.
Prevention
- Monitor UdpRcvbufErrors continuously. Any nonzero increment on a syslog collector is abnormal. Alert on it, not just chart it.
- Track syslog receive rate per source. A single source dominating volume is a finding, not just noise.
- Load impstats (rsyslog) or enable stats counters (syslog-ng) permanently. Queue depth is a leading indicator that precedes drops by minutes. Without it, you are blind until the kernel starts dropping.
- Size
net.core.rmem_maxproactively. Do not wait for the first burst to discover the default is too small. 16 MB is a reasonable starting point for a production syslog collector. - Verify RSS IRQ distribution. One core at 100% during a syslog burst with other cores idle means RSS is funneling all UDP interrupts to one CPU. Check
cat /proc/interrupts | grep eth0to verify distribution. - Separate syslog storage from TSDB storage. A syslog flood that fills the disk volume can take down flow collection running on the same host.
- Pre-configure isolation rulesets for known-noisy device classes. Wireless controllers, load balancers, and devices prone to debug-level logging should have dedicated queues from day one.
How Netdata helps
- UdpRcvbufErrors monitoring. Netdata charts UDP socket buffer drops system-wide from
/proc/net/snmp. Configure an alert on any nonzero increment for hosts running syslog collectors. - Per-core CPU breakdown. Netdata’s
cpucollector provides per-core utilization including softirq time. A single core pinned at 100% during a syslog burst is visible without runningmpstatmanually. - NIC ring buffer drops. The
netdevcollector tracks/proc/net/devRX drops, which precede socket-buffer drops in the loss cascade. - Disk space on syslog volumes. Netdata’s
diskcollector monitors free space and fill rate. A syslog flood filling/varis detected before it causes a hard failure. - Cross-signal correlation. During a syslog flood, Netdata’s unified timeline lets you correlate the burst with device control-plane CPU, interface state changes, and BGP events on the same dashboard, which helps confirm whether the syslog noise reflects a real network event or is purely a logging artifact.
Related guides
- Network monitoring checklist: the signals every production network needs
- Silent UDP flow data loss: why your NetFlow collector is dropping records
- NetFlow storage sizing: how much disk your flow collector really needs
- Flow export-to-ingest latency: why your NetFlow data is minutes behind
- BGP flapping: why a peer keeps resetting and how to find the cause







