NGINX listen queue overflow: somaxconn, backlog, and silent connection drops
Clients report intermittent connection timeouts. Your load balancer health checks pass. NGINX error logs are clean and access logs show no 5xx spikes. The issue is not in NGINX workers or upstream applications. It is in the kernel accept queue.
When the accept queue fills, the kernel drops new connections silently. NGINX never sees them, so it logs nothing. Evidence is client-side timeouts and the kernel counter TcpExtListenOverflows.
On Linux, the effective listen backlog is min(NGINX backlog, net.core.somaxconn). NGINX defaults to 511 on Linux; modern kernels default somaxconn to 4096, but older systems may use 128. If somaxconn is 128, the effective backlog is 128 regardless of what NGINX requests. Fix both.
What this means
When a client connects, the kernel completes the three-way handshake and places the socket in the listen socket’s accept queue. Workers call accept() to pull connections and process them.
If workers are slow to pull connections, or arrivals exceed the dequeue rate, the queue fills. Once it hits the backlog limit, the kernel drops newly completed connections. These drops happen before NGINX allocates a connection structure or writes an access log entry. From the client’s perspective, the TCP handshake may complete but the HTTP request hangs until timeout.
The accept queue depth limit is min(backlog, somaxconn):
backlogon thelistendirective (default 511 on Linux).net.core.somaxconn(default 4096 on Linux 5.4+, 128 on older kernels).
Tuning only one side is a common mistake. Setting listen 80 backlog=4096 while somaxconn remains 128 yields an effective backlog of 128.
With reuseport, the kernel creates a separate listen socket per worker. This eliminates thundering herd but splits the backlog: each worker gets approximately backlog / worker_count queue depth. One deep queue becomes several shallower queues.
flowchart TD
A[Client timeouts] --> B{NGINX errors?}
B -->|None| C[Check TcpExtListenOverflows]
B -->|Present| X[Different failure mode]
C -->|Increasing| D[Check ss Recv-Q vs Send-Q]
C -->|Zero| Y[Check network or firewall]
D --> E{Recv-Q near Send-Q?}
E -->|Yes| F[Check somaxconn and backlog]
E -->|No| G[Check worker saturation]
F --> H[Raise effective backlog]
G --> I[Scale workers or tune keepalive]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
net.core.somaxconn too low | ss -tlnp Send-Q capped at 128 despite higher NGINX config | sysctl net.core.somaxconn |
NGINX backlog too low or unset | Send-Q stuck at 511 under burst load | nginx -T 2>/dev/null | grep -E 'listen.*backlog' |
| Worker saturation | Recv-Q climbs while worker CPU or connection slots max out | Worker CPU and active connection ratio |
| Sudden traffic spike | TcpExtListenOverflows spikes during flash crowds or deployments | Rate of change in nstat -a TcpExtListenOverflows |
Quick checks
Run these read-only commands to confirm whether the kernel is dropping connections.
# Kernel-level listen queue drops (cumulative counter)
nstat -a 2>/dev/null | awk '/ListenOverflows/ {print $2}'
# Direct from /proc if nstat is unavailable
awk '/^TcpExt:/ { if (!header) { split($0, header); next } for (i=1; i<=NF; i++) if (header[i]=="ListenOverflows") print $i }' /proc/net/netstat
# Current queue depth vs configured backlog for NGINX ports
ss -tlnp '( sport = :80 or sport = :443 )' | awk 'NR>1 {print "Recv-Q:"$2, "Send-Q:"$3, $4}'
# System-wide somaxconn limit
sysctl net.core.somaxconn
# NGINX configured backlog values
nginx -T 2>/dev/null | grep -E 'listen.*backlog'
# NGINX-level drops (accepts without handled); adjust path to your stub_status location
curl -s http://127.0.0.1/nginx_status | awk '/^ / {print "gap:", $1-$2}'
# Worker CPU to detect saturation
ps -eo pid,pcpu,comm | grep 'nginx: worker'
How to diagnose it
Confirm the symptom pattern. Clients see connection timeouts or long hangs, with no matching 5xx status codes in the access log and no
[error]entries about upstream failures. This silence is the hallmark of a kernel-level drop.Check for kernel drops. Poll
TcpExtListenOverflowstwice over a 10-second interval. If the value increases, the kernel is actively dropping connections. Also checkTcpExtListenDrops.Inspect queue depth. Run
ss -tlnp '( sport = :80 or sport = :443 )'. ARecv-QapproachingSend-Qmeans completed connections are piling up faster than workers accept them.Send-Qshows the configured backlog limit.Find the effective backlog limit. Compare
sysctl net.core.somaxconnwith thebacklogvalues in your NGINXlistendirectives. Whichever is lower is your actual ceiling. Ifsomaxconnis 128 and NGINX requests 511, the ceiling is 128.Determine if workers are the bottleneck. Check per-worker CPU (not aggregate). If any worker is near 100%, the event loop is stalled and
accept()is delayed. Check connection slot utilization:active_connections / (worker_connections * worker_processes). Above 80%, raising the backlog only deepens the queue without preventing drops.Correlate with traffic events. Look for request rate spikes, deployment-related connection storms, or health-check amplification that coincides with the overflow counter.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
TcpExtListenOverflows rate | Direct evidence of kernel-side silent drops | Any sustained increase above zero |
ss -tlnp Recv-Q / Send-Q | Shows accept queue fill level per socket | Recv-Q > 50% of Send-Q sustained |
accepts - handled gap | NGINX-level connection drops after kernel handoff | Cumulative gap increasing over time |
| Worker CPU per process | Saturated workers cannot pull from the queue fast enough | Any worker > 80% sustained |
| Connection slot utilization | Connection exhaustion mimics queue overflow symptoms | Active / (workers * connections) > 80% |
| Active connections trend | Rising baseline can predict queue pressure before overflow | Monotonic climb over 30+ minutes |
Fixes
Raise the effective backlog
Set net.core.somaxconn and the NGINX backlog parameter to the same value. 4096 matches the modern Linux default and covers most bursts. Go higher only if you measure larger connection storms.
# Temporary
sudo sysctl -w net.core.somaxconn=4096
# Persistent
echo "net.core.somaxconn=4096" | sudo tee /etc/sysctl.d/99-nginx.conf
sudo sysctl --system
In nginx.conf, set an explicit backlog on each listen directive:
server {
listen 80 backlog=4096;
listen 443 ssl backlog=4096;
...
}
Restart NGINX to ensure the new backlog takes effect. Changing the backlog parameter may not update already-bound sockets on a reload.
Scale worker capacity
If workers are saturated, deepening the queue only delays drops. Address the root cause:
- Increase
worker_connections. Proxy traffic uses two slots per request. - Increase
worker_processesif CPU cores are available. - Reduce
keepalive_timeoutto free connection slots held by idle keepalive connections. - Enable or verify
reuseportis active. Withreuseport, ensure the backlog is large enough thatbacklog / worker_processesstill leaves headroom.
Mitigate connection storms
- Enable
proxy_cache_lockif cache stampedes are driving connection spikes. - Add rate limiting at the edge load balancer or with
limit_connto prevent single-client floods from filling the queue. - Scale horizontally by adding NGINX instances behind a load balancer.
Prevention
- Monitor
TcpExtListenOverflowsfrom day one. It is a zero-cost kernel counter that reveals silent drops before users complain. - Tune both sides of the backlog equation:
somaxconnand the NGINXlistendirective. Never assume the default is sufficient. - Account for the
reuseportqueue-splitting effect when sizing backlog. - Do not rely on NGINX logs as the sole source of connection-health truth. Correlate kernel socket counters with
stub_statusmetrics.
How Netdata helps
- Netdata monitors kernel TCP extended statistics including
TcpExtListenOverflowsandTcpExtListenDrops, surfacing rate increases invisible to application logs. - Per-process CPU and file-descriptor charts for NGINX workers help identify saturation that causes the accept queue to back up.
- The NGINX collector exposes
stub_statusmetrics, making it easy to correlate kernel drops withaccepts,handled, and active connection counts. - Charts for connection-state breakdowns show when
ReadingorWritingcounts trend toward capacity limits before queue overflow begins.
Related guides
- How NGINX actually works in production: a mental model for operators
- nginx 413 Request Entity Too Large: client_max_body_size explained
- nginx 499 status code: why clients close connections before the response
- nginx 500 Internal Server Error: how to diagnose it
- nginx 502 Bad Gateway: causes and how to fix it
- nginx 503 Service Temporarily Unavailable: causes and fixes
- nginx 504 Gateway Time-out: causes and fixes
- NGINX active connections climbing: reading, writing, waiting explained
- nginx: bind() to 0.0.0.0:80 failed (98: Address already in use)
- NGINX backend cascade failure: when slow upstreams take down everything
- nginx: a client request body is buffered to a temporary file - what it means
- NGINX proxy cache hit rate is low: measuring and improving it







