NGINX listen queue overflow: somaxconn, backlog, and silent connection drops

Clients report intermittent connection timeouts. Your load balancer health checks pass. NGINX error logs are clean and access logs show no 5xx spikes. The issue is not in NGINX workers or upstream applications. It is in the kernel accept queue.

When the accept queue fills, the kernel drops new connections silently. NGINX never sees them, so it logs nothing. Evidence is client-side timeouts and the kernel counter TcpExtListenOverflows.

On Linux, the effective listen backlog is min(NGINX backlog, net.core.somaxconn). NGINX defaults to 511 on Linux; modern kernels default somaxconn to 4096, but older systems may use 128. If somaxconn is 128, the effective backlog is 128 regardless of what NGINX requests. Fix both.

What this means

When a client connects, the kernel completes the three-way handshake and places the socket in the listen socket’s accept queue. Workers call accept() to pull connections and process them.

If workers are slow to pull connections, or arrivals exceed the dequeue rate, the queue fills. Once it hits the backlog limit, the kernel drops newly completed connections. These drops happen before NGINX allocates a connection structure or writes an access log entry. From the client’s perspective, the TCP handshake may complete but the HTTP request hangs until timeout.

The accept queue depth limit is min(backlog, somaxconn):

  • backlog on the listen directive (default 511 on Linux).
  • net.core.somaxconn (default 4096 on Linux 5.4+, 128 on older kernels).

Tuning only one side is a common mistake. Setting listen 80 backlog=4096 while somaxconn remains 128 yields an effective backlog of 128.

With reuseport, the kernel creates a separate listen socket per worker. This eliminates thundering herd but splits the backlog: each worker gets approximately backlog / worker_count queue depth. One deep queue becomes several shallower queues.

flowchart TD
    A[Client timeouts] --> B{NGINX errors?}
    B -->|None| C[Check TcpExtListenOverflows]
    B -->|Present| X[Different failure mode]
    C -->|Increasing| D[Check ss Recv-Q vs Send-Q]
    C -->|Zero| Y[Check network or firewall]
    D --> E{Recv-Q near Send-Q?}
    E -->|Yes| F[Check somaxconn and backlog]
    E -->|No| G[Check worker saturation]
    F --> H[Raise effective backlog]
    G --> I[Scale workers or tune keepalive]

Common causes

CauseWhat it looks likeFirst thing to check
net.core.somaxconn too lowss -tlnp Send-Q capped at 128 despite higher NGINX configsysctl net.core.somaxconn
NGINX backlog too low or unsetSend-Q stuck at 511 under burst loadnginx -T 2>/dev/null | grep -E 'listen.*backlog'
Worker saturationRecv-Q climbs while worker CPU or connection slots max outWorker CPU and active connection ratio
Sudden traffic spikeTcpExtListenOverflows spikes during flash crowds or deploymentsRate of change in nstat -a TcpExtListenOverflows

Quick checks

Run these read-only commands to confirm whether the kernel is dropping connections.

# Kernel-level listen queue drops (cumulative counter)
nstat -a 2>/dev/null | awk '/ListenOverflows/ {print $2}'

# Direct from /proc if nstat is unavailable
awk '/^TcpExt:/ { if (!header) { split($0, header); next } for (i=1; i<=NF; i++) if (header[i]=="ListenOverflows") print $i }' /proc/net/netstat

# Current queue depth vs configured backlog for NGINX ports
ss -tlnp '( sport = :80 or sport = :443 )' | awk 'NR>1 {print "Recv-Q:"$2, "Send-Q:"$3, $4}'

# System-wide somaxconn limit
sysctl net.core.somaxconn

# NGINX configured backlog values
nginx -T 2>/dev/null | grep -E 'listen.*backlog'

# NGINX-level drops (accepts without handled); adjust path to your stub_status location
curl -s http://127.0.0.1/nginx_status | awk '/^ / {print "gap:", $1-$2}'

# Worker CPU to detect saturation
ps -eo pid,pcpu,comm | grep 'nginx: worker'

How to diagnose it

  1. Confirm the symptom pattern. Clients see connection timeouts or long hangs, with no matching 5xx status codes in the access log and no [error] entries about upstream failures. This silence is the hallmark of a kernel-level drop.

  2. Check for kernel drops. Poll TcpExtListenOverflows twice over a 10-second interval. If the value increases, the kernel is actively dropping connections. Also check TcpExtListenDrops.

  3. Inspect queue depth. Run ss -tlnp '( sport = :80 or sport = :443 )'. A Recv-Q approaching Send-Q means completed connections are piling up faster than workers accept them. Send-Q shows the configured backlog limit.

  4. Find the effective backlog limit. Compare sysctl net.core.somaxconn with the backlog values in your NGINX listen directives. Whichever is lower is your actual ceiling. If somaxconn is 128 and NGINX requests 511, the ceiling is 128.

  5. Determine if workers are the bottleneck. Check per-worker CPU (not aggregate). If any worker is near 100%, the event loop is stalled and accept() is delayed. Check connection slot utilization: active_connections / (worker_connections * worker_processes). Above 80%, raising the backlog only deepens the queue without preventing drops.

  6. Correlate with traffic events. Look for request rate spikes, deployment-related connection storms, or health-check amplification that coincides with the overflow counter.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
TcpExtListenOverflows rateDirect evidence of kernel-side silent dropsAny sustained increase above zero
ss -tlnp Recv-Q / Send-QShows accept queue fill level per socketRecv-Q > 50% of Send-Q sustained
accepts - handled gapNGINX-level connection drops after kernel handoffCumulative gap increasing over time
Worker CPU per processSaturated workers cannot pull from the queue fast enoughAny worker > 80% sustained
Connection slot utilizationConnection exhaustion mimics queue overflow symptomsActive / (workers * connections) > 80%
Active connections trendRising baseline can predict queue pressure before overflowMonotonic climb over 30+ minutes

Fixes

Raise the effective backlog

Set net.core.somaxconn and the NGINX backlog parameter to the same value. 4096 matches the modern Linux default and covers most bursts. Go higher only if you measure larger connection storms.

# Temporary
sudo sysctl -w net.core.somaxconn=4096

# Persistent
echo "net.core.somaxconn=4096" | sudo tee /etc/sysctl.d/99-nginx.conf
sudo sysctl --system

In nginx.conf, set an explicit backlog on each listen directive:

server {
    listen 80 backlog=4096;
    listen 443 ssl backlog=4096;
    ...
}

Restart NGINX to ensure the new backlog takes effect. Changing the backlog parameter may not update already-bound sockets on a reload.

Scale worker capacity

If workers are saturated, deepening the queue only delays drops. Address the root cause:

  • Increase worker_connections. Proxy traffic uses two slots per request.
  • Increase worker_processes if CPU cores are available.
  • Reduce keepalive_timeout to free connection slots held by idle keepalive connections.
  • Enable or verify reuseport is active. With reuseport, ensure the backlog is large enough that backlog / worker_processes still leaves headroom.

Mitigate connection storms

  • Enable proxy_cache_lock if cache stampedes are driving connection spikes.
  • Add rate limiting at the edge load balancer or with limit_conn to prevent single-client floods from filling the queue.
  • Scale horizontally by adding NGINX instances behind a load balancer.

Prevention

  • Monitor TcpExtListenOverflows from day one. It is a zero-cost kernel counter that reveals silent drops before users complain.
  • Tune both sides of the backlog equation: somaxconn and the NGINX listen directive. Never assume the default is sufficient.
  • Account for the reuseport queue-splitting effect when sizing backlog.
  • Do not rely on NGINX logs as the sole source of connection-health truth. Correlate kernel socket counters with stub_status metrics.

How Netdata helps

  • Netdata monitors kernel TCP extended statistics including TcpExtListenOverflows and TcpExtListenDrops, surfacing rate increases invisible to application logs.
  • Per-process CPU and file-descriptor charts for NGINX workers help identify saturation that causes the accept queue to back up.
  • The NGINX collector exposes stub_status metrics, making it easy to correlate kernel drops with accepts, handled, and active connection counts.
  • Charts for connection-state breakdowns show when Reading or Writing counts trend toward capacity limits before queue overflow begins.