$ guides / nginx / nginx-dropped-connections ▌

Operations Guides

NGINX dropped connections: the accepts vs handled gap

Users report intermittent connection timeouts. Your HTTP 5xx rate is flat. The error log is quiet. Something is dropping traffic before it ever becomes a request.

On every NGINX instance, the stub_status page exposes two cumulative counters: accepts and handled. When accepts grows faster than handled, NGINX is taking connections from the kernel and then discarding them. This gap is a leading indicator of connection-slot or file-descriptor exhaustion. It often starts increasing minutes before the system hits the hard wall.

Because the counters are cumulative since process start, a static nonzero difference only proves a past event. Watch the rate of change.

What this means

In stub_status output, the first two numbers on the data line are cumulative counters:

accepts – total connections accepted from the kernel listen queue. handled – total connections that NGINX actually processed.

In a healthy instance, the counters increase in lockstep. When accepts exceeds handled, the delta equals dropped connections. NGINX accepted the TCP handshake but could not allocate the resources to process it. The connection is silently closed.

The gap is not a single-point metric. Both counters are monotonic since startup and survive configuration reloads. A nonzero absolute value only tells you that drops happened at some point. What matters is whether the gap is actively growing during a measurement window. If the delta over 60 seconds is greater than zero, connections are being dropped now.

The usual triggers are resource ceilings: the per-worker worker_connections limit, the process file-descriptor limit (worker_rlimit_nofile or system ulimit), or the kernel listen backlog. The gap grows because workers cannot create connection structures, cannot open new file descriptors, or cannot pull connections from the kernel queue fast enough.

Common causes

Cause	What it looks like	First thing to check
Connection slot exhaustion	Active connections plateau near `worker_connections x worker_processes`. Waiting (keepalive) or Writing (slow upstream) states dominate.	`stub_status` active connections against configured maximum.
File-descriptor exhaustion	Active connections are below the slot limit, but `accept4() failed (24: Too many open files)` appears in the error log. FD count per worker is pinned near its limit.	`/proc/<pid>/fd` count against `/proc/<pid>/limits` for each worker.
Kernel listen-backlog overflow	`stub_status` shows no stress, but clients see connection timeouts. NGINX logs show no evidence because drops happen before `accept()`.	`TcpExtListenOverflows` or `ss -tlnp` Recv-Q approaching Send-Q.
Worker restart or reload	Brief gap spike that stabilizes within seconds. Correlates with a reload event in the error log.	Error log for `reconfiguring` notices around the same timestamp.

Quick checks

These checks are read-only and safe to run during an incident.

# 1. Check the current gap and whether it is growing
curl -s http://127.0.0.1/nginx_status | awk '/^[[:space:]]*[0-9]/ {print "gap=" $1-$2; exit}'

# 2. Check active connections vs theoretical maximum
active=$(curl -s http://127.0.0.1/nginx_status | awk '/Active connections/ {print $3}')
workers=$(pgrep -a -P $(cat /var/run/nginx.pid) | grep -c 'nginx: worker')
wc=$(nginx -T 2>/dev/null | grep -m1 'worker_connections' | awk '{print $2}' | tr -d ';')
wc=${wc:-512}
echo "Active: $active / Max: $((workers * wc))"

# 3. Check file-descriptor usage per worker
pgrep -a -P $(cat /var/run/nginx.pid) | grep 'nginx: worker' | awk '{print $1}' | while read -r pid; do
  used=$(ls /proc/$pid/fd 2>/dev/null | wc -l)
  echo "Worker $pid: $used FDs"
done

# 4. Check the hard FD limit enforced on workers
prlimit -n -p $(pgrep -a -P $(cat /var/run/nginx.pid) | grep 'nginx: worker' | head -1 | awk '{print $1}') 2>/dev/null

# 5. Check kernel-level listen-queue overflows
nstat -az TcpExtListenOverflows 2>/dev/null | awk '/ListenOverflows/ {print $2}'

# 6. Check current backlog depth on listening sockets
ss -tlnp | awk 'NR>1 && /:(80|443)/ {print "Recv-Q:"$2, "Send-Q:"$3, $4}'

How to diagnose it

flowchart TD
  A[accepts - handled gap growing] --> B{Active connections near limit?}
  B -->|Yes| C[Connection slot exhaustion]
  B -->|No| D{FD usage near process limit?}
  D -->|Yes| E[File descriptor exhaustion]
  D -->|No| F{TcpExtListenOverflows increasing?}
  F -->|Yes| G[Kernel listen backlog overflow]
  F -->|No| H[Worker restart or transient reload]

Establish that the gap is live. Take two stub_status samples 30 to 60 seconds apart. If (accepts - handled) increases between samples, drops are occurring now. A static gap is historical noise.
Check connection slot utilization. Calculate active connections divided by worker_connections x worker_processes. If the ratio is above 0.8 and climbing, the worker event loops are running out of connection structures. For reverse-proxy traffic, remember the proxy multiplier: each request consumes at least two slots (client-facing and upstream).
Check file-descriptor saturation. Even if slots are available, the OS may refuse to allocate new FDs. Count open FDs per worker against the limit in /proc/<pid>/limits. If usage is above 90 percent, FD exhaustion is the bottleneck. The error log may contain accept4() failed (24: Too many open files).
Check the kernel listen queue. If neither slots nor FDs are saturated, look at the kernel level. Run nstat -az TcpExtListenOverflows. An increasing counter means the kernel is dropping SYNs or established connections before NGINX ever sees them. Use ss -tlnp to compare Recv-Q against Send-Q.
Correlate with error log and process state. Search the error log for accept4() failed or too many open files. Verify that the worker count matches the configured worker_processes; missing workers can concentrate load and accelerate slot exhaustion on the remaining processes.
Determine the root cause category. Slot exhaustion points to capacity or slow upstreams. FD exhaustion points to worker_rlimit_nofile or system ulimit being too low, or an FD leak. Kernel overflow points to net.core.somaxconn or worker CPU saturation preventing accept.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Dropped connection rate (`accepts - handled` delta)	The only NGINX-native signal that proves connections are being discarded after TCP handshake.	Gap increasing over any 60-second window while request rate is above zero.
Connection slot utilization	Reveals whether the event loop is near its hard ceiling.	Active / (`worker_connections` x `worker_processes`) > 0.8.
FD usage per worker	FD limits often bind before connection slots do, especially with keepalive and log files.	Used FDs / limit > 0.8, or any `accept4() failed (24)` in error log.
TcpExtListenOverflows	Kernel drops are invisible to NGINX logs. This OS counter is the only evidence.	Counter increasing over a 60-second window.
Active connection state breakdown	Distinguishes slot exhaustion caused by slow upstreams (Writing) from keepalive waste (Waiting).	Writing > 50% of active with low throughput, or Waiting consuming > 80% of capacity.
Error log severity rate	Resource-limit errors may not produce HTTP status codes but still log at `crit` or `error`.	Any `too many open files` or `could not allocate node` entries.

Fixes

Fix choice depends on which boundary you hit.

Connection slots exhausted

If active connections are near worker_connections x worker_processes, you have three levers.

Increase capacity. Raise worker_connections in the events block and reload. For a reverse proxy, set the value to at least 2x your peak proxied request count plus keepalive headroom. Monitor memory: each connection allocates buffers, so doubling slots increases RSS.

Reclaim idle capacity. If Waiting connections dominate active connections, reduce keepalive_timeout or keepalive_requests so idle connections close faster. This is a fast config change with no memory cost.

Fix the upstream bottleneck. If Writing connections dominate, the backend is too slow. Reduce proxy_read_timeout only if you can tolerate faster 504 errors; otherwise add backend capacity.

File descriptors exhausted

Check the actual process limit with prlimit or /proc/<pid>/limits, not just the NGINX config.

Raise the worker FD ceiling. Set worker_rlimit_nofile to a value comfortably above worker_connections plus overhead for log files, upstream sockets, and temp files. Then reload. Note that systemd LimitNOFILE or container runtimes may override this, so verify the process-level limit after restart.

Find a leak. If FD usage grows while active connections stay flat, inspect /proc/<pid>/fd for repeated patterns. Leaked upstream connections, excessive temp files, or per-vhost log proliferation are common culprits.

Kernel listen queue overflowing

The effective backlog is the lesser of net.core.somaxconn and the backlog parameter on the listen directive.

Raise the kernel limit. Increase net.core.somaxconn to at least 4096 on modern kernels.

Raise the socket backlog. Add backlog=4096 to the listen directive and reload.

Speed up accept capacity. If the queue fills despite a deep backlog, workers are not calling accept() fast enough. Check worker CPU saturation or event-loop blocking from synchronous disk I/O.

Prevention

Monitor the rate, not the absolute. Alert when (accepts - handled) increases over a rolling 60-second window, not when the cumulative counter is nonzero.
Size for the proxy multiplier. In reverse-proxy mode, each request needs at least two connections. Size worker_connections at 2x peak proxied concurrency.
Set generous FD limits. worker_rlimit_nofile should be at least 2x worker_connections. Verify the systemd or container limit independently.
Watch the kernel queue. Include TcpExtListenOverflows in host-level monitoring. It is the only signal that catches drops before NGINX sees them.

How Netdata helps

Delta rate visualization. Netdata computes the accepts minus handled gap rate from stub_status counters, removing manual two-point sampling.
Per-process FD charts. File-descriptor usage per NGINX worker is shown against process limits, surfacing exhaustion before errors appear.
Kernel drop correlation. Host-level kernel listen-overflow counters are charted next to NGINX connection metrics, linking silent kernel drops to the application.
Slot utilization alerts. Active connections are compared against worker_connections capacity, with alerts when utilization crosses 80 percent.
State breakdown correlation. Reading, Writing, and Waiting proportions are shown alongside upstream latency and error-log rates to distinguish slot exhaustion from backend slowness.

The Netdata solution

Web server monitoring with Netdata

Netdata monitors NGINX with per-second request, connection, and latency metrics plus ML anomaly detection. Correlate connection and file-descriptor exhaustion, upstream cascade failures, buffer spill, and TLS CPU with the host signals behind them.

See web server monitoring → Start monitoring free

NGINX dropped connections: the accepts vs handled gap

NGINX dropped connections: the accepts vs handled gap

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Connection slots exhausted

File descriptors exhausted

Kernel listen queue overflowing

Prevention

How Netdata helps

Related guides

Web server monitoring with Netdata