nginx: worker_connections are not enough — causes and fixes
Your error log shows worker_connections are not enough while connecting to upstream. New clients time out while existing connections may still work. This is a hard capacity cliff: once a worker exhausts its connection slots, it cannot accept new connections until a slot frees. The default limit is 512 per worker, not 1024, and in reverse-proxy mode each request consumes at least two slots. Raising the number is often the first reaction, but if a slow backend is holding connections open, the slots will fill again no matter how high you set the limit.
What this means
worker_connections is a per-worker hard ceiling inside the events block. Every client connection, upstream connection, and idle keepalive connection counts as one slot. When a worker hits the limit, the kernel may still complete TCP handshakes and queue them in the listen backlog, but nginx cannot accept them. The stub_status counters reveal this through a growing gap between accepts and handled.
For reverse-proxy workloads, effective capacity is at most half of worker_connections * worker_processes because each proxied request ties up one slot on the client side and one on the upstream side. File descriptors impose a second, independent ceiling. Each connection needs an FD, so worker_rlimit_nofile must be at least worker_connections * 2 plus headroom for logs and temp files. nginx validates this relationship at startup and emits a warning if the FD limit is lower, but it will still start.
flowchart TD
A[Client request arrives] --> B{Worker has free slot?}
B -->|Yes| C[Accept client connection]
C --> D[Open upstream connection]
D --> E[Process response]
B -->|No| F[Connection dropped]
F --> G[accepts > handled gap grows]
E --> H{Backend slow?}
H -->|Yes| I[Slot held open]
I --> B
H -->|No| J[Close or keepalive]
J --> BCommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Slow backend holding connections | Writing state dominates; $upstream_response_time climbs; error log shows upstream timed out | stub_status state breakdown and access log upstream latency |
worker_connections too low for proxy load | Active connections flat near worker_connections * worker_processes; accepts exceeds handled | stub_status and nginx -T for the configured limit |
| File descriptor limit below connection limit | accept4() failed (24: Too many open files) in error log; FD count near limit per worker | /proc/<worker_pid>/limits or prlimit |
| Excessive keepalive idle connections | Waiting state dominates; low request rate but high active connections | keepalive_timeout and keepalive_requests values |
| Traffic spike or connection flood | Sudden spike in active connections; Reading state may rise | Request rate from access log and ss SYN queue state |
Quick checks
# Confirm the error and frequency
grep -c "worker_connections are not enough" /var/log/nginx/error.log
# Check connection counts and accepts vs handled gap
curl -s http://127.0.0.1/nginx_status
# Inspect configured limits
nginx -T 2>/dev/null | grep -E 'worker_connections|worker_rlimit_nofile|worker_processes'
# File descriptor usage per worker against its limit
for pid in $(pgrep -P $(cat /var/run/nginx.pid)); do
used=$(ls /proc/$pid/fd 2>/dev/null | wc -l)
max=$(awk '/^Max open files/ {print $4}' /proc/$pid/limits)
echo "Worker $pid: $used / $max FDs"
done
# Kernel-level listen drops (silent to nginx)
nstat -az TcpExtListenOverflows 2>/dev/null | awk '/ListenOverflows/ {print $2}'
# Upstream latency in recent requests (requires $upstream_response_time in log_format)
tail -n 5000 /var/log/nginx/access.log | awk '{print $NF}' | sort -n | \
awk 'BEGIN{c=0} {a[c++]=$1} END{print "p95:", a[int(c*0.95)], "p99:", a[int(c*0.99)]}'
How to diagnose it
- Verify the symptom. Search the error log for
worker_connections are not enough. Note whether it correlates with traffic spikes or appears continuously under normal load. - Calculate slot utilization. Pull
Active connectionsfromstub_status. Divide byworker_connections * worker_processes. If this is above 80%, you are in the danger zone. For reverse proxy, divide by an additional factor of two. - Check for admission loss. Compare the
acceptsandhandledcounters instub_status. Ifacceptsexceedshandledand the gap is growing, nginx is actively dropping connections. - Check the FD ceiling. Inspect
/proc/<worker_pid>/limitsforMax open files. If FD usage is within 20% of the limit, the real bottleneck isworker_rlimit_nofileor the OSulimit, notworker_connections. - Identify the connection state mix. High
Writingwith elevated$upstream_response_timepoints to slow backends. HighWaitingwith flat traffic points to keepalive hoarding. HighReadingwithout corresponding throughput points to slow clients or a flood of new connections. - Look for kernel drops. Check
TcpExtListenOverflows. If it is increasing, the kernel is dropping SYNs or completed connections before nginx can accept them. This produces client-side timeouts with no entry in nginx logs. - Correlate with backend health. If upstream latency rose before the error appeared, the backend is the root cause. Fixing the limit without fixing the backend will only postpone the next outage.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Connection slot utilization | Reveals how close you are to the hard per-worker cap | >80% of worker_connections * worker_processes sustained |
Dropped connections (accepts - handled gap) | The earliest indicator that nginx is refusing work | Gap increasing over a 60-second window |
| Active connection state breakdown | Distinguishes backend slowness from keepalive bloat | Writing >60% of active with low throughput |
| File descriptor usage per worker | FD exhaustion mimics connection exhaustion | >80% of worker_rlimit_nofile or OS limit |
| Upstream response time (P95) | Slow backends are the most common hidden cause | P95 trending up or >80% of proxy_read_timeout |
| Kernel listen overflows | Connections dropped before nginx sees them | TcpExtListenOverflows counter increasing |
Fixes
Raise the limit correctly
If utilization is genuinely too low for your traffic, increase worker_connections. Remember the default is 512, not 1024. In nginx.conf, inside the events block:
worker_connections 4096;
worker_rlimit_nofile 8192;
For a reverse proxy, worker_rlimit_nofile should be at least double worker_connections to cover client and upstream sockets plus logs and temp files. Reload with nginx -s reload; a restart is not required. If nginx runs under systemd, verify that LimitNOFILE in the service unit is not overriding your config.
Fix the backend first
If Writing connections are high and upstream latency is elevated, raising worker_connections will only delay the inevitable. The correct fix is to restore backend performance. As an emergency mitigation, you can reduce proxy_read_timeout so nginx gives up on slow upstreams faster. This trades 504 Gateway Time-outs for freed connection slots, which is usually preferable to total connection exhaustion.
Reclaim keepalive capacity
If Waiting connections dominate, lower keepalive_timeout or keepalive_requests to recycle idle slots faster. In upstream blocks, ensure the keepalive pool size matches your concurrency; an oversized pool wastes slots, while an undersized pool causes excessive upstream connection churn.
Address file descriptor limits
When the error log shows accept4() failed (24: Too many open files), the FD limit is the real ceiling, not worker_connections. Raise worker_rlimit_nofile and verify the OS soft limit with prlimit -n -p <worker_pid>. If systemd manages the process, create a drop-in override for LimitNOFILE, run systemctl daemon-reload, and restart nginx.
Prevention
- Size
worker_connectionsfor at least 2x your peak proxied request concurrency, then add headroom for keepalive idle connections and WebSockets. - Set
worker_rlimit_nofileto at least twiceworker_connections, and verify it is not overridden by the init system or container runtime. - Monitor the
accepts - handledgap fromstub_status. A nonzero rate is a leading indicator that appears before users complain. - Monitor upstream response time percentiles. Rising backend latency is the most reliable predictor of future connection exhaustion.
- Set
worker_shutdown_timeoutso old workers do not linger indefinitely after reloads, hoarding slots and FDs on long-lived connections.
How Netdata helps
- Correlates nginx active connections, connection state breakdown, and the accepts-handled gap in one view so you can spot admission loss immediately.
- Tracks per-process file descriptor usage and warns when workers approach the
worker_rlimit_nofileceiling. - Surfaces upstream response time metrics alongside nginx connection metrics, making it obvious when a slow backend is the root cause.
- Monitors kernel-level
TcpExtListenOverflowsto catch silent drops that never appear in nginx logs. - Alerts on connection slot utilization ratio so you can act before the hard limit is reached.
Related guides
- nginx 413 Request Entity Too Large: client_max_body_size explained
- nginx 499 status code: why clients close connections before the response
- nginx 500 Internal Server Error: how to diagnose it
- nginx 502 Bad Gateway: causes and how to fix it
- nginx 503 Service Temporarily Unavailable: causes and fixes
- nginx 504 Gateway Time-out: causes and fixes
- NGINX access log performance: buffering, sampling, and the event loop
- NGINX active connections climbing: reading, writing, waiting explained
- nginx: bind() to 0.0.0.0:80 failed (98: Address already in use)
- NGINX backend cascade failure: when slow upstreams take down everything
- NGINX proxy cache hit rate is low: measuring and improving it
- nginx: configuration file test failed - finding the syntax error







