nginx: worker_connections are not enough — causes and fixes

Your error log shows worker_connections are not enough while connecting to upstream. New clients time out while existing connections may still work. This is a hard capacity cliff: once a worker exhausts its connection slots, it cannot accept new connections until a slot frees. The default limit is 512 per worker, not 1024, and in reverse-proxy mode each request consumes at least two slots. Raising the number is often the first reaction, but if a slow backend is holding connections open, the slots will fill again no matter how high you set the limit.

What this means

worker_connections is a per-worker hard ceiling inside the events block. Every client connection, upstream connection, and idle keepalive connection counts as one slot. When a worker hits the limit, the kernel may still complete TCP handshakes and queue them in the listen backlog, but nginx cannot accept them. The stub_status counters reveal this through a growing gap between accepts and handled.

For reverse-proxy workloads, effective capacity is at most half of worker_connections * worker_processes because each proxied request ties up one slot on the client side and one on the upstream side. File descriptors impose a second, independent ceiling. Each connection needs an FD, so worker_rlimit_nofile must be at least worker_connections * 2 plus headroom for logs and temp files. nginx validates this relationship at startup and emits a warning if the FD limit is lower, but it will still start.

flowchart TD
    A[Client request arrives] --> B{Worker has free slot?}
    B -->|Yes| C[Accept client connection]
    C --> D[Open upstream connection]
    D --> E[Process response]
    B -->|No| F[Connection dropped]
    F --> G[accepts > handled gap grows]
    E --> H{Backend slow?}
    H -->|Yes| I[Slot held open]
    I --> B
    H -->|No| J[Close or keepalive]
    J --> B

Common causes

CauseWhat it looks likeFirst thing to check
Slow backend holding connectionsWriting state dominates; $upstream_response_time climbs; error log shows upstream timed outstub_status state breakdown and access log upstream latency
worker_connections too low for proxy loadActive connections flat near worker_connections * worker_processes; accepts exceeds handledstub_status and nginx -T for the configured limit
File descriptor limit below connection limitaccept4() failed (24: Too many open files) in error log; FD count near limit per worker/proc/<worker_pid>/limits or prlimit
Excessive keepalive idle connectionsWaiting state dominates; low request rate but high active connectionskeepalive_timeout and keepalive_requests values
Traffic spike or connection floodSudden spike in active connections; Reading state may riseRequest rate from access log and ss SYN queue state

Quick checks

# Confirm the error and frequency
grep -c "worker_connections are not enough" /var/log/nginx/error.log

# Check connection counts and accepts vs handled gap
curl -s http://127.0.0.1/nginx_status

# Inspect configured limits
nginx -T 2>/dev/null | grep -E 'worker_connections|worker_rlimit_nofile|worker_processes'

# File descriptor usage per worker against its limit
for pid in $(pgrep -P $(cat /var/run/nginx.pid)); do
  used=$(ls /proc/$pid/fd 2>/dev/null | wc -l)
  max=$(awk '/^Max open files/ {print $4}' /proc/$pid/limits)
  echo "Worker $pid: $used / $max FDs"
done

# Kernel-level listen drops (silent to nginx)
nstat -az TcpExtListenOverflows 2>/dev/null | awk '/ListenOverflows/ {print $2}'

# Upstream latency in recent requests (requires $upstream_response_time in log_format)
tail -n 5000 /var/log/nginx/access.log | awk '{print $NF}' | sort -n | \
  awk 'BEGIN{c=0} {a[c++]=$1} END{print "p95:", a[int(c*0.95)], "p99:", a[int(c*0.99)]}'

How to diagnose it

  1. Verify the symptom. Search the error log for worker_connections are not enough. Note whether it correlates with traffic spikes or appears continuously under normal load.
  2. Calculate slot utilization. Pull Active connections from stub_status. Divide by worker_connections * worker_processes. If this is above 80%, you are in the danger zone. For reverse proxy, divide by an additional factor of two.
  3. Check for admission loss. Compare the accepts and handled counters in stub_status. If accepts exceeds handled and the gap is growing, nginx is actively dropping connections.
  4. Check the FD ceiling. Inspect /proc/<worker_pid>/limits for Max open files. If FD usage is within 20% of the limit, the real bottleneck is worker_rlimit_nofile or the OS ulimit, not worker_connections.
  5. Identify the connection state mix. High Writing with elevated $upstream_response_time points to slow backends. High Waiting with flat traffic points to keepalive hoarding. High Reading without corresponding throughput points to slow clients or a flood of new connections.
  6. Look for kernel drops. Check TcpExtListenOverflows. If it is increasing, the kernel is dropping SYNs or completed connections before nginx can accept them. This produces client-side timeouts with no entry in nginx logs.
  7. Correlate with backend health. If upstream latency rose before the error appeared, the backend is the root cause. Fixing the limit without fixing the backend will only postpone the next outage.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Connection slot utilizationReveals how close you are to the hard per-worker cap>80% of worker_connections * worker_processes sustained
Dropped connections (accepts - handled gap)The earliest indicator that nginx is refusing workGap increasing over a 60-second window
Active connection state breakdownDistinguishes backend slowness from keepalive bloatWriting >60% of active with low throughput
File descriptor usage per workerFD exhaustion mimics connection exhaustion>80% of worker_rlimit_nofile or OS limit
Upstream response time (P95)Slow backends are the most common hidden causeP95 trending up or >80% of proxy_read_timeout
Kernel listen overflowsConnections dropped before nginx sees themTcpExtListenOverflows counter increasing

Fixes

Raise the limit correctly

If utilization is genuinely too low for your traffic, increase worker_connections. Remember the default is 512, not 1024. In nginx.conf, inside the events block:

worker_connections 4096;
worker_rlimit_nofile 8192;

For a reverse proxy, worker_rlimit_nofile should be at least double worker_connections to cover client and upstream sockets plus logs and temp files. Reload with nginx -s reload; a restart is not required. If nginx runs under systemd, verify that LimitNOFILE in the service unit is not overriding your config.

Fix the backend first

If Writing connections are high and upstream latency is elevated, raising worker_connections will only delay the inevitable. The correct fix is to restore backend performance. As an emergency mitigation, you can reduce proxy_read_timeout so nginx gives up on slow upstreams faster. This trades 504 Gateway Time-outs for freed connection slots, which is usually preferable to total connection exhaustion.

Reclaim keepalive capacity

If Waiting connections dominate, lower keepalive_timeout or keepalive_requests to recycle idle slots faster. In upstream blocks, ensure the keepalive pool size matches your concurrency; an oversized pool wastes slots, while an undersized pool causes excessive upstream connection churn.

Address file descriptor limits

When the error log shows accept4() failed (24: Too many open files), the FD limit is the real ceiling, not worker_connections. Raise worker_rlimit_nofile and verify the OS soft limit with prlimit -n -p <worker_pid>. If systemd manages the process, create a drop-in override for LimitNOFILE, run systemctl daemon-reload, and restart nginx.

Prevention

  • Size worker_connections for at least 2x your peak proxied request concurrency, then add headroom for keepalive idle connections and WebSockets.
  • Set worker_rlimit_nofile to at least twice worker_connections, and verify it is not overridden by the init system or container runtime.
  • Monitor the accepts - handled gap from stub_status. A nonzero rate is a leading indicator that appears before users complain.
  • Monitor upstream response time percentiles. Rising backend latency is the most reliable predictor of future connection exhaustion.
  • Set worker_shutdown_timeout so old workers do not linger indefinitely after reloads, hoarding slots and FDs on long-lived connections.

How Netdata helps

  • Correlates nginx active connections, connection state breakdown, and the accepts-handled gap in one view so you can spot admission loss immediately.
  • Tracks per-process file descriptor usage and warns when workers approach the worker_rlimit_nofile ceiling.
  • Surfaces upstream response time metrics alongside nginx connection metrics, making it obvious when a slow backend is the root cause.
  • Monitors kernel-level TcpExtListenOverflows to catch silent drops that never appear in nginx logs.
  • Alerts on connection slot utilization ratio so you can act before the hard limit is reached.