nginx: too many open files - diagnosing file descriptor exhaustion
After a traffic spike, the error log shows accept4() failed (24: Too many open files), then goes silent. Existing connections still serve, but new ones cannot land.
File descriptor exhaustion is a hard failure. Once the limit is hit, nginx cannot accept new connections, open upstream sockets, or write to the error log. Default OS limits of 1024 are too low for production reverse proxies. Each proxied request consumes at least two FDs, and idle keepalive connections hold them indefinitely. The effective limit is the lower of worker_rlimit_nofile and the OS hard limit enforced by systemd or the container runtime.
What this means
Every active resource in nginx consumes one FD per worker: client sockets, upstream sockets, open log files, temporary files for large request or response bodies, entries held by open_file_cache, and internal event notifications. In reverse proxy mode, a single request ties up two sockets simultaneously, so FD demand is at least double the active connection count.
When a worker exhausts its allowance, accept() returns EMFILE. The kernel still completes TCP handshakes and queues connections in the listen backlog, but nginx cannot pull them into the event loop. If the backlog fills, the kernel silently drops new SYN packets. Existing connections continue to process, so the server looks partially healthy from the inside while appearing down to new clients. Because the error log file is also an FD, severe exhaustion can prevent nginx from recording further diagnostics.
flowchart TD
A[accept4 failed 24] --> B{Check proc PID limits}
B -->|Hard limit below config| C[Fix systemd or OS ulimit]
B -->|Limit is high| D{Check FD consumers}
D -->|Waiting high| E[Reduce keepalive timeout]
D -->|Writing high| F[Check upstream latency]
D -->|Files exceed sockets| G[Check temp and cache files]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
OS or systemd limit lower than worker_rlimit_nofile | accept4() failed (24) under moderate load; /proc/<pid>/limits shows a hard limit below the nginx config | awk '/^Max open files/ {print $4}' /proc/<worker_pid>/limits |
worker_rlimit_nofile set below connection demand | Active connections plateau below worker_connections but FDs are maxed | ls /proc/<worker_pid>/fd | wc -l against worker_rlimit_nofile |
| Keepalive hoarding idle connections | High Waiting count in stub_status; FD usage climbs while throughput is flat | curl -s http://127.0.0.1/nginx_status and compare Waiting to total active |
| Missing upstream keepalive causing churn | Many upstream sockets in TIME_WAIT; high $upstream_connect_time | ss -tan state time-wait | wc -l and access log $upstream_connect_time values |
| Open file cache or temp files consuming headroom | Static file workloads with open_file_cache; FD count exceeds socket count | ls /proc/<worker_pid>/fd and count regular files versus sockets |
| FD leak in a module or configuration | FD count grows monotonically without matching connection growth | FD count sampled every minute from /proc/<pid>/fd |
Quick checks
Run these read-only commands to confirm the failure and locate the bottleneck.
# Confirm FD exhaustion in the error log
grep 'accept4() failed (24' /var/log/nginx/error.log | tail -5
# FD count per worker
for pid in $(pgrep -P $(cat /var/run/nginx.pid)); do
echo "Worker $pid: $(ls /proc/$pid/fd 2>/dev/null | wc -l) FDs"
done
# Effective soft and hard limits for a worker
prlimit -n -p $(pgrep -P $(cat /var/run/nginx.pid) | head -1)
# Hard limit from procfs
awk '/^Max open files/ {print "Hard limit:", $4}' \
/proc/$(pgrep -P $(cat /var/run/nginx.pid) | head -1)/limits
# Connection state breakdown
curl -s http://127.0.0.1/nginx_status | \
awk '/Reading/ {print "R:"$2, "W:"$4, "Wait:"$6}'
# Kernel listen queue depth and silent drops
ss -tlnp | awk 'NR>1 && /nginx/ {print $4, "Recv-Q:", $2, "Send-Q:", $3}'
nstat -a TcpExtListenOverflows 2>/dev/null | \
awk '/ListenOverflows/ {print "Kernel drops:", $2}'
# Relevant configuration directives
nginx -T 2>/dev/null | grep -E 'worker_rlimit_nofile|worker_connections|keepalive|open_file_cache'
How to diagnose it
- Confirm the error pattern. Look for
accept4() failed (24: Too many open files)in the error log. If the log has gone silent under load, FD exhaustion may already be preventing new entries. - Quantify per-worker FD consumption. Count entries in
/proc/<worker_pid>/fdfor each worker. - Identify the effective hard limit. Read
/proc/<worker_pid>/limitsand compare it withworker_rlimit_nofile. The lower value wins. - Classify FD consumers. Inside
/proc/<pid>/fd, sockets dominate for proxy workloads. A high count of regular files points to temp files, logs, oropen_file_cachepressure. - Correlate with connection state. High
Waitingmeans keepalive is hoarding FDs. HighWritingwith low throughput means slow upstreams are piling up active connections that each hold FDs. - Check for silent kernel drops.
TcpExtListenOverflowsincreasing, orssshowingRecv-QnearSend-Q, means connections are dropping before nginx can accept them. - Verify the capacity math. For reverse proxy,
worker_rlimit_nofilemust cover at least two FDs perworker_connectionsplus log files, temp files, and cache headroom.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| File descriptor usage per worker | Direct measure of proximity to the hard limit | >75% of limit sustained |
Active connections vs worker_connections | Shows slot pressure; each proxy request uses two slots | >80% of worker_connections * worker_processes |
Dropped connections (accepts - handled gap) | Confirms admission loss before total saturation | Gap increasing for >60 seconds |
Kernel listen overflows (TcpExtListenOverflows) | Reveals silent drops invisible to nginx | Any nonzero rate of increase |
| Connection state breakdown (Reading/Writing/Waiting) | Distinguishes keepalive bloat from slow upstream | Waiting >80% of active |
| Upstream connect time | Detects keepalive pool miss and TIME_WAIT churn | Reuse rate low, connect time nonzero |
| Error log rate and content | FD exhaustion eventually kills logging itself | Emergence of accept4() failed (24) or sudden silence |
Fixes
Align the OS limit with nginx config
If systemd or the container runtime enforces a hard limit below worker_rlimit_nofile, the config value is ignored. If the master process was started under an OS hard limit below worker_rlimit_nofile, a reload cannot raise the workers above that inherited hard limit. Raise the OS or container limit, then restart nginx.
Raise worker_rlimit_nofile
Set worker_rlimit_nofile to at least double worker_connections, plus headroom for log files, temp files, and the open file cache. For a reverse proxy, each connection slot can require two FDs. Reload to apply. Tradeoff: FDs are cheap on modern systems, but the master process can only raise the limit up to the OS hard ceiling at worker spawn time.
Shed idle keepalive connections
If Waiting connections dominate active connections, reduce keepalive_timeout for client connections and verify keepalive pool sizing in upstream blocks. Reload to apply. Tradeoff: lower timeouts increase TCP and TLS handshake overhead for repeat clients, but they free FDs immediately.
Enable or tune upstream keepalive
If every proxied request opens a new upstream socket, configure keepalive inside the upstream block to reuse connections. This cuts upstream FD consumption from one per request to one per concurrent upstream peer. Tradeoff: consumes upstream server connection slots and memory.
Reduce open file cache or temp file pressure
Lower open_file_cache max= or reduce buffer sizes that spill to proxy_temp_path and client_body_temp_path. Tradeoff: slightly higher disk I/O for static files or large responses, but fewer simultaneous open file descriptors.
Emergency load shedding without restart
If you cannot restart, reload with a very low keepalive_timeout to force idle connections to close. If even a reload is too risky, block new traffic at the edge firewall or load balancer to reduce connection creation while preserving existing sessions. Tradeoff: impacts some clients but prevents total lockup.
Prevention
- Set
worker_rlimit_nofileto at least 2xworker_connectionsper worker, plus margin for logs and cache. - Verify the effective limit in
/proc/<pid>/limitsafter every deployment, not just during config syntax checks. - Monitor FD utilization percentage per worker as a core saturation signal.
- Keep upstream keepalive pools effective by logging
$upstream_connect_timeand targeting near-zero connect times. - Treat
TcpExtListenOverflowsas a first-class signal. It reveals exhaustion before nginx logs do.
How Netdata helps
Netdata charts per-process FD utilization for each worker against its limit without manual /proc scraping. It correlates nginx stub_status active connections with kernel TcpExtListenOverflows, and alerts on growing accepts - handled gaps and error log matches for accept4() failed (24). Connection state breakdowns distinguish keepalive bloat from upstream slowness.
Related guides
- How NGINX actually works in production: a mental model for operators
- NGINX active connections climbing: reading, writing, waiting explained
- NGINX connection exhaustion: detection, diagnosis, and prevention
- NGINX dropped connections: the accepts vs handled gap
- NGINX monitoring checklist: the signals every production server needs
- NGINX monitoring maturity model: from survival to expert
- nginx: worker_connections are not enough - causes and fixes
- NGINX worker_connections and worker_processes: sizing for real traffic







