$ guides / nginx / nginx-too-many-open-files ▌

Operations Guides

nginx: too many open files - diagnosing file descriptor exhaustion

After a traffic spike, the error log shows accept4() failed (24: Too many open files), then goes silent. Existing connections still serve, but new ones cannot land.

File descriptor exhaustion is a hard failure. Once the limit is hit, nginx cannot accept new connections, open upstream sockets, or write to the error log. Default OS limits of 1024 are too low for production reverse proxies. Each proxied request consumes at least two FDs, and idle keepalive connections hold them indefinitely. The effective limit is the lower of worker_rlimit_nofile and the OS hard limit enforced by systemd or the container runtime.

What this means

Every active resource in nginx consumes one FD per worker: client sockets, upstream sockets, open log files, temporary files for large request or response bodies, entries held by open_file_cache, and internal event notifications. In reverse proxy mode, a single request ties up two sockets simultaneously, so FD demand is at least double the active connection count.

When a worker exhausts its allowance, accept() returns EMFILE. The kernel still completes TCP handshakes and queues connections in the listen backlog, but nginx cannot pull them into the event loop. If the backlog fills, the kernel silently drops new SYN packets. Existing connections continue to process, so the server looks partially healthy from the inside while appearing down to new clients. Because the error log file is also an FD, severe exhaustion can prevent nginx from recording further diagnostics.

flowchart TD
    A[accept4 failed 24] --> B{Check proc PID limits}
    B -->|Hard limit below config| C[Fix systemd or OS ulimit]
    B -->|Limit is high| D{Check FD consumers}
    D -->|Waiting high| E[Reduce keepalive timeout]
    D -->|Writing high| F[Check upstream latency]
    D -->|Files exceed sockets| G[Check temp and cache files]

Common causes

Cause	What it looks like	First thing to check
OS or systemd limit lower than `worker_rlimit_nofile`	`accept4() failed (24)` under moderate load; `/proc/<pid>/limits` shows a hard limit below the nginx config	`awk '/^Max open files/ {print $4}' /proc/<worker_pid>/limits`
`worker_rlimit_nofile` set below connection demand	Active connections plateau below `worker_connections` but FDs are maxed	`ls /proc/<worker_pid>/fd \| wc -l` against `worker_rlimit_nofile`
Keepalive hoarding idle connections	High `Waiting` count in `stub_status`; FD usage climbs while throughput is flat	`curl -s http://127.0.0.1/nginx_status` and compare `Waiting` to total active
Missing upstream keepalive causing churn	Many upstream sockets in `TIME_WAIT`; high `$upstream_connect_time`	`ss -tan state time-wait \| wc -l` and access log `$upstream_connect_time` values
Open file cache or temp files consuming headroom	Static file workloads with `open_file_cache`; FD count exceeds socket count	`ls /proc/<worker_pid>/fd` and count regular files versus sockets
FD leak in a module or configuration	FD count grows monotonically without matching connection growth	FD count sampled every minute from `/proc/<pid>/fd`

Quick checks

Run these read-only commands to confirm the failure and locate the bottleneck.

# Confirm FD exhaustion in the error log
grep 'accept4() failed (24' /var/log/nginx/error.log | tail -5

# FD count per worker
for pid in $(pgrep -P $(cat /var/run/nginx.pid)); do
  echo "Worker $pid: $(ls /proc/$pid/fd 2>/dev/null | wc -l) FDs"
done

# Effective soft and hard limits for a worker
prlimit -n -p $(pgrep -P $(cat /var/run/nginx.pid) | head -1)

# Hard limit from procfs
awk '/^Max open files/ {print "Hard limit:", $4}' \
  /proc/$(pgrep -P $(cat /var/run/nginx.pid) | head -1)/limits

# Connection state breakdown
curl -s http://127.0.0.1/nginx_status | \
  awk '/Reading/ {print "R:"$2, "W:"$4, "Wait:"$6}'

# Kernel listen queue depth and silent drops
ss -tlnp | awk 'NR>1 && /nginx/ {print $4, "Recv-Q:", $2, "Send-Q:", $3}'
nstat -a TcpExtListenOverflows 2>/dev/null | \
  awk '/ListenOverflows/ {print "Kernel drops:", $2}'

# Relevant configuration directives
nginx -T 2>/dev/null | grep -E 'worker_rlimit_nofile|worker_connections|keepalive|open_file_cache'

How to diagnose it

Confirm the error pattern. Look for accept4() failed (24: Too many open files) in the error log. If the log has gone silent under load, FD exhaustion may already be preventing new entries.
Quantify per-worker FD consumption. Count entries in /proc/<worker_pid>/fd for each worker.
Identify the effective hard limit. Read /proc/<worker_pid>/limits and compare it with worker_rlimit_nofile. The lower value wins.
Classify FD consumers. Inside /proc/<pid>/fd, sockets dominate for proxy workloads. A high count of regular files points to temp files, logs, or open_file_cache pressure.
Correlate with connection state. High Waiting means keepalive is hoarding FDs. High Writing with low throughput means slow upstreams are piling up active connections that each hold FDs.
Check for silent kernel drops. TcpExtListenOverflows increasing, or ss showing Recv-Q near Send-Q, means connections are dropping before nginx can accept them.
Verify the capacity math. For reverse proxy, worker_rlimit_nofile must cover at least two FDs per worker_connections plus log files, temp files, and cache headroom.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
File descriptor usage per worker	Direct measure of proximity to the hard limit	>75% of limit sustained
Active connections vs `worker_connections`	Shows slot pressure; each proxy request uses two slots	>80% of `worker_connections * worker_processes`
Dropped connections (`accepts - handled` gap)	Confirms admission loss before total saturation	Gap increasing for >60 seconds
Kernel listen overflows (`TcpExtListenOverflows`)	Reveals silent drops invisible to nginx	Any nonzero rate of increase
Connection state breakdown (Reading/Writing/Waiting)	Distinguishes keepalive bloat from slow upstream	Waiting >80% of active
Upstream connect time	Detects keepalive pool miss and TIME_WAIT churn	Reuse rate low, connect time nonzero
Error log rate and content	FD exhaustion eventually kills logging itself	Emergence of `accept4() failed (24)` or sudden silence

Fixes

Align the OS limit with nginx config

If systemd or the container runtime enforces a hard limit below worker_rlimit_nofile, the config value is ignored. If the master process was started under an OS hard limit below worker_rlimit_nofile, a reload cannot raise the workers above that inherited hard limit. Raise the OS or container limit, then restart nginx.

Raise worker_rlimit_nofile

Set worker_rlimit_nofile to at least double worker_connections, plus headroom for log files, temp files, and the open file cache. For a reverse proxy, each connection slot can require two FDs. Reload to apply. Tradeoff: FDs are cheap on modern systems, but the master process can only raise the limit up to the OS hard ceiling at worker spawn time.

Shed idle keepalive connections

If Waiting connections dominate active connections, reduce keepalive_timeout for client connections and verify keepalive pool sizing in upstream blocks. Reload to apply. Tradeoff: lower timeouts increase TCP and TLS handshake overhead for repeat clients, but they free FDs immediately.

Enable or tune upstream keepalive

If every proxied request opens a new upstream socket, configure keepalive inside the upstream block to reuse connections. This cuts upstream FD consumption from one per request to one per concurrent upstream peer. Tradeoff: consumes upstream server connection slots and memory.

Reduce open file cache or temp file pressure

Lower open_file_cache max= or reduce buffer sizes that spill to proxy_temp_path and client_body_temp_path. Tradeoff: slightly higher disk I/O for static files or large responses, but fewer simultaneous open file descriptors.

Emergency load shedding without restart

If you cannot restart, reload with a very low keepalive_timeout to force idle connections to close. If even a reload is too risky, block new traffic at the edge firewall or load balancer to reduce connection creation while preserving existing sessions. Tradeoff: impacts some clients but prevents total lockup.

Prevention

Set worker_rlimit_nofile to at least 2x worker_connections per worker, plus margin for logs and cache.
Verify the effective limit in /proc/<pid>/limits after every deployment, not just during config syntax checks.
Monitor FD utilization percentage per worker as a core saturation signal.
Keep upstream keepalive pools effective by logging $upstream_connect_time and targeting near-zero connect times.
Treat TcpExtListenOverflows as a first-class signal. It reveals exhaustion before nginx logs do.

How Netdata helps

Netdata charts per-process FD utilization for each worker against its limit without manual /proc scraping. It correlates nginx stub_status active connections with kernel TcpExtListenOverflows, and alerts on growing accepts - handled gaps and error log matches for accept4() failed (24). Connection state breakdowns distinguish keepalive bloat from upstream slowness.

The Netdata solution

Web server monitoring with Netdata

Netdata monitors NGINX with per-second request, connection, and latency metrics plus ML anomaly detection. Correlate connection and file-descriptor exhaustion, upstream cascade failures, buffer spill, and TLS CPU with the host signals behind them.

See web server monitoring → Start monitoring free

nginx: too many open files - diagnosing file descriptor exhaustion

nginx: too many open files - diagnosing file descriptor exhaustion

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Align the OS limit with nginx config

Raise worker_rlimit_nofile

Shed idle keepalive connections

Enable or tune upstream keepalive

Reduce open file cache or temp file pressure

Emergency load shedding without restart

Prevention

How Netdata helps

Related guides

Web server monitoring with Netdata