NGINX slowloris and slow-client attacks: detection and mitigation

You check stub_status and the Reading count is triple its normal value and not dropping. Requests per second has collapsed to near zero. Active connections are climbing toward worker_connections * worker_processes while worker CPU stays idle. This is not a backend slowdown. It is a slowloris or slow-client attack: connections open faster than they complete, and NGINX waits for data that arrives one byte at a time.

The default client_header_timeout and client_body_timeout of 60 seconds are too generous for most production traffic. client_body_timeout resets on every successive read, so a client sending one byte every 50 seconds can hold a slot indefinitely. Once enough slots are occupied, legitimate clients cannot connect. The symptom is connection exhaustion, but the root cause is behavioral: attackers abuse the wait state, not bandwidth.

What this means

NGINX allocates a connection structure and a worker event-loop slot for every accepted TCP connection. Normally, a connection moves quickly from Reading (receiving headers or body) to Writing (sending a response) or Waiting (keepalive idle). In a slowloris attack, the client sends headers or body so slowly that the connection stays in Reading for minutes. The worker does not stall because the event loop is non-blocking, but the slot is tied up. When all slots are consumed, the kernel backlog fills and overflows. New connections are then dropped before NGINX can log them.

flowchart TD
  A[Client opens connection] --> B{Sends data slowly?}
  B -->|Headers| C[NGINX stays in Reading]
  B -->|Body| C
  C --> D[Connection slot consumed]
  D --> E{All slots full?}
  E -->|No| C
  E -->|Yes| F[Kernel drops new connections]
  F --> G[Legitimate clients timeout]

Common causes

CauseWhat it looks likeFirst thing to check
Slowloris header attackReading count sustained high; few source IPs hold many connections each; request rate near zerostub_status Reading ratio and ss peer IP concentration
Slow HTTP POST body attackReading high during body transfer; throughput collapsed; body arrives byte-by-byteclient_body_timeout value and access log $request_time
Slow read attackWriting count sustained high; upstream response time normal; total request time inflated$request_time minus $upstream_response_time gap
Legitimate slow clients or large uploadsReading elevated from many diverse IPs; correlates with known upload endpointsURI patterns and geographic distribution of source IPs

Quick checks

# Check connection state breakdown from stub_status
curl -s http://127.0.0.1/nginx_status | awk '/Reading/ {print "R:"$2, "W:"$4, "Wait:"$6}'
# Identify source IPs with the most established connections
ss -tn state established '( dport = :80 or dport = :443 )' \
  | awk '{print $5}' \
  | sed 's/]:[0-9]*$/]/; s/:[0-9]*$//' \
  | sort | uniq -c | sort -rn | head
# Inspect current timeout directives (warning: nginx -T dumps full config, including secrets)
nginx -T 2>/dev/null | grep -E 'client_header_timeout|client_body_timeout|reset_timedout_connection'
# Calculate connection slot utilization
active=$(curl -s http://127.0.0.1/nginx_status | awk '/Active/ {print $3}')
workers=$(pgrep -c -P $(cat /var/run/nginx.pid))
wc=$(nginx -T 2>/dev/null | grep -m1 'worker_connections' | awk '{print $2}' | tr -d ';')
wc=${wc:-512}
max=$((workers * wc))
echo "Utilization: $(echo "scale=1; $active * 100 / $max" | bc)% ($active / $max)"
# Check if NGINX is already dropping connections
curl -s http://127.0.0.1/nginx_status | awk '/^[[:space:]]*[0-9]/ {print "gap=" $1-$2; exit}'
# Check error log for limit or resource exhaustion messages (adjust path if needed)
tail -500 /var/log/nginx/error.log | grep -E 'limiting|accept4\(\) failed|too many open files'

How to diagnose it

  1. Confirm Reading dominance. A brief spike in Reading is normal during traffic bursts. Sustained Reading above 20% of active connections without a corresponding Writing spike signals a slow-client attack.
  2. Check for throughput collapse. Sample the stub_status requests counter twice, one second apart. If active connections are high but completions per second have collapsed, slots are occupied by incomplete requests.
  3. Identify source IP concentration. Use ss to list peer addresses in ESTABLISHED state. If a handful of IPs hold 50 or more connections each while legitimate traffic typically shows 1-5 connections per IP, you have an attack.
  4. Verify timeout settings. Run nginx -T | grep client_header_timeout. Empty output means the 60-second default is in effect. That is too long for most internet-facing applications.
  5. Distinguish from backend slowness. High Writing and elevated $upstream_response_time point to an upstream bottleneck. In a slowloris attack, Writing is low and upstream time is irrelevant because the request never reaches the upstream phase.
  6. Check admission loss. An increasing accepts - handled gap or climbing TcpExtListenOverflows means capacity is exhausted and the kernel is silently dropping new connections.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Reading / active connections ratioDirect signature of slow-client attacks>20% sustained for more than 5 minutes
Active connections vs maximum capacityCliff-edge exhaustion indicator>80% of worker_connections * worker_processes
Requests per secondThroughput reality checkFalling while active connections rise
Accepts minus handled gapProof that admission control is failingIncreasing over successive samples
Connection slot utilizationOverall capacity headroom>75% sustained
Per-IP connection countDistinguishes attack concentration from organic growthSingle IP holding >50 connections

Fixes

Reduce client timeouts

Lower client_header_timeout and client_body_timeout from the 60-second default to values that match your traffic. For typical HTTP APIs, 10-15 seconds is enough. For file uploads, use a location-specific override. Remember that client_body_timeout resets on each successive read. A client sending one byte every 9 seconds will never time out with a 10-second threshold. If you must support slow uploads, pair aggressive timeouts with strict per-IP connection limits instead of relying on timeout generosity alone. Test changes in staging first; a reload applies the new timeout to new connections only.

Enable reset_timedout_connection

Set reset_timedout_connection on;. By default, NGINX closes timed-out connections with a TCP FIN. If the client is unresponsive, the socket can linger in FIN_WAIT1, consuming file descriptors and memory. reset_timedout_connection forces a TCP RST, reclaiming resources immediately. The tradeoff is an abrupt client termination, which is acceptable during an attack.

Enforce per-IP connection limits

Add a shared memory zone and a limit_conn rule:

limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn addr 50;

Any IP exceeding 50 concurrent connections receives a 503 response. The tradeoff is that NATed users behind a corporate or mobile gateway share an IP. If your user base is heavily NATed, a low limit will block legitimate users. Start with a value above your normal per-IP peak and tune downward. limit_conn_zone returns an error immediately without eviction if the zone fills, so size the zone for at least 2x your expected peak unique IP count.

Block at the firewall

If ss identifies specific attacking IPs, block them at the host firewall or upstream edge. This is faster than allowing the traffic to reach NGINX. For immediate host-level relief, use iptables -A INPUT -s <ip> -j DROP or equivalent. Distribute the block to your edge firewall if the attack volume threatens NIC saturation before it reaches the host TCP stack. The tradeoff is that distributed attacks rotate IPs, so firewall rules are temporary relief, not a structural fix.

Increase worker_connections (emergency only)

Raising worker_connections and reloading provides immediate headroom, but only buys time. It does not fix the attack. Every proxied request consumes two connection slots (client-facing plus upstream), so effective proxy capacity is half the configured value.

Prevention

  • Set client_header_timeout and client_body_timeout to values that match your traffic profile. Do not leave the 60-second defaults on internet-facing servers.
  • Configure limit_conn_zone and limit_conn before an attack occurs. Monitor zone allocation errors (could not allocate node in the error log) to ensure the zone is large enough.
  • Enable reset_timedout_connection on to prevent FIN_WAIT1 accumulation.
  • Monitor the Reading/active ratio proactively. A slow rise is easier to catch than a capacity cliff.
  • Size worker_connections for peak traffic plus attack headroom, and ensure worker_rlimit_nofile is at least double that to accommodate upstream connections, log files, and temp files.

How Netdata helps

Netdata collects stub_status Reading, Writing, and Waiting states every second. A Reading spike is visible as it happens. Active connections and slot-utilization metrics include pre-built thresholds for danger zones above 80%. Netdata tracks the accepts-handled gap and alerts when NGINX starts dropping connections. It correlates NGINX connection states with kernel TCP metrics like listen queue overflows, so you can confirm whether the kernel is already dropping connections.