NGINX active connections climbing: reading, writing, waiting explained

When operators see Active connections climbing in stub_status, the first instinct is often to add capacity. That instinct is usually wrong. The stub_status module exposes exactly seven metrics, and the most useful of them is the breakdown of active connections into Reading, Writing, and Waiting. The absolute number of active connections is almost meaningless without the ratio between these three states. A server with 10,000 active connections where 9,000 are Waiting is healthy. A server with 500 active connections where 400 are Reading may be under a slowloris-style attack.

This article explains how nginx assigns connections to each state, what the ratios reveal about your traffic, and why high numbers in one bucket are healthy while high numbers in another are pathological. If you understand these three numbers, you can stop chasing phantom upstream latency and start diagnosing the real bottleneck.

What active connections actually measure

Active connections is the sum of all client sockets currently allocated by nginx worker processes. This includes connections that are mid-request, connections proxying to upstream, and idle keepalive connections waiting for the next HTTP request. It does not include half-open TCP connections sitting in the kernel backlog, nor does it count upstream-only sockets that have not yet been paired with a client request.

Each active connection consumes one slot against worker_connections (default 512 per worker) and one file descriptor. In reverse proxy mode, every request occupies at least two slots: one for the client-facing socket and one for the upstream socket. This means effective proxy capacity is at most half the configured limit, minus whatever idle keepalive sockets occupy on either side. Teams that set worker_connections to 1024 and expect 1,024 simultaneous proxied requests are off by a factor of two before accounting for keepalive.

HTTP/2 multiplexes many requests over a single TCP connection, so a lone active connection in Writing may actually carry dozens of concurrent streams. WebSocket connections count as a single active connection for their entire lifetime regardless of message rate. The active connection count is therefore a measure of socket occupancy, not request concurrency.

Because stub_status samples the state machine at the moment you query it, the numbers are point-in-time snapshots. A connection may transition from Reading to Writing to Waiting in a few milliseconds under normal load. Use sampling intervals of at least five to ten seconds and look at trends, not individual readings.

How the three states work

flowchart LR
  A[New connection] --> B[Reading]
  B -->|request complete| C[Processing]
  C -->|upstream wait or response send| D[Writing]
  D -->|keepalive enabled| E[Waiting]
  E -->|next request| B
  D -->|connection close| F[Close]
  E -->|keepalive timeout| F

Reading. The connection is in Reading while nginx consumes bytes from the client. This phase covers the request line, headers, and request body. In normal operation, a connection should transition through Reading in milliseconds. A connection that stays in Reading for seconds or minutes is either receiving a large upload or is stalled waiting for bytes from a slow client.

Writing. The connection enters Writing once nginx finishes reading the request and begins producing a response. This is the most misleading label in stub_status. Writing also includes connections where nginx has finished reading the request and is waiting for an upstream server to respond. During proxying, the connection remains in Writing while the worker buffers the upstream response or streams it to the client. A proxy deployment may spend most of its Writing time waiting for $upstream_header_time, not transmitting bytes to the client.

Waiting. These are idle keepalive connections. Mathematically, Waiting equals Active connections minus Reading minus Writing. They hold open file descriptors and a small amount of memory but consume almost no CPU. A high Waiting count is evidence that clients are efficiently reusing connections, which is the intended behavior of HTTP/1.1 keepalive and HTTP/2. Waiting only becomes a problem when it consumes so many slots that new connections cannot be accepted.

You can sample the breakdown manually from the status endpoint:

# Check current state breakdown
curl -s http://127.0.0.1/nginx_status | awk '/Reading/ {print "R:"$2, "W:"$4, "Wait:"$6}'

What the ratios tell you in production

Use ratios, not absolutes. A baseline of 10,000 active connections tells you nothing. A baseline where 80% are Waiting, 15% are Writing, and 5% are Reading describes a healthy keepalive-heavy workload. The table below maps patterns to their likely operational meaning.

PatternLikely meaningCorrelation to check
Waiting is 70-90% of ActiveHealthy keepalive reuseConnection slot utilization; if near limit, reduce keepalive_timeout instead of adding capacity
Writing is > 50% of Active, normal upstream latencySlow client downstream (bandwidth or ACK throttling)Gap between $request_time and $upstream_response_time
Writing is > 50% of Active, high upstream latencyBackend bottleneckUpstream response time P95 and upstream header time
Reading is > 20% of Active, low request rateSlowloris attack or stalled uploadsPer-IP connection concentration from ss or access logs
Reading spikes briefly during traffic burstLegitimate connection initiationRequest rate spike that transitions into Writing within seconds

When Writing dominates and upstream latency is elevated, the bottleneck is behind nginx. The worker is holding the connection open waiting for the upstream application. If upstream latency is normal but $request_time is much larger than $upstream_response_time, the client is slow to receive the response. The Writing state does not distinguish these two cases on its own; you must correlate with upstream timing.

When Reading dominates with flat or falling throughput, clients are not completing their requests. During a slowloris attack, connections stay in Reading indefinitely because the attacker sends partial headers slowly. Lower client_header_timeout and client_body_timeout from their default 60 seconds and use limit_conn to cap concurrent connections per source IP.

Common misinterpretations

High active connections means imminent overload. This is only true if the count approaches worker_connections × worker_processes. If Waiting dominates the ratio, the sockets are idle and efficient. Check connection slot utilization before ordering more capacity.

High Writing means nginx is slow. Writing measures time spent waiting for upstream or flushing to the client. Nginx itself is event-driven and non-blocking. If Writing is high, look at the backend or the client’s network, not the nginx host.

Reading high means heavy request traffic. Sustained high Reading with a flat request rate means bytes are trickling in slowly, not that the server is busy processing. Large legitimate uploads are an exception, but those correlate with specific endpoints and content lengths.

Waiting connections are a leak. They are the intended behavior of keepalive. They become a problem only when they crowd out capacity. If Waiting approaches your theoretical maximum, tune keepalive_timeout or keepalive_requests before raising worker_connections.

The default worker_connections is 1024. The actual default is 512. Many tutorials perpetuate the 1024 myth. Verify your configured ceiling with nginx -T | grep worker_connections rather than assuming.

Signals to watch in production

SignalWhy it mattersWarning sign
Reading / Active ratioReveals stalled request intake or slowlorisSustained > 20% without a large upload workload
Writing / Active ratioReveals downstream or upstream bottlenecksSustained > 50% with flat or falling request rate
Waiting / Active ratioMeasures keepalive efficiency70-90% is normal; approaching 100% may crowd capacity
Connection slot utilizationHard capacity ceilingActive / (worker_connections × worker_processes) > 80%
Request rateDistinguishes live traffic from stuck socketsLow RPS plus high Active means connections are blocked
Accepts vs. handled gapConfirmed admission lossGrowing gap means nginx is dropping new connections

How Netdata helps

  • Netdata exposes Reading, Writing, and Waiting as separate dimensions under the nginx collector, making ratio shifts visible in real time without parsing stub_status by hand.
  • Correlate the connection state breakdown with requests per second on the same dashboard. A divergence between active connections and throughput immediately reveals stuck sockets.
  • Alert on Reading dominance sustained for multiple minutes alongside low request throughput to catch slowloris patterns before connection slots exhaust.
  • Track connection slot utilization to distinguish healthy keepalive accumulation from genuine capacity pressure.
  • Cross-reference Writing spikes with upstream response time metrics to isolate backend slowdowns from client bandwidth limitations.