$ guides / nginx / how-nginx-works-in-production ▌

Operations Guides

How NGINX actually works in production: a mental model for operators

NGINX is not a multi-threaded server that spawns a thread per connection. It is an event-driven, non-blocking, single-threaded-per-worker process architecture. If you are debugging a production incident where connections are timing out, memory is climbing, or CPU is pinned, this architecture is the lens through which every symptom must be interpreted.

Most production issues involving NGINX are not NGINX bugs. They are resource accounting problems: file descriptors, connection slots, buffer boundaries, or event loop latency. An operator who understands the internal mechanics can read the signals correctly instead of chasing phantom upstream problems or adding hardware that hits the same limit.

After reading this article, you will understand how the master process, worker event loops, connection state machine, upstream keepalive pools, and the proxy connection multiplier interact. You will know why capacity in NGINX is a set of hard cliffs rather than graceful curves, and which signals predict each cliff.

What it is and why it matters

NGINX uses a master/worker process model. The master process reads configuration, binds to listening ports, and spawns worker processes. It handles signal-based lifecycle events such as reload, upgrade, and shutdown. The master never processes client traffic directly. It holds the listening sockets open and passes them to workers via fork inheritance.

Worker processes handle all traffic. Each worker runs a single-threaded event loop. On Linux this loop uses epoll; on BSD it uses kqueue. The worker registers interest in socket events with the kernel, then iterates through ready events without blocking. This design allows one worker to manage tens of thousands of concurrent connections, but it also means that any blocking operation stalls every connection on that worker.

The mental model matters because capacity limits in NGINX are hard cliffs. When worker_connections is exhausted, new connections drop immediately. When file descriptors are exhausted, accept() fails with EMFILE. When the event loop stalls because of synchronous disk I/O or a slow system call, every connection on that worker slows down. Understanding these boundaries prevents misdiagnosing a connection limit as an upstream outage.

This architecture also means that monitoring must happen at multiple levels. The worker sees only what the kernel delivers through the event loop. Kernel-level drops, such as listen queue overflows, are invisible to NGINX logs entirely. A complete mental model requires looking at both the process internals and the surrounding OS boundary.

How it works

The architecture has four interacting layers: the master process, the per-worker event loop, the connection state machine, and upstream connection pooling.

flowchart TD
    A[Master Process] -->|fork + inherit sockets| B[Worker 1]
    A -->|fork + inherit sockets| C[Worker 2]
    A -->|fork + inherit sockets| D[Worker N]
    B -->|epoll / kqueue| E[Event Loop]
    C -->|epoll / kqueue| F[Event Loop]
    D -->|epoll / kqueue| G[Event Loop]
    E --> H[Connection States
Reading / Writing / Waiting]
    F --> H
    G --> H
    H --> I[Upstream Keepalive Pool
per worker]
    H --> J[Client Connection]
    I --> K[Upstream Server]

Master process. At runtime the master does almost nothing except manage worker lifecycle. It binds to ports before spawning workers so that workers inherit the listening file descriptors. On reload, triggered by HUP signal or nginx -s reload, the master spawns new workers with the updated configuration while old workers continue serving existing connections until they drain. A failed reload does not stop NGINX; the previous configuration stays active, which means configuration drift can go undetected if you do not verify that new workers spawned.

Worker event loop. Each worker competes to accept new connections from the shared listening sockets. The accept_mutex governs which worker picks up the next connection. With reuseport, available since NGINX 1.9.1, the kernel distributes connections across per-worker sockets instead. Inside the loop, the worker never blocks. It registers interest in the next I/O event and moves to the next ready connection.

Because the worker is single-threaded, a long TLS handshake, a gzip compression of a large response, or a disk write for a temporary file occupies the worker until completion. There are no background threads to absorb these tasks. This is why CPU profiling and disk I/O latency on the NGINX node matter as much as upstream health.

Connection state machine. Every accepted connection moves through discrete states: reading request headers, reading request body, processing (proxying, serving static files, executing subrequests), writing response headers, writing response body, and keepalive idle. Each state transition is an event. A connection in Reading is receiving data from the client. A connection in Writing is sending a response or waiting for an upstream response. A connection in Waiting is idle on keepalive between requests.

The stub_status module exposes these states as Reading, Writing, and Waiting counts. High Writing with elevated upstream response time means the upstream is slow. High Reading with low throughput means clients are sending slowly, or a slowloris-style attack is consuming slots. High Waiting is usually healthy connection reuse, but it still consumes a connection slot and a file descriptor.

Buffer management. NGINX uses fixed-size per-connection buffers. Request headers are read into client_header_buffer_size, with large_client_header_buffers for overflow. Request bodies use client_body_buffer_size. Proxy responses use proxy_buffer_size for headers and proxy_buffers for the body. When these buffers overflow, NGINX spills to temporary files on disk. This is silent in the error log and produces latency spikes that do not correlate with upstream response time. The gap between request_time and upstream_response_time in the access log often reveals this disk I/O overhead.

Upstream keepalive pools. The keepalive directive in an upstream block maintains persistent connections to backends. The pool is per-worker, not global across all workers. Without keepalive, every proxied request opens a new TCP connection and, if the upstream uses HTTPS, a new TLS handshake. With keepalive, the worker reuses idle connections from its own pool. Total idle capacity equals keepalive size multiplied by worker_processes. Because the pool is not shared, a worker under heavy load cannot borrow an idle connection from a peer worker.

The 2x proxy connection multiplier. When NGINX acts as a reverse proxy, each proxied request consumes at least two connections: one from the client to NGINX and one from NGINX to the upstream. Both connections occupy a slot in worker_connections and both consume a file descriptor. Effective proxy capacity is at most half the configured worker_connections value, minus keepalive overhead on both sides. The default worker_connections of 512 means a maximum of roughly 256 simultaneous proxied requests per worker before the cliff.

Resource accounting. NGINX competes for several resources that each have hard limits:

File descriptors: one per client connection, one per upstream connection, one per open log file, one per temp file. The ceiling is the lower of worker_rlimit_nofile and the OS ulimit. Default system limits are often dangerously low.
Connection slots: pre-allocated per worker via worker_connections. The default is 512.
Memory: per-connection buffers, SSL buffers (ssl_buffer_size defaults to 16 KB per connection), and shared memory zones for rate limiting, caching, and session state.
CPU: TLS handshakes dominate at high connection rates. Gzip compression, regex evaluation in location blocks, and module execution also consume cycles.
Disk I/O: access logging is synchronous by default. Temp file spooling for large request or response bodies adds I/O latency.
Network: ephemeral port exhaustion can occur for upstream connections when keepalive is not configured or is ineffective.

Where it shows up in production

The architecture behaves differently depending on the deployment pattern.

Standalone HTTP server. Fewer failure modes. Focus on connection handling, static file serving, and file descriptor limits. Standalone static file serving is the simplest case. The primary risks are file descriptor exhaustion when serving many small files, and disk I/O latency if the storage subsystem stalls. The connection multiplier does not apply, so worker_connections maps more directly to client capacity.

Reverse proxy / load balancer. Upstream health is the dominant concern. Proxy buffers, timeouts, keepalive pools, and the connection multiplier are critical. The most common failure pattern is backend cascade failure: upstreams slow down, workers hold connections waiting, connection slots fill, and the remaining healthy backends become overloaded.

SSL termination endpoint. CPU is bound by TLS handshakes. Session cache hit rate and TLS version distribution become key metrics. A TLS handshake storm can pin workers at 100% CPU while request throughput collapses.

Caching proxy. Disk I/O and cache hit rate become primary concerns. Cache zone metadata size matters. When the cache is cold after restart, all traffic hits upstream until the in-memory index rebuilds.

Kubernetes ingress controller. Frequent configuration reloads, dynamic upstream endpoints, old worker accumulation, and health check amplification change the failure modes. Each reload spawns new workers and drains old ones; without worker_shutdown_timeout, old workers with long-lived connections can linger indefinitely.

Containerized deployments. File descriptor limits are often constrained by default. Log access patterns differ, and worker_processes auto may detect host CPU count instead of container quota on older kernels.

Tradeoffs and when this matters

worker_connections versus memory. Each connection slot allocates memory for buffers and connection state. Raising worker_connections increases concurrency but also raises per-worker memory footprint. The default of 512 is often too low for production proxy workloads, but raising it without raising worker_rlimit_nofile and the OS file descriptor limit simply shifts the bottleneck to EMFILE errors.

Per-worker keepalive pools. Because keepalive pools are not shared across workers, load imbalance between workers can leave some pools exhausted while others hold idle connections. With reuseport, the kernel pins clients to workers by source hash, which can worsen this imbalance when traffic originates from a small number of source IPs such as a Layer 4 load balancer.

Buffer sizing. Large proxy_buffers reduce disk I/O but increase memory per connection. Small buffers cause temp file spooling, which silently degrades latency. The gap between request_time and upstream_response_time in the access log reveals when buffering is the bottleneck.

Shared memory zones. Zones used for rate limiting, connection limiting, and SSL session caching have fixed sizes. When a rate limiting zone fills, NGINX stops enforcing limits for new keys. This is a silent security failure. Zone sizes cannot be changed via reload; they require a full restart.

Signals to watch in production

Signal	Why it matters	Warning sign
Active connections / (worker_connections * worker_processes)	Connection slot utilization approaching the hard cliff	Sustained ratio above 0.75; above 0.9 is critical
Accepts minus handled (stub_status)	Dropped connections due to slot or FD exhaustion	Gap growing at any sustained rate
Reading / Writing / Waiting breakdown	Reveals whether load is slow clients, slow upstreams, or keepalive reuse	Reading above 30% of active sustained; Writing dominant with low throughput
File descriptors per worker vs limit	FD exhaustion blocks accepts, upstream connects, and log opens	Usage above 75% of limit
Upstream response time vs request time	Isolates backend latency from client send time and buffer spill latency	upstream_response_time is small but request_time is large
Upstream connect time	Connection reuse efficiency and network health to backends	Nonzero values increasing when keepalive is configured
Worker CPU per process	Event loop saturation from TLS, compression, or regex	Sustained above 80% of one core
Error log rate and severity	Leading indicator of resource exhaustion and upstream failures	Any emerg or alert; sustained error rate above baseline
TcpExtListenOverflows	Kernel dropping connections before NGINX sees them	Counter increasing

How Netdata helps

Correlates NGINX stub_status metrics (active connections, accepts, handled, requests, reading, writing, waiting) with per-worker CPU and memory to distinguish event loop saturation from upstream slowness.
Tracks file descriptor usage per process against configured limits, surfacing EMFILE risk before connections drop.
Monitors kernel-level TcpExtListenOverflows and socket backlog depth, exposing silent kernel drops invisible to NGINX logs.
Plots upstream response time and connect time from access log parsing alongside request time, making the proxy buffer overhead and backend latency components visible.
Alerts on worker process count deviations and reload events, catching old worker accumulation and failed reloads that leave stale configuration active.

The Netdata solution

Web server monitoring with Netdata

Netdata monitors NGINX with per-second request, connection, and latency metrics plus ML anomaly detection. Correlate connection and file-descriptor exhaustion, upstream cascade failures, buffer spill, and TLS CPU with the host signals behind them.

See web server monitoring → Start monitoring free

How NGINX actually works in production: a mental model for operators

How NGINX actually works in production: a mental model for operators

What it is and why it matters

How it works

Where it shows up in production

Tradeoffs and when this matters

Signals to watch in production

How Netdata helps

Related guides

Web server monitoring with Netdata