NGINX limit_req burst and nodelay tuning: rate limiting without blocking real users

Most production nginx rate limiting configs fall into two camps: no burst at all, which rejects legitimate traffic during harmless spikes, or burst without nodelay, which queues real users into artificial delays that mimic upstream slowness. Neither is what you want. The limit_req module implements a leaky bucket at millisecond granularity, and the interaction between burst and nodelay determines whether a request is delayed, rejected, or forwarded immediately. Understanding that interaction, and sizing the shared memory zone to match your traffic profile, is the difference between rate limiting that protects upstreams and rate limiting that creates incidents during normal user behavior.

What it is and why it matters

limit_req_zone defines a shared memory zone and a sustained request rate. A matching limit_req directive inside a location, server, or http block enforces it. Without both pieces, nothing happens. The zone tracks state per key, typically $binary_remote_addr, and uses a leaky bucket to decide the fate of each request.

The default configuration, limit_req zone=one with no burst argument, rejects any request that arrives before the bucket has leaked enough capacity. At a rate of 10r/s, the bucket leaks one slot every 100 milliseconds. A second request arriving 10 ms after the first is rejected with a 503. This is correct for brute-force protection but catastrophic for legitimate browser bursts, API batching, or password managers that submit multiple forms rapidly.

Adding burst changes the behavior from immediate rejection to queuing, but without nodelay that queue translates into artificial latency. The Nth excess request in a burst=20 queue at 10r/s waits up to two seconds before nginx forwards it to the upstream. If your upstream response time is 50 ms but nginx delays the request by two seconds, users experience a timeout. Clients that time out generate 499s; retries amplify load.

The goal is absorbing short legitimate bursts without delaying them, while still enforcing a hard ceiling against sustained abuse. That is what nodelay does, and why the zone size matters more than most operators assume.

How it works

The leaky bucket operates at millisecond granularity. A rate of 10r/s means one request every 100 ms, not an average of ten requests over a one-second window. When a request arrives, nginx checks whether the bucket has accumulated at least one slot since the last request. If yes, the request passes immediately. If not, nginx looks at the burst parameter.

Without burst, the request is rejected immediately. The default rejection status is 503, unless you override it with limit_req_status 429.

With burst=N alone, excess requests enter a FIFO queue. They are released to the upstream at the configured rate, one by one. The queue length is N. The Nth excess request in a burst queue waits up to N multiplied by (1/rate) before reaching the upstream. This queuing happens inside nginx; the upstream sees none of the delay until the request is finally forwarded. During that wait, the client connection is held open. If the client timeout is shorter than the queue delay, the connection closes and nginx logs a 499.

With burst=N nodelay, nginx allocates a pool of N burst slots per key. When a request arrives above the base rate, nginx immediately forwards it and marks one slot as consumed. No queuing delay is introduced. Slots are freed back into the pool at the configured rate. As long as a free slot exists, the request passes through without delay. When all N slots are occupied, subsequent requests are rejected immediately. This preserves the hard rate cap while eliminating artificial latency for bursty traffic.

The delay parameter offers a middle ground. A configuration like limit_req zone=one burst=12 delay=8 forwards the first eight burst requests without delay, then enforces the rate limit for the remaining four before the hard rejection threshold. It is useful when you want to allow a small spike but still apply backpressure before the full burst ceiling.

nodelay is semantically equivalent to delay=infinity while still respecting the burst ceiling.

Burst is per-key and per-zone, not per-location. If two location blocks reference the same limit_req_zone, they share the same burst pool for each key. A burst of ten consumed by one location leaves zero available for the other. Order of arrival determines allocation.

Zone sizing determines how many unique keys nginx can track. On 64-bit systems, each entry consumes roughly 128 bytes. A 10 MB zone tracks approximately 80,000 unique IPs. Most guides quote the 32-bit figure, which is roughly double. If you size the zone for 16,000 IPs per megabyte on a 64-bit production server, you will exhaust the zone.

When a limit_req_zone fills, nginx evicts the least recently used entries to make room for new keys. If it cannot free sufficient space, the request receives a 503. This means a sudden influx of previously unseen IPs can evict tracked keys and cause collateral damage to existing clients.

flowchart TD
    A[Request arrives] --> B{Key exists in zone?}
    B -->|No| C[Allocate node]
    C -->|Zone full| D[LRU evict oldest]
    D -->|Still full| E[Reject 503]
    B -->|Yes| F[Check bucket]
    F -->|Within rate| G[Forward immediately]
    F -->|Over rate| H{Burst slots free?}
    H -->|Yes| I{nodelay set?}
    I -->|Yes| J[Forward now
Consume slot] I -->|No| K[Queue request
Release at rate] H -->|No| L[Reject 503] J --> M[Free slot at leak rate] K --> M

Where it shows up in production

API endpoints are the most common site of misconfiguration. A mobile app that batches three requests on launch, or a single-page application that fetches data in parallel, can trigger a burst limit instantly. Without nodelay, those requests queue and the app times out.

Authentication endpoints suffer the same problem. Password managers and security-conscious users may submit login forms multiple times in quick succession. Immediate rejection teaches them the site is broken; queuing without nodelay teaches them the site is slow.

Browser behavior also matters. Modern browsers open six or more parallel connections per host to load assets. If you apply limit_req to static asset locations without nodelay, the browser stalls waiting for CSS or JavaScript while the bucket leaks.

Shared zones across locations create hidden coupling. If /api and /webhook both reference the same zone, a burst of webhooks can consume the entire burst quota and cause API requests to be rejected. Operators often assume each location gets its own burst pool.

NAT and proxy aggregation silently break per-IP limits. A corporate gateway or a caching proxy may represent dozens of legitimate users with a single IP address. Per-IP rate limiting keyed to $binary_remote_addr treats all of those users as one entity. When the shared zone is undersized, the LRU eviction amplifies the problem by rotating tracked IPs under pressure, destabilizing limits for everyone.

Tradeoffs and common misuses

Burst without nodelay is appropriate only when you want absolute traffic smoothing. The queue enforces a perfectly uniform rate to the upstream, but it adds latency under load. That latency can cascade into client timeouts, 499 errors, and retry storms that increase load instead of reducing it.

Burst with nodelay eliminates artificial delay but enforces a hard cap at rate plus burst. Once the burst pool is exhausted, requests are rejected immediately. This is the right choice for most web applications because it absorbs legitimate spikes without punishing the user, while still protecting the upstream from sustained overload.

Delay parameter is useful when you want to allow a small unconditional burst but still apply backpressure before the ceiling. It sits between the two extremes.

Leaving the default 503 status mixes rate limiting with upstream outage signals. The default limit_req_status is 503. If your alerting treats 503 as “all backends are down,” rate-limited legitimate users will trigger a false incident page. Set limit_req_status 429 to separate capacity enforcement from backend health.

Undersizing the zone is the most dangerous misconfiguration. When limit_req_zone fills, LRU eviction begins. A flash crowd of new IPs can evict established tracking keys, causing nginx to reject requests from clients that were previously within their limit. If eviction cannot free enough space, enforcement fails entirely and requests receive 503. The error log shows could not allocate node, but by then the limiter has already stopped protecting you.

Assuming per-location isolation leads to quota starvation. Because burst is per-key across all locations sharing a zone, one high-traffic path can steal the burst allowance from another.

Signals to watch in production

SignalWhy it mattersWarning sign
Rate limit rejection rateMeasures enforcement activitySudden spike indicates attack, flash crowd, or limits set too low
$limit_req_statusDistinguishes PASSED, DELAYED, and REJECTEDHigh DELAYED means burst is queuing without nodelay; unexpected REJECTED means limits are too tight
Zone allocation errorsZone exhaustion breaks enforcement for new keysAny could not allocate node in the error log is critical
Unique key count vs zone sizeProactive capacity planningUnique clients approaching 80% of estimated zone capacity
Request time minus upstream response timeDetects artificial queuing delayGap growing under load indicates burst without nodelay

How Netdata helps

  • Correlate 503 or 429 spikes with stub_status request rate and connection state breakdown to determine whether rate limiting or upstream failure is the source.
  • Monitor the nginx error log for limiting requests and could not allocate node to catch zone exhaustion before it disables enforcement.
  • Track the gap between $request_time and $upstream_response_time to detect artificial delay introduced by queued bursts.
  • Alert on 5xx rate anomalies that coincide with rate limit zone saturation, separating legitimate enforcement from unintended rejection of real users.