nginx limiting requests, excess – understanding limit_req rejections
When [error] ... limiting requests, excess appears in nginx error logs alongside 503 responses in access logs, determine whether you are under attack, misconfigured, or out of shared memory. The ngx_http_limit_req_module implements a leaky bucket rate limiter. Its interaction with burst, nodelay, and shared memory sizing determines whether you reject malicious traffic, delay legitimate users, or silently stop enforcing limits.
What it is and why it matters
limit_req is nginx’s request-level rate limiter. It uses a shared memory zone, configured via limit_req_zone, to track request rates per key, typically $binary_remote_addr. The zone is mapped into every worker process. When a request arrives, nginx checks the key’s current rate against the configured limit. Depending on burst and nodelay, it delays the request, rejects it, or processes it immediately.
It is often the only layer between your upstream application and abusive traffic, but also a common source of self-inflicted degradation. burst without nodelay creates artificial latency spikes that look like upstream failures. A shared memory zone sized for yesterday’s traffic exhausts during a traffic spike, silently disabling rate limiting while the upstream is overwhelmed.
How it works
At the core of limit_req is a leaky bucket algorithm. Each unique key (for example, a client IP) has a corresponding state in the shared memory zone. The configured rate, such as 10r/s in limit_req_zone, is the leak rate. Requests arriving faster than this rate fill the bucket. When the bucket overflows, nginx rejects the request.
A typical configuration:
limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;
server {
location /api/ {
limit_req zone=one burst=5 nodelay;
}
}
The limit_req_zone directive defines the zone name, size, key, and rate. The limit_req directive applies that zone to a location. The optional burst parameter defines how many excess requests the bucket can hold above the rate. The optional nodelay parameter determines whether excess requests wait or proceed immediately.
Default behavior with no burst: any request that would exceed the rate is immediately rejected. By default, nginx returns HTTP 503 Service Unavailable. You can change this with limit_req_status 429; to distinguish rate limits from backend failures.
With burst but without nodelay: excess requests are delayed to conform to the leak rate. If arrivals consistently outpace the leak rate, the burst queue fills, and subsequent excess requests are rejected with 503. This queuing creates latency in $request_time but not in $upstream_response_time, which can mislead operators into blaming the backend.
With burst and nodelay: excess requests are processed immediately, but they still consume bucket capacity. The slot they occupy is freed only at the configured leak rate. If another request arrives before a slot frees up, it is rejected. This absorbs spikes without adding latency but offers no smoothing.
When a rejection occurs, nginx logs an error line similar to:
[error] 1234#1234: *123 limiting requests, excess: 0.300 by zone "one", client: 192.0.2.1
The exact format includes the worker process and thread IDs, the connection number, the excess value, the zone name, and the client IP.
Memory sizing is critical. The key $binary_remote_addr uses a fixed-width binary representation: 4 bytes for IPv4 and 16 bytes for IPv6. Each state entry in the zone consumes roughly 64 to 128 bytes depending on architecture. As a rule of thumb, a 1 MB zone holds approximately 8,000 to 16,000 states. The slab allocator has internal overhead that reduces effective capacity to roughly 80-85 percent of the configured zone size. Zone capacity can be estimated as zone_size * 0.85 / 128.
If the zone fills and nginx cannot allocate a new tracking state–for example, because the slab allocator cannot free sufficient space–nginx logs could not allocate node and stops enforcing the rate limit for that key. The request proceeds normally. This is a silent failure: clients are not rejected, and no 503s are emitted, but rate limiting is effectively disabled for new keys.
flowchart TD
A[Request arrives] --> B[Look up key in limit_req_zone]
B --> C{Within leak rate?}
C -->|Yes| D[Forward to upstream]
C -->|No| E{Burst configured?}
E -->|No| F[Reject with 503]
E -->|Yes| G{Burst capacity available?}
G -->|No| F
G -->|Yes| H{Nodelay configured?}
H -->|Yes| D
H -->|No| I[Delay until leak slot available]
I --> DWhere it shows up in production
The most obvious symptom is the error log entry. A sudden flood of limiting requests, excess messages from diverse IPs suggests an attack or flash crowd. A steady stream from the same IP suggests a misbehaving client or scraper. Rejections clustered on authentication endpoints often indicate credential stuffing.
In the access log, the default manifestation is HTTP 503. If you have not changed limit_req_status, your 5xx error rate will include these rejections. This is dangerous for monitoring: a generic 5xx alert cannot distinguish between a rate-limited client and a dead upstream. Many operators explicitly set limit_req_status 429 so that rate limit rejections are visually and programmatically distinct from service failures.
If you have configured burst without nodelay, you may not see 503s at all during moderate spikes. Instead, latency climbs in $request_time while $upstream_response_time remains flat. The gap between the two is time spent queued in the leaky bucket. Browsers and mobile clients may time out before the delayed request ever reaches your application.
Zone exhaustion is harder to detect. When limit_req_zone fills, new clients bypass rate limiting entirely. Your access logs show 200s. Your error logs may show could not allocate node. The only signal that rate limiting has failed is that your upstream begins receiving unexpected traffic volume.
The NAT aggregation caveat is particularly important for consumer-facing services. Corporate proxies, mobile carrier NAT, and CGNAT collapse many users behind a single public IP. A limit keyed to $binary_remote_addr treats that entire population as one client. If one user behind the NAT exceeds the limit, all users behind that IP are delayed or rejected.
Tradeoffs and common misuses
- Burst without nodelay. Operators add
burst=20to accommodate legitimate traffic, but withoutnodelaythose requests queue and drain at the leak rate. During a spike, the queue fills, requests time out, and the remainder are rejected. The result looks like an upstream failure but is a configuration choice. - Using
$remote_addrinstead of$binary_remote_addr. The string representation varies in length and uses more memory.$binary_remote_addris fixed-width and more efficient for slab allocation. - IP-keyed limits behind NAT. Any rate limit based on
$binary_remote_addrpenalizes all users behind a shared IP. For applications with large populations behind CGNAT or corporate proxies, consider alternative keys such as authenticated user IDs or API tokens, though these require application cooperation. - Zone undersizing. A
limit_req_zonesized for 10,000 unique IPs will silently fail when your traffic grows to 50,000. Because zone size cannot be changed via reload, resizing requires a full restart. Size for at least 2x your expected peak unique key count. - Default 503 status. The default
limit_req_statusis 503, which conflates rate limiting with upstream failures. Change it to 429 unless you have a specific reason not to.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
| Error log “limiting requests, excess” | Confirms the rate limiter is actively rejecting or delaying | Sudden 10x increase indicates attack, flash crowd, or misconfiguration |
| HTTP 503 or 429 rate | Measures the user-visible impact of rejections | Sustained >1% of total traffic may mean limits are too restrictive for legitimate load |
could not allocate node in error log | Zone exhaustion silently disables rate limiting for new keys | Any occurrence means new clients bypass limits; investigate zone sizing immediately |
| Unique keys vs zone capacity | Ensures the zone can track your actual client population | Unique keys approaching zone_size * 0.85 / 128 per entry |
Latency gap between $request_time and $upstream_response_time | Detects artificial delay from burst without nodelay | Gap grows during traffic spikes while upstream remains healthy |
How Netdata helps
- Correlate nginx 503/429 spikes with total request rate from
stub_status. If rejections rise without a traffic surge, your limits may be too tight for legitimate load. - Surface error log rates by parsing nginx logs, including
limiting requestsandcould not allocate node, without manual tailing. - Compare
$request_timeagainst$upstream_response_timeto identify queue latency fromburstwithoutnodelay. - Track active connections to help size
limit_req_zonecapacity against peak concurrent populations. - Distinguish rate-limit 503s from backend-cascade 503s by correlating with upstream response time and health state.
Related guides
- How NGINX actually works in production: a mental model for operators
- nginx 413 Request Entity Too Large: client_max_body_size explained
- nginx 499 status code: why clients close connections before the response
- nginx 500 Internal Server Error: how to diagnose it
- nginx 502 Bad Gateway: causes and how to fix it
- nginx 503 Service Temporarily Unavailable: causes and fixes
- nginx 504 Gateway Time-out: causes and fixes
- NGINX active connections climbing: reading, writing, waiting explained
- NGINX backend cascade failure: when slow upstreams take down everything
- nginx: a client request body is buffered to a temporary file - what it means
- nginx connect() failed (111: Connection refused) while connecting to upstream
- NGINX connection exhaustion: detection, diagnosis, and prevention







