nginx limiting requests, excess -- understanding limit_req rejections

nginx limiting requests, excess – understanding limit_req rejections

When [error] ... limiting requests, excess appears in nginx error logs alongside 503 responses in access logs, determine whether you are under attack, misconfigured, or out of shared memory. The ngx_http_limit_req_module implements a leaky bucket rate limiter. Its interaction with burst, nodelay, and shared memory sizing determines whether you reject malicious traffic, delay legitimate users, or silently stop enforcing limits.

What it is and why it matters

limit_req is nginx’s request-level rate limiter. It uses a shared memory zone, configured via limit_req_zone, to track request rates per key, typically $binary_remote_addr. The zone is mapped into every worker process. When a request arrives, nginx checks the key’s current rate against the configured limit. Depending on burst and nodelay, it delays the request, rejects it, or processes it immediately.

It is often the only layer between your upstream application and abusive traffic, but also a common source of self-inflicted degradation. burst without nodelay creates artificial latency spikes that look like upstream failures. A shared memory zone sized for yesterday’s traffic exhausts during a traffic spike, silently disabling rate limiting while the upstream is overwhelmed.

How it works

At the core of limit_req is a leaky bucket algorithm. Each unique key (for example, a client IP) has a corresponding state in the shared memory zone. The configured rate, such as 10r/s in limit_req_zone, is the leak rate. Requests arriving faster than this rate fill the bucket. When the bucket overflows, nginx rejects the request.

A typical configuration:

limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;

server {
    location /api/ {
        limit_req zone=one burst=5 nodelay;
    }
}

The limit_req_zone directive defines the zone name, size, key, and rate. The limit_req directive applies that zone to a location. The optional burst parameter defines how many excess requests the bucket can hold above the rate. The optional nodelay parameter determines whether excess requests wait or proceed immediately.

Default behavior with no burst: any request that would exceed the rate is immediately rejected. By default, nginx returns HTTP 503 Service Unavailable. You can change this with limit_req_status 429; to distinguish rate limits from backend failures.

With burst but without nodelay: excess requests are delayed to conform to the leak rate. If arrivals consistently outpace the leak rate, the burst queue fills, and subsequent excess requests are rejected with 503. This queuing creates latency in $request_time but not in $upstream_response_time, which can mislead operators into blaming the backend.

With burst and nodelay: excess requests are processed immediately, but they still consume bucket capacity. The slot they occupy is freed only at the configured leak rate. If another request arrives before a slot frees up, it is rejected. This absorbs spikes without adding latency but offers no smoothing.

When a rejection occurs, nginx logs an error line similar to:

[error] 1234#1234: *123 limiting requests, excess: 0.300 by zone "one", client: 192.0.2.1

The exact format includes the worker process and thread IDs, the connection number, the excess value, the zone name, and the client IP.

Memory sizing is critical. The key $binary_remote_addr uses a fixed-width binary representation: 4 bytes for IPv4 and 16 bytes for IPv6. Each state entry in the zone consumes roughly 64 to 128 bytes depending on architecture. As a rule of thumb, a 1 MB zone holds approximately 8,000 to 16,000 states. The slab allocator has internal overhead that reduces effective capacity to roughly 80-85 percent of the configured zone size. Zone capacity can be estimated as zone_size * 0.85 / 128.

If the zone fills and nginx cannot allocate a new tracking state–for example, because the slab allocator cannot free sufficient space–nginx logs could not allocate node and stops enforcing the rate limit for that key. The request proceeds normally. This is a silent failure: clients are not rejected, and no 503s are emitted, but rate limiting is effectively disabled for new keys.

flowchart TD
    A[Request arrives] --> B[Look up key in limit_req_zone]
    B --> C{Within leak rate?}
    C -->|Yes| D[Forward to upstream]
    C -->|No| E{Burst configured?}
    E -->|No| F[Reject with 503]
    E -->|Yes| G{Burst capacity available?}
    G -->|No| F
    G -->|Yes| H{Nodelay configured?}
    H -->|Yes| D
    H -->|No| I[Delay until leak slot available]
    I --> D

Where it shows up in production

The most obvious symptom is the error log entry. A sudden flood of limiting requests, excess messages from diverse IPs suggests an attack or flash crowd. A steady stream from the same IP suggests a misbehaving client or scraper. Rejections clustered on authentication endpoints often indicate credential stuffing.

In the access log, the default manifestation is HTTP 503. If you have not changed limit_req_status, your 5xx error rate will include these rejections. This is dangerous for monitoring: a generic 5xx alert cannot distinguish between a rate-limited client and a dead upstream. Many operators explicitly set limit_req_status 429 so that rate limit rejections are visually and programmatically distinct from service failures.

If you have configured burst without nodelay, you may not see 503s at all during moderate spikes. Instead, latency climbs in $request_time while $upstream_response_time remains flat. The gap between the two is time spent queued in the leaky bucket. Browsers and mobile clients may time out before the delayed request ever reaches your application.

Zone exhaustion is harder to detect. When limit_req_zone fills, new clients bypass rate limiting entirely. Your access logs show 200s. Your error logs may show could not allocate node. The only signal that rate limiting has failed is that your upstream begins receiving unexpected traffic volume.

The NAT aggregation caveat is particularly important for consumer-facing services. Corporate proxies, mobile carrier NAT, and CGNAT collapse many users behind a single public IP. A limit keyed to $binary_remote_addr treats that entire population as one client. If one user behind the NAT exceeds the limit, all users behind that IP are delayed or rejected.

Tradeoffs and common misuses

Burst without nodelay. Operators add burst=20 to accommodate legitimate traffic, but without nodelay those requests queue and drain at the leak rate. During a spike, the queue fills, requests time out, and the remainder are rejected. The result looks like an upstream failure but is a configuration choice.
Using $remote_addr instead of $binary_remote_addr. The string representation varies in length and uses more memory. $binary_remote_addr is fixed-width and more efficient for slab allocation.
IP-keyed limits behind NAT. Any rate limit based on $binary_remote_addr penalizes all users behind a shared IP. For applications with large populations behind CGNAT or corporate proxies, consider alternative keys such as authenticated user IDs or API tokens, though these require application cooperation.
Zone undersizing. A limit_req_zone sized for 10,000 unique IPs will silently fail when your traffic grows to 50,000. Because zone size cannot be changed via reload, resizing requires a full restart. Size for at least 2x your expected peak unique key count.
Default 503 status. The default limit_req_status is 503, which conflates rate limiting with upstream failures. Change it to 429 unless you have a specific reason not to.

Signals to watch in production

Signal	Why it matters	Warning sign
Error log “limiting requests, excess”	Confirms the rate limiter is actively rejecting or delaying	Sudden 10x increase indicates attack, flash crowd, or misconfiguration
HTTP 503 or 429 rate	Measures the user-visible impact of rejections	Sustained >1% of total traffic may mean limits are too restrictive for legitimate load
`could not allocate node` in error log	Zone exhaustion silently disables rate limiting for new keys	Any occurrence means new clients bypass limits; investigate zone sizing immediately
Unique keys vs zone capacity	Ensures the zone can track your actual client population	Unique keys approaching `zone_size * 0.85 / 128` per entry
Latency gap between `$request_time` and `$upstream_response_time`	Detects artificial delay from `burst` without `nodelay`	Gap grows during traffic spikes while upstream remains healthy

How Netdata helps

Correlate nginx 503/429 spikes with total request rate from stub_status. If rejections rise without a traffic surge, your limits may be too tight for legitimate load.
Surface error log rates by parsing nginx logs, including limiting requests and could not allocate node, without manual tailing.
Compare $request_time against $upstream_response_time to identify queue latency from burst without nodelay.
Track active connections to help size limit_req_zone capacity against peak concurrent populations.
Distinguish rate-limit 503s from backend-cascade 503s by correlating with upstream response time and health state.

The Netdata solution

Web server monitoring with Netdata

Netdata monitors NGINX with per-second request, connection, and latency metrics plus ML anomaly detection. Correlate connection and file-descriptor exhaustion, upstream cascade failures, buffer spill, and TLS CPU with the host signals behind them.

See web server monitoring → Start monitoring free