nginx 503 service temporarily unavailable: causes and fixes

A 503 Service Temporarily Unavailable from nginx does not always mean the upstream application is broken. A healthy nginx process returns 503 by design when rate limits reject traffic, or when every backend in an upstream block is unavailable. The same status code covers three different failure paths: intentional throttling, upstream exhaustion, or resource saturation on the nginx host. The fix depends on which path the request took.

Distinguishing rate-limit 503s from upstream 503s avoids chasing ghost backend failures. The limit_req and limit_conn modules default to status 503, not 429. This guide maps the three primary failure paths and gives commands to prove which one is active.

What this means

All upstream servers are down or refusing connections. When nginx has no healthy backend to proxy to, it returns 503. This includes PHP-FPM pool exhaustion where pm.max_children is reached and new connections are refused immediately.

Rate limiting or connection limiting is rejecting the request. The limit_req and limit_conn modules default to status 503 when the leaky bucket overflows or the per-key connection count is exceeded. This is an intentional, healthy rejection. nginx 1.15.7 and later allow you to change this via limit_req_status and limit_conn_status, but the default remains 503.

Shared memory zone exhaustion or nginx resource pressure. If a limit_req_zone or limit_conn_zone fills completely, nginx may return 503 because it cannot allocate tracking state. Connection slot saturation and file descriptor limits can also manifest as 503s or 502s depending on exactly where the refusal occurs.

flowchart TD
    A[Client receives 503] --> B{Check error.log for
limiting requests or connections} B -->|Yes| C[Rate limiting rejection] B -->|No| D{Check upstream health
and error.log} D -->|All backends down
or refusing| E[Upstream outage or pool exhaustion] D -->|Backends healthy| F{Check connection slots
and K8s endpoints} F -->|Slots saturated| G[nginx connection exhaustion] F -->|Stale endpoints| H[Kubernetes ingress drift] F -->|Resources normal| I[Check zone exhaustion
or maintenance config]

Common causes

CauseWhat it looks likeFirst thing to check
Rate limiting rejection (limit_req / limit_conn)503s spike during traffic bursts; error.log shows “limiting requests” or “limiting connections”grep "limiting" /var/log/nginx/error.log
All upstream servers unavailable503s for all proxied requests; error.log shows “no live upstreams” or connect failures/dev/tcp probes or PHP-FPM pool status
PHP-FPM pool exhaustion503s on PHP endpoints; PHP-FPM log shows “server reached pm.max_children”PHP-FPM error log and pm.max_children
Rate limit zone exhaustedRate limiting silently stops working or returns 503 for new keys; error.log shows “could not allocate node”Zone size in nginx -T vs unique client count
Kubernetes ingress stale endpointsIntermittent 503s after pod rotations; other ingress paths work fineEndpoint freshness and service cluster IP routing
nginx connection slot exhaustionActive connections at worker_connections × worker_processes; accepts-handled gap growingstub_status active connections vs configured maximum

Quick checks

# Check if rate limiting is the source of 503s
tail -1000 /var/log/nginx/error.log | grep -E "(limiting requests|limiting connections)"

# Check for upstream total failures
tail -1000 /var/log/nginx/error.log | grep -E "(no live upstreams|connect\(\) failed)"

# Check current connection pressure (adjust URL to match your stub_status location)
curl -s http://127.0.0.1/stub_status | awk '/Active connections/ {print $3}'

# Probe upstream backends directly (replace with your backend IPs:ports)
for backend in 10.0.1.10:8080 10.0.1.11:8080; do
  timeout 2 bash -c "echo > /dev/tcp/${backend%:*}/${backend#*:}" 2>/dev/null && \
    echo "$backend: UP" || echo "$backend: DOWN"
done

# Inspect rate limiting configuration and zone sizes
nginx -T 2>/dev/null | grep -E "(limit_req|limit_conn|limit_req_zone|limit_conn_zone)"

# Check if 503s are concentrated on specific paths or IPs
# Assumes combined log format; adjust field numbers if your format differs
tail -10000 /var/log/nginx/access.log | awk '$9 == 503 {print $1, $7}' | sort | uniq -c | sort -rn | head -20

How to diagnose it

  1. Confirm whether nginx or an upstream generated the 503. Check the error log for “limiting requests”, “limiting connections”, or “excess” messages. If these are present, the 503 was generated locally by nginx. If absent, look upstream or at zone allocation failures.

  2. Distinguish rate-limit 503s from upstream 503s. Rate-limit 503s correlate with traffic spikes and appear in the error log as “limiting requests” with the offending IP and zone name. Upstream 503s correlate with backend failures and appear as “no live upstreams” or connect() failed errors.

  3. Check upstream health directly. Use /dev/tcp probes or curl against each backend. If all backends refuse connections, verify whether the backend processes are running, whether the listen ports are bound, and whether pools like PHP-FPM pm.max_children are saturated.

  4. Inspect connection slot utilization. Query stub_status and compare active connections to worker_connections × worker_processes. Each proxied request consumes two slots. If utilization is above 80%, connection exhaustion is either causing 503s directly or preventing recovery during a backend restart.

  5. Look for shared memory zone exhaustion. Search the error log for “could not allocate node”. If limit_req_zone is full, nginx stops enforcing rate limits for new keys. If limit_conn_zone is full, it errors immediately. Zone size cannot be changed via reload; it requires a restart.

  6. If running Kubernetes ingress-nginx, check endpoint drift. After pod rotations, nginx may retain stale pod IPs. Look for intermittent 503s that resolve briefly after an ingress reload. Routing via the Service cluster IP instead of individual pod IPs removes this drift.

  7. Check access log patterns. Parse the access log for 503 entries. Are they tied to specific locations, user agents, or source IPs? A concentrated burst from a single IP suggests rate limiting is working as intended. Uniform distribution across all clients suggests an upstream outage.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
HTTP 503 response rateDirect measure of user-visible rejectionsSustained rate above baseline, or any spike correlated with error log entries
Error log “limiting requests” rateConfirms rate limiting is the sourceNonzero rate means 503s are intentional rejections, not upstream failures
Active connections vs worker_connections capacityConnection exhaustion precedes or accompanies 503 stormsUtilization above 80% of (worker_connections × worker_processes)
Accepts-handled gapReveals silent connection drops that can manifest as 503sAny growing gap indicates admission loss
Upstream connect timeZero or near-zero means keepalive reuse; high means new connection stressSudden spike suggests upstreams are restarting or refusing connections
Error log “could not allocate node”Signals shared memory zone exhaustionAny occurrence means a security or capacity boundary is failing
PHP-FPM pm.max_children warningsPool exhaustion causes immediate 503s on PHP pathsLog entries stating the pool reached its limit

Fixes

Rate limiting is too aggressive

If legitimate traffic is being rejected, increase the burst parameter or relax the rate. Review error log entries to confirm rejections match actual abuse before tightening.

Tradeoff: Higher burst increases backend exposure during attacks. If you loosen limits, ensure your backends can absorb the additional load.

Rate limit zones are exhausted

If the error log shows “could not allocate node”, the shared memory zone is too small for the number of unique tracking keys. Calculate required size using roughly 128 bytes per key on 64-bit platforms. Size the zone for at least 2x your expected peak unique key count. Changing a zone size requires a full nginx restart, not a reload.

Tradeoff: Larger zones consume more RAM. Size for at least 2x your expected peak unique key count.

Upstream servers are all down

If all backends are unavailable, restore at least one backend. If the outage is partial but nginx has marked all servers as failed due to aggressive max_fails settings, increase max_fails or fail_timeout to reduce flapping. Configure a backup server in the upstream block to handle traffic when all primaries fail.

Tradeoff: Looser failure thresholds keep unhealthy servers in rotation longer. Monitor upstream response time closely if you increase max_fails.

PHP-FPM pool exhaustion

When PHP-FPM logs “server reached pm.max_children”, the pool is saturated and refusing new connections. Increase pm.max_children if the host has memory headroom, or reduce application response time so workers free faster. Switching to pm = ondemand can help if the steady-state worker count is low but spikes are rare.

Tradeoff: Higher pm.max_children uses more memory. Do not raise it without confirming RSS headroom.

Kubernetes ingress stale endpoints

If pod rotations cause intermittent 503s, configure the ingress to use the Service cluster IP rather than individual pod endpoints. This removes endpoint drift during rollouts.

Tradeoff: You lose direct pod-to-pod load balancing and some session affinity behaviors. Test sticky session requirements before applying broadly.

Connection or file descriptor exhaustion

If active connections are near the configured maximum, increase worker_connections and ensure worker_rlimit_nofile is at least double that value. Reduce keepalive_timeout to reclaim idle connection slots faster. If upstream keepalive pools are churning, verify the backend is honoring keepalive and that keepalive_requests is not forcing premature closure.

Tradeoff: Higher connection limits need more file descriptors and memory. Verify ulimit and container limits are raised in tandem.

Prevention

  • Set limit_req_status and limit_conn_status explicitly. If you prefer RFC-compliant behavior, set both to 429 (available since nginx 1.15.7). This prevents confusion between rate-limit 503s and upstream outage 503s.
  • Size rate limit zones for growth. Allocate at least 2x the expected peak unique keys. Monitor for “could not allocate node” errors proactively.
  • Monitor the accepts-handled gap. A growing gap is the earliest indicator that nginx is dropping connections before they are processed.
  • Keep upstream health checks independent of nginx. Open-source nginx uses passive health checking. A sidecar or external monitor can detect backend degradation before real user requests fail.
  • Account for the proxy connection multiplier. Each proxied request uses two connection slots. Capacity planning should use an effective maximum of half worker_connections × worker_processes, minus keepalive overhead.
  • Set worker_shutdown_timeout. Old workers lingering after reloads consume slots and memory. A timeout prevents accumulation during frequent configuration changes.

How Netdata helps

  • stub_status metrics (active connections, reading/writing/waiting breakdown, accepts-handled gap) surface capacity exhaustion before 503s spike.
  • Error log severity rates correlate spikes in “limiting requests” or “no live upstreams” with 503 response codes.
  • Upstream response time and connect time percentiles distinguish backend degradation from nginx-side rejections.
  • Alerts on connection slot and file descriptor utilization provide lead time before admission loss.