nginx 502 Bad Gateway: causes and how to fix it

A 502 Bad Gateway means the upstream server returned an invalid response, refused the connection, or terminated before completing the response. Unlike 504, which signals upstream slowness, 502 means the upstream never produced a valid response or nginx could not reach it.

Start with the error log. A single line like connect() failed (111: Connection refused) tells you the upstream is not listening. A line like upstream prematurely closed connection tells you the backend died mid-request. Match the exact message to the root cause.

This guide maps common failure modes from symptom to fix.

What this means

nginx returns 502 when it acts as a reverse proxy and the upstream server sends an invalid response, refuses the connection, or crashes. The nginx source maps unhandled upstream failures to HTTP 502 by default.

This is different from:

  • 503 Service Unavailable: nginx itself refuses the request, usually from rate limiting or when all upstreams are explicitly down.
  • 504 Gateway Timeout: the upstream accepted the connection but did not respond within proxy_read_timeout.

In practice, 502 falls into four categories: the upstream is unreachable (connection refused), the upstream dies during the request (premature close), the response headers exceed nginx buffer limits (too big header), or nginx cannot resolve the upstream hostname (DNS failure).

Common causes

CauseWhat it looks likeFirst thing to check
Upstream not listeningconnect() failed (111: Connection refused) or connect() failed (113: No route to host) in error log; $upstream_status is -Is the upstream process running and bound to the expected IP/port or Unix socket?
Upstream crashed mid-requestupstream prematurely closed connection while reading response header or recv() failed (104: Connection reset by peer)Upstream application logs for OOM, segfault, or worker kill at the same timestamp.
Oversized response headersupstream sent too big header while reading response header from upstreamproxy_buffer_size (or fastcgi_buffer_size for PHP-FPM) versus actual upstream header size.
DNS resolution failureno resolver defined to resolve backend.example.com or resolver timeout messages; only affects variable-based proxy_passresolver directive presence and reachability of the configured DNS server.
All upstreams marked downno live upstreams while connecting to upstreamWhether every server in the upstream block has failed max_fails health checks.
Unix socket permission deniedconnect() failed (13: Permission denied) while connecting to upstreamSELinux httpd_can_network_connect boolean or filesystem permissions on the socket file.

Quick checks

Run these read-only checks before making changes.

# Check error log for upstream failures (last 1000 lines)
tail -1000 /var/log/nginx/error.log | grep -E "upstream|connect\(\) failed|502"

# Check if upstreams are reachable from the nginx host
for backend in 10.0.1.10:8080 10.0.1.11:8080; do
  timeout 2 bash -c "echo > /dev/tcp/${backend%:*}/${backend#*:}" 2>/dev/null && \
    echo "$backend: UP" || echo "$backend: DOWN"
done

# Check stub_status for connection pressure
curl -s http://127.0.0.1/stub_status

# Check for silently dropped kernel connections
nstat -az TcpExtListenOverflows 2>/dev/null | grep ListenOverflows

# Count file descriptors per worker (compare against limits)
for pid in $(pgrep -P $(cat /var/run/nginx.pid)); do
  echo "Worker $pid: $(ls /proc/$pid/fd 2>/dev/null | wc -l) FDs"
done

# Verify nginx workers are running
pgrep -c -P $(cat /var/run/nginx.pid)

How to diagnose it

Follow these steps in order. Do not restart nginx until you know why the 502 is happening.

  1. Confirm this nginx instance emitted the 502. Check $upstream_addr in the access log. If there is no upstream address, the failure happened before nginx selected an upstream.
  2. Read the error log for the exact message. The error log line includes the upstream IP or socket path and the system errno. Match the message to the table above.
  3. Correlate with access log variables. A log format that includes $upstream_addr, $upstream_status, $upstream_connect_time, and $upstream_response_time shows whether the upstream was reached at all. If $upstream_status is -, nginx never got a response.
  4. Check upstream health directly. Use curl, nc, or /dev/tcp from the nginx host to the upstream endpoint. If this fails, the problem is upstream, not nginx.
  5. Check for connection exhaustion. If active connections are near worker_connections * worker_processes, or if the accepts - handled gap is growing, nginx may drop connections before they reach the upstream phase. See NGINX connection exhaustion.
  6. Check DNS if using dynamic upstreams. If the configuration uses proxy_pass http://$variable, verify that a resolver directive is present and that the DNS server is reachable. Without it, hostname resolution fails with 502.
  7. Check for oversized headers. If the error log says too big header, capture the upstream response headers with curl -I and compare against proxy_buffer_size.
flowchart TD
    A[502 Bad Gateway] --> B{Check error log message}
    B -->|connect refused| C[Upstream not listening]
    B -->|prematurely closed| D[Upstream crashed]
    B -->|too big header| E[Buffer limit exceeded]
    B -->|resolver error| F[DNS resolution failed]
    B -->|no live upstreams| G[All backends down]
    C --> H[Test upstream port or socket]
    D --> I[Check upstream app logs]
    E --> J[Increase proxy_buffer_size]
    F --> K[Verify resolver directive]
    G --> L[Review max_fails and backend health]

Fixes

Upstream not listening or refusing connections

If the error log shows 111: Connection refused, the upstream process is down, bound to the wrong address, or exhausted its connection limit.

  • Restart the upstream process. Warning: this drops in-flight requests.
  • Verify the upstream bind address. In Kubernetes, ensure the container binds to 0.0.0.0, not 127.0.0.1.
  • For PHP-FPM, check the FPM error log for server reached pm.max_children setting. Increase pm.max_children or reduce request latency to free workers.
  • If SELinux is enforcing, run getsebool httpd_can_network_connect and enable it if needed.

Upstream crashes mid-request

upstream prematurely closed connection or Connection reset by peer means the upstream worker died while processing the request.

  • Check upstream application logs for fatal errors, OOM kills, or timeouts that kill workers.
  • If the crash correlates with a specific request pattern, reproduce it in a staging environment.
  • Ensure the upstream has enough memory and CPU. PHP-FPM workers in particular die when memory limits are hit.

Oversized response headers

If the error log says upstream sent too big header, the upstream response headers exceed the buffer allocated by nginx.

  • Increase proxy_buffer_size (for HTTP upstreams) or fastcgi_buffer_size (for PHP-FPM).
  • You may also need to increase proxy_buffers or fastcgi_buffers if the body is large, though the error specifically references headers.
  • Tradeoff: larger buffers increase per-connection memory usage. Do not set them arbitrarily high.

DNS resolution failure

If the error log mentions no resolver defined or DNS timeouts, and you are using a variable in proxy_pass, nginx must resolve the hostname at request time.

  • Add a resolver directive in the relevant server or location block. Example: resolver 8.8.8.8 valid=30s;
  • Ensure the resolver IP is reachable from the nginx host.
  • Tradeoff: caching DNS with valid= hides upstream IP changes until the cache expires. Set this based on your infrastructure’s failover speed.

All upstreams marked down

If you see no live upstreams, every server in the upstream block has been marked unavailable by nginx’s passive health checking.

  • Check all backends directly. Restore at least one.
  • Review max_fails and fail_timeout. The defaults (max_fails=1, fail_timeout=10s) are aggressive; a single blip removes a server. Consider raising max_fails to 3 if your upstreams are stable but occasionally hiccup.
  • Add a backup server to the upstream block to receive traffic only when primaries fail.

Prevention

  • Log upstream variables. Include $upstream_addr, $upstream_status, $upstream_response_time, and $upstream_connect_time in your access log format. This makes post-incident correlation trivial.
  • Size connection capacity for the proxy multiplier. Every proxied request uses at least two connection slots (client and upstream). Keep active connections well below worker_connections * worker_processes, and remember the default worker_connections is 512.
  • Monitor the accepts-handled gap. A growing gap means nginx is dropping connections. This is a leading indicator for connection exhaustion that produces 502s before capacity formally runs out.
  • Set resolver caching. If you use dynamic upstreams, always set resolver ... valid=30s to avoid DNS latency and reduce dependency on external resolvers.
  • Tune proxy_next_upstream. The default only retries on error and timeout. If you have multiple upstreams, consider adding http_502 to the list so nginx can fail over when one backend refuses connections or crashes. Only do this if your application is idempotent or handles retries safely.
  • Set worker_shutdown_timeout. In environments with frequent reloads, old workers can linger indefinitely on long-lived connections, causing resource accumulation and unexpected connection behavior.

How Netdata helps

Netdata correlates the signals that matter during a 502 incident:

  • 5xx rate and upstream latency: View HTTP 502 spikes alongside upstream response time and connect time trends to see whether the issue is backend refusal, crash, or slowness.
  • Connection saturation: Monitor active connections against worker_connections * worker_processes capacity, and watch for the accepts-handled gap that predicts silent connection drops.
  • Error log classification: Surface nginx error log severity rates to spot upstream prematurely closed connection or connect() failed patterns as they emerge.
  • Per-worker resource usage: Track file descriptor utilization and worker CPU per process to rule out nginx-side resource exhaustion before blaming the upstream.
  • DNS resolution monitoring: For dynamic upstreams, flag resolver-related error log entries and correlate them with 502 response rate increases.