nginx 502 Bad Gateway: causes and how to fix it
A 502 Bad Gateway means the upstream server returned an invalid response, refused the connection, or terminated before completing the response. Unlike 504, which signals upstream slowness, 502 means the upstream never produced a valid response or nginx could not reach it.
Start with the error log. A single line like connect() failed (111: Connection refused) tells you the upstream is not listening. A line like upstream prematurely closed connection tells you the backend died mid-request. Match the exact message to the root cause.
This guide maps common failure modes from symptom to fix.
What this means
nginx returns 502 when it acts as a reverse proxy and the upstream server sends an invalid response, refuses the connection, or crashes. The nginx source maps unhandled upstream failures to HTTP 502 by default.
This is different from:
- 503 Service Unavailable: nginx itself refuses the request, usually from rate limiting or when all upstreams are explicitly down.
- 504 Gateway Timeout: the upstream accepted the connection but did not respond within
proxy_read_timeout.
In practice, 502 falls into four categories: the upstream is unreachable (connection refused), the upstream dies during the request (premature close), the response headers exceed nginx buffer limits (too big header), or nginx cannot resolve the upstream hostname (DNS failure).
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Upstream not listening | connect() failed (111: Connection refused) or connect() failed (113: No route to host) in error log; $upstream_status is - | Is the upstream process running and bound to the expected IP/port or Unix socket? |
| Upstream crashed mid-request | upstream prematurely closed connection while reading response header or recv() failed (104: Connection reset by peer) | Upstream application logs for OOM, segfault, or worker kill at the same timestamp. |
| Oversized response headers | upstream sent too big header while reading response header from upstream | proxy_buffer_size (or fastcgi_buffer_size for PHP-FPM) versus actual upstream header size. |
| DNS resolution failure | no resolver defined to resolve backend.example.com or resolver timeout messages; only affects variable-based proxy_pass | resolver directive presence and reachability of the configured DNS server. |
| All upstreams marked down | no live upstreams while connecting to upstream | Whether every server in the upstream block has failed max_fails health checks. |
| Unix socket permission denied | connect() failed (13: Permission denied) while connecting to upstream | SELinux httpd_can_network_connect boolean or filesystem permissions on the socket file. |
Quick checks
Run these read-only checks before making changes.
# Check error log for upstream failures (last 1000 lines)
tail -1000 /var/log/nginx/error.log | grep -E "upstream|connect\(\) failed|502"
# Check if upstreams are reachable from the nginx host
for backend in 10.0.1.10:8080 10.0.1.11:8080; do
timeout 2 bash -c "echo > /dev/tcp/${backend%:*}/${backend#*:}" 2>/dev/null && \
echo "$backend: UP" || echo "$backend: DOWN"
done
# Check stub_status for connection pressure
curl -s http://127.0.0.1/stub_status
# Check for silently dropped kernel connections
nstat -az TcpExtListenOverflows 2>/dev/null | grep ListenOverflows
# Count file descriptors per worker (compare against limits)
for pid in $(pgrep -P $(cat /var/run/nginx.pid)); do
echo "Worker $pid: $(ls /proc/$pid/fd 2>/dev/null | wc -l) FDs"
done
# Verify nginx workers are running
pgrep -c -P $(cat /var/run/nginx.pid)
How to diagnose it
Follow these steps in order. Do not restart nginx until you know why the 502 is happening.
- Confirm this nginx instance emitted the 502. Check
$upstream_addrin the access log. If there is no upstream address, the failure happened before nginx selected an upstream. - Read the error log for the exact message. The error log line includes the upstream IP or socket path and the system errno. Match the message to the table above.
- Correlate with access log variables. A log format that includes
$upstream_addr,$upstream_status,$upstream_connect_time, and$upstream_response_timeshows whether the upstream was reached at all. If$upstream_statusis-, nginx never got a response. - Check upstream health directly. Use
curl,nc, or/dev/tcpfrom the nginx host to the upstream endpoint. If this fails, the problem is upstream, not nginx. - Check for connection exhaustion. If active connections are near
worker_connections * worker_processes, or if theaccepts - handledgap is growing, nginx may drop connections before they reach the upstream phase. See NGINX connection exhaustion. - Check DNS if using dynamic upstreams. If the configuration uses
proxy_pass http://$variable, verify that aresolverdirective is present and that the DNS server is reachable. Without it, hostname resolution fails with 502. - Check for oversized headers. If the error log says
too big header, capture the upstream response headers withcurl -Iand compare againstproxy_buffer_size.
flowchart TD
A[502 Bad Gateway] --> B{Check error log message}
B -->|connect refused| C[Upstream not listening]
B -->|prematurely closed| D[Upstream crashed]
B -->|too big header| E[Buffer limit exceeded]
B -->|resolver error| F[DNS resolution failed]
B -->|no live upstreams| G[All backends down]
C --> H[Test upstream port or socket]
D --> I[Check upstream app logs]
E --> J[Increase proxy_buffer_size]
F --> K[Verify resolver directive]
G --> L[Review max_fails and backend health]Fixes
Upstream not listening or refusing connections
If the error log shows 111: Connection refused, the upstream process is down, bound to the wrong address, or exhausted its connection limit.
- Restart the upstream process. Warning: this drops in-flight requests.
- Verify the upstream bind address. In Kubernetes, ensure the container binds to
0.0.0.0, not127.0.0.1. - For PHP-FPM, check the FPM error log for
server reached pm.max_children setting. Increasepm.max_childrenor reduce request latency to free workers. - If SELinux is enforcing, run
getsebool httpd_can_network_connectand enable it if needed.
Upstream crashes mid-request
upstream prematurely closed connection or Connection reset by peer means the upstream worker died while processing the request.
- Check upstream application logs for fatal errors, OOM kills, or timeouts that kill workers.
- If the crash correlates with a specific request pattern, reproduce it in a staging environment.
- Ensure the upstream has enough memory and CPU. PHP-FPM workers in particular die when memory limits are hit.
Oversized response headers
If the error log says upstream sent too big header, the upstream response headers exceed the buffer allocated by nginx.
- Increase
proxy_buffer_size(for HTTP upstreams) orfastcgi_buffer_size(for PHP-FPM). - You may also need to increase
proxy_buffersorfastcgi_buffersif the body is large, though the error specifically references headers. - Tradeoff: larger buffers increase per-connection memory usage. Do not set them arbitrarily high.
DNS resolution failure
If the error log mentions no resolver defined or DNS timeouts, and you are using a variable in proxy_pass, nginx must resolve the hostname at request time.
- Add a
resolverdirective in the relevantserverorlocationblock. Example:resolver 8.8.8.8 valid=30s; - Ensure the resolver IP is reachable from the nginx host.
- Tradeoff: caching DNS with
valid=hides upstream IP changes until the cache expires. Set this based on your infrastructure’s failover speed.
All upstreams marked down
If you see no live upstreams, every server in the upstream block has been marked unavailable by nginx’s passive health checking.
- Check all backends directly. Restore at least one.
- Review
max_failsandfail_timeout. The defaults (max_fails=1,fail_timeout=10s) are aggressive; a single blip removes a server. Consider raisingmax_failsto 3 if your upstreams are stable but occasionally hiccup. - Add a
backupserver to the upstream block to receive traffic only when primaries fail.
Prevention
- Log upstream variables. Include
$upstream_addr,$upstream_status,$upstream_response_time, and$upstream_connect_timein your access log format. This makes post-incident correlation trivial. - Size connection capacity for the proxy multiplier. Every proxied request uses at least two connection slots (client and upstream). Keep active connections well below
worker_connections * worker_processes, and remember the defaultworker_connectionsis 512. - Monitor the accepts-handled gap. A growing gap means nginx is dropping connections. This is a leading indicator for connection exhaustion that produces 502s before capacity formally runs out.
- Set resolver caching. If you use dynamic upstreams, always set
resolver ... valid=30sto avoid DNS latency and reduce dependency on external resolvers. - Tune
proxy_next_upstream. The default only retries onerrorandtimeout. If you have multiple upstreams, consider addinghttp_502to the list so nginx can fail over when one backend refuses connections or crashes. Only do this if your application is idempotent or handles retries safely. - Set
worker_shutdown_timeout. In environments with frequent reloads, old workers can linger indefinitely on long-lived connections, causing resource accumulation and unexpected connection behavior.
How Netdata helps
Netdata correlates the signals that matter during a 502 incident:
- 5xx rate and upstream latency: View HTTP 502 spikes alongside upstream response time and connect time trends to see whether the issue is backend refusal, crash, or slowness.
- Connection saturation: Monitor active connections against
worker_connections * worker_processescapacity, and watch for the accepts-handled gap that predicts silent connection drops. - Error log classification: Surface nginx error log severity rates to spot
upstream prematurely closed connectionorconnect() failedpatterns as they emerge. - Per-worker resource usage: Track file descriptor utilization and worker CPU per process to rule out nginx-side resource exhaustion before blaming the upstream.
- DNS resolution monitoring: For dynamic upstreams, flag resolver-related error log entries and correlate them with 502 response rate increases.
Related guides
- How NGINX actually works in production: a mental model for operators
- NGINX active connections climbing: reading, writing, waiting explained
- NGINX connection exhaustion: detection, diagnosis, and prevention
- NGINX dropped connections: the accepts vs handled gap
- NGINX monitoring checklist: the signals every production server needs
- NGINX monitoring maturity model: from survival to expert
- nginx: worker_connections are not enough - causes and fixes
- NGINX worker_connections and worker_processes: sizing for real traffic







