nginx connect() failed (111: Connection refused) while connecting to upstream
HTTP 502 Bad Gateway and the error connect() failed (111: Connection refused) while connecting to upstream mean nginx reached the upstream IP, but the target port actively refused the TCP connection. The backend is either not running, not listening on the interface nginx expects, or a firewall is blocking the port.
This is distinct from upstream timed out (110: Connection timed out). A timeout means the TCP SYN never received a response, usually because a firewall silently dropped the packet or the host is unreachable. Errno 111 means the network path is open but no process is accepting connections. The error log includes the upstream address, such as upstream: "fastcgi://127.0.0.1:9000". Read that line first to isolate the exact backend.
If you have multiple upstream servers, stock nginx retries the request on the next peer because connect failures map to the error condition in proxy_next_upstream, which is in the default retry set. With a single upstream, or if all servers are already marked unavailable, the client gets an immediate 502.
What this means
When nginx proxies a request, connect() returns ECONNREFUSED (111) when the target IP is reachable but no process listens on that port. The client receives HTTP 502.
The error log contains the upstream address, for example upstream: "http://10.0.1.10:8080". Use this to identify the exact peer that failed.
Stock nginx uses passive health checks with max_fails=1 and fail_timeout=10s. A single 111 counts as one failure. If a server accumulates max_fails failures within fail_timeout, nginx marks it unavailable for the remainder of that period and stops sending traffic to it. With a single upstream server, nginx keeps attempting connections regardless of recent failures. Connect failures map to the error condition, which is in the default proxy_next_upstream set, so nginx retries on the next peer when multiple servers are configured.
flowchart TD
A[Error 111 in nginx log] --> B{Backend process running?}
B -->|No| C[Restart backend and check for OOM or crash]
B -->|Yes| D{Listening on expected interface and port?}
D -->|No| E[Fix bind address or upstream port in config]
D -->|Yes| F{Reachable from nginx host?}
F -->|No| G[Check firewall rules or container network]
F -->|Yes| H{Using localhost or 127.0.0.1?}
H -->|Yes| I[Replace with explicit IP or service name]
H -->|No| J{IPv6 localhost mismatch?}
J -->|Yes| K[Use explicit 127.0.0.1 instead of localhost]
J -->|No| L[Review max_fails and fail_timeout state]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Backend process stopped or crashed | 502s start immediately after a deploy, restart, or OOM kill; error log points to the expected backend address | Process status on the backend host: ps aux or container runtime status |
| Backend listening on wrong interface or port | Backend process is running, but direct connection from the nginx host still fails with refusal | ss -tlnp on the backend host to verify bound IP and port |
| Container localhost mismatch | nginx runs inside a container and uses 127.0.0.1 or localhost as upstream, reaching its own loopback instead of the host or another container | Container network mode and DNS or service names |
| Firewall or host ACL blocking port | Backend is healthy and listening, but connections from the nginx host are refused | iptables -L or firewall-cmd --list-all, then test with nc or bash /dev/tcp from the nginx host |
| IPv6/IPv4 localhost mismatch | Upstream configured as localhost; backend binds only to IPv4 (127.0.0.1) while nginx resolves to [::1] | Use explicit 127.0.0.1 in the upstream definition instead of localhost |
| Port or upstream definition mismatch | nginx points to port 9000 but the backend listens on 3000, or a typo exists in the upstream block | nginx -T output for the upstream server directive |
Quick checks
Run these read-only checks first.
# Check recent upstream connection errors
tail -1000 /var/log/nginx/error.log | grep -E "connect\(\) failed.*111"
# Validate nginx configuration syntax
nginx -t
# Verify backend process is running (example: php-fpm)
ps aux | grep php-fpm | grep -v grep
# Verify listening sockets on the backend host
ss -tlnp
# Test direct TCP connectivity from the nginx host
timeout 2 bash -c "echo > /dev/tcp/<backend-host>/<port>" 2>/dev/null && echo "UP" || echo "DOWN"
# Identify which upstream server failed from error logs
grep "connect() failed (111" /var/log/nginx/error.log | grep -oP 'upstream: "\K[^"]+' | sort | uniq -c | sort -rn
How to diagnose it
Read the error log line. Extract the upstream address from the
upstream:field. Confirm the timestamp correlates with the 502 responses.Verify backend process health. On the backend host, check that the application process is running. In container environments, check container or pod status. Look for recent OOM kills or crash loops.
Verify the listening socket. Run
ss -tlnpon the backend host. Look for the expected port in theLocal Address:Portcolumn. If the backend is bound to127.0.0.1:8080but nginx connects from another host or container, that explains the refusal. If the port does not appear at all, the backend process failed to start or bound to a different port.Test direct connectivity from the nginx host. Use
nc -zv <host> <port>orbash -c 'echo > /dev/tcp/<host>/<port>'from the nginx host or container. If this fails with connection refused, the network path is clear but the port is closed.Check container networking. If nginx and the backend are in separate Docker containers or Kubernetes pods, ensure they share a network or use the correct service DNS name. Never use
localhostor127.0.0.1to reach another container. In Docker Compose, use the service name; in Kubernetes, use the ClusterIP service DNS or the pod IP if headless.Check firewall rules. If the backend is listening and the port is correct, verify that host firewalls or cloud security groups allow traffic from the nginx host to the backend port.
Check for IPv6/IPv4 mismatch. If the upstream is defined as
localhost, change it to127.0.0.1to rule out[::1]vs127.0.0.1binding mismatches.Review passive health check state. If you have multiple upstream servers, check whether the failing server was temporarily marked down. Look for
no live upstreamsor repeated 111 errors to the same peer. With a single upstream, nginx continues sending traffic to a dead backend.Correlate with access log timing. If your access log format includes
$upstream_connect_time, a value of-or a sudden spike indicates connection establishment failures. Compare$upstream_response_timeand$request_timeto confirm the delay is at connect time, not during response transfer.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| HTTP 502 rate | Direct user impact from refused upstream connections | Sustained nonzero rate, or a spike after deploys or restarts |
$upstream_connect_time | Isolates connection establishment from response time; 111 errors appear as failures or spikes | Values of -, or P95 > 100 ms in a local datacenter |
$upstream_response_time | Reveals if remaining healthy backends are overloaded after peers drop | P95 trending up after upstream failures |
Error log connect() failed (111) | Exact symptom count and backend identification | Any sustained rate > 0 |
| Active connections in Writing state | Connections pile up waiting for backends, accelerating exhaustion | Writing sustained above 60% of active connections |
accepts - handled gap | Detects connection drops if upstream failures cascade to saturation | Gap increasing for > 60 seconds |
Fixes
Backend process down
Restart the application process. Check backend logs for OOM kills, segfaults, or CrashLoopBackOff events immediately before the 111 errors started. If the backend is managed by systemd, check systemctl status <service> and journalctl -u <service> -n 50. If the backend uses systemd socket activation, verify the socket unit is listening with systemctl status <service>.socket and that the service unit started.
Wrong bind address or port mismatch
Reconfigure the backend to bind to 0.0.0.0:<port> or the specific interface nginx uses. If the backend intentionally binds to 127.0.0.1 for security, nginx must run on the same host and network namespace, or you must use a Unix domain socket. Ensure the socket path is readable by the nginx worker process user and that the path matches the upstream block exactly. Update the nginx upstream definition if the backend port changed.
Container networking mismatch
Replace localhost and 127.0.0.1 in upstream definitions with the Docker Compose service name, Kubernetes service DNS name, or the container bridge IP. Verify both containers share the same network or can route to each other.
Firewall or security group blocking
Add an allow rule for the nginx host to reach the backend port. Verify with nc or /dev/tcp from the nginx host after applying the change.
IPv6/IPv4 mismatch
Change upstream definitions from localhost to the explicit IPv4 or IPv6 address that matches the backend bind configuration.
Passive health check flapping
If transient 111 errors occur during rolling updates or brief restarts, consider increasing max_fails from the default of 1 to 3, or raising fail_timeout. Tradeoff: this delays detection of genuine failures. Only tune this if you observe false positives correlated with normal deployment behavior.
Prevention
Expose a lightweight health check endpoint on every backend so external monitors detect absence before user traffic is affected. Avoid localhost in upstream definitions; use explicit IPs or service names. Manage firewall rules as code and audit them after infrastructure changes. Ensure container orchestration uses shared networking or proper service discovery rather than loopback assumptions. Size upstream pools so that the loss of one backend does not overload the remainder.
How Netdata helps
- Correlate HTTP 502 spikes with error log entries for
connect() failed (111)to confirm the pattern. - Plot
$upstream_connect_timeto isolate connection refusal from backend slowness. - Track active connection states (Reading, Writing, Waiting) to detect upstream failure pile-up.
- Alert on the
accepts - handledgap to catch connection exhaustion before it cascades. - Monitor worker process count and file descriptor utilization to rule out nginx-side saturation that mimics upstream problems.
Related guides
- How NGINX actually works in production: a mental model for operators
- nginx 502 Bad Gateway: causes and how to fix it
- NGINX active connections climbing: reading, writing, waiting explained
- NGINX connection exhaustion: detection, diagnosis, and prevention
- NGINX dropped connections: the accepts vs handled gap
- NGINX monitoring checklist: the signals every production server needs
- NGINX monitoring maturity model: from survival to expert
- NGINX slowloris and slow-client attacks: detection and mitigation
- nginx: too many open files - diagnosing file descriptor exhaustion
- nginx: worker_connections are not enough - causes and fixes
- NGINX worker_connections and worker_processes: sizing for real traffic
- NGINX worker_rlimit_nofile: setting file descriptor limits correctly







