$ guides / nginx / nginx-upstream-timed-out ▌

Operations Guides

nginx upstream timed out (110: Connection timed out) while connecting/reading

upstream timed out (110: Connection timed out) in the nginx error log usually surfaces to clients as a 504 Gateway Timeout. The suffix after the error string tells you which phase failed: connecting, sending, or reading. That phase determines whether you are looking at a dead backend, a network partition, or a retry storm hiding the real problem.

The defaults are unforgiving. proxy_connect_timeout, proxy_send_timeout, and proxy_read_timeout all default to 60 seconds, and proxy_next_upstream implicitly retries on error and timeout. Retries can mask the root cause while exhausting upstream capacity.

This guide maps the exact log message to the directive that fired, reads retry evidence in $upstream_response_time, and fixes the failure without worsening retry behavior.

What this means

nginx distinguishes three timeout phases when talking to an upstream. Each produces a distinct suffix in the error log.

Phase	Error log suffix	Directive	What it measures
Connect	`while connecting to upstream`	`proxy_connect_timeout`	TCP handshake (and TLS handshake if HTTPS). Defaults to 60s. There is a hard ceiling of 75s regardless of configuration.
Send	`while sending to upstream`	`proxy_send_timeout`	Idle time between successive write operations to upstream, not total upload duration. Defaults to 60s.
Read header	`while reading response header from upstream`	`proxy_read_timeout`	Idle time between successive read operations from upstream. Defaults to 60s. If the backend sends headers slowly, this fires.
Read body	`while reading response body from upstream`	`proxy_read_timeout`	Same directive as header read, but fires while streaming the response body.

The client sees a 504 when nginx gives up on an upstream attempt. If the client disconnects first, nginx logs a 499 instead. If retries are configured and all attempts fail, the final client-facing status is still 504.

Because proxy_next_upstream implicitly defaults to error timeout, nginx may retry the same request on another upstream server when any of these timeouts fires. The retry can succeed and the user never sees an error, but the upstream that timed out is still sick, and the retry adds load to the remaining backends.

flowchart TD
  A[upstream timed out 110] --> B{Which phase?}
  B -->|while connecting| C[Network / TCP / backlog issue]
  B -->|while sending| D[Upstream read stall or buffering]
  B -->|while reading header| E[Backend processing is slow]
  B -->|while reading body| F[Large body / slow transfer]
  C --> G[Check upstream_connect_time and TCP path]
  E --> H[Check upstream_header_time and backend logs]

Common causes

Cause	What it looks like	First thing to check
Backend is genuinely slow or dead	Read-phase timeouts dominate; `$upstream_response_time` or `$upstream_header_time` near `proxy_read_timeout`; 504 rate rises	Backend CPU, memory, and application logs
Network partition or firewall between nginx and upstream	Connect-phase timeouts; `$upstream_connect_time` missing or maxed; TCP handshake never completes	Layer-3/4 reachability from the nginx host to the upstream port
Upstream accept queue full	Connect-phase timeouts; backend process is alive but kernel backlog is overflowing	`ss -tlnp` on the upstream and `TcpExtListenOverflows`
Retry storm exhausting upstream capacity	Multiple comma-separated values in `$upstream_response_time`; errors rotate across backends	Whether `proxy_next_upstream_timeout` or `proxy_next_upstream_tries` is unbounded
Keepalive pool exhausted	`$upstream_connect_time` suddenly nonzero and spiking; upstream otherwise healthy	Upstream keepalive connection reuse ratio
Dynamic upstream DNS resolution failing	Variable `proxy_pass` with `resolver`; 502s mixed with timeouts; latency clusters near `resolver_timeout` (default 30s)	`resolver` directive reachability and `valid=` TTL

Quick checks

Run these in order. They are read-only and safe on a live server.

# Identify the exact timeout phase from recent error logs
grep -E 'upstream timed out.*while (connecting|reading|sending)' /var/log/nginx/error.log | tail -20

# Inspect current timeout directive values in the running config
nginx -T 2>/dev/null | grep -E 'proxy_(connect|send|read)_timeout'

# Look for retry evidence in access logs (comma-separated upstream_response_time)
# Assumes $upstream_response_time is logged; adjust field position to your log_format
tail -10000 /var/log/nginx/access.log | awk '{print $NF}' | grep ',' | head -10

# Check nginx connection pressure (high Writing + low throughput = slow upstream)
curl -s http://127.0.0.1/stub_status

# Test raw TCP connectivity from nginx to each upstream backend
for backend in 10.0.1.10:8080 10.0.1.11:8080; do
  timeout 2 bash -c "echo > /dev/tcp/${backend%:*}/${backend#*:}" 2>/dev/null && \
    echo "$backend: UP" || echo "$backend: DOWN"
done

# Detect kernel-level drops that never reach nginx logs
nstat -az TcpExtListenOverflows 2>/dev/null | awk '/ListenOverflows/ {print $2}'

# Review proxy_next_upstream settings
nginx -T 2>/dev/null | grep -E 'proxy_next_upstream'

# Check if retry limits are actually bounded
nginx -T 2>/dev/null | grep -E 'proxy_next_upstream_(tries|timeout)'

How to diagnose it

Classify the phase from the error suffix.
while connecting points to the network or upstream TCP accept path. while reading response header points to backend application processing. while reading response body points to slow payload generation or transfer. while sending is rare and usually means upstream stopped reading.
Correlate with access-log timing.
Log $upstream_response_time, $upstream_connect_time, and $upstream_header_time. If $upstream_header_time is high while the gap between $upstream_response_time and $upstream_header_time is small, the backend is slow to generate the response, not slow to transfer it.
Detect retries from $upstream_response_time punctuation.
A comma separates times for different upstream servers contacted during retries. A colon separates times for different upstream groups when an internal redirect occurred. The last value is the final attempt. If you see commas, nginx retried and the first upstream failed.
Verify that retries are not making things worse.
Check whether proxy_next_upstream_timeout is set. The default is 0 (unlimited total wall-clock time). A request with multiple slow retries can hang for minutes. Also check proxy_next_upstream_tries. Since 1.7.5 this directive exists, but it is silently capped at the number of servers in the upstream block. With one upstream server, tries=5 still only attempts once.
Test the upstream directly from the nginx host.
Use curl or a raw TCP connect to bypass nginx entirely. If the direct test also times out, the problem is the backend or the network, not nginx configuration.
Check for 499s preceding 504s.
A spike in 499 status codes means clients are giving up before nginx fires the upstream timeout. This is often the first visible symptom and confirms that total latency is breaching client-side patience thresholds.
Rule out nginx-side saturation.
If active connections are near worker_connections * worker_processes or file descriptors are exhausted, nginx may be too constrained to maintain upstream connections efficiently. Check stub_status and per-worker FD counts.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`$upstream_response_time`	Isolates backend latency from client slowness	P95 > 80% of `proxy_read_timeout`
`$upstream_connect_time`	Reveals TCP/TLS handshake overhead	Nonzero and spiking when keepalive should reuse connections
`$upstream_header_time`	Shows time to first byte from upstream	Approaching `proxy_read_timeout` while body transfer time remains low
504 rate	Direct client impact of upstream timeouts	Any sustained nonzero rate
499 rate	Clients abandoning before nginx times out	Correlates with rising upstream latency; often precedes 504 spikes
Active connections in Writing state	Connections held waiting for upstream	Writing > 50% of active connections with low request throughput
`$upstream_addr`	Identifies which specific backend is failing	Repeated same-backend failures before a retry succeeds
`proxy_next_upstream_timeout` vs request duration	Total retry budget exhaustion	Requests hanging longer than the primary timeout because retries are unbounded

Fixes

Backend is slow or overloaded

Fix the backend. There is no nginx tuning that makes a slow database query fast.

If you need immediate relief, reduce proxy_read_timeout so nginx fails faster and frees connection slots. The tradeoff is more 504s for legitimate long requests. You can also temporarily shrink proxy_next_upstream_tries or remove the timeout keyword from proxy_next_upstream to stop retrying slow backends and avoid amplifying load. If caching is enabled, set proxy_cache_use_stale updating error timeout so nginx serves stale content while the backend recovers.

Network or connect-phase failures

Fix the network path or firewall rule. If the upstream is cross-region and legitimately needs more handshake time, increase proxy_connect_timeout to the minimum necessary value. Do not exceed the 75s hard ceiling documented in the nginx proxy module. If you are proxying to hostnames resolved per-request, ensure the resolver directive points to a reliable server and add a valid= cache TTL to avoid repeated DNS lookups.

Retry behavior causing cascades

Set proxy_next_upstream_timeout to a finite total budget. The default of 0 means retries can accumulate unlimited wall-clock time. This is a total limit across all attempts, not per-attempt.

Avoid adding http_500 to proxy_next_upstream unless you understand the interaction with max_fails. The default max_fails is 1 and fail_timeout is 10s. Adding http_500 means a request that returns HTTP 500 can be retried across multiple upstreams; each server that returns 500 is marked failed for 10 seconds. If the underlying bug affects all peers, this can empty the upstream pool.

Do not add non_idempotent to proxy_next_upstream unless the upstream safely handles duplicate non-idempotent requests. Without it, nginx correctly avoids retrying POST, LOCK, and PATCH requests on timeout. Enabling it risks duplicate side effects if the first attempt partially succeeded.

Capacity and resource limits on nginx

If nginx itself is saturated, timeouts can be a secondary effect. Increase worker_connections and worker_rlimit_nofile to ensure workers can maintain both client and upstream sockets. Each proxied request uses at least two connection slots. Reduce keepalive_timeout on the upstream side if idle connections are filling the upstream’s accept queue.

Prevention

Set proxy_read_timeout based on application SLA, not the default 60s. Endpoints with predictable fast responses should have tight timeouts; genuinely long-polling endpoints should have location-specific overrides.
Log $upstream_response_time, $upstream_connect_time, and $upstream_header_time in your access log. Without them, you cannot distinguish connect latency from processing latency.
Monitor the ratio of P95 $upstream_response_time to proxy_read_timeout. When it crosses 80%, mass timeouts are likely imminent.
Verify upstream keepalive reuse. Without connection reuse, every request pays a TCP handshake tax and contributes to ephemeral port exhaustion.
Set explicit proxy_next_upstream_timeout and proxy_next_upstream_tries to bound retry cost.
Do not rely solely on nginx passive health checking for critical paths. Open-source nginx discovers unhealthy upstreams by sending real user traffic to them. Use external health checks or nginx Plus active checks if you need probe-based detection.

How Netdata helps

Correlate 504 and 499 spikes with upstream response time percentiles to confirm backend degradation versus client impatience.
Monitor active connections and the Writing state ratio to detect upstream slowdown before timeouts fire.
Track file descriptor utilization and connection slot saturation to rule out nginx-side resource exhaustion.
Alert on kernel-level TcpExtListenOverflows and listen backlog depth to catch silent connection drops that never appear in nginx logs.
Parse access-log timing variables to flag P95 upstream latency approaching configured timeout thresholds.

The Netdata solution

Web server monitoring with Netdata

Netdata monitors NGINX with per-second request, connection, and latency metrics plus ML anomaly detection. Correlate connection and file-descriptor exhaustion, upstream cascade failures, buffer spill, and TLS CPU with the host signals behind them.

See web server monitoring → Start monitoring free

nginx upstream timed out (110: Connection timed out) while connecting/reading

nginx upstream timed out (110: Connection timed out) while connecting/reading

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Backend is slow or overloaded

Network or connect-phase failures

Retry behavior causing cascades

Capacity and resource limits on nginx

Prevention

How Netdata helps

Related guides

Web server monitoring with Netdata