nginx 499 status code: why clients close connections before the response
Status 499 in nginx access logs means the client closed the TCP connection before nginx finished responding. It is an nginx-specific code that never reaches the client, so it is easy to dismiss. In practice, a 499 surge is an early warning: users or intermediaries abandon requests before upstreams officially time out and before 5xx errors spike. Ignore 499s and you usually see 502s or 504s minutes later.
This guide shows how to read 499s, separate real user pain from load-balancer noise, and fix the root cause.
What this means
499 is an nginx-specific log-only status code. By the time nginx logs it, the client has already closed the TCP connection. The trigger can be a closed browser tab, a mobile app killing its socket, a CDN edge hitting an idle timeout, or a load-balancer probe giving up early.
Because the client is gone, 499 is invisible to end-user metrics and to your upstream application unless you parse nginx access logs. If nginx logs 499 while proxying, the upstream may still be processing the request. That wastes backend capacity and can leave partial side effects, especially for POST or PUT requests, with no successful response delivered.
Operationally, 499 is the canary that precedes 5xx: clients bail before the server officially times out.
flowchart TD
A[Spike in 499s] --> B{Check request_time}
B -->|Long| C[Client impatience]
B -->|Short| D[Network or LB drop]
C --> E[Check upstream_response_time]
D --> F[Check source IP and LB timeouts]
E --> G[Tune backend or timeouts]
F --> H[Align LB and nginx timeouts]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Slow upstream response | 499s rise with high $request_time and $upstream_response_time | Compare $request_time with $upstream_response_time |
| Intermediary idle timeout | 499s spread evenly across endpoints; request time sits near your proxy timeout | Load-balancer or CDN idle timeout vs. proxy_read_timeout |
| Aggressive health checks | Short $request_time, concentrated source IP, no user complaints | Source IP distribution in 499 access logs |
| Network-layer disconnect | Short $request_time, random distribution, no upstream latency trend | Kernel drop counters and connection state breakdown |
Quick checks
Run these read-only checks to characterize the scope before making changes. Adjust field numbers if your log_format differs from the default.
# Count 499s in recent traffic
tail -n 10000 /var/log/nginx/access.log | awk '$9 == 499 {c++} END {print "499 count:", c+0}'
# Distribution of status codes
tail -n 10000 /var/log/nginx/access.log | awk '{print $9}' | sort | uniq -c | sort -rn
# Active connection states (requires stub_status on /nginx_status)
curl -s http://127.0.0.1/nginx_status | awk '/Reading/ {print "R:"$2, "W:"$4, "Wait:"$6}'
# Upstream timeout messages in error log
grep -E 'upstream timed out' /var/log/nginx/error.log | tail -5
# Dropped connections: accepts minus handled (requires stub_status)
curl -s http://127.0.0.1/nginx_status | awk '/^ / {print "dropped:", $1-$2}'
# Top source IPs for 499s
awk '$9 == 499 {print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
How to diagnose it
Quantify the scope. Calculate the 499 rate against total requests. As a rule of thumb, a sustained rate above 1 percent signals active user abandonment; above 5 percent you are in outage territory. A brief spike during a deploy may be normal; a sustained climb is not.
Split by
$request_time. This is the most important split. Long$request_timeon 499s means clients ran out of patience waiting for a slow upstream. Short$request_timemeans the connection was killed by a network reset, firewall, or intermediary timeout before nginx had time to wait.Check upstream latency. For the long-request-time bucket, compare
$upstream_response_timeon non-499 requests. If upstream P95 is trending upward and approachingproxy_read_timeout, the backend is the bottleneck. If upstream time is normal but$request_timeis long, the delay is client-side or in nginx buffering.Identify the true client. If nginx sits behind a load balancer or CDN,
$remote_addris the intermediary, not the end user. If 499s cluster around a single internal IP, you are likely looking at health-check probes or internal proxy timeouts rather than user abandonment.Inspect connection states. Query
stub_status. If Writing connections dominate while request rate stays flat, nginx is holding connections open waiting for upstreams. That backlog confirms the impatience pattern. If Reading is high instead, you may be looking at slow clients or a slowloris pattern.Check for admission loss. Look at the
accepts - handledgap instub_status. If the gap grows while 499s rise, nginx is dropping connections because slots are full. The 499s are a symptom of connection exhaustion, not just slow backends.Validate timeout alignment. If an intermediary sits in front of nginx, compare its idle timeout to nginx proxy and send timeouts. If the intermediary gives up first, nginx logs 499s even though the upstream is still healthy. The intermediary timeout should be larger than nginx proxy timeouts, or reduce nginx timeouts to match.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| 499 rate | Canary for user abandonment before hard 5xx failures | Sustained rate above 1%, or a sudden spike correlating with latency |
$request_time on 499s | Distinguishes impatience from network drops | Long times mean upstream is too slow; short times mean RST or LB timeout |
$upstream_response_time | Is the backend actually slow? | P95 trending toward proxy_read_timeout |
| Active connections: Writing | Connections stuck waiting for upstream | Writing above 50% of active with low throughput |
| Dropped connections (accepts - handled) | Connection slot exhaustion | Gap growing while 499s rise |
| Error log: upstream timed out | Confirms backend slowness | Any sustained rate of upstream timeout messages |
Fixes
Slow upstream responses
If $upstream_response_time is high and 499s correlate with long $request_time, the backend is too slow. You can temporarily reduce proxy_read_timeout so nginx fails faster and frees connection slots. The tradeoff is that slow requests from clients who are still connected will return 504 Gateway Timeout instead of eventually becoming 499s. That is usually preferable because it releases resources faster and gives clients a clear error instead of a dropped connection.
Longer-term, scale the upstream, optimize the slow queries or endpoints, or add caching to reduce backend load.
Intermediary timeout mismatch
If a CDN or load balancer closes the connection before nginx finishes, align the timeouts. Either increase the intermediary idle timeout to be larger than proxy_read_timeout, or decrease proxy_read_timeout so nginx times out before the intermediary closes the connection. Reducing nginx proxy timeouts increases 504s but decreases 499s. Raising the LB timeout is usually the better fix unless the upstream is genuinely too slow.
Aggressive health checks
Health-check probes that open a connection and close it without reading the response generate 499s. These are false positives for user-facing incidents. Fix the health check to wait for a valid HTTP response, use a lightweight dedicated endpoint, or filter health-check traffic from your main access log.
Connection exhaustion
If stub_status shows active connections near worker_connections * worker_processes and the accepts-handled gap is growing, 499s are a side effect of saturation. Remember that each proxied request uses at least two connection slots. Increase worker_connections and ensure worker_rlimit_nofile is high enough to cover them. You can also reduce keepalive_timeout to reclaim idle keepalive slots faster.
Prevention
- Include
$request_time,$upstream_response_time,$upstream_connect_time, and$upstream_header_timein your access log format so you can always run the long-vs-short split. - Ensure that any load-balancer or CDN idle timeout upstream of nginx exceeds nginx proxy and send timeouts.
- Size
worker_connectionsfor the proxy multiplier: effective proxied capacity is at most half the configured maximum, minus keepalive overhead. - Monitor 499 rate as an early-warning metric. It should trend with latency before 5xx errors appear.
- Set
client_header_timeoutandclient_body_timeoutappropriately so slow clients do not hold slots indefinitely.
How Netdata helps
- Correlates 499 spikes with upstream response time and active connection state charts in real time.
- Surfaces 5xx and 499 rate anomalies together so you see the canary before the cascade.
- Tracks the accepts vs handled gap to alert on connection exhaustion that amplifies 499s.
- Breaks down nginx error log rates to reveal upstream timeout patterns behind the 499s.
Related guides
- How NGINX actually works in production: a mental model for operators
- nginx 502 Bad Gateway: causes and how to fix it
- nginx 503 Service Temporarily Unavailable: causes and fixes
- nginx 504 Gateway Time-out: causes and fixes
- NGINX active connections climbing: reading, writing, waiting explained
- NGINX backend cascade failure: when slow upstreams take down everything
- nginx connect() failed (111: Connection refused) while connecting to upstream
- NGINX connection exhaustion: detection, diagnosis, and prevention
- NGINX dropped connections: the accepts vs handled gap
- NGINX monitoring checklist: the signals every production server needs
- NGINX monitoring maturity model: from survival to expert
- nginx no live upstreams while connecting to upstream: what it means







