nginx 499 status code: why clients close connections before the response

Status 499 in nginx access logs means the client closed the TCP connection before nginx finished responding. It is an nginx-specific code that never reaches the client, so it is easy to dismiss. In practice, a 499 surge is an early warning: users or intermediaries abandon requests before upstreams officially time out and before 5xx errors spike. Ignore 499s and you usually see 502s or 504s minutes later.

This guide shows how to read 499s, separate real user pain from load-balancer noise, and fix the root cause.

What this means

499 is an nginx-specific log-only status code. By the time nginx logs it, the client has already closed the TCP connection. The trigger can be a closed browser tab, a mobile app killing its socket, a CDN edge hitting an idle timeout, or a load-balancer probe giving up early.

Because the client is gone, 499 is invisible to end-user metrics and to your upstream application unless you parse nginx access logs. If nginx logs 499 while proxying, the upstream may still be processing the request. That wastes backend capacity and can leave partial side effects, especially for POST or PUT requests, with no successful response delivered.

Operationally, 499 is the canary that precedes 5xx: clients bail before the server officially times out.

flowchart TD
  A[Spike in 499s] --> B{Check request_time}
  B -->|Long| C[Client impatience]
  B -->|Short| D[Network or LB drop]
  C --> E[Check upstream_response_time]
  D --> F[Check source IP and LB timeouts]
  E --> G[Tune backend or timeouts]
  F --> H[Align LB and nginx timeouts]

Common causes

CauseWhat it looks likeFirst thing to check
Slow upstream response499s rise with high $request_time and $upstream_response_timeCompare $request_time with $upstream_response_time
Intermediary idle timeout499s spread evenly across endpoints; request time sits near your proxy timeoutLoad-balancer or CDN idle timeout vs. proxy_read_timeout
Aggressive health checksShort $request_time, concentrated source IP, no user complaintsSource IP distribution in 499 access logs
Network-layer disconnectShort $request_time, random distribution, no upstream latency trendKernel drop counters and connection state breakdown

Quick checks

Run these read-only checks to characterize the scope before making changes. Adjust field numbers if your log_format differs from the default.

# Count 499s in recent traffic
tail -n 10000 /var/log/nginx/access.log | awk '$9 == 499 {c++} END {print "499 count:", c+0}'

# Distribution of status codes
tail -n 10000 /var/log/nginx/access.log | awk '{print $9}' | sort | uniq -c | sort -rn

# Active connection states (requires stub_status on /nginx_status)
curl -s http://127.0.0.1/nginx_status | awk '/Reading/ {print "R:"$2, "W:"$4, "Wait:"$6}'

# Upstream timeout messages in error log
grep -E 'upstream timed out' /var/log/nginx/error.log | tail -5

# Dropped connections: accepts minus handled (requires stub_status)
curl -s http://127.0.0.1/nginx_status | awk '/^ / {print "dropped:", $1-$2}'

# Top source IPs for 499s
awk '$9 == 499 {print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

How to diagnose it

  1. Quantify the scope. Calculate the 499 rate against total requests. As a rule of thumb, a sustained rate above 1 percent signals active user abandonment; above 5 percent you are in outage territory. A brief spike during a deploy may be normal; a sustained climb is not.

  2. Split by $request_time. This is the most important split. Long $request_time on 499s means clients ran out of patience waiting for a slow upstream. Short $request_time means the connection was killed by a network reset, firewall, or intermediary timeout before nginx had time to wait.

  3. Check upstream latency. For the long-request-time bucket, compare $upstream_response_time on non-499 requests. If upstream P95 is trending upward and approaching proxy_read_timeout, the backend is the bottleneck. If upstream time is normal but $request_time is long, the delay is client-side or in nginx buffering.

  4. Identify the true client. If nginx sits behind a load balancer or CDN, $remote_addr is the intermediary, not the end user. If 499s cluster around a single internal IP, you are likely looking at health-check probes or internal proxy timeouts rather than user abandonment.

  5. Inspect connection states. Query stub_status. If Writing connections dominate while request rate stays flat, nginx is holding connections open waiting for upstreams. That backlog confirms the impatience pattern. If Reading is high instead, you may be looking at slow clients or a slowloris pattern.

  6. Check for admission loss. Look at the accepts - handled gap in stub_status. If the gap grows while 499s rise, nginx is dropping connections because slots are full. The 499s are a symptom of connection exhaustion, not just slow backends.

  7. Validate timeout alignment. If an intermediary sits in front of nginx, compare its idle timeout to nginx proxy and send timeouts. If the intermediary gives up first, nginx logs 499s even though the upstream is still healthy. The intermediary timeout should be larger than nginx proxy timeouts, or reduce nginx timeouts to match.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
499 rateCanary for user abandonment before hard 5xx failuresSustained rate above 1%, or a sudden spike correlating with latency
$request_time on 499sDistinguishes impatience from network dropsLong times mean upstream is too slow; short times mean RST or LB timeout
$upstream_response_timeIs the backend actually slow?P95 trending toward proxy_read_timeout
Active connections: WritingConnections stuck waiting for upstreamWriting above 50% of active with low throughput
Dropped connections (accepts - handled)Connection slot exhaustionGap growing while 499s rise
Error log: upstream timed outConfirms backend slownessAny sustained rate of upstream timeout messages

Fixes

Slow upstream responses

If $upstream_response_time is high and 499s correlate with long $request_time, the backend is too slow. You can temporarily reduce proxy_read_timeout so nginx fails faster and frees connection slots. The tradeoff is that slow requests from clients who are still connected will return 504 Gateway Timeout instead of eventually becoming 499s. That is usually preferable because it releases resources faster and gives clients a clear error instead of a dropped connection.

Longer-term, scale the upstream, optimize the slow queries or endpoints, or add caching to reduce backend load.

Intermediary timeout mismatch

If a CDN or load balancer closes the connection before nginx finishes, align the timeouts. Either increase the intermediary idle timeout to be larger than proxy_read_timeout, or decrease proxy_read_timeout so nginx times out before the intermediary closes the connection. Reducing nginx proxy timeouts increases 504s but decreases 499s. Raising the LB timeout is usually the better fix unless the upstream is genuinely too slow.

Aggressive health checks

Health-check probes that open a connection and close it without reading the response generate 499s. These are false positives for user-facing incidents. Fix the health check to wait for a valid HTTP response, use a lightweight dedicated endpoint, or filter health-check traffic from your main access log.

Connection exhaustion

If stub_status shows active connections near worker_connections * worker_processes and the accepts-handled gap is growing, 499s are a side effect of saturation. Remember that each proxied request uses at least two connection slots. Increase worker_connections and ensure worker_rlimit_nofile is high enough to cover them. You can also reduce keepalive_timeout to reclaim idle keepalive slots faster.

Prevention

  • Include $request_time, $upstream_response_time, $upstream_connect_time, and $upstream_header_time in your access log format so you can always run the long-vs-short split.
  • Ensure that any load-balancer or CDN idle timeout upstream of nginx exceeds nginx proxy and send timeouts.
  • Size worker_connections for the proxy multiplier: effective proxied capacity is at most half the configured maximum, minus keepalive overhead.
  • Monitor 499 rate as an early-warning metric. It should trend with latency before 5xx errors appear.
  • Set client_header_timeout and client_body_timeout appropriately so slow clients do not hold slots indefinitely.

How Netdata helps

  • Correlates 499 spikes with upstream response time and active connection state charts in real time.
  • Surfaces 5xx and 499 rate anomalies together so you see the canary before the cascade.
  • Tracks the accepts vs handled gap to alert on connection exhaustion that amplifies 499s.
  • Breaks down nginx error log rates to reveal upstream timeout patterns behind the 499s.