Redis connected_clients climbing: connection leak detection
A sustained climb in connected_clients over hours or days while application traffic is flat is a classic Redis connection leak. Each connection costs roughly 10 KB of server-side memory. A thousand leaked connections consume ~100 MB independent of your dataset. If the instance is near maxmemory, that overhead can push Redis into eviction or OOM territory.
The default timeout is 0, so idle connections are never closed. Missing close() calls, connection pool misconfiguration, or unsubscribed pub/sub listeners all accumulate forever.
What this means
A sustained increase in connected_clients that does not track with instantaneous_ops_per_sec or request volume is a connection leak. Redis does not leak connections; the source is almost always the client application, a proxy, or a forgotten monitoring session.
With timeout 0 (the default), there is no automatic cleanup. Idle TCP connections persist indefinitely. The server maintains per-client query and output buffers. At ~10 KB per connection, 10,000 leaked connections consume ~100 MB of RAM independent of your dataset. If the leak continues, the instance hits maxclients (default 10,000, or lower if the OS file-descriptor limit is constrained). Once the limit is reached, Redis increments rejected_connections and new clients get connection errors.
flowchart TD
A[connected_clients climbing] --> B{Correlates with traffic?}
B -->|Yes| C[Capacity issue or organic growth]
B -->|No| D[Connection leak]
D --> E[Check CLIENT LIST idle]
E --> F{Many high-idle connections?}
F -->|Yes| G[Identify source addr and timeout config]
F -->|No| H[Check blocked_clients and pub/sub]
G --> I[Fix application or set timeout]
H --> J[Investigate blocking commands or subscriber leak]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Application connection pool leak | connected_clients climbs steadily while ops/sec stays flat; often after a code deploy or exception-path change | CLIENT LIST sorted by idle; many connections from the same application host with high idle |
Missing close() in exception handlers | Connections spike during error storms and never drop; high total_connections_received rate but flat throughput | Application exception logs correlated with total_connections_received jumps |
| Pub/Sub subscriber leak | Subscribers accumulate; pubsub_channels or pubsub_patterns grows; subscribers are exempt from timeout | PUBSUB NUMSUB and CLIENT LIST flags containing P |
| Proxy or load-balancer pooling issue | All connections appear from a single source IP (the proxy); low idle but high total count | CLIENT LIST addr field showing single LB IP; proxy connection pool config |
timeout left at default 0 | Every connection ever opened is still present; age is high but cmd is empty or old | CONFIG GET timeout |
Quick checks
Run these read-only checks to confirm the leak and assess proximity to the limit.
# Check current connection count and hard limit
redis-cli INFO clients | grep -E "connected_clients|blocked_clients"
redis-cli CONFIG GET maxclients
# Measure connection churn: sample total_connections_received twice, 10s apart
redis-cli INFO stats | grep total_connections_received
# Inspect idle times and source addresses
redis-cli CLIENT LIST
# Check whether timeout is disabled (default 0)
redis-cli CONFIG GET timeout
# Check if connections are already being rejected
redis-cli INFO stats | grep rejected_connections
# Check for pub/sub subscribers that are exempt from timeout
redis-cli INFO pubsub
How to diagnose it
Correlate connections with traffic. Pull
connected_clientsandinstantaneous_ops_per_secfor the same time window. If the connection count climbs while throughput is flat, the growth is not driven by load.Measure churn with
total_connections_received. This is a cumulative counter. Sample it twice over a known interval. If the rate is elevated while your application concurrency is stable, the application is opening new connections faster than it closes them.Inspect
CLIENT LISTfor idle connections. Look for many connections with largeidlevalues (seconds since last command). Ifidleexceeds any reasonable application command interval, the connection is stale. Note theaddrfield to attribute leaks to specific application hosts or proxies.Check the
agefield.ageis seconds since the connection was opened. A cluster of connections with similaragethat never drops suggests a one-time event, like a deploy, created a batch of orphaned connections.Verify
timeoutconfiguration. RunCONFIG GET timeout. If the value is 0, idle connections will never be closed automatically. This is the default and the most common reason leaks accumulate.Identify Pub/Sub subscribers. Subscribers have a
flagsvalue containingP. Pub/Sub clients ignoretimeoutbecause idle subscription is expected behavior. If subscriber counts grow without bound, the application is subscribing and never unsubscribing or exiting.Calculate true connection capacity. On a primary with replicas or in Cluster mode, include
connected_slavesandcluster_connectionsin the numerator:(connected_clients + connected_slaves + cluster_connections) / maxclients. If this ratio is above 0.8, you are close to rejection even if the raw client count looks healthy.Check
rejected_connections. Any increase here means the leak has already caused client-visible errors. This is a lagging indicator, but it confirms severity.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
connected_clients | Absolute connection count | Sustained climb uncorrelated with traffic |
total_connections_received rate | Connection churn | Rate increases while instantaneous_ops_per_sec stays flat |
CLIENT LIST idle | Identifies stale connections | Many connections with idle far exceeding the normal command interval |
rejected_connections | Hard limit reached | Any rate > 0 |
| Connection capacity ratio | True headroom including replicas/cluster | (clients + slaves + cluster) / maxclients > 0.8 |
client_recent_max_output_buffer | Memory pressure from buffers | Growing output buffer memory per client |
Fixes
Immediate relief: kill stale connections
If you are near maxclients and need to buy time, identify and kill the stalest connections. This is disruptive to those clients, which will need to reconnect, but it is safe for the server.
# Inspect CLIENT LIST to find the stale connection addr
redis-cli CLIENT LIST
# Kill a specific connection by address
redis-cli CLIENT KILL <ip:port>
Set a non-zero timeout
If the leak is in application code that cannot be deployed immediately, set timeout to a value that matches your longest legitimate idle period (in seconds). Common production values are 300 (5 minutes) or 600 (10 minutes).
redis-cli CONFIG SET timeout 300
# Persist the change
redis-cli CONFIG REWRITE
Tradeoffs: Pub/Sub subscribers are exempt from timeout. Long-running blocking commands (BLPOP, WAIT) reset the idle timer when data arrives, but if your application legitimately leaves connections idle for longer than the timeout, they will be closed. Do not set timeout lower than your application’s longest expected idle connection lifetime without testing.
Increase maxclients (if headroom exists)
If the OS file-descriptor limit allows, you can raise maxclients temporarily:
# Check current OS limit
ulimit -n
# Set new limit if OS headroom exists
redis-cli CONFIG SET maxclients 15000
Tradeoffs: This only delays the problem. Each connection still consumes ~10 KB. If the leak continues, you will exhaust memory or file descriptors eventually.
Enable client eviction (Redis 7.0+)
Redis 7.0 introduces maxmemory-clients. When aggregate client memory exceeds the threshold, Redis disconnects the highest-memory clients first. This is a safety net, not a fix for the root cause.
redis-cli CONFIG SET maxmemory-clients 5%
Tradeoffs: Replica and master connections are exempt. Normal clients and Pub/Sub subscribers can be evicted. If you have monitoring connections that must survive, mark them with CLIENT NO-EVICT ON.
Application-level fix
The permanent fix is to ensure every connection path has a corresponding close path. Common defects include:
- Missing
close()in exception handlers. - Connection pools created per-request instead of per-process.
pubsubobjects not unsubscribed before application shutdown.- Framework integrations that recreate pools on each worker fork.
Fix the application so connection count tracks active concurrency, not total historical openings.
Prevention
- Set
timeoutto a non-zero value in production. Do not rely on the default of 0. - Monitor
total_connections_receivedrate alongsideconnected_clients. Churn without growth in throughput is an early leak indicator. - Monitor connection capacity ratio, not just absolute
connected_clients. Include replicas and cluster bus connections in the calculation. - Use bounded connection pools in application code with a
max_connectionslimit that is known and alertable. - Configure
maxmemory-clientson Redis 7.0+ as a backstop. - Run periodic
CLIENT LISTaudits during low-traffic windows to establish a baseline of normal idle distributions.
How Netdata helps
- Correlates
connected_clientswithinstantaneous_ops_per_secon the same timeline, making leaks visible as a divergence between the two signals. - Alerts on
rejected_connectionsincreases so you know when the leak has crossed from background noise to client impact. - Tracks
total_connections_receivedas a rate, surfacing connection churn without manual sampling. - Surfaces memory overhead metrics (
used_memory_overhead) alongside connection counts, showing when the leak is consuming meaningful RAM. - Provides the connection capacity ratio automatically, accounting for replicas and cluster connections where applicable.
Related guides
- How Redis actually works in production: a mental model for operators: /guides/redis/how-redis-works-in-production/
- Redis aof_last_write_status:err: AOF write failures and recovery: /guides/redis/redis-aof-last-write-status-err/
- Redis appendfsync always latency: durability vs throughput trade-offs: /guides/redis/redis-appendfsync-always-latency/
- Redis BUSY Redis is busy running a script: blocking Lua and how to recover: /guides/redis/redis-busy-running-script/
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix: /guides/redis/redis-cant-save-in-background-fork/
- Redis event loop blocked: when one slow command freezes everything: /guides/redis/redis-event-loop-blocked/
- Redis eviction policy tuning: allkeys-lru vs volatile-ttl vs noeviction: /guides/redis/redis-eviction-policy-tuning/
- Redis fork/COW memory storm: why persistence doubles RSS and OOM-kills the box: /guides/redis/redis-fork-cow-storm/
- Redis KEYS command blocking production: why to replace it with SCAN: /guides/redis/redis-keys-command-blocking-production/
- Redis latency spikes: diagnosis with the LATENCY subsystem: /guides/redis/redis-latency-spikes-diagnosis/
- Redis latest_fork_usec too high: THP, NUMA, and fork latency: /guides/redis/redis-latest-fork-usec-high/
- Redis max number of clients reached: maxclients and rejected_connections: /guides/redis/redis-max-number-of-clients-reached/







