Redis connected_clients climbing: connection leak detection

A sustained climb in connected_clients over hours or days while application traffic is flat is a classic Redis connection leak. Each connection costs roughly 10 KB of server-side memory. A thousand leaked connections consume ~100 MB independent of your dataset. If the instance is near maxmemory, that overhead can push Redis into eviction or OOM territory.

The default timeout is 0, so idle connections are never closed. Missing close() calls, connection pool misconfiguration, or unsubscribed pub/sub listeners all accumulate forever.

What this means

A sustained increase in connected_clients that does not track with instantaneous_ops_per_sec or request volume is a connection leak. Redis does not leak connections; the source is almost always the client application, a proxy, or a forgotten monitoring session.

With timeout 0 (the default), there is no automatic cleanup. Idle TCP connections persist indefinitely. The server maintains per-client query and output buffers. At ~10 KB per connection, 10,000 leaked connections consume ~100 MB of RAM independent of your dataset. If the leak continues, the instance hits maxclients (default 10,000, or lower if the OS file-descriptor limit is constrained). Once the limit is reached, Redis increments rejected_connections and new clients get connection errors.

flowchart TD
    A[connected_clients climbing] --> B{Correlates with traffic?}
    B -->|Yes| C[Capacity issue or organic growth]
    B -->|No| D[Connection leak]
    D --> E[Check CLIENT LIST idle]
    E --> F{Many high-idle connections?}
    F -->|Yes| G[Identify source addr and timeout config]
    F -->|No| H[Check blocked_clients and pub/sub]
    G --> I[Fix application or set timeout]
    H --> J[Investigate blocking commands or subscriber leak]

Common causes

CauseWhat it looks likeFirst thing to check
Application connection pool leakconnected_clients climbs steadily while ops/sec stays flat; often after a code deploy or exception-path changeCLIENT LIST sorted by idle; many connections from the same application host with high idle
Missing close() in exception handlersConnections spike during error storms and never drop; high total_connections_received rate but flat throughputApplication exception logs correlated with total_connections_received jumps
Pub/Sub subscriber leakSubscribers accumulate; pubsub_channels or pubsub_patterns grows; subscribers are exempt from timeoutPUBSUB NUMSUB and CLIENT LIST flags containing P
Proxy or load-balancer pooling issueAll connections appear from a single source IP (the proxy); low idle but high total countCLIENT LIST addr field showing single LB IP; proxy connection pool config
timeout left at default 0Every connection ever opened is still present; age is high but cmd is empty or oldCONFIG GET timeout

Quick checks

Run these read-only checks to confirm the leak and assess proximity to the limit.

# Check current connection count and hard limit
redis-cli INFO clients | grep -E "connected_clients|blocked_clients"
redis-cli CONFIG GET maxclients
# Measure connection churn: sample total_connections_received twice, 10s apart
redis-cli INFO stats | grep total_connections_received
# Inspect idle times and source addresses
redis-cli CLIENT LIST
# Check whether timeout is disabled (default 0)
redis-cli CONFIG GET timeout
# Check if connections are already being rejected
redis-cli INFO stats | grep rejected_connections
# Check for pub/sub subscribers that are exempt from timeout
redis-cli INFO pubsub

How to diagnose it

  1. Correlate connections with traffic. Pull connected_clients and instantaneous_ops_per_sec for the same time window. If the connection count climbs while throughput is flat, the growth is not driven by load.

  2. Measure churn with total_connections_received. This is a cumulative counter. Sample it twice over a known interval. If the rate is elevated while your application concurrency is stable, the application is opening new connections faster than it closes them.

  3. Inspect CLIENT LIST for idle connections. Look for many connections with large idle values (seconds since last command). If idle exceeds any reasonable application command interval, the connection is stale. Note the addr field to attribute leaks to specific application hosts or proxies.

  4. Check the age field. age is seconds since the connection was opened. A cluster of connections with similar age that never drops suggests a one-time event, like a deploy, created a batch of orphaned connections.

  5. Verify timeout configuration. Run CONFIG GET timeout. If the value is 0, idle connections will never be closed automatically. This is the default and the most common reason leaks accumulate.

  6. Identify Pub/Sub subscribers. Subscribers have a flags value containing P. Pub/Sub clients ignore timeout because idle subscription is expected behavior. If subscriber counts grow without bound, the application is subscribing and never unsubscribing or exiting.

  7. Calculate true connection capacity. On a primary with replicas or in Cluster mode, include connected_slaves and cluster_connections in the numerator: (connected_clients + connected_slaves + cluster_connections) / maxclients. If this ratio is above 0.8, you are close to rejection even if the raw client count looks healthy.

  8. Check rejected_connections. Any increase here means the leak has already caused client-visible errors. This is a lagging indicator, but it confirms severity.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
connected_clientsAbsolute connection countSustained climb uncorrelated with traffic
total_connections_received rateConnection churnRate increases while instantaneous_ops_per_sec stays flat
CLIENT LIST idleIdentifies stale connectionsMany connections with idle far exceeding the normal command interval
rejected_connectionsHard limit reachedAny rate > 0
Connection capacity ratioTrue headroom including replicas/cluster(clients + slaves + cluster) / maxclients > 0.8
client_recent_max_output_bufferMemory pressure from buffersGrowing output buffer memory per client

Fixes

Immediate relief: kill stale connections

If you are near maxclients and need to buy time, identify and kill the stalest connections. This is disruptive to those clients, which will need to reconnect, but it is safe for the server.

# Inspect CLIENT LIST to find the stale connection addr
redis-cli CLIENT LIST
# Kill a specific connection by address
redis-cli CLIENT KILL <ip:port>

Set a non-zero timeout

If the leak is in application code that cannot be deployed immediately, set timeout to a value that matches your longest legitimate idle period (in seconds). Common production values are 300 (5 minutes) or 600 (10 minutes).

redis-cli CONFIG SET timeout 300
# Persist the change
redis-cli CONFIG REWRITE

Tradeoffs: Pub/Sub subscribers are exempt from timeout. Long-running blocking commands (BLPOP, WAIT) reset the idle timer when data arrives, but if your application legitimately leaves connections idle for longer than the timeout, they will be closed. Do not set timeout lower than your application’s longest expected idle connection lifetime without testing.

Increase maxclients (if headroom exists)

If the OS file-descriptor limit allows, you can raise maxclients temporarily:

# Check current OS limit
ulimit -n
# Set new limit if OS headroom exists
redis-cli CONFIG SET maxclients 15000

Tradeoffs: This only delays the problem. Each connection still consumes ~10 KB. If the leak continues, you will exhaust memory or file descriptors eventually.

Enable client eviction (Redis 7.0+)

Redis 7.0 introduces maxmemory-clients. When aggregate client memory exceeds the threshold, Redis disconnects the highest-memory clients first. This is a safety net, not a fix for the root cause.

redis-cli CONFIG SET maxmemory-clients 5%

Tradeoffs: Replica and master connections are exempt. Normal clients and Pub/Sub subscribers can be evicted. If you have monitoring connections that must survive, mark them with CLIENT NO-EVICT ON.

Application-level fix

The permanent fix is to ensure every connection path has a corresponding close path. Common defects include:

  • Missing close() in exception handlers.
  • Connection pools created per-request instead of per-process.
  • pubsub objects not unsubscribed before application shutdown.
  • Framework integrations that recreate pools on each worker fork.

Fix the application so connection count tracks active concurrency, not total historical openings.

Prevention

  • Set timeout to a non-zero value in production. Do not rely on the default of 0.
  • Monitor total_connections_received rate alongside connected_clients. Churn without growth in throughput is an early leak indicator.
  • Monitor connection capacity ratio, not just absolute connected_clients. Include replicas and cluster bus connections in the calculation.
  • Use bounded connection pools in application code with a max_connections limit that is known and alertable.
  • Configure maxmemory-clients on Redis 7.0+ as a backstop.
  • Run periodic CLIENT LIST audits during low-traffic windows to establish a baseline of normal idle distributions.

How Netdata helps

  • Correlates connected_clients with instantaneous_ops_per_sec on the same timeline, making leaks visible as a divergence between the two signals.
  • Alerts on rejected_connections increases so you know when the leak has crossed from background noise to client impact.
  • Tracks total_connections_received as a rate, surfacing connection churn without manual sampling.
  • Surfaces memory overhead metrics (used_memory_overhead) alongside connection counts, showing when the leak is consuming meaningful RAM.
  • Provides the connection capacity ratio automatically, accounting for replicas and cluster connections where applicable.