$ guides / redis / redis-connected-clients-climbing ▌

Operations Guides

Redis connected_clients climbing: connection leak detection

A sustained climb in connected_clients over hours or days while application traffic is flat is a classic Redis connection leak. Each connection costs roughly 10 KB of server-side memory. A thousand leaked connections consume ~100 MB independent of your dataset. If the instance is near maxmemory, that overhead can push Redis into eviction or OOM territory.

The default timeout is 0, so idle connections are never closed. Missing close() calls, connection pool misconfiguration, or unsubscribed pub/sub listeners all accumulate forever.

What this means

A sustained increase in connected_clients that does not track with instantaneous_ops_per_sec or request volume is a connection leak. Redis does not leak connections; the source is almost always the client application, a proxy, or a forgotten monitoring session.

With timeout 0 (the default), there is no automatic cleanup. Idle TCP connections persist indefinitely. The server maintains per-client query and output buffers. At ~10 KB per connection, 10,000 leaked connections consume ~100 MB of RAM independent of your dataset. If the leak continues, the instance hits maxclients (default 10,000, or lower if the OS file-descriptor limit is constrained). Once the limit is reached, Redis increments rejected_connections and new clients get connection errors.

flowchart TD
    A[connected_clients climbing] --> B{Correlates with traffic?}
    B -->|Yes| C[Capacity issue or organic growth]
    B -->|No| D[Connection leak]
    D --> E[Check CLIENT LIST idle]
    E --> F{Many high-idle connections?}
    F -->|Yes| G[Identify source addr and timeout config]
    F -->|No| H[Check blocked_clients and pub/sub]
    G --> I[Fix application or set timeout]
    H --> J[Investigate blocking commands or subscriber leak]

Common causes

Cause	What it looks like	First thing to check
Application connection pool leak	`connected_clients` climbs steadily while ops/sec stays flat; often after a code deploy or exception-path change	`CLIENT LIST` sorted by `idle`; many connections from the same application host with high idle
Missing `close()` in exception handlers	Connections spike during error storms and never drop; high `total_connections_received` rate but flat throughput	Application exception logs correlated with `total_connections_received` jumps
Pub/Sub subscriber leak	Subscribers accumulate; `pubsub_channels` or `pubsub_patterns` grows; subscribers are exempt from `timeout`	`PUBSUB NUMSUB` and `CLIENT LIST` flags containing `P`
Proxy or load-balancer pooling issue	All connections appear from a single source IP (the proxy); low `idle` but high total count	`CLIENT LIST` `addr` field showing single LB IP; proxy connection pool config
`timeout` left at default 0	Every connection ever opened is still present; `age` is high but `cmd` is empty or old	`CONFIG GET timeout`

Quick checks

Run these read-only checks to confirm the leak and assess proximity to the limit.

# Check current connection count and hard limit
redis-cli INFO clients | grep -E "connected_clients|blocked_clients"
redis-cli CONFIG GET maxclients

# Measure connection churn: sample total_connections_received twice, 10s apart
redis-cli INFO stats | grep total_connections_received

# Inspect idle times and source addresses
redis-cli CLIENT LIST

# Check whether timeout is disabled (default 0)
redis-cli CONFIG GET timeout

# Check if connections are already being rejected
redis-cli INFO stats | grep rejected_connections

# Check for pub/sub subscribers that are exempt from timeout
redis-cli INFO pubsub

How to diagnose it

Correlate connections with traffic. Pull connected_clients and instantaneous_ops_per_sec for the same time window. If the connection count climbs while throughput is flat, the growth is not driven by load.
Measure churn with total_connections_received. This is a cumulative counter. Sample it twice over a known interval. If the rate is elevated while your application concurrency is stable, the application is opening new connections faster than it closes them.
Inspect CLIENT LIST for idle connections. Look for many connections with large idle values (seconds since last command). If idle exceeds any reasonable application command interval, the connection is stale. Note the addr field to attribute leaks to specific application hosts or proxies.
Check the age field. age is seconds since the connection was opened. A cluster of connections with similar age that never drops suggests a one-time event, like a deploy, created a batch of orphaned connections.
Verify timeout configuration. Run CONFIG GET timeout. If the value is 0, idle connections will never be closed automatically. This is the default and the most common reason leaks accumulate.
Identify Pub/Sub subscribers. Subscribers have a flags value containing P. Pub/Sub clients ignore timeout because idle subscription is expected behavior. If subscriber counts grow without bound, the application is subscribing and never unsubscribing or exiting.
Calculate true connection capacity. On a primary with replicas or in Cluster mode, include connected_slaves and cluster_connections in the numerator: (connected_clients + connected_slaves + cluster_connections) / maxclients. If this ratio is above 0.8, you are close to rejection even if the raw client count looks healthy.
Check rejected_connections. Any increase here means the leak has already caused client-visible errors. This is a lagging indicator, but it confirms severity.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`connected_clients`	Absolute connection count	Sustained climb uncorrelated with traffic
`total_connections_received` rate	Connection churn	Rate increases while `instantaneous_ops_per_sec` stays flat
`CLIENT LIST` `idle`	Identifies stale connections	Many connections with `idle` far exceeding the normal command interval
`rejected_connections`	Hard limit reached	Any rate > 0
Connection capacity ratio	True headroom including replicas/cluster	`(clients + slaves + cluster) / maxclients` > 0.8
`client_recent_max_output_buffer`	Memory pressure from buffers	Growing output buffer memory per client

Fixes

Immediate relief: kill stale connections

If you are near maxclients and need to buy time, identify and kill the stalest connections. This is disruptive to those clients, which will need to reconnect, but it is safe for the server.

# Inspect CLIENT LIST to find the stale connection addr
redis-cli CLIENT LIST
# Kill a specific connection by address
redis-cli CLIENT KILL <ip:port>

Set a non-zero timeout

If the leak is in application code that cannot be deployed immediately, set timeout to a value that matches your longest legitimate idle period (in seconds). Common production values are 300 (5 minutes) or 600 (10 minutes).

redis-cli CONFIG SET timeout 300
# Persist the change
redis-cli CONFIG REWRITE

Tradeoffs: Pub/Sub subscribers are exempt from timeout. Long-running blocking commands (BLPOP, WAIT) reset the idle timer when data arrives, but if your application legitimately leaves connections idle for longer than the timeout, they will be closed. Do not set timeout lower than your application’s longest expected idle connection lifetime without testing.

Increase maxclients (if headroom exists)

If the OS file-descriptor limit allows, you can raise maxclients temporarily:

# Check current OS limit
ulimit -n
# Set new limit if OS headroom exists
redis-cli CONFIG SET maxclients 15000

Tradeoffs: This only delays the problem. Each connection still consumes ~10 KB. If the leak continues, you will exhaust memory or file descriptors eventually.

Enable client eviction (Redis 7.0+)

Redis 7.0 introduces maxmemory-clients. When aggregate client memory exceeds the threshold, Redis disconnects the highest-memory clients first. This is a safety net, not a fix for the root cause.

redis-cli CONFIG SET maxmemory-clients 5%

Tradeoffs: Replica and master connections are exempt. Normal clients and Pub/Sub subscribers can be evicted. If you have monitoring connections that must survive, mark them with CLIENT NO-EVICT ON.

Application-level fix

The permanent fix is to ensure every connection path has a corresponding close path. Common defects include:

Missing close() in exception handlers.
Connection pools created per-request instead of per-process.
pubsub objects not unsubscribed before application shutdown.
Framework integrations that recreate pools on each worker fork.

Fix the application so connection count tracks active concurrency, not total historical openings.

Prevention

Set timeout to a non-zero value in production. Do not rely on the default of 0.
Monitor total_connections_received rate alongside connected_clients. Churn without growth in throughput is an early leak indicator.
Monitor connection capacity ratio, not just absolute connected_clients. Include replicas and cluster bus connections in the calculation.
Use bounded connection pools in application code with a max_connections limit that is known and alertable.
Configure maxmemory-clients on Redis 7.0+ as a backstop.
Run periodic CLIENT LIST audits during low-traffic windows to establish a baseline of normal idle distributions.

How Netdata helps

Correlates connected_clients with instantaneous_ops_per_sec on the same timeline, making leaks visible as a divergence between the two signals.
Alerts on rejected_connections increases so you know when the leak has crossed from background noise to client impact.
Tracks total_connections_received as a rate, surfacing connection churn without manual sampling.
Surfaces memory overhead metrics (used_memory_overhead) alongside connection counts, showing when the leak is consuming meaningful RAM.
Provides the connection capacity ratio automatically, accounting for replicas and cluster connections where applicable.

How Redis actually works in production: a mental model for operators: /guides/redis/how-redis-works-in-production/
Redis aof_last_write_status:err: AOF write failures and recovery: /guides/redis/redis-aof-last-write-status-err/
Redis appendfsync always latency: durability vs throughput trade-offs: /guides/redis/redis-appendfsync-always-latency/
Redis BUSY Redis is busy running a script: blocking Lua and how to recover: /guides/redis/redis-busy-running-script/
Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix: /guides/redis/redis-cant-save-in-background-fork/
Redis event loop blocked: when one slow command freezes everything: /guides/redis/redis-event-loop-blocked/
Redis eviction policy tuning: allkeys-lru vs volatile-ttl vs noeviction: /guides/redis/redis-eviction-policy-tuning/
Redis fork/COW memory storm: why persistence doubles RSS and OOM-kills the box: /guides/redis/redis-fork-cow-storm/
Redis KEYS command blocking production: why to replace it with SCAN: /guides/redis/redis-keys-command-blocking-production/
Redis latency spikes: diagnosis with the LATENCY subsystem: /guides/redis/redis-latency-spikes-diagnosis/
Redis latest_fork_usec too high: THP, NUMA, and fork latency: /guides/redis/redis-latest-fork-usec-high/
Redis max number of clients reached: maxclients and rejected_connections: /guides/redis/redis-max-number-of-clients-reached/

The Netdata solution

Redis monitoring with Netdata

Netdata monitors Redis with per-second metrics and ML anomaly detection. Track memory usage and fragmentation, fork/COW latency, replication backlog, evictions, and connection pressure to spot the failure modes in these runbooks early.

See Redis monitoring → Start monitoring free

Redis connected_clients climbing: connection leak detection

Redis connected_clients climbing: connection leak detection

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Immediate relief: kill stale connections

Set a non-zero timeout

Increase maxclients (if headroom exists)

Enable client eviction (Redis 7.0+)

Application-level fix

Prevention

How Netdata helps

Related guides

Redis monitoring with Netdata