$ guides / redis / redis-blocked-clients-growing ▌

Operations Guides

Redis blocked_clients growing: dead consumers vs healthy queues

blocked_clients in INFO clients is climbing. In queue-based architectures this is often normal: workers call BLPOP, BRPOP, or XREAD BLOCK and wait for producers to push work. When blocked_clients grows while queue depth also grows, consumers are no longer consuming. They may have crashed, been OOM-killed, or stalled on replication lag via WAIT.

blocked_clients counts only clients waiting on explicit blocking commands. It does not capture clients stalled by slow commands like KEYS * or large SMEMBERS. A high value is either a healthy signal of an active queue pattern or a pathological signal of dead connections holding slots open forever, especially with timeout 0.

This guide shows how to tell the difference, what commands to run, and how to fix the underlying cause without restarting Redis.

What this means

blocked_clients increments when a client executes a blocking command and the required condition is not met. Commands that increment it include BLPOP, BRPOP, BLMOVE, BZPOPMIN, BZPOPMAX, XREAD BLOCK, XREADGROUP BLOCK, and WAIT. A client blocked on BLPOP with timeout 0 waits indefinitely until data arrives or the connection closes. If the consumer crashes while blocked, the TCP connection may hang open and Redis counts that client in blocked_clients forever, consuming one connection slot and never processing messages.

WAIT blocks the calling client until prior writes are acknowledged by numreplicas replicas within the timeout. If replicas are down or lagging and the timeout is large, WAIT holds a blocked slot until replication catches up or the timeout fires.

A healthy queue worker pool shows a stable blocked_clients count equal to the number of worker processes. The operator problem is sustained growth above baseline, or a count that nears connected_clients while queue depth increases.

flowchart TD
    A[blocked_clients growing] --> B{Queue depth growing?}
    B -->|Yes| C[Dead consumers or WAIT]
    B -->|No| D[Producer failure or healthy idle]
    C --> E{Replication lag?}
    E -->|Yes| F[WAIT blocking on lag]
    E -->|No| G[Crashed consumers with infinite timeout]

Common causes

Cause	What it looks like	First thing to check
Crashed consumers with `timeout=0`	`blocked_clients` grows; queue length grows; no processing visible in logs	`CLIENT LIST` for idle blocked connections; `LLEN` or `XLEN`
WAIT blocking on replication lag	`blocked_clients` grows after write bursts; replicas lagging or link down	`INFO replication` offset delta and `master_link_status`
Producer failure	`blocked_clients` stable at worker count; queues stay empty; no new jobs	Application producer logs; `LLEN` / `XLEN` near zero
Stream consumer group with no live consumers	Stream `lag` grows; `blocked_clients` may be flat because `XREADGROUP` sessions died	`XINFO GROUPS` lag and pending counts
Connection leak in blocking clients	`blocked_clients` and `connected_clients` both grow; high idle times	`CLIENT LIST` sorted by idle

Quick checks

Run these read-only commands to characterize the state.

# Confirm blocked client count
redis-cli INFO clients | grep blocked_clients

# Check total connection load
redis-cli INFO clients | grep connected_clients

# Identify blocked clients, their command, and idle time
redis-cli CLIENT LIST

# Check list or stream depth
redis-cli LLEN myqueue
redis-cli XLEN mystream

# Check stream consumer group health (Redis 7.0+ lag field)
redis-cli XINFO GROUPS mystream

# Check for WAIT-induced replication lag
redis-cli INFO replication | grep -E "master_repl_offset|slave_repl_offset|master_link_status"

# Rule out slow commands (these do NOT increment blocked_clients)
redis-cli SLOWLOG LEN
redis-cli SLOWLOG GET 10

# Check command mix to confirm blocking command usage
redis-cli INFO commandstats | grep -E "cmdstat_blpop|cmdstat_brpop|cmdstat_blmove|cmdstat_wait|cmdstat_xread"

Note: CLIENT LIST output includes the cmd field showing the current command and idle showing seconds since last interaction. High idle while cmd is a blocking command suggests a stalled connection; confirm against queue depth and consumer process health before treating it as a zombie.

How to diagnose it

Establish whether the count is truly anomalous. If your application runs 50 queue workers, a stable blocked_clients of 50 is normal. Alert on deviation from baseline, not on absolute value.
Determine which blocking commands are in use. Check INFO commandstats for cmdstat_blpop, cmdstat_brpop, cmdstat_wait, or cmdstat_xread. If none are present but blocked_clients is high, look for module-issued blocking operations or older commands like brpoplpush (deprecated, replaced by blmove).
Correlate with queue depth. Use LLEN for lists or XLEN for streams. If queue depth grows while blocked_clients also grows, consumers are not draining the queue. They are likely dead or stalled. If queue depth is near zero and blocked_clients is stable, the workers are simply idle.
Check for WAIT-specific lag. If cmdstat_wait is present, compare master_repl_offset on the primary with slave_repl_offset on the replica. A large and growing delta means replicas are behind. master_link_status:down on replicas used by WAIT will cause it to block until timeout.
Inspect individual blocked connections. In CLIENT LIST, look for entries with high idle seconds and cmd equal to a blocking command. If idle exceeds your expected processing interval, the consumer process is likely gone but the TCP connection has not yet timed out.
Check stream consumer groups separately. If you use streams, XINFO GROUPS shows lag (undelivered entries) and pending (delivered but unacknowledged). A growing lag while blocked_clients stays flat means XREADGROUP consumers have died and are not reconnecting. This does not always reflect in blocked_clients because the blocked sessions may have closed.
Confirm it is not slow-command blocking. Run SLOWLOG GET 50. Slow commands block the event loop but do not increment blocked_clients. If SLOWLOG is full of KEYS or large SMEMBERS, the real problem is command latency, not queue consumers.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`blocked_clients`	Count of clients in blocking state	Sustained growth above workload baseline
`connected_clients`	Total connection pool pressure	Growing in tandem with `blocked_clients`
List/stream length (`LLEN`/`XLEN`)	Distinguishes consumer death from producer outage	Queue growing while blocked count is high
Replication offset lag	For `WAIT`-induced blocking	`master_repl_offset` minus `slave_repl_offset` growing
`master_link_status`	Replica availability	`down` on replicas that `WAIT` depends on
Stream group `lag` / `pending`	Invisible buildup when stream consumers die	`lag` or `pending` growing continuously
`instantaneous_ops_per_sec`	Queue processing throughput	Drop correlated with rising `blocked_clients`

Fixes

Crashed consumers with infinite timeout

If consumers use BLPOP, BRPOP, or XREAD BLOCK with timeout 0, a crashed consumer leaves a blocked connection open indefinitely.

Use CLIENT UNBLOCK <client-id> TIMEOUT to force the connection to return nil as if the timeout fired. Use CLIENT UNBLOCK <client-id> ERROR to return -UNBLOCKED client unblocked via CLIENT UNBLOCK. Both free the slot immediately.
Kill the stale connection with CLIENT KILL ID <client-id>. CLIENT LIST provides the id field. This is safe but may cause the application to reconnect immediately if it is still alive.
Restart the consumer application to restore throughput.
Tradeoff: Unblocking or killing the connection drops any in-flight blocking context. The consumer must handle a nil response or reconnect gracefully.

WAIT blocking on replication lag

If WAIT is the source of blocked clients:

Fix the replica lag by investigating replica CPU, disk I/O, or network saturation. See Redis latency spikes: diagnosis with the LATENCY subsystem and Redis latest_fork_usec too high: THP, NUMA, and fork latency.
If replicas are down, fail them over or remove them so WAIT can proceed or fail fast.
Review whether WAIT with a large timeout is necessary. A shorter timeout fails the write durability check faster, freeing the blocked slot.

Producer failure with healthy consumers

If blocked_clients is stable at the worker count and queues are empty, but the expected job volume is absent:

Check application producer logs. The issue is upstream of Redis.
This is not a Redis incident. Do not restart Redis.

Stream consumer group death

If XINFO GROUPS shows growing lag or pending but blocked_clients does not reflect active consumers:

Use XAUTOCLAIM or XCLAIM to redistribute pending messages from dead consumers to live ones.
Ensure consumers call XACK after processing. Missing XACK causes pending to grow even when consumers are alive.

Prevention

Finite timeouts. A timeout of 0 leaves no recovery path if the consumer crashes; use BLPOP key 30 so Redis frees the slot automatically.
Baseline-relative alerts. Queue architectures have a normal blocked population equal to worker count; alert on deviation from baseline, not absolute value.
Queue depth correlation. blocked_clients alone cannot distinguish a healthy idle worker pool from dead consumers; correlate with LLEN, XLEN, or stream lag.
Replication backlog sizing. If you use WAIT, a small repl-backlog-size causes full resyncs that worsen lag; set it to 100MB or more.
Consumer liveness checks. Monitor stream consumer idle time via XINFO CONSUMERS and application process health independently of Redis.

How Netdata helps

Charts blocked_clients with connected_clients, instantaneous_ops_per_sec, and replication offset lag to distinguish consumer death from WAIT lag.
Monitors stream consumer group lag and pending counts to catch buildup that blocked_clients misses.
Displays replication lag and master_link_status alongside application metrics for WAIT diagnosis.

The Netdata solution

Redis monitoring with Netdata

Netdata monitors Redis with per-second metrics and ML anomaly detection. Track memory usage and fragmentation, fork/COW latency, replication backlog, evictions, and connection pressure to spot the failure modes in these runbooks early.

See Redis monitoring → Start monitoring free

Redis blocked_clients growing: dead consumers vs healthy queues

Redis blocked_clients growing: dead consumers vs healthy queues

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Crashed consumers with infinite timeout

WAIT blocking on replication lag

Producer failure with healthy consumers

Stream consumer group death

Prevention

How Netdata helps

Related guides

Redis monitoring with Netdata