Redis Stream consumer group lag: pending entries and dead consumers
Stream processing falls behind. An alert fires on used_memory, or your application dashboard shows lag. You run XINFO GROUPS and see lag or pending climbing. One means consumers cannot keep up. The other means they are not acknowledging messages. Both increase memory pressure, but they have different fixes. This guide shows how to tell them apart, find dead consumers, and clean up the Pending Entry List before it becomes a memory incident.
What this means
Redis Streams consumer groups track two distinct backlogs.
lag, returned by XINFO GROUPS in Redis 7.0+, counts entries the group has not yet delivered to any consumer. Growing lag means producers outpace consumers. This is a throughput problem.
pending counts entries delivered but not yet acknowledged with XACK. Growing pending means consumers received messages and never confirmed completion. The consumer may have crashed, hung, or omitted XACK. This is a reliability problem.
The Pending Entry List (PEL) is a per-group structure. Every pending entry carries metadata overhead. A large PEL consumes heap memory and slows group operations that inspect pending state.
XLEN measures total stream entries. Without MAXLEN or MINID trimming, XLEN grows without bound and adds its own memory cost. Lag and pending are consumer-group signals. XLEN is a stream-level retention signal. Watch all three.
flowchart TD
A[XINFO GROUPS] --> B{lag growing?}
B -->|Yes| C[Scale consumers]
B -->|No| D{pending growing?}
D -->|Yes| E[XINFO CONSUMERS idle]
D -->|No| F[Check XLEN trimming]
E --> G{Idle high?}
G -->|Yes| H[XAUTOCLAIM dead consumer]
G -->|No| I[Fix missing XACK]
F --> J[Add MAXLEN or MINID]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Producer rate exceeds consumer capacity | lag growing steadily; pending stable or slowly rising; high producer ops | XINFO GROUPS lag trend and instantaneous_ops_per_sec |
| Consumer crash or hang without XACK | pending climbing; lag flat or low; one consumer shows high idle time or is missing | XINFO CONSUMERS idle time and application error logs |
| Application logic omits XACK | pending grows linearly with throughput; consumers appear healthy and idle time stays low | XPENDING entry details and application code review |
| Unbounded stream growth | XLEN increases indefinitely; memory climbs even when lag and pending are stable | XLEN and whether MAXLEN or MINID is configured |
| Dead consumer entries never reclaimed | pending accumulates after a consumer disappears; no janitor runs XAUTOCLAIM | XINFO CONSUMERS list compared to expected instances |
Quick checks
Run these read-only commands to triage.
# Check group-level lag and pending counts
redis-cli XINFO GROUPS mystream
# Check per-consumer health and idle time
redis-cli XINFO CONSUMERS mystream mygroup
# Check total stream length
redis-cli XLEN mystream
# Inspect pending entry IDs, idle time, and delivery count
redis-cli XPENDING mystream mygroup - + 10
# Check memory pressure that PEL growth can cause
redis-cli INFO memory | grep -E "used_memory:|used_memory_rss:"
# Check for consumers blocked on stream reads
redis-cli INFO clients | grep blocked_clients
How to diagnose it
- Run
XINFO GROUPS <stream>and comparelagandpending. Iflagis rising fastest, focus on throughput. Ifpendingis rising fastest, focus on consumer reliability. - Run
XINFO CONSUMERS <stream> <group>and compare idle times against your expected read interval. A consumer whose idle time is many multiples of the interval is likely dead. - Run
XPENDING <stream> <group> - + 10to inspect individual entries. High idle times on specific entries point to stuck messages. High delivery counts point to messages being redelivered repeatedly. - Run
XLEN <stream>to verify the stream is not growing without bound. IfXLENis orders of magnitude larger than your retention target, trimming is missing. - Check
INFO memoryforused_memoryandused_memory_rssgrowth that correlates with backlog growth. The PEL is heap-allocated; large pending lists show up as memory growth. - Check application logs for consumer crashes, restarts, or exceptions that would prevent an
XACKfrom executing. - If consumers appear healthy but lag persists, check
INFO clients | grep blocked_clients. A drop in blocked clients alongside rising lag can indicate consumers are crashing between reads rather than staying connected and blocked.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
XINFO GROUPS lag | Entries produced but not yet delivered to the group | Sustained growth over multiple minutes |
XINFO GROUPS pending | Delivered but unacknowledged entries in the PEL | Sustained growth; correlates with memory pressure |
XLEN | Total entries retained in the stream | Growing without bound when trimming should cap it |
Consumer idle time from XINFO CONSUMERS | Time since the consumer last interacted with the group | Exceeds 10x the expected read interval |
used_memory and used_memory_rss | Heap memory including PEL and stream overhead | Growth correlated with pending or stream length |
blocked_clients | Consumers waiting on blocking commands | Sustained count above baseline for stream workloads |
instantaneous_ops_per_sec | Overall command throughput | Sudden drop may indicate blocked or crashed consumers |
Fixes
Consumers cannot keep up (growing lag)
Add consumer instances if the workload partitions. Review the batch size in XREADGROUP. Too many entries per fetch increases processing latency; too few increases round-trip overhead. Check consumer CPU and network saturation. If consumers use BLOCK, ensure the timeout is not so long that recovery from a stalled consumer is delayed.
If scaling consumers is not immediate, increase the COUNT argument in XREADGROUP to pull larger batches, or apply backpressure at the producer if the pipeline supports it. Verify consumers are not bottlenecked on downstream dependencies, such as databases or external APIs, which can manifest as low CPU but high lag.
Pending entries from dead or crashed consumers
Identify the dead consumer via XINFO CONSUMERS idle time. Run XAUTOCLAIM to reassign its pending entries to a healthy consumer, or use XCLAIM for fine-grained control. Start with a conservative idle-time threshold in XAUTOCLAIM – at least several times your maximum expected processing time – to avoid stealing active work. After claiming, verify the receiving consumer has idempotent processing logic, because XAUTOCLAIM does not deduplicate entries already delivered to other consumers.
The claiming consumer must still call XACK after processing, or the entries remain in the PEL. If you do not have a janitor process running XAUTOCLAIM, implement one. Without it, pending entries from crashed consumers accumulate forever.
Application missing XACK
Review application code to ensure XACK runs after successful processing and before any logic that can throw or exit. If your client library offers automatic acknowledgement, verify it is not silently failing on uncaught exceptions. Some libraries run XACK in a finally block that is skipped on process termination.
A missing XACK is a code bug, not a Redis issue. The PEL grows linearly with throughput until the fix is deployed.
Unbounded stream growth
Configure MAXLEN or MINID trimming on XADD, or run XTRIM periodically. Use approximate trimming with ~ on large streams. Exact trimming is expensive because it operates on individual entries rather than whole macro-nodes. When using MAXLEN ~ <count>, Redis trims to a node boundary near the limit. If you need strict retention compliance, use exact trimming or MINID with an ID derived from your retention window.
Without trimming, XLEN and memory grow indefinitely.
Memory pressure from a large PEL
Reclaim and acknowledge entries to shrink the PEL. No INFO field isolates PEL memory, but growth shows up in used_memory and used_memory_rss. If used_memory approaches maxmemory, verify your eviction policy. allkeys-lru can evict the stream key itself, dropping unconsumed entries. If you must protect the stream, ensure it carries a TTL when using volatile-* policies, or use noeviction and handle write errors explicitly instead of losing data.
Ensure maxmemory is set so that runaway stream growth triggers eviction or alerts before an OOM kill.
Prevention
- Configure
MAXLENorMINIDtrimming to capXLEN. - Run a periodic
XAUTOCLAIMjanitor to recover entries from dead consumers. - Ensure consumers call
XACKimmediately after processing. - Monitor
lagandpendingas first-class metrics alongside memory. - Size consumer capacity for peak producer throughput.
- Set
maxmemoryand an appropriate eviction policy so that runaway growth triggers eviction before OOM.
How Netdata helps
Netdata surfaces Redis INFO metrics alongside system-level signals, so you can correlate stream backlog with resource pressure:
used_memoryandused_memory_rsstrends show when PEL or stream growth translates into memory pressure.blocked_clientstracking detects changes in blocking consumer behavior.instantaneous_ops_per_secdrops reveal consumer slowdowns that precede lag spikes.- Memory saturation alerts (
used_memoryversusmaxmemory) fire before OOM kills, giving you runway to trim streams or reclaim pending entries.
Related guides
- How Redis actually works in production: a mental model for operators: /guides/redis/how-redis-works-in-production/
- Redis aof_last_write_status:err: AOF write failures and recovery: /guides/redis/redis-aof-last-write-status-err/
- Redis appendfsync always latency: durability vs throughput trade-offs: /guides/redis/redis-appendfsync-always-latency/
- Redis big keys: finding the giant key that blocks the event loop: /guides/redis/redis-big-keys-latency/
- Redis blocked_clients growing: dead consumers vs healthy queues: /guides/redis/redis-blocked-clients-growing/
- Redis BUSY Redis is busy running a script: blocking Lua and how to recover: /guides/redis/redis-busy-running-script/
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix: /guides/redis/redis-cant-save-in-background-fork/
- Redis client output buffer overflow: slow consumers and client-output-buffer-limit: /guides/redis/redis-client-output-buffer-limit/
- Redis cluster_slots_pfail > 0: impending node failure in a cluster: /guides/redis/redis-cluster-slots-pfail/
- Redis CLUSTERDOWN / cluster_state:fail: slot coverage and recovery: /guides/redis/redis-cluster-state-fail/
- Redis connected_clients climbing: connection leak detection: /guides/redis/redis-connected-clients-climbing/
- Redis connected_slaves dropped: detecting replica disconnects on the primary: /guides/redis/redis-connected-slaves-dropped/







