Redis Pub/Sub pattern overhead: PSUBSCRIBE scaling and slow subscribers
You see Redis latency spikes that align with PUBLISH bursts. Subscribers disconnect with output buffer limit errors. In cluster mode, CPU and network climb linearly with node count even though PUBLISH volume stays flat. Two mechanisms drive this: PSUBSCRIBE pattern matching adds O(N) work to every PUBLISH, and slow subscribers accumulate per-client output memory until Redis cuts them off. This guide shows how to confirm the bottleneck, apply safe fixes, and decide when to move to sharded Pub/Sub.
What this means
Redis processes every PUBLISH on the single-threaded event loop. If subscribers use PSUBSCRIBE, each PUBLISH iterates all registered glob-style patterns server-wide and tests the channel name against every one. Beyond roughly 1,000 unique patterns, the scan becomes a measurable event-loop cost. Redis 6.0 deduplicates identical patterns across clients, so the cost is driven by the number of unique patterns, not total client subscriptions.
Pub/Sub has no backpressure. When a subscriber cannot read from its socket as fast as Redis publishes, messages buffer in the client’s output queue. The client-output-buffer-limit pubsub policy controls how large that queue may grow before Redis disconnects the client. The default hard limit is 32 MB and the soft limit is 8 MB for 60 seconds. This is generous for fast consumers and lethal for slow ones. Output buffer memory counts against maxmemory, so slow subscribers can push the instance toward eviction or OOM.
In Redis Cluster, classic Pub/Sub broadcasts every PUBLISH to every node, regardless of where subscribers are connected. The cluster bus carries that fan-out. Redis 7.0+ introduces sharded Pub/Sub (SSUBSCRIBE / SPUBLISH), which routes messages only to the nodes owning the relevant hash slot.
flowchart TD
A[Latency spikes or subscriber disconnects during PUBLISH] --> B{pubsub_patterns > 1000?}
B -->|Yes| C[Pattern scan CPU overhead
Reduce or consolidate patterns]
B -->|No| D{Pub/Sub client omem growing?}
D -->|Yes| E[Slow subscriber buffer pressure
Fix consumer or adjust limits]
D -->|No| F{Cluster mode and high cluster bus messages?}
F -->|Yes| G[Classic Pub/Sub broadcast overhead
Use sharded Pub/Sub or reduce fan-out]
F -->|No| H[Check slowlog, commandstats,
and pattern complexity or CVE exposure]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| High unique pattern count | CPU spikes during PUBLISH; LATENCY LATEST events; pubsub_patterns growing past 1,000 | INFO stats for pubsub_patterns; INFO commandstats for cmdstat_publish latency |
| Slow subscriber backpressure | Subscribers disconnected; output buffer memory climbing; possible eviction pressure | CLIENT LIST sorted by omem; logs mentioning “scheduled to be closed ASAP for overcoming of output buffer limits” |
| Cluster broadcast saturation | Cluster bus traffic grows with node count; PUBLISH throughput hits a ceiling | CLUSTER INFO cluster_stats_messages_sent / cluster_stats_messages_received |
| Pattern complexity / CVE-2024-31228 | Latency on PSUBSCRIBE, PUBLISH, or KEYS; unpatched Redis version | INFO server version; inspect pattern and ACL lengths |
| Subscription leak | connected_clients and pubsub_channels climb without traffic growth | INFO stats pubsub_channels; CLIENT LIST age and idle times |
Quick checks
# Measure Pub/Sub scale
redis-cli INFO stats | grep -E "pubsub_channels|pubsub_patterns"
redis-cli PUBSUB NUMPAT
redis-cli PUBSUB NUMSUB
# Find per-publish CPU cost
redis-cli INFO commandstats | grep cmdstat_publish
redis-cli LATENCY LATEST
# Find slow subscribers by output buffer size
redis-cli CLIENT LIST | awk -F'[= ]' '{for(i=1;i<=NF;i++) if($i=="omem") print $(i+1), $0}' | sort -rn | head -20
# Check current Pub/Sub buffer policy
redis-cli CONFIG GET client-output-buffer-limit
# Cluster-only: inspect cluster bus traffic
redis-cli CLUSTER INFO | grep -E "cluster_stats_messages|cluster_state"
# Confirm buffer-limit disconnections (Redis 7.4+)
redis-cli INFO stats | grep evicted_clients
# Inspect recent slow commands
redis-cli SLOWLOG GET 10
How to diagnose it
Quantify the Pub/Sub surface area. Run
INFO statsand notepubsub_channelsandpubsub_patterns.pubsub_channelsis the count of channels with at least one subscriber, not subscriber count.pubsub_patternsis the count of active pattern subscriptions. UsePUBSUB NUMPATandPUBSUB NUMSUBto cross-check.Correlate with CPU and latency. Check
INFO commandstatsforcmdstat_publishand watchusec_per_call. If PUBLISH latency grows aspubsub_patternsgrows, pattern matching is the likely bottleneck. UseLATENCY LATESTandLATENCY HISTORY commandto confirm event-loop blocking. On Redis 6.2+, checkINFO cpuused_cpu_user_main_threadto isolate main-thread saturation.Find slow subscribers. From
CLIENT LIST, look for connections with largeomemvalues. Pub/Sub clients are not labeled separately inCLIENT LIST, but if a client is accumulating output memory whilecmdstat_publishis active, it is likely a slow consumer. Also search Redis logs for lines containing “scheduled to be closed ASAP for overcoming of output buffer limits.”Check cluster broadcast overhead (cluster mode only). Run
CLUSTER INFOand comparecluster_stats_messages_sentandcluster_stats_messages_receivedto your PUBLISH rate. Classic Pub/Sub multiplies PUBLISH traffic by node count. If the cluster bus is saturated whilecluster_stateremainsok, you are hitting broadcast limits.Inspect pattern health. List active channels with
PUBSUB CHANNELS. Review whether many patterns are unique but semantically redundant. Remember that Redis 6.0+ deduplicates identical patterns, soPUBSUB NUMPATreflects unique patterns. Broad patterns such as*or*foo*are the most expensive.Review pattern safety and version exposure. CVE-2024-31228 allows authenticated users to trigger unbounded recursion via extremely long glob patterns in PSUBSCRIBE, KEYS, SCAN, FUNCTION LIST, COMMAND LIST, and ACL definitions. Affected versions are all Redis >= 2.2.5; patches are in 6.2.16, 7.2.6, and 7.4.1. Check
INFO serverredis_version, audit pattern lengths, and inspectACL LOGif you use Redis 6.0+ ACLs.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pubsub_patterns | Each PUBLISH scans all patterns; CPU cost rises with count. | Sustained value above 1,000 unique patterns. |
cmdstat_publish usec_per_call | Measures event-loop cost per PUBLISH, including pattern scan and fan-out. | Sustained growth or outliers against baseline. |
LATENCY LATEST / LATENCY HISTORY command | Captures event-loop blocking from publish work. | Recurring spikes above your latency threshold. |
CLIENT LIST omem for Pub/Sub clients | Output buffer memory for slow subscribers; counts against maxmemory. | Any Pub/Sub client with omem near or above the soft limit, or trending up. |
cluster_stats_messages_sent / received | Cluster bus traffic; classic Pub/Sub broadcasts every PUBLISH to all nodes. | Growth disproportionate to PUBLISH rate; high baseline. |
evicted_clients (Redis 7.4+) or equivalent log line | Confirms disconnections due to output buffer overflow. | Any rate above zero, or log lines matching the buffer-limit message. |
instantaneous_output_kbps | Network fan-out from Pub/Sub and replication. | Output far exceeds input during publish bursts. |
Fixes
Reduce pattern count and complexity
Replace PSUBSCRIBE with SUBSCRIBE to concrete channels wherever possible. Consolidate redundant patterns; Redis 6.0+ deduplicates identical patterns across clients, but many unique patterns still force a scan on every PUBLISH. Avoid overly broad patterns such as * or *foo*. If you need many fine-grained channels, prefer explicit channel names over wildcards.
Handle slow subscribers
First choice is to make consumers faster: scale consumers, reduce per-message processing time, or shard subscriptions across clients. If consumers are legitimately slower than publishers for transient bursts, you can raise the Pub/Sub output buffer limit. This trades memory for availability. For example:
# Adjust Pub/Sub buffer limit: 64 MB hard, 16 MB soft for 120 seconds
redis-cli CONFIG SET client-output-buffer-limit "pubsub 67108864 16777216 120"
The default is 32 MB hard and 8 MB soft for 60 seconds. Raising the limit increases the risk that slow subscribers will consume enough memory to push Redis toward maxmemory. Monitor omem closely if you do this.
If a subscriber is stale and you cannot recover it, kill the connection:
# Dangerous: disconnects the client immediately
redis-cli CLIENT KILL ID <client-id>
For workloads that need delivery guarantees or durable replay, consider Redis Streams instead of Pub/Sub.
Mitigate cluster broadcast overhead
In Redis Cluster, migrate from classic Pub/Sub to sharded Pub/Sub (SSUBSCRIBE / SPUBLISH), available in Redis 7.0+. Sharded Pub/Sub routes messages only to the shard that owns the channel’s hash slot, eliminating the all-node broadcast. Note that SSUBSCRIBE and SUBSCRIBE are not interchangeable: a message published with PUBLISH is not received by SSUBSCRIBE clients, and a message published with SPUBLISH is not received by SUBSCRIBE clients. Smart clients must connect to the correct shard.
If you cannot migrate, keep pattern and channel counts low, reduce message sizes, and monitor cluster bus bandwidth.
Patch pattern-matching vulnerabilities
If you are on an affected Redis version, upgrade to 6.2.16, 7.2.6, 7.4.1, or later. Audit existing ACL entries and any user-submitted PSUBSCRIBE patterns for abnormal length. Use ACLs to restrict access to pattern-subscription commands where appropriate.
Prevention
- Establish a baseline for
pubsub_patterns,pubsub_channels, andcmdstat_publishusec_per_call. Alert on deviation, not just absolute thresholds. - Set explicit
client-output-buffer-limit pubsubvalues based on your workload. Do not rely on unlimited defaults for normal clients, and treat Pub/Sub buffers as a first-class memory consumer. - Design new cluster deployments with sharded Pub/Sub rather than classic Pub/Sub.
- Implement application-level cleanup for stale subscriptions, and monitor
connected_clientsandpubsub_channelsfor leaks.
How Netdata helps
- Correlate
redis.pubsub_channelsandredis.pubsub_patternswithredis.cpu_utilizationandredis.operationsto identify when PUBLISH work is becoming expensive. - Track per-client output buffer metrics to catch subscribers nearing buffer limits before Redis disconnects them.
- In cluster mode, monitor
redis.cluster_stats_messages_sentandredis.cluster_stats_messages_receivedto detect classic Pub/Sub broadcast saturation. - Surface commandstats and latency events to tie latency spikes directly to
PUBLISHorPSUBSCRIBEbehavior. - Combine Redis metrics with system CPU, network, and memory charts to distinguish event-loop exhaustion from infrastructure-level contention.
Related guides
- How Redis actually works in production: a mental model for operators
- Redis aof_last_write_status:err: AOF write failures and recovery
- Redis appendfsync always latency: durability vs throughput trade-offs
- Redis big keys: finding the giant key that blocks the event loop
- Redis blocked_clients growing: dead consumers vs healthy queues
- Redis BUSY Redis is busy running a script: blocking Lua and how to recover
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix
- Redis client output buffer overflow: slow consumers and client-output-buffer-limit
- Redis cluster_slots_pfail > 0: impending node failure in a cluster
- Redis CLUSTERDOWN / cluster_state:fail: slot coverage and recovery
- Redis connected_clients climbing: connection leak detection
- Redis connected_slaves dropped: detecting replica disconnects on the primary







