Redis Pub/Sub pattern overhead: PSUBSCRIBE scaling and slow subscribers

You see Redis latency spikes that align with PUBLISH bursts. Subscribers disconnect with output buffer limit errors. In cluster mode, CPU and network climb linearly with node count even though PUBLISH volume stays flat. Two mechanisms drive this: PSUBSCRIBE pattern matching adds O(N) work to every PUBLISH, and slow subscribers accumulate per-client output memory until Redis cuts them off. This guide shows how to confirm the bottleneck, apply safe fixes, and decide when to move to sharded Pub/Sub.

What this means

Redis processes every PUBLISH on the single-threaded event loop. If subscribers use PSUBSCRIBE, each PUBLISH iterates all registered glob-style patterns server-wide and tests the channel name against every one. Beyond roughly 1,000 unique patterns, the scan becomes a measurable event-loop cost. Redis 6.0 deduplicates identical patterns across clients, so the cost is driven by the number of unique patterns, not total client subscriptions.

Pub/Sub has no backpressure. When a subscriber cannot read from its socket as fast as Redis publishes, messages buffer in the client’s output queue. The client-output-buffer-limit pubsub policy controls how large that queue may grow before Redis disconnects the client. The default hard limit is 32 MB and the soft limit is 8 MB for 60 seconds. This is generous for fast consumers and lethal for slow ones. Output buffer memory counts against maxmemory, so slow subscribers can push the instance toward eviction or OOM.

In Redis Cluster, classic Pub/Sub broadcasts every PUBLISH to every node, regardless of where subscribers are connected. The cluster bus carries that fan-out. Redis 7.0+ introduces sharded Pub/Sub (SSUBSCRIBE / SPUBLISH), which routes messages only to the nodes owning the relevant hash slot.

flowchart TD
    A[Latency spikes or subscriber disconnects during PUBLISH] --> B{pubsub_patterns > 1000?}
    B -->|Yes| C[Pattern scan CPU overhead
Reduce or consolidate patterns] B -->|No| D{Pub/Sub client omem growing?} D -->|Yes| E[Slow subscriber buffer pressure
Fix consumer or adjust limits] D -->|No| F{Cluster mode and high cluster bus messages?} F -->|Yes| G[Classic Pub/Sub broadcast overhead
Use sharded Pub/Sub or reduce fan-out] F -->|No| H[Check slowlog, commandstats,
and pattern complexity or CVE exposure]

Common causes

CauseWhat it looks likeFirst thing to check
High unique pattern countCPU spikes during PUBLISH; LATENCY LATEST events; pubsub_patterns growing past 1,000INFO stats for pubsub_patterns; INFO commandstats for cmdstat_publish latency
Slow subscriber backpressureSubscribers disconnected; output buffer memory climbing; possible eviction pressureCLIENT LIST sorted by omem; logs mentioning “scheduled to be closed ASAP for overcoming of output buffer limits”
Cluster broadcast saturationCluster bus traffic grows with node count; PUBLISH throughput hits a ceilingCLUSTER INFO cluster_stats_messages_sent / cluster_stats_messages_received
Pattern complexity / CVE-2024-31228Latency on PSUBSCRIBE, PUBLISH, or KEYS; unpatched Redis versionINFO server version; inspect pattern and ACL lengths
Subscription leakconnected_clients and pubsub_channels climb without traffic growthINFO stats pubsub_channels; CLIENT LIST age and idle times

Quick checks

# Measure Pub/Sub scale
redis-cli INFO stats | grep -E "pubsub_channels|pubsub_patterns"
redis-cli PUBSUB NUMPAT
redis-cli PUBSUB NUMSUB

# Find per-publish CPU cost
redis-cli INFO commandstats | grep cmdstat_publish
redis-cli LATENCY LATEST

# Find slow subscribers by output buffer size
redis-cli CLIENT LIST | awk -F'[= ]' '{for(i=1;i<=NF;i++) if($i=="omem") print $(i+1), $0}' | sort -rn | head -20

# Check current Pub/Sub buffer policy
redis-cli CONFIG GET client-output-buffer-limit

# Cluster-only: inspect cluster bus traffic
redis-cli CLUSTER INFO | grep -E "cluster_stats_messages|cluster_state"

# Confirm buffer-limit disconnections (Redis 7.4+)
redis-cli INFO stats | grep evicted_clients

# Inspect recent slow commands
redis-cli SLOWLOG GET 10

How to diagnose it

  1. Quantify the Pub/Sub surface area. Run INFO stats and note pubsub_channels and pubsub_patterns. pubsub_channels is the count of channels with at least one subscriber, not subscriber count. pubsub_patterns is the count of active pattern subscriptions. Use PUBSUB NUMPAT and PUBSUB NUMSUB to cross-check.

  2. Correlate with CPU and latency. Check INFO commandstats for cmdstat_publish and watch usec_per_call. If PUBLISH latency grows as pubsub_patterns grows, pattern matching is the likely bottleneck. Use LATENCY LATEST and LATENCY HISTORY command to confirm event-loop blocking. On Redis 6.2+, check INFO cpu used_cpu_user_main_thread to isolate main-thread saturation.

  3. Find slow subscribers. From CLIENT LIST, look for connections with large omem values. Pub/Sub clients are not labeled separately in CLIENT LIST, but if a client is accumulating output memory while cmdstat_publish is active, it is likely a slow consumer. Also search Redis logs for lines containing “scheduled to be closed ASAP for overcoming of output buffer limits.”

  4. Check cluster broadcast overhead (cluster mode only). Run CLUSTER INFO and compare cluster_stats_messages_sent and cluster_stats_messages_received to your PUBLISH rate. Classic Pub/Sub multiplies PUBLISH traffic by node count. If the cluster bus is saturated while cluster_state remains ok, you are hitting broadcast limits.

  5. Inspect pattern health. List active channels with PUBSUB CHANNELS. Review whether many patterns are unique but semantically redundant. Remember that Redis 6.0+ deduplicates identical patterns, so PUBSUB NUMPAT reflects unique patterns. Broad patterns such as * or *foo* are the most expensive.

  6. Review pattern safety and version exposure. CVE-2024-31228 allows authenticated users to trigger unbounded recursion via extremely long glob patterns in PSUBSCRIBE, KEYS, SCAN, FUNCTION LIST, COMMAND LIST, and ACL definitions. Affected versions are all Redis >= 2.2.5; patches are in 6.2.16, 7.2.6, and 7.4.1. Check INFO server redis_version, audit pattern lengths, and inspect ACL LOG if you use Redis 6.0+ ACLs.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pubsub_patternsEach PUBLISH scans all patterns; CPU cost rises with count.Sustained value above 1,000 unique patterns.
cmdstat_publish usec_per_callMeasures event-loop cost per PUBLISH, including pattern scan and fan-out.Sustained growth or outliers against baseline.
LATENCY LATEST / LATENCY HISTORY commandCaptures event-loop blocking from publish work.Recurring spikes above your latency threshold.
CLIENT LIST omem for Pub/Sub clientsOutput buffer memory for slow subscribers; counts against maxmemory.Any Pub/Sub client with omem near or above the soft limit, or trending up.
cluster_stats_messages_sent / receivedCluster bus traffic; classic Pub/Sub broadcasts every PUBLISH to all nodes.Growth disproportionate to PUBLISH rate; high baseline.
evicted_clients (Redis 7.4+) or equivalent log lineConfirms disconnections due to output buffer overflow.Any rate above zero, or log lines matching the buffer-limit message.
instantaneous_output_kbpsNetwork fan-out from Pub/Sub and replication.Output far exceeds input during publish bursts.

Fixes

Reduce pattern count and complexity

Replace PSUBSCRIBE with SUBSCRIBE to concrete channels wherever possible. Consolidate redundant patterns; Redis 6.0+ deduplicates identical patterns across clients, but many unique patterns still force a scan on every PUBLISH. Avoid overly broad patterns such as * or *foo*. If you need many fine-grained channels, prefer explicit channel names over wildcards.

Handle slow subscribers

First choice is to make consumers faster: scale consumers, reduce per-message processing time, or shard subscriptions across clients. If consumers are legitimately slower than publishers for transient bursts, you can raise the Pub/Sub output buffer limit. This trades memory for availability. For example:

# Adjust Pub/Sub buffer limit: 64 MB hard, 16 MB soft for 120 seconds
redis-cli CONFIG SET client-output-buffer-limit "pubsub 67108864 16777216 120"

The default is 32 MB hard and 8 MB soft for 60 seconds. Raising the limit increases the risk that slow subscribers will consume enough memory to push Redis toward maxmemory. Monitor omem closely if you do this.

If a subscriber is stale and you cannot recover it, kill the connection:

# Dangerous: disconnects the client immediately
redis-cli CLIENT KILL ID <client-id>

For workloads that need delivery guarantees or durable replay, consider Redis Streams instead of Pub/Sub.

Mitigate cluster broadcast overhead

In Redis Cluster, migrate from classic Pub/Sub to sharded Pub/Sub (SSUBSCRIBE / SPUBLISH), available in Redis 7.0+. Sharded Pub/Sub routes messages only to the shard that owns the channel’s hash slot, eliminating the all-node broadcast. Note that SSUBSCRIBE and SUBSCRIBE are not interchangeable: a message published with PUBLISH is not received by SSUBSCRIBE clients, and a message published with SPUBLISH is not received by SUBSCRIBE clients. Smart clients must connect to the correct shard.

If you cannot migrate, keep pattern and channel counts low, reduce message sizes, and monitor cluster bus bandwidth.

Patch pattern-matching vulnerabilities

If you are on an affected Redis version, upgrade to 6.2.16, 7.2.6, 7.4.1, or later. Audit existing ACL entries and any user-submitted PSUBSCRIBE patterns for abnormal length. Use ACLs to restrict access to pattern-subscription commands where appropriate.

Prevention

  • Establish a baseline for pubsub_patterns, pubsub_channels, and cmdstat_publish usec_per_call. Alert on deviation, not just absolute thresholds.
  • Set explicit client-output-buffer-limit pubsub values based on your workload. Do not rely on unlimited defaults for normal clients, and treat Pub/Sub buffers as a first-class memory consumer.
  • Design new cluster deployments with sharded Pub/Sub rather than classic Pub/Sub.
  • Implement application-level cleanup for stale subscriptions, and monitor connected_clients and pubsub_channels for leaks.

How Netdata helps

  • Correlate redis.pubsub_channels and redis.pubsub_patterns with redis.cpu_utilization and redis.operations to identify when PUBLISH work is becoming expensive.
  • Track per-client output buffer metrics to catch subscribers nearing buffer limits before Redis disconnects them.
  • In cluster mode, monitor redis.cluster_stats_messages_sent and redis.cluster_stats_messages_received to detect classic Pub/Sub broadcast saturation.
  • Surface commandstats and latency events to tie latency spikes directly to PUBLISH or PSUBSCRIBE behavior.
  • Combine Redis metrics with system CPU, network, and memory charts to distinguish event-loop exhaustion from infrastructure-level contention.