Cassandra thread pool pending and blocked tasks: SEDA backpressure

You run nodetool tpstats and see non-zero values in the Pending or Blocked columns. On a healthy Cassandra node, request-stage pools like MUTATION and READ should show zero pending tasks in steady state. When pending climbs and stays above zero, the SEDA pipeline is backing up. If Blocked also rises, the node has moved from queuing to rejecting work.

This is not a transient spike you can ignore. Sustained pending tasks on the MUTATION or READ stages add latency to every client request. Blocked tasks mean the queue is full and the node is actively shedding load. In the GOSSIP stage, even a small pending backlog is an emergency: it means gossip is falling behind, which leads to false DOWN marking across the cluster.

The root cause is almost always a downstream bottleneck. The thread pool is not the problem; it is the symptom. This guide shows how to read nodetool tpstats, find the bottleneck, and relieve the pressure before dropped mutations or gossip flapping trigger a wider outage.

What this means

Cassandra uses a staged event-driven architecture (SEDA). Writes, reads, gossip, compaction, and flushes each run in dedicated thread pools. When a write arrives, it is handed to the MUTATION pool. If all threads are busy, the task enters the Pending queue. If the queue also fills, the task is counted as Blocked and is rejected or dropped.

nodetool tpstats shows five columns per pool:

ColumnWhat it tracks
ActiveThreads currently executing tasks
PendingTasks queued waiting for a thread
CompletedTasks finished since JVM startup
BlockedTasks that could not enter the queue because it is full
All time blockedCumulative blocked tasks since JVM startup

A brief Pending spike during a traffic burst is normal and resolves within seconds. Sustained Pending greater than zero means the pool cannot dequeue work as fast as it arrives. Blocked greater than zero means the queue overflowed and Cassandra rejected the task. For request pools (MUTATION, READ, Native-Transport-Requests), any blocked count is an emergency because client requests are being rejected. For the GOSSIP pool, pending alone is dangerous: backed-up gossip prevents the failure detector from updating peer state and can cause peers to falsely mark the node DOWN.

In Cassandra 3.x, nodetool tpstats displays CamelCase pool names such as MutationStage and ReadStage. In 4.x and later, the same pools appear as uppercase MUTATION and READ. The underlying JMX metric structure is identical.

flowchart TD
    A[Disk CPU or GC saturation] --> B[SEDA stage slows]
    B --> C[Pending tasks > 0]
    C --> D[Blocked tasks > 0]
    D --> E[Dropped messages]
    C --> F[Latency spikes]
    B --> G[GOSSIP backlog]
    G --> H[False DOWN marking]

Common causes

CauseWhat it looks likeFirst thing to check
CPU saturationPending climbing across multiple pools simultaneously; CPU usage near 100%mpstat -P ALL 1
Disk I/O contentionREAD or MUTATION pending with elevated await on data or commitlog devicesiostat -x 1 on the relevant device
GC-induced stallsPending spikes that correlate with stop-the-world pauses; drops reset after the pausegrep "pause" /var/log/cassandra/gc.log
Compaction debtCompactionExecutor pending > 50 and growing; SSTable count increasingnodetool compactionstats
Slow replicas or network partitionsCoordinator latency high but local read latency low on the replica; some nodes show higher pending than peersnodetool proxyhistograms and per-node nodetool tpstats
Gossip stage backlogGOSSIP pool Pending > 0 sustained; nodes flapping UP/DOWNnodetool status from multiple nodes

Quick checks

# Check thread pool saturation per stage
nodetool tpstats

# Check heap pressure and commitlog backlog
nodetool info | grep -E "Heap Memory|Commit Log"

# Check disk I/O latency on data and commitlog devices
iostat -x 1

# Check for GC pauses longer than 200 ms
grep -i "pause" /var/log/cassandra/gc.log | awk '$NF > 200'

# Check compaction backlog
nodetool compactionstats

# Check for node flapping or DOWN states
nodetool status

# Check coordinator latency percentiles
nodetool proxyhistograms

How to diagnose it

  1. Confirm the symptom is sustained. Run nodetool tpstats twice, 30 seconds apart. If Pending on MUTATION or READ is greater than zero both times, the pool is saturated. Note the Blocked and All time blocked columns: if Blocked is greater than zero or All time blocked is climbing, the queue has overflowed.
  2. Check which pools are affected. Is it only MUTATION, only READ, or multiple pools including CompactionExecutor and MemtableFlushWriter? Wide impact suggests CPU, GC, or disk. Narrow impact suggests a stage-specific bottleneck. For example, MemtableFlushWriter pending greater than zero with MUTATION pending indicates the flush pipeline is the root cause.
  3. Check for GC pauses. Run nodetool gcstats or grep GC logs. If pauses exceed 500 ms, they are likely freezing all stages. If pauses exceed 2 seconds, gossip failure detection will trigger.
  4. Check disk I/O. Run iostat -x 1 on the data and commitlog devices. If %util is greater than 80% or await is elevated, disk saturation is the bottleneck.
  5. Check compaction status. Run nodetool compactionstats. If pending tasks are greater than 50 and growing, compaction is stealing I/O bandwidth and read amplification is increasing.
  6. Check for gossip-specific backlog. If the GOSSIP pool shows pending greater than zero, check nodetool status for flapping nodes. Gossip backlog does not self-correct and requires immediate investigation of network connectivity, disk I/O stalls, or GC pauses on the affected node.
  7. Check commitlog pressure. Look at Commit Log pending tasks in nodetool info. If greater than zero, the write path is blocked at the durability layer.
  8. Correlate with dropped messages. Look at the Dropped section in nodetool tpstats. Sustained drops confirm that work is timing out in queues. Dropped MUTATION means replica divergence that will require repair.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
MUTATION PendingTasksWrite path cannot keep up> 0 sustained > 60 s
READ PendingTasksRead requests piling up> 0 sustained > 60 s
CurrentlyBlockedTasksQueue overflow; work rejected> 0 in any pool
All time blocked rateRecurring overflow eventsIncreasing over 5 min
GOSSIP PendingTasksFalse DOWN marking risk> 0 sustained
Dropped MUTATIONSilent replica divergenceNon-zero rate
CompactionExecutor pendingCompaction falling behind> 50 and growing
Native-Transport-Requests pendingClient request backlog> 50% of pool maximum

Fixes

CPU saturation

Reduce application write rate or add nodes. There is no Cassandra tuning knob that creates more CPU. If compaction is consuming the majority of cycles, temporarily lower compaction_throughput_mb_per_sec with nodetool setcompactionthroughput to free capacity for requests. This trades compaction debt for request latency.

Disk I/O contention

Verify that commitlog and data directories are on separate devices. If repairs or streaming are active, pause them. If compaction was artificially throttled below the disk capacity, raise compaction_throughput_mb_per_sec. If the device is already at maximum throughput, add IOPS or nodes. For STCS, major compaction can transiently need up to 100% additional space. Running out of room prevents compaction from running, which increases SSTable count and amplifies the I/O problem.

GC-induced stalls

This pattern is covered in detail in the Cassandra GC death spiral guide. Immediate actions: run nodetool disablebinary to stop new client load while keeping the node in the ring, then identify large partition reads or tombstone scans via nodetool toppartitions. Do not restart the node without identifying the heap consumer; the spiral will resume.

Commitlog backup

Ensure the commitlog device is independent from data directories. Check commitlog_sync mode: batch fsyncs every write batch and is slower than periodic. Do not change this during an incident without understanding the durability tradeoff. If commitlog segments are accumulating because memtable flushes are blocked, investigate MemtableFlushWriter pending tasks.

Gossip stage backlog

Treat this as a PAGE-level event. Check for network partitions by comparing nodetool status output from multiple nodes. Check for GC pauses longer than 2 seconds, which prevent gossip from progressing. Check disk I/O stalls that block the gossip thread. Gossip backlog does not self-correct.

Native transport overload

If the Native-Transport-Requests pool is pending, client connections are arriving faster than CQL requests can be parsed and routed. Check driver connection behavior and connectedNativeClients. Increasing native_transport_max_threads buys time but does not fix the underlying bottleneck.

Emergency load shedding

If the node is dropping mutations and approaching unavailability, run nodetool disablebinary to stop accepting new CQL connections without removing the node from the cluster. This prevents further client timeouts while you investigate disk, GC, or compaction issues.

Prevention

Monitor pending task

[OUTPUT TRUNCATED: Response exceeded output token limit.]