Cassandra pending compactions growing: the compaction backlog runbook

Pending tasks climbing in nodetool compactionstats is normal after a bulk load under STCS, but when the number trends upward for hours it signals that your node is producing SSTables faster than compaction can merge them.

This is the leading indicator of the compaction death spiral. Left unchecked, the backlog drives read amplification up, saturates disk I/O, and eventually exhausts disk space as temporary compaction files accumulate. Writes often stay fast while reads degrade, masking the problem.

Fix the bottleneck first. It may be I/O capacity, throttling, tombstone-heavy tables, or competing operations like repair and streaming. This guide covers diagnosis and remediation.

What this means

Compaction merges immutable SSTables, discards tombstones, and reclaims space in the background. When the write path appends data faster than compaction threads can merge it, tasks queue up as PendingTasks in the CompactionManager. Each pending task represents uncompacted SSTables that reads may need to consult, so read amplification grows even while writes remain fast.

Compaction is background work, so it is often ignored until reads slow down. By the time read latency crosses your SLO, the backlog has usually been building for days. Catch the PendingTasks trend early to avoid emergency intervention.

Unlike transient post-restart spikes that resolve in minutes, a monotonic increase over four or more hours means compaction throughput has fallen below the flush rate. LCS pending tasks should stay low by design; sustained elevation is especially dangerous because L0 accumulation propagates latency quickly. STCS tolerates bursts, but a persistent climb forecasts disk space trouble because major compactions can transiently require up to 100 percent additional space. UCS in Cassandra 5.0 distributes tasks more evenly, but a rising trend still signals insufficient throughput.

flowchart TD
    A[High write rate] --> B[SSTables accumulate]
    B --> C[Pending compactions grow]
    C --> D[Read amplification rises]
    D --> E[Read latency spikes]
    C --> F[Disk I/O saturates]
    F --> G[Compaction slows further]
    G --> C
    E --> H[Client timeouts]

Common causes

CauseWhat it looks likeFirst thing to check
Write rate exceeds compaction throughputPendingTasks rises steadily; write latency normal; read latency climbingnodetool compactionstats and disk I/O
Compaction throttled too aggressivelyPendingTasks grows despite low disk utilization; compaction_throughput_mb_per_sec set lownodetool compactionstats and iostat -x 1
Disk I/O saturation%util >80% or await high; flush and read stages also backing upiostat -x 1 on data and commitlog devices
Tombstone-heavy compactions slowing mergeSingle table lagging; tombstone warnings in logs; repair overduenodetool tablehistograms and repair history
Repair or streaming competing for bandwidthPendingTasks spikes during repair; nodetool netstats shows active streamsnodetool netstats
Sudden write traffic spike or bulk loadPendingTasks jumps after batch ingest; write throughput elevated above baselinenodetool proxyhistograms and write request rate

Quick checks

# Check compaction backlog and active tasks
nodetool compactionstats

# Check disk saturation on data and commitlog devices
iostat -x 1

# Count SSTables per table to gauge read amplification
nodetool tablestats | grep "SSTable count"

# Check internal thread pool pressure
nodetool tpstats

# View coordinator-level latency percentiles
nodetool proxyhistograms

# Verify filesystem headroom for compaction temp space
df -h /var/lib/cassandra/data

# Scan for tombstone warnings that indicate slow merges
grep -i "tombstone" /var/log/cassandra/system.log

# Check JVM heap pressure and GC behavior
nodetool info | grep -i "Heap Memory"

How to diagnose it

  1. Confirm the trend. Sample nodetool compactionstats at 15-minute intervals. A monotonic increase over 4 or more hours means the node is falling behind. Transient spikes after restarts or bulk loads usually resolve within an hour.

  2. Identify the bottleneck. Run iostat -x 1 on both data and commitlog devices. If %util exceeds 80 percent or await exceeds 10 ms on SSDs, disk saturation is throttling compaction. Pay attention to r_await versus w_await; high write await on the data device means the disk is struggling with flush throughput as well as compaction. If disk metrics look idle, check CPU and GC.

  3. Correlate with write pressure. Check nodetool proxyhistograms. Fast writes with stagnant compactions mean the merge path is the constraint, not the ingest path.

  4. Quantify read amplification. Run nodetool tablestats per keyspace. If LiveSSTableCount is growing, reads are touching more files and latency will follow.

  5. Inspect internal queues. In nodetool tpstats, look for sustained pending tasks in CompactionExecutor or blocked tasks in Native-Transport-Requests. Blocked tasks in MutationStage or ReadStage mean client requests are already being rejected or timed out. This confirms resource contention inside the node and indicates the spiral is affecting the front door.

  6. Review operational events. Check for recent repairs, bootstraps, or decommissions. Streaming competes for the same disk I/O and can push compaction into the red. Look at nodetool netstats for active streams. If a bootstrap or repair is in progress, expect elevated pending tasks, but they should stabilize once streaming completes. If pending tasks continue to climb after streaming finishes, the node cannot keep up with the combined load.

  7. Check JVM health. Run nodetool info to verify heap usage. Parse GC logs for pauses over 2 seconds; long GC stalls freeze compaction threads and create artificial backlog.

  8. Look for tombstone drag. High tombstone counts in nodetool tablehistograms or tombstone warnings in logs mean compactions are doing extra work to merge delete markers, slowing progress.

Metrics and signals to monitor

The following signals give you a complete picture of compaction health and its consequences. Monitor them together; no single metric tells the full story.

SignalWhy it mattersWarning sign
CompactionManager:name=PendingTasksDirect measure of compaction debtTrending upward over 4+ hours; >500 sustained for 2+ hours in LCS
Table:name=LiveSSTableCountProxy for read amplificationGrowing steadily regardless of strategy; >50 sustained in STCS or >100 in LCS
Disk %util and awaitCompaction is I/O-intensive%util >80% sustained; await >10ms on SSD
ClientRequest Read Latency p99Consequence of uncompacted SSTablesp99 >3x rolling baseline sustained
DroppedMessage (MUTATION/READ)Node shedding load because it cannot keep upNon-zero sustained rate
ThreadPools:CompactionExecutor pendingInternal compaction queue depthPending >0 and growing while active is at max
Table:name=TombstoneScannedHistogramTombstones force compactions to merge dead dataSustained tombstone warnings or aborted reads
Disk space freeCompaction requires temp space to rewrite files<50% free for STCS; <30% free for LCS/TWCS

Fixes

Increase compaction throughput

If CPU and disk headroom exist, raise compaction_throughput_mb_per_sec and concurrent_compactors. You can adjust compaction_throughput_mb_per_sec dynamically without a restart. Increase throughput in increments and watch iostat to ensure you are not simply moving the bottleneck from the queue to the disk. Verify the effect by watching nodetool compactionstats active byte progress. Tradeoff: compaction steals I/O bandwidth from reads, which can raise read latency in the short term.

Reduce write pressure

Throttle non-critical writes at the application layer. Temporarily stop or postpone repairs, bootstraps, and decommissions that generate additional SSTables or compete for I/O. If you are running a bulk load, pause it until compaction catches up. Tradeoff: slower ingest and delayed topology changes.

Address disk I/O saturation

If commitlog and data share a device, plan to move commitlog to a dedicated volume during the next rolling restart. In the immediate term, reduce other I/O consumers such as backups or analytics queries. On cloud block storage, upgrade IOPS or migrate to instance types with local SSDs. If you are using network-attached storage, check for noisy-neighbor effects or throughput caps imposed by the cloud provider. Tradeoff: infrastructure change requires a maintenance window.

Resolve tombstone-heavy tables

Verify repair has completed within gc_grace_seconds for affected tables. Tombstones cannot be purged until all replicas have been repaired. If a specific table dominates the backlog, review its TTL and delete patterns. A long-term fix is switching time-series TTL tables to TimeWindowCompactionStrategy. Tradeoff: compaction spikes during strategy changes are CPU and I/O intensive across all nodes.

Recover disk space

If disk usage is approaching limits, remove forgotten snapshots with nodetool clearsnapshot --all. Warning: confirm your backup retention policy before clearing snapshots. Check du -sh /var/lib/cassandra/hints/ for accumulated hints and clear them only if you understand the consistency impact. Hints that have exceeded max_hint_window_in_ms are already useless for consistency and can be removed.

Plan strategy migration

For chronic STCS space amplification on read-heavy workloads, plan a migration to LCS. Tradeoff: ALTER TABLE changes the strategy for future compactions only. Existing SSTables must be rewritten by a major compaction to benefit from the new strategy, causing a heavy I/O spike. Schedule that during a maintenance window and monitor PendingTasks closely.

Prevention

  • Monitor the derivative. Alert on the rate of change of PendingTasks, not a static threshold. A steady increase over 24 hours is actionable even if the absolute value is low.
  • Maintain disk headroom. Keep more than 50 percent free for STCS and more than 30 percent free for LCS or TWCS to accommodate temporary compaction files and unexpected ingest spikes.
  • Separate commitlog and data volumes. This prevents commitlog fsync from contending with compaction reads and writes.
  • Schedule maintenance outside peak hours. Run repairs, bootstraps, and snapshot operations when client traffic is low.
  • Size compaction for peaks. Set compaction_throughput_mb_per_sec high enough to cover peak write rates plus headroom, and ensure concurrent_compactors matches available CPU without starving read threads.
  • Baseline per-table SSTable counts. Track LiveSSTableCount per table so you catch divergence before it becomes a cluster-wide backlog.
  • Watch for tombstone growth. Monitor TombstoneScannedHistogram and tombstone log warnings. Tombstones slow compaction and accelerate the spiral.
  • Match strategy to workload. Review your compaction strategy during capacity planning. STCS is write-optimized but requires significant space and I/O headroom. LCS provides steadier latency but demands more compaction throughput. Choose the strategy that fits your access patterns.

How Netdata helps

  • Correlates PendingTasks with per-disk I/O utilization and await in the same time frame to pinpoint whether compaction is I/O-bound or CPU-bound.
  • Surfaces the rate of change of compaction pending tasks, making trends visible before absolute thresholds breach.
  • Displays JVM heap usage and GC pause duration alongside compaction metrics to reveal when GC stalls are creating artificial backlog.
  • Visualizes per-table LiveSSTableCount and read latency percentiles so you can confirm read amplification impact without manually sampling nodetool.
  • Tracks disk space usage with configurable headroom alerts.