Cassandra pending compactions growing: the compaction backlog runbook
Pending tasks climbing in nodetool compactionstats is normal after a bulk load under STCS, but when the number trends upward for hours it signals that your node is producing SSTables faster than compaction can merge them.
This is the leading indicator of the compaction death spiral. Left unchecked, the backlog drives read amplification up, saturates disk I/O, and eventually exhausts disk space as temporary compaction files accumulate. Writes often stay fast while reads degrade, masking the problem.
Fix the bottleneck first. It may be I/O capacity, throttling, tombstone-heavy tables, or competing operations like repair and streaming. This guide covers diagnosis and remediation.
What this means
Compaction merges immutable SSTables, discards tombstones, and reclaims space in the background. When the write path appends data faster than compaction threads can merge it, tasks queue up as PendingTasks in the CompactionManager. Each pending task represents uncompacted SSTables that reads may need to consult, so read amplification grows even while writes remain fast.
Compaction is background work, so it is often ignored until reads slow down. By the time read latency crosses your SLO, the backlog has usually been building for days. Catch the PendingTasks trend early to avoid emergency intervention.
Unlike transient post-restart spikes that resolve in minutes, a monotonic increase over four or more hours means compaction throughput has fallen below the flush rate. LCS pending tasks should stay low by design; sustained elevation is especially dangerous because L0 accumulation propagates latency quickly. STCS tolerates bursts, but a persistent climb forecasts disk space trouble because major compactions can transiently require up to 100 percent additional space. UCS in Cassandra 5.0 distributes tasks more evenly, but a rising trend still signals insufficient throughput.
flowchart TD
A[High write rate] --> B[SSTables accumulate]
B --> C[Pending compactions grow]
C --> D[Read amplification rises]
D --> E[Read latency spikes]
C --> F[Disk I/O saturates]
F --> G[Compaction slows further]
G --> C
E --> H[Client timeouts]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Write rate exceeds compaction throughput | PendingTasks rises steadily; write latency normal; read latency climbing | nodetool compactionstats and disk I/O |
| Compaction throttled too aggressively | PendingTasks grows despite low disk utilization; compaction_throughput_mb_per_sec set low | nodetool compactionstats and iostat -x 1 |
| Disk I/O saturation | %util >80% or await high; flush and read stages also backing up | iostat -x 1 on data and commitlog devices |
| Tombstone-heavy compactions slowing merge | Single table lagging; tombstone warnings in logs; repair overdue | nodetool tablehistograms and repair history |
| Repair or streaming competing for bandwidth | PendingTasks spikes during repair; nodetool netstats shows active streams | nodetool netstats |
| Sudden write traffic spike or bulk load | PendingTasks jumps after batch ingest; write throughput elevated above baseline | nodetool proxyhistograms and write request rate |
Quick checks
# Check compaction backlog and active tasks
nodetool compactionstats
# Check disk saturation on data and commitlog devices
iostat -x 1
# Count SSTables per table to gauge read amplification
nodetool tablestats | grep "SSTable count"
# Check internal thread pool pressure
nodetool tpstats
# View coordinator-level latency percentiles
nodetool proxyhistograms
# Verify filesystem headroom for compaction temp space
df -h /var/lib/cassandra/data
# Scan for tombstone warnings that indicate slow merges
grep -i "tombstone" /var/log/cassandra/system.log
# Check JVM heap pressure and GC behavior
nodetool info | grep -i "Heap Memory"
How to diagnose it
Confirm the trend. Sample
nodetool compactionstatsat 15-minute intervals. A monotonic increase over 4 or more hours means the node is falling behind. Transient spikes after restarts or bulk loads usually resolve within an hour.Identify the bottleneck. Run
iostat -x 1on both data and commitlog devices. If%utilexceeds 80 percent orawaitexceeds 10 ms on SSDs, disk saturation is throttling compaction. Pay attention tor_awaitversusw_await; high write await on the data device means the disk is struggling with flush throughput as well as compaction. If disk metrics look idle, check CPU and GC.Correlate with write pressure. Check
nodetool proxyhistograms. Fast writes with stagnant compactions mean the merge path is the constraint, not the ingest path.Quantify read amplification. Run
nodetool tablestatsper keyspace. IfLiveSSTableCountis growing, reads are touching more files and latency will follow.Inspect internal queues. In
nodetool tpstats, look for sustained pending tasks inCompactionExecutoror blocked tasks inNative-Transport-Requests. Blocked tasks inMutationStageorReadStagemean client requests are already being rejected or timed out. This confirms resource contention inside the node and indicates the spiral is affecting the front door.Review operational events. Check for recent repairs, bootstraps, or decommissions. Streaming competes for the same disk I/O and can push compaction into the red. Look at
nodetool netstatsfor active streams. If a bootstrap or repair is in progress, expect elevated pending tasks, but they should stabilize once streaming completes. If pending tasks continue to climb after streaming finishes, the node cannot keep up with the combined load.Check JVM health. Run
nodetool infoto verify heap usage. Parse GC logs for pauses over 2 seconds; long GC stalls freeze compaction threads and create artificial backlog.Look for tombstone drag. High tombstone counts in
nodetool tablehistogramsor tombstone warnings in logs mean compactions are doing extra work to merge delete markers, slowing progress.
Metrics and signals to monitor
The following signals give you a complete picture of compaction health and its consequences. Monitor them together; no single metric tells the full story.
| Signal | Why it matters | Warning sign |
|---|---|---|
CompactionManager:name=PendingTasks | Direct measure of compaction debt | Trending upward over 4+ hours; >500 sustained for 2+ hours in LCS |
Table:name=LiveSSTableCount | Proxy for read amplification | Growing steadily regardless of strategy; >50 sustained in STCS or >100 in LCS |
Disk %util and await | Compaction is I/O-intensive | %util >80% sustained; await >10ms on SSD |
ClientRequest Read Latency p99 | Consequence of uncompacted SSTables | p99 >3x rolling baseline sustained |
DroppedMessage (MUTATION/READ) | Node shedding load because it cannot keep up | Non-zero sustained rate |
ThreadPools:CompactionExecutor pending | Internal compaction queue depth | Pending >0 and growing while active is at max |
Table:name=TombstoneScannedHistogram | Tombstones force compactions to merge dead data | Sustained tombstone warnings or aborted reads |
| Disk space free | Compaction requires temp space to rewrite files | <50% free for STCS; <30% free for LCS/TWCS |
Fixes
Increase compaction throughput
If CPU and disk headroom exist, raise compaction_throughput_mb_per_sec and concurrent_compactors. You can adjust compaction_throughput_mb_per_sec dynamically without a restart. Increase throughput in increments and watch iostat to ensure you are not simply moving the bottleneck from the queue to the disk. Verify the effect by watching nodetool compactionstats active byte progress. Tradeoff: compaction steals I/O bandwidth from reads, which can raise read latency in the short term.
Reduce write pressure
Throttle non-critical writes at the application layer. Temporarily stop or postpone repairs, bootstraps, and decommissions that generate additional SSTables or compete for I/O. If you are running a bulk load, pause it until compaction catches up. Tradeoff: slower ingest and delayed topology changes.
Address disk I/O saturation
If commitlog and data share a device, plan to move commitlog to a dedicated volume during the next rolling restart. In the immediate term, reduce other I/O consumers such as backups or analytics queries. On cloud block storage, upgrade IOPS or migrate to instance types with local SSDs. If you are using network-attached storage, check for noisy-neighbor effects or throughput caps imposed by the cloud provider. Tradeoff: infrastructure change requires a maintenance window.
Resolve tombstone-heavy tables
Verify repair has completed within gc_grace_seconds for affected tables. Tombstones cannot be purged until all replicas have been repaired. If a specific table dominates the backlog, review its TTL and delete patterns. A long-term fix is switching time-series TTL tables to TimeWindowCompactionStrategy. Tradeoff: compaction spikes during strategy changes are CPU and I/O intensive across all nodes.
Recover disk space
If disk usage is approaching limits, remove forgotten snapshots with nodetool clearsnapshot --all. Warning: confirm your backup retention policy before clearing snapshots. Check du -sh /var/lib/cassandra/hints/ for accumulated hints and clear them only if you understand the consistency impact. Hints that have exceeded max_hint_window_in_ms are already useless for consistency and can be removed.
Plan strategy migration
For chronic STCS space amplification on read-heavy workloads, plan a migration to LCS. Tradeoff: ALTER TABLE changes the strategy for future compactions only. Existing SSTables must be rewritten by a major compaction to benefit from the new strategy, causing a heavy I/O spike. Schedule that during a maintenance window and monitor PendingTasks closely.
Prevention
- Monitor the derivative. Alert on the rate of change of
PendingTasks, not a static threshold. A steady increase over 24 hours is actionable even if the absolute value is low. - Maintain disk headroom. Keep more than 50 percent free for STCS and more than 30 percent free for LCS or TWCS to accommodate temporary compaction files and unexpected ingest spikes.
- Separate commitlog and data volumes. This prevents commitlog fsync from contending with compaction reads and writes.
- Schedule maintenance outside peak hours. Run repairs, bootstraps, and snapshot operations when client traffic is low.
- Size compaction for peaks. Set
compaction_throughput_mb_per_sechigh enough to cover peak write rates plus headroom, and ensureconcurrent_compactorsmatches available CPU without starving read threads. - Baseline per-table SSTable counts. Track
LiveSSTableCountper table so you catch divergence before it becomes a cluster-wide backlog. - Watch for tombstone growth. Monitor
TombstoneScannedHistogramand tombstone log warnings. Tombstones slow compaction and accelerate the spiral. - Match strategy to workload. Review your compaction strategy during capacity planning. STCS is write-optimized but requires significant space and I/O headroom. LCS provides steadier latency but demands more compaction throughput. Choose the strategy that fits your access patterns.
How Netdata helps
- Correlates
PendingTaskswith per-disk I/O utilization andawaitin the same time frame to pinpoint whether compaction is I/O-bound or CPU-bound. - Surfaces the rate of change of compaction pending tasks, making trends visible before absolute thresholds breach.
- Displays JVM heap usage and GC pause duration alongside compaction metrics to reveal when GC stalls are creating artificial backlog.
- Visualizes per-table
LiveSSTableCountand read latency percentiles so you can confirm read amplification impact without manually samplingnodetool. - Tracks disk space usage with configurable headroom alerts.
Related guides
- Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM
- Cassandra GC death spiral: long pauses, gossip flapping, and recovery
- Cassandra monitoring checklist: the signals every production cluster needs
- Cassandra monitoring maturity model: from survival to expert
- How Cassandra actually works in production: a mental model for operators







