Cassandra Not enough space for compaction: STCS space amplification and recovery
Not enough space for compaction in system.log means STCS has hit a structural space-amplification limit. Disk usage may already be above 50%. Cassandra aborts the compaction, skips the tier, and leaves tombstones and old versions unmerged. SSTable count rises, read amplification increases, and free space stops being reclaimed. Left unchecked, this enters a compaction death spiral that ends in write rejection or disk exhaustion. Recovery requires immediate free space, targeted cleanup, and a headroom plan that accounts for transient STCS amplification.
What this means
STCS compacts SSTables by reading all input files in a tier and writing a new merged file. Both the input and output sets coexist until the new SSTable is complete and the old ones are removed. If a tier contains four 50 GB SSTables, compaction writes a new ~200 GB file before deleting the sources. For large tiers, transient disk usage can equal the full size of the data being compacted. Treat running STCS above 50% disk utilization as dangerous: any large compaction can exhaust remaining space.
When Cassandra skips a compaction due to insufficient space, SSTables accumulate. Higher SSTable counts increase read amplification and leave tombstones in place. Because compaction is the only way to reclaim space from deletes and overwrites, a stall also halts reclamation. The result is a feedback loop: less free space means fewer compactions, which means more SSTables and worse performance.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| STCS headroom exceeded | Not enough space for compaction in logs; pending compactions rising while disk is above 50% full | df -h /var/lib/cassandra/data and nodetool info | grep "Load" |
| Stale snapshots | Filesystem usage grows faster than Load; old backups never cleaned up | nodetool listsnapshots |
| Write rate exceeding compaction throughput | Pending compactions trending up; SSTable count growing; disk I/O saturated | nodetool compactionstats and iostat -x 1 |
| Overwrite workload amplifying space | Rapid disk growth on update-heavy tables; same partitions rewritten repeatedly | nodetool tablestats <keyspace> | grep "Space used" |
| Hint accumulation after node outage | Hints directory large; a node was recently marked DOWN | du -sh /var/lib/cassandra/hints/ |
Quick checks
# Check filesystem free space on the data volume
df -h /var/lib/cassandra/data
# Check Cassandra's live data size estimate (excludes snapshots and hints)
nodetool info | grep "Load"
# Check pending and active compactions
nodetool compactionstats
# Check SSTable accumulation per table
nodetool tablestats | grep "SSTable count"
# Check for snapshots
nodetool listsnapshots
# Check hint file accumulation
du -sh /var/lib/cassandra/hints/
# Check disk I/O saturation on data and commitlog devices
iostat -x 1
# Check CompactionExecutor thread pool status
nodetool tpstats | grep -A1 "CompactionExecutor"
How to diagnose it
- Confirm the exact log error. Search
system.logfor the verbatim message:grep "Not enough space for compaction" /var/log/cassandra/system.log. Note the timestamp and table name if included. - Compare data size to free space. Run
nodetool info | grep "Load"to see Cassandra’s live data size, then rundf -hon the data mount. If free space is less than theLoadvalue, a major compaction of the full dataset cannot complete. Regular tier compactions can also fail if the target tier is larger than available free space. - Verify compaction is stalled. Run
nodetool compactionstats. If pending tasks are high but no compactions are active, and logs show space errors, disk pressure is the direct blocker. - Find snapshot bloat. Snapshots hold hard links to SSTables. Compaction cannot delete original files that are still linked by snapshots, so snapshot growth directly amplifies space consumption. Check snapshot directories under the data path.
- Quantify SSTable growth. Run
nodetool tablestats <keyspace> <table>and checkSSTable count. In STCS, sustained counts above 50 indicate compaction is falling behind. - Check for storage exceptions. Run
grep -i "FSError\|CorruptSSTable\|IOError" /var/log/cassandra/system.log. A failing disk can trigger write errors that resemble space exhaustion. - Correlate write and flush pressure. In
nodetool tpstats, rising completed tasks onMemtableFlushWriterwith flat or slowCompactionExecutoractivity means flush debt is outpacing compaction.
flowchart TD
A[Disk usage exceeds 50% with STCS] --> B{Cassandra selects large tier for compaction}
B -->|Not enough free space| C[Compaction skipped]
C --> D[SSTables accumulate]
D --> E[Read amplification rises]
E --> F[More disk consumed by old SSTables]
F --> B
D --> G[Pending compactions grow unchecked]
G --> H[Compaction death spiral]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Disk space available (data dir) | STCS needs temporary space equal to the data size being compacted | < 50% free for STCS; < 30% for LCS; < 20% for TWCS |
| Pending compactions | Shows compaction backlog directly | Trending upward for more than 4 hours |
| SSTable count per table | Accumulates when compaction is skipped | STCS sustained count > 50 |
| Storage exceptions | Hardware errors can mimic space issues | Any non-zero rate |
| Disk I/O utilization | Saturated I/O slows compaction, preventing space reclamation | %util > 80% sustained |
| Commitlog pending tasks | Write-path pressure from disk exhaustion | Pending > 0 sustained |
| Client request timeouts | Reads suffer from accumulated SSTables | Rate > 0 sustained |
Fixes
Free disk space immediately
The fastest way to give compaction room is to remove snapshots. Snapshots are hard links to SSTables. They prevent the space from being reclaimed when compaction replaces the original files.
# List snapshots before deleting to confirm they are stale
nodetool listsnapshots
# Remove all snapshots. WARNING: only run if backups are verified or stale.
nodetool clearsnapshot --all
If the node is on a cloud volume, expand the filesystem after resizing the block device. Adding space is safer than running a major compaction on a nearly full disk, because the compaction itself can transiently double disk usage and push the node into disk-full failure.
Trigger targeted compaction, not cluster-wide major
Avoid a full major compaction when disk is low. Instead, target the largest or most bloated tables. Before running, check the table size to ensure the compaction will fit:
# Check target table size
nodetool tablestats <keyspace> <table> | grep "Space used"
# Target a specific high-bloat table
nodetool compact <keyspace> <table>
Tradeoff: Even targeted compaction consumes temporary space. Only run this after clearing snapshots or expanding storage. The operation is I/O intensive and will raise read latency while it runs. After starting, run nodetool compactionstats to confirm the job is active and progressing.
Increase compaction throughput temporarily
If disk I/O is not saturated and the bottleneck is compaction speed, raise the throttle:
# Increase compaction throughput (example: 128 MB/s)
nodetool setcompactionthroughput 128
Tradeoff: Higher throughput steals I/O bandwidth from reads and flushes. Monitor iostat -x 1 and client latency while the value is elevated. Return it to baseline once the backlog clears.
Stop background I/O consumers
Pause repair and streaming operations during recovery. Repair generates anti-compaction SSTables and consumes network and disk I/O. Check for active streams:
nodetool netstats
If repair is running, let it finish or schedule it for off-peak, but do not start new full repairs while fighting disk pressure.
Change compaction strategy (long-term)
If the workload is read-heavy or time-series, migrate away from STCS. LCS provides steadier space usage and needs roughly 30% headroom. TWCS needs roughly 20% headroom.
Tradeoff: Altering the compaction strategy on an existing table triggers a full recompaction. This requires significant temporary space and I/O. Plan the migration for a maintenance window when the node has adequate free space.
Prevention
- Maintain per-strategy headroom. STCS requires greater than 50% free disk space. LCS requires greater than 30%. TWCS requires greater than 20%.
- Automate snapshot cleanup. Retention scripts should run
nodetool clearsnapshotafter backup verification. Hard-linked snapshot data silently accumulates as compaction progresses. - Monitor compaction trends, not just absolute pending counts. Alert when pending compactions increase continuously over a 24 hour period.
- Separate commitlog and data directories. Place them on independent volumes so commitlog growth does not steal headroom from compaction.
- Provision for transient amplification, not just live data size. Size disks assuming STCS will need up to 100% additional space during major compaction.
How Netdata helps
- Correlate data volume disk usage with JMX compaction pending tasks and per-table SSTable counts.
- Alert on disk space crossing strategy-specific thresholds before compaction stalls.
- Track read latency spikes alongside SSTable growth to surface the compaction death spiral before disk exhaustion.
- Monitor JMX metrics like
org.apache.cassandra.metrics:type=Compaction,name=PendingTasksand per-tableLiveSSTableCountwithout manualnodetoolpolling.
Related guides
- Cassandra compaction strategies: STCS vs LCS vs TWCS vs UCS
- Cassandra compaction death spiral: when writes outrun compaction throughput
- Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM
- Cassandra zombie data resurrection: gc_grace_seconds and unrepaired tombstones
- Cassandra GC death spiral: long pauses, gossip flapping, and recovery
- Cassandra GC pauses too long: diagnosing G1 stop-the-world pauses
- Cassandra heap pressure: sizing the JVM heap and tuning G1GC
- Cassandra monitoring checklist: the signals every production cluster needs
- Cassandra monitoring maturity model: from survival to expert
- Cassandra java.lang.OutOfMemoryError: Java heap space - causes and recovery
- Cassandra pending compactions growing: the compaction backlog runbook
- Cassandra Scanned over N tombstones warning: finding the offending query







