Cassandra snapshots silently consuming disk: hard links and clearsnapshot
df shows usage climbing toward 90%, but nodetool info Load does not explain the gap. If commitlog and hints are normal and compaction backlog is small, the missing space is likely a snapshot taken days or weeks ago. Cassandra snapshots use hard links, so they allocate no additional blocks at creation. After compaction deletes live SSTables, the snapshot links become the sole owners of old blocks and silently consume gigabytes.
This occurs after backup automation, schema changes, or emergency nodetool snapshot runs that are never cleaned up. With Size-Tiered Compaction Strategy (STCS), major compaction can already require up to 100% additional temporary space. A forgotten snapshot can push a node from comfortable to disk full without any warning in live data metrics.
flowchart TD
A[Snapshot creates hard links to live SSTables] --> B[True size reported as zero]
B --> C{Compaction replaces live SSTable?}
C -->|No| D[Blocks shared with live data]
C -->|Yes| E[Live directory entry removed]
E --> F[Snapshot link now sole owner]
F --> G[True size grows silently]
G --> H[Disk usage increases without Load changing]What this means
When you run nodetool snapshot, Cassandra creates hard links to current SSTable files inside <data_dir>/<keyspace>/<table>/snapshots/<tag>/. The snapshot and live SSTable share the same inode, so no data is copied and no additional disk blocks are allocated. nodetool listsnapshots reflects this by showing a True size near zero while the original SSTable still exists.
As compaction runs and merges SSTables, the live versions are deleted. Because the snapshot directory still holds a hard link to the original inode, the blocks are not freed. The snapshot True size grows to match the space it exclusively owns. This consumption is invisible to nodetool info Load, which counts only live data files and excludes snapshots, commitlog, hints, and compaction temporary files. A single forgotten snapshot on a 50 GB table can eventually hold 50 GB of dead data, and multiple snapshots compound the problem.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Forgotten manual or backup snapshots | Gradual disk growth after a backup window; old tags in listsnapshots | nodetool listsnapshots |
| Auto-snapshot on DROP or TRUNCATE | Sudden disk jumps after schema changes; tags prefixed with dropped- or timestamps | nodetool listsnapshots |
| Incremental backups without cleanup | backups/ directories grow indefinitely; space not reclaimed by clearsnapshot | find /var/lib/cassandra/data -name backups -type d |
| Orphaned snapshots from dropped tables | listsnapshots underreports usage; snapshot directories exist for tables no longer in the schema | find /var/lib/cassandra/data -name snapshots -type d |
Quick checks
# List all snapshots and their true sizes
nodetool listsnapshots
# Check whether auto-snapshot is enabled (default true)
grep "auto_snapshot" /etc/cassandra/cassandra.yaml
# Check whether incremental backups are enabled (default false)
grep "incremental_backups" /etc/cassandra/cassandra.yaml
# Find snapshot directories, including orphans from dropped tables
find /var/lib/cassandra/data -name snapshots -type d
# Check keyspace-level disk usage
du -sh /var/lib/cassandra/data/*/
How to diagnose it
- Compare
nodetool infoLoad todfoutput. If the filesystem shows significantly more usage, check snapshots, incremental backups, and compaction temp files. - Run
nodetool listsnapshots. High True size means the snapshot exclusively owns blocks that would otherwise be free. - Identify old tags. Snapshots older than your retention window are deletion candidates.
- Check for orphaned snapshots. Run
find /var/lib/cassandra/data -name snapshots -type dand inspect subdirectories. Snapshot directories for table UUIDs that no longer exist in the schema are leftovers from dropped tables.nodetool listsnapshotsdoes not show snapshots for dropped tables, so they are invisible to standard tooling. - Check
cassandra.yamlforauto_snapshotandincremental_backups. Ifincremental_backupsis true, thebackups/subdirectories grow indefinitely unless you manage them externally. - Correlate the start of disk growth with snapshot creation or compaction. Snapshots created before major compaction grow in True size once it finishes.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Disk space available (df) | Snapshots consume physical blocks not reflected in Cassandra’s Load metric | Free space declining while live data size is stable |
nodetool listsnapshots True size | Measures disk blocks held exclusively by the snapshot | True size large or growing week over week |
nodetool info Load vs. filesystem used | Load excludes snapshots; a large gap indicates hidden consumption | df used significantly larger than Load |
| Pending compactions | Compaction deletes live SSTables and converts shared hard links into snapshot-owned blocks | Pending compactions high with simultaneous unexplained disk growth |
auto_snapshot enabled | Creates automatic snapshots on every DROP and TRUNCATE | Unexpected snapshot tags appearing after schema changes |
Fixes
Remove forgotten snapshots with clearsnapshot
Warning: This permanently deletes snapshot data. Do not remove snapshots that have not been backed up or verified.
Remove all snapshots on the node:
# Remove all snapshots (destructive)
nodetool clearsnapshot --all
Remove a specific snapshot by tag:
# Remove one snapshot tag from a specific keyspace (destructive)
nodetool clearsnapshot -t <tag> <keyspace>
If your version supports it, remove snapshots older than a retention window:
# Remove snapshots older than 7 days (destructive)
nodetool clearsnapshot --older-than 7d
Clean up orphaned snapshots from dropped tables
nodetool listsnapshots does not show snapshots for dropped tables. Identify orphaned directories with find /var/lib/cassandra/data -name snapshots -type d, inspect the contents, and remove snapshot directories for tables that no longer exist. This is destructive and irreversible.
Purge incremental backup links
nodetool clearsnapshot does not remove hard links created by incremental_backups: true. These accumulate in backups/ subdirectories under each table directory. You must script rotation or removal manually. Leaving incremental_backups enabled without cleanup will fill disks predictably.
Disable automatic snapshot creation
If you frequently DROP or TRUNCATE and do not need the safety net, set auto_snapshot: false in cassandra.yaml. This requires a rolling restart. The tradeoff is that dropped or truncated tables will not be recoverable from local snapshots. With auto_snapshot: true, TRUNCATE operations wait for the snapshot to complete before returning.
Use snapshot TTL
If you are on Cassandra 4.1 or later, use nodetool snapshot --ttl <seconds> to set an expiration on new snapshots. This eliminates manual cleanup for snapshots that do not need indefinite retention.
Prevention
- Automate
nodetool clearsnapshotafter backup verification completes. Never leave snapshots on production nodes longer than necessary. - Monitor
nodetool listsnapshotsTrue size as a time series. A steady upward trend is an early warning. - If you enable
incremental_backups, implement an external rotation job that deletes old hard links frombackups/directories.clearsnapshotwill not do this for you. - Maintain a snapshot naming convention that includes creation dates in the tag, making it easy to identify stale snapshots.
How Netdata helps
- Disk space alerts on Cassandra data directories trigger before compaction headroom is breached, giving time to clear snapshots before writes stall.
- Correlation of disk usage with pending compactions and SSTable count helps distinguish snapshot growth from compaction backlog.
- Tracking filesystem usage independently of Cassandra Load surfaces hidden consumption from snapshots or incremental backups.
- Long-term retention of disk usage trends makes it easy to correlate the start of growth with backup schedules or schema changes.
Related guides
- Cassandra compaction strategies: STCS vs LCS vs TWCS vs UCS
- Cassandra compaction death spiral: when writes outrun compaction throughput
- Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM
- Cassandra zombie data resurrection: gc_grace_seconds and unrepaired tombstones
- Cassandra GC death spiral: long pauses, gossip flapping, and recovery
- Cassandra GC pauses too long: diagnosing G1 stop-the-world pauses
- Cassandra heap pressure: sizing the JVM heap and tuning G1GC
- Cassandra monitoring checklist: the signals every production cluster needs
- Cassandra monitoring maturity model: from survival to expert
- Cassandra java.lang.OutOfMemoryError: Java heap space - causes and recovery
- Cassandra pending compactions growing: the compaction backlog runbook
- Cassandra Scanned over N tombstones warning: finding the offending query







