$ guides / cassandra / cassandra-too-many-sstables ▌

Operations Guides

Cassandra too many SSTables per table: read amplification and how to fix it

Your Cassandra read P99 latency is climbing. Queries that used to take milliseconds now time out. The application reports intermittent ReadTimeoutException. You check nodetool tablestats and see one table sitting at 200 SSTables. For LCS, that is a catastrophe. For STCS, anything sustained above 50 means compaction has fallen behind.

Each extra SSTable adds a bloom filter check and a potential disk seek to every read. The read path must merge fragments from memtables plus every SSTable that might contain the partition. When SSTable counts balloon, the node spends more time checking filters and seeking than returning data. This is read amplification.

nodetool compact can fix it, but running it blindly doubles disk usage and starves I/O. Confirm the diagnosis, choose the right intervention, and fix the root cause so the count stays low.

What this means

The number of SSTables per table determines how many files a read must consult. More SSTables means more bloom filter checks, more index lookups, and more merge-sort work per query.

Healthy thresholds depend on your compaction strategy:

LCS: targets ~10 SSTables per level. L0 should stay below 32. Sustained counts above 100 indicate severe level imbalance.
STCS: should stabilize below 32 in a healthy table. Sustained counts above 50 signal compaction debt. Above 100, reads are effectively broken.
TWCS: old time windows should compact down to one SSTable. Multiple SSTables in old windows indicate problems.

When compaction cannot keep up with the flush rate, SSTables accumulate. This creates a feedback loop: reads slow down, consuming I/O and CPU that compaction also needs, which makes compaction fall further behind.

flowchart TD
    A[High write rate or slow compaction] --> B[SSTables accumulate]
    B --> C[More bloom filters checked per read]
    C --> D[Disk seeks and merge overhead increase]
    D --> E[Read latency P99 spikes]
    E --> F[Compaction starved of I/O]
    F --> B

Common causes

Cause	What it looks like	First thing to check
Write rate exceeds compaction throughput	Pending compactions rising for days; SSTable count growing steadily; disk I/O near saturation	`nodetool compactionstats` and disk `await`
Compaction throttled too aggressively	Low disk I/O despite high SSTable count; throughput limit set too low	`nodetool getcompactionthroughput`
LCS L0 backlog	L0 SSTable count > 32; higher levels look balanced	`nodetool tablestats` SSTables in each level
Memtable flush pressure	Many tiny SSTables (few KB each); flush writers busy	`nodetool tablestats` memtable switch count
Repair or streaming burst	SSTable count spike after bootstrap, decommission, or repair	`nodetool netstats` and repair history
Wrong compaction strategy for workload	Read-heavy workload on STCS with runaway SSTable growth; or time-series data not using TWCS	Schema and access pattern review

Quick checks

# SSTable count for a specific table
nodetool tablestats <keyspace> <table> | grep "SSTable count"

# Pending compaction tasks
nodetool compactionstats

# Disk I/O latency on data device
iostat -x 1

# Thread pool saturation
nodetool tpstats

# Live SSTable count via JMX
# bean: org.apache.cassandra.metrics:type=Table,keyspace=<ks>,scope=<table>,name=LiveSSTableCount

How to diagnose it

Confirm the symptom. Run nodetool tablestats <keyspace> <table> and look for SSTable count. Compare against your compaction strategy threshold (LCS > 100, STCS > 50 sustained).
Determine if compaction is falling behind. Run nodetool compactionstats. If pending tasks are increasing over hours or days, the node is creating SSTables faster than it can merge them.
Check disk I/O. Run iostat -x 1 on the data volume. If %util is above 90% or await is elevated, compaction is likely I/O-starved.
Identify level imbalance for LCS. nodetool tablestats shows SSTables in each level. If L0 is swollen (for example, 100/10) while L1+ are balanced, L0 compaction is the bottleneck.
Correlate with read latency. Check nodetool proxyhistograms or per-table coordinator latency. Rising P99 with stable write volume strongly suggests read amplification.
Check for tiny SSTables. If memtables are flushing prematurely due to memory pressure, you will see many small SSTables that overwhelm compaction. Review Memtable switch count in nodetool tablestats.
Verify disk space headroom. Run df -h on the data directory. If usage is above 50% with STCS or above 70% with LCS, compaction may be unable to allocate temporary space.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`LiveSSTableCount`	Direct read amplification indicator	LCS > 100 or STCS > 50 sustained
Pending compactions	Leading indicator of compaction debt	Trending upward over 4+ hours
Disk I/O `await`	Compaction and reads compete for I/O	SSD `await` > 10ms or HDD > 50ms sustained
Read latency P99	Client-visible impact	P99 > 3x baseline sustained
File descriptor usage	Each SSTable opens ~6 FDs	Open FDs > 80% of ulimit
Off-heap memory	Bloom filters and compression metadata scale with SSTable count	RSS minus heap trending up with SSTable count

Fixes

Compaction throughput too low

If compaction_throughput_mb_per_sec is throttled too low for your write volume, increase it temporarily:

# Check current limit
nodetool getcompactionthroughput

# Increase (resets to default on restart unless changed in cassandra.yaml)
nodetool setcompactionthroughput 256

Tradeoff: Higher throughput steals I/O bandwidth from reads. Increase it only when reads are already degraded and you need compaction to catch up.

Emergency manual compaction

If a single table is critically bloated and disk space permits:

# WARNING: This creates a new full SSTable alongside old ones.
# Ensure you have at least 30-50% free disk space before running.
nodetool compact <keyspace> <table>

Tradeoff: nodetool compact holds old and new SSTables on disk simultaneously. On a full disk it can trigger compaction failure or node instability. With STCS it rewrites SSTables into one. Use this as a bridge, not a cure.

Wrong compaction strategy

If STCS cannot keep up on a read-heavy workload, plan a migration to LCS or UCS (Cassandra 5.0+). Changing strategy triggers a full recompaction, which is disruptive. Schedule it during a maintenance window after verifying disk space.

LCS L0 backlog

If L0 is swollen but higher levels are healthy, increase concurrent_compactors in cassandra.yaml (requires restart) if CPU allows, and verify sstable_size_in_mb is not set below 64 MB. Small SSTables flood L0 faster than compaction can drain it.

Hinted handoff or repair debris

If the spike followed a node recovery or repair, allow compaction to settle. If hints are replaying aggressively and creating compaction debt, reduce hinted_handoff_throttle_in_kb in cassandra.yaml.

Prevention

Monitor the derivative of pending compaction tasks, not just the absolute value. An increasing trend over 24 hours is a leading indicator.
Maintain disk headroom: > 50% free for STCS, > 30% for LCS/TWCS. Compaction cannot run without temporary space.
Place commitlog and data directories on separate devices. Shared I/O between commitlog writes and compaction reads/writes is a common bottleneck.
Review tables with sustained LiveSSTableCount growth weekly. Catching compaction debt at 30 SSTables is easier than at 300.
For time-series workloads with TTL, use TWCS so expired windows drop as units instead of requiring full compaction merges.

How Netdata helps

Correlate per-table LiveSSTableCount with read latency P99 and disk await on one timeline to confirm read amplification.
Track compaction pending tasks with automatic trend detection to surface backlog before it becomes an incident.
Monitor off-heap memory growth alongside SSTable count to catch bloom filter expansion before it triggers OOM.
Alert on file descriptor usage approaching the ulimit.
Flag nodes with read latency deviations from the cluster median to isolate SSTable bloat.

Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM: /guides/cassandra/cassandra-consistency-levels-explained/
Cassandra GC death spiral: long pauses, gossip flapping, and recovery: /guides/cassandra/cassandra-gc-death-spiral/
Cassandra GC pauses too long: diagnosing G1 stop-the-world pauses: /guides/cassandra/cassandra-gc-pauses-too-long/
Cassandra heap pressure: sizing the JVM heap and tuning G1GC: /guides/cassandra/cassandra-heap-pressure-tuning/
Cassandra monitoring checklist: the signals every production cluster needs: /guides/cassandra/cassandra-monitoring-checklist/
Cassandra monitoring maturity model: from survival to expert: /guides/cassandra/cassandra-monitoring-maturity-model/
Cassandra java.lang.OutOfMemoryError: Java heap space - causes and recovery: /guides/cassandra/cassandra-out-of-memory-error/
Cassandra pending compactions growing: the compaction backlog runbook: /guides/cassandra/cassandra-pending-compactions-growing/
How Cassandra actually works in production: a mental model for operators: /guides/cassandra/how-cassandra-works-in-production/

The Netdata solution

Cassandra monitoring with Netdata

Netdata monitors Apache Cassandra with per-second metrics and automatic dashboards. Correlate GC pauses, compaction backlog, tombstone rates, pending hints, and disk usage across nodes to catch a creeping cluster before it tips over.

See Cassandra monitoring → Start monitoring free

Cassandra too many SSTables per table: read amplification and how to fix it

Cassandra too many SSTables per table: read amplification and how to fix it

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Compaction throughput too low

Emergency manual compaction

Wrong compaction strategy

LCS L0 backlog

Hinted handoff or repair debris

Prevention

How Netdata helps

Related guides

Cassandra monitoring with Netdata