Cassandra repair not running: the silent gap that resurrects deleted data
Deleted rows reappear in application queries, or system_distributed.repair_history shows a last success older than your gc_grace_seconds window. In Cassandra, a repair that is not running, not completing, or not scheduled is an active data integrity risk.
Cassandra uses tombstones to track deletions and TTL expirations. Those tombstones must survive on every replica until anti-entropy repair propagates the delete. Once gc_grace_seconds passes, compaction drops tombstones. If repair has not finished for a table within that window, nodes that missed the original delete retain live data while the rest have discarded the tombstone. The result is zombie data resurrection.
flowchart TD
A[Repair not completed within gc_grace_seconds] --> B[Tombstones compacted away on replicas that received delete]
B --> C[Unrepaired replicas retain original live data]
C --> D[Deleted data reappears on reads]What this means
Anti-entropy repair builds Merkle trees across token ranges and streams differences between replicas. It is the only mechanism that guarantees every replica has seen every deletion. gc_grace_seconds defaults to 864,000 seconds (10 days). Compaction will purge tombstones older than this threshold. Repair must complete for every table before that purge happens.
When repair lapses:
- Tombstones compact away on replicas that received the delete.
- Replicas that were down, partitioned, or slow during the delete never received the tombstone.
- Those unrepaired replicas still serve the original data.
- A subsequent read at a consistency level that includes the unrepaired replica returns the deleted row.
This is the deterministic outcome of a missing or incomplete repair cycle.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Repair never scheduled or automation broke | No recent entries in system_distributed.repair_history; no orchestration tool configured | Query system_distributed.repair_history for the latest repair per table |
| Repair starts but fails silently | Repair command returns while background sessions hang or abort; only some token ranges finish | nodetool repair_admin list (4.0+) for incomplete sessions |
| Repair too slow for the dataset | Last successful repair timestamp drifts closer to gc_grace_seconds each cycle | Compare last repair time to the table’s gc_grace_seconds |
| Pre-4.0 incremental repair bugs | SSTable proliferation or inconsistent state on Cassandra 3.x clusters | Cassandra version and whether incremental repair is enabled |
| Compaction backlog blocking repair | High pending compactions; anti-compaction cannot split SSTables fast enough | nodetool compactionstats and pending task trends |
| STCS anti-compaction overload on dense nodes | Repair stalls for hours or days; large SSTables rewritten repeatedly | SSTable sizes and compaction strategy configuration |
Quick checks
Run these read-only commands to assess the repair gap without mutating data.
# Check last repair timestamp per table (3.x+)
cqlsh -e "SELECT keyspace_name, table_name, finished_at FROM system_distributed.repair_history LIMIT 50;"
# Get gc_grace_seconds for a specific table
cqlsh -e "SELECT gc_grace_seconds FROM system_schema.tables WHERE keyspace_name='ks' AND table_name='tbl';"
# Check parent repair session status (4.0+)
cqlsh -e "SELECT keyspace_name, started_at, finished_at, state FROM system_distributed.parent_repair_history LIMIT 20;"
# List active and historical repair sessions (4.0+)
nodetool repair_admin list
# Check repaired vs unrepaired data percentage (4.0+)
nodetool tablestats <keyspace>.<table>
# Check compaction backlog that may block repair progress
nodetool compactionstats
# Check active streaming sessions from running repair
nodetool netstats
# Search logs for repair errors or session timeouts
grep -i "repair" /var/log/cassandra/system.log
How to diagnose it
Confirm the repair gap. Query
system_distributed.repair_historyto find the latest completed repair for each keyspace and table. On Cassandra 4.0+,nodetool repair_admin listshows session states and completion percentages. If a table has no completed repair withingc_grace_seconds, the gap is confirmed.Check the deadline. Retrieve the table’s
gc_grace_secondsvalue fromsystem_schema.tables. Compute 80% of that value. If the last repair is older than that threshold, the table is in the danger zone. With the default 10 days, alert at 8 days.Identify silent failures. A repair command can return to the shell while background sessions continue, fail, or hang. Verify session status in
repair_admin list,parent_repair_history, or logs. Look for sessions that started but never reached a successful completion.Inspect compaction interaction. Repair generates anti-compaction work. If
nodetool compactionstatsshows pending tasks trending upward while repair is active, compaction is the bottleneck. On Cassandra 5.0+, check whether you are using STCS with incremental repair. STCS anti-compaction can rewrite multi-hundred gigabyte SSTables repeatedly, stalling repair indefinitely.Review version-specific bugs. On Cassandra 3.x, incremental repair is unreliable and can leave repaired state inconsistent (CASSANDRA-9143). If you are on 3.x and using incremental repair, switch to full subrange repair or upgrade to 4.0+. Check for FINALIZED repair sessions not cleaned up after range movements in 4.0+/5.0+. Also check for incremental repair failures on mixed IPv4/IPv6 clusters.
Check node density and vNode count. High
num_tokensincreases Merkle tree comparisons per repair. Very large SSTables on dense nodes extend repair duration per subrange. Evaluate whether the chosen segment count for subrange repair is appropriate for your SSTable sizes.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Time since last successful repair | Direct measure of the anti-entropy window | Exceeds 80% of gc_grace_seconds |
| Repair session status | Catches repairs that start but do not finish | Sessions that do not reach successful completion after extended runtime |
| Pending compactions | Anti-compaction and repair-generated SSTables must be processed | Trending upward over hours during a repair cycle |
| Unrepaired SSTable percentage (4.0+) | Shows incremental repair backlog | Percentage growing week over week |
| Tombstone scan warnings | Tombstones accumulate when repair cannot purge them | Sustained log entries exceeding tombstone_warn_threshold |
| Read repair rate | Compensates for full repair gaps but adds latency | Spike not correlated with a node returning from downtime |
Fixes
If repair has never been scheduled
Deploy Reaper or another orchestration tool. Manual cron jobs invoking nodetool repair fail silently because they do not handle token range calculation, failure retries, or progress tracking. Reaper manages subrange scheduling and persists completion state.
If repair is running but too slow
Switch to subrange full repair with smaller segments to reduce per-pass memory and I/O overhead. Increase stream_throughput_outbound_megabits_per_sec temporarily if network is the bottleneck, but monitor client latency. If the dataset is too large to repair within gc_grace_seconds, you must either increase subrange segment count or increase gc_grace_seconds after confirming repair can complete within the new window.
If compaction is blocking repair
On Cassandra 5.0+, migrate tables experiencing anti-compaction stalls to UnifiedCompactionStrategy (UCS). UCS handles incremental repair anti-compaction far more efficiently than SizeTieredCompactionStrategy (STCS).
WARNING: Changing compaction strategy requires rewriting all existing SSTables and is extremely disk and I/O intensive. Test on a non-production cluster first, and execute during a maintenance window.
If you must stay on STCS, schedule repair only during off-peak hours and ensure at least 50% free disk space for anti-compaction headroom.
If you are on Cassandra 3.x
Avoid incremental repair entirely. Use full repair only. The incremental repair rewrite in 4.0 made the feature production-ready; on 3.x it can corrupt repaired state and proliferate tiny SSTables.
If data resurrection has already occurred
Run a full repair immediately on affected tables to reconcile replicas. Then audit application data for inconsistencies. There is no automatic rollback for resurrected data; you must identify and re-delete invalid rows at the application layer after repair restores consistency.
Prevention
- Alert on the gap, not the failure. Set an alert when last successful repair exceeds 80% of
gc_grace_seconds. Do not wait for data resurrection. - Verify completion, not just invocation. Monitoring that calls
nodetool repairis insufficient. Confirm all token ranges finish successfully. - Schedule during off-peak windows. Repair generates heavy disk and network I/O. Running it during peak traffic creates the saturation that causes repair to stall.
- Track compaction as a leading indicator. A node with rising pending compactions will soon fail to complete repair within its window. Treat compaction backlog as a repair risk.
- Prefer automated orchestration. Cassandra 6.0 introduces automated repair orchestration, which removes external scheduling dependencies. Until then, use Reaper.
How Netdata helps
- Correlate repair gaps with compaction backlog, disk I/O saturation, and streaming throughput to identify whether repair is stalled by resource contention.
- Alert on
system_distributed.repair_historyage ornodetool repair_admin liststate via custom data collection. - Visualize tombstone scan warnings alongside read latency percentiles to spot early signs of unrepaired tombstone accumulation.
- Track per-node SSTable counts and pending compaction trends as leading indicators for repair capacity.
Related guides
- Cassandra compaction strategies: STCS vs LCS vs TWCS vs UCS
- Cassandra clock skew: how NTP drift silently corrupts data
- Cassandra compaction death spiral: when writes outrun compaction throughput
- Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM
- Cassandra zombie data resurrection: gc_grace_seconds and unrepaired tombstones
- Cassandra disk space exhaustion: emergency recovery when the data volume fills
- Cassandra dropped mutations: silent write loss and load shedding
- Cassandra dropped reads and other messages: reading nodetool tpstats Dropped
- Cassandra GC death spiral: long pauses, gossip flapping, and recovery
- Cassandra GC pauses too long: diagnosing G1 stop-the-world pauses
- Cassandra gossip flapping: nodes bouncing UP and DOWN
- Cassandra heap pressure: sizing the JVM heap and tuning G1GC







