Cassandra CorruptSSTableException and FSError: disk failure and recovery
A Cassandra node stops serving traffic or refuses to start. In system.log you see org.apache.cassandra.io.sstable.CorruptSSTableException, FSError, or a JVM shutdown triggered by a filesystem exception. These indicate disk failure, filesystem corruption, or irreversible SSTable damage, not retryable application bugs.
Because SSTables are immutable, a corrupt file cannot be patched. The node either stops serving the data or shuts down, depending on disk_failure_policy. Recovery requires at least one healthy replica. Without that, corruption is data loss.
This guide covers confirming the failure, determining whether the node failed at startup or runtime, and replacing damaged SSTables safely.
What this means
CorruptSSTableException signals an internal consistency failure inside an SSTable: checksum mismatch, corrupt block, malformed row, or unexpected EOF. FSError wraps lower-level filesystem errors such as I/O errors, permission denied, or unavailable disk. Both trigger the configured disk_failure_policy.
The default policy is stop. A storage exception shuts down gossip and native transport; the node remains alive but stops serving client traffic. Other values:
die- shut down the JVM immediately.best_effort- stop using the failed disk directory and continue on the remaining directories for that restart.ignore- log and continue. Requests fail silently. Never use in production.
The JMX counter org.apache.cassandra.metrics:type=Storage,name=Exceptions tracks uncaught storage subsystem errors. Any non-zero rate, plus any CorruptSSTableException or FSError in the logs, is a PAGE-level event.
flowchart TD
A[Log shows CorruptSSTableException] --> B{Startup or runtime?}
B -->|Startup| C[Node fails to join]
B -->|Runtime| D[Node stops serving traffic]
C --> E[Find bad SSTable path]
D --> E
E --> F[Run nodetool verify]
F --> G[Remove SSTable and repair]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Disk hardware failure | FSError or IOError in system.log; node exits or stops transport | OS logs (dmesg) and SMART status for disk errors |
| SSTable corruption from failed compaction or bit rot | CorruptSSTableException naming a specific SSTable path during startup or reads | The exact file prefix in the exception message |
| Unexpected shutdown during write or compaction | CorruptSSTableException, EOFException, or checksum errors after a crash | Node uptime, OOM kills, or power events preceding the failure |
| Filesystem or kernel fault | FSError without accompanying SMART errors | Kernel logs and filesystem consistency |
Startup vs runtime behavior
The recovery path depends on when Cassandra encounters the error.
Startup failure. If an SSTable is corrupt during initialization, Cassandra logs the path and either fails to start (die) or aborts startup of the storage layer (stop). The exception stack trace includes the full path to the component file. Note the keyspace, table name, and SSTable generation number.
Runtime failure. If the error occurs during a read or compaction on a live node, disk_failure_policy determines the result. With stop, the node halts gossip and native transport but remains running. The exception is logged with the SSTable path. Client connections drop, but the JVM stays up, which can aid diagnostics. With die, the JVM exits and the node leaves the ring.
Confirming the failure
Before replacing data, confirm the failure at both the OS and Cassandra layers.
- Inspect
system.logfor the exact exception. ACorruptSSTableExceptionincludes the SSTable path. AnFSErrorincludes the underlying cause (for example,java.io.IOErrororjava.io.FileNotFoundExceptionwith I/O details). - Check OS-level disk health. Run
dmesgfor I/O errors, bus resets, or filesystem remounts. Runsmartctlagainst the physical device to check reallocated sectors, pending sectors, or command timeouts. These are read-only checks. - Check filesystem consistency. If the OS reports errors, run a filesystem check. Warning: do not run
fsckon a mounted read-write filesystem. Schedule maintenance or boot into recovery mode. - Verify the SSTable. On a running node that is still up (runtime failure with policy
stop), runnodetool verifyagainst the table to force checksum validation. If the node is down,nodetoolis unavailable. - Check replica availability. Before removing any local data, confirm that other replicas are healthy and current. Run
nodetool statusto ensure the replica count and node state are normal. If replication factor is 1 or other replicas are down, removing the SSTable causes data loss.
Recovering from corruption
Once you have identified the corrupt SSTable and confirmed healthy replicas exist, quarantine the files and repair the node.
Quarantine the SSTable. Move all files sharing the corrupt SSTable prefix out of the data directory and into a quarantine directory. Do not delete them until the repair completes and you confirm data consistency. The exception message names the base filename; move every file with that prefix. If multiple SSTables are corrupt, quarantine all of them before restarting. Starting the node with a partially removed SSTable (for example, leaving an index or summary file behind) will trigger new errors.
Warning: Do not remove SSTable files while Cassandra is running. If the node is up but transport is stopped, stop Cassandra before moving files to avoid file-handle issues or additional crashes.
Restart the node. If the node was down, start Cassandra. With the corrupt files removed, it should join the ring. If disk_failure_policy was stop and the node was still running, restart Cassandra to re-enable gossip and native transport. Verify that nodetool status shows the node as UN and that no new exceptions appear.
Repair the data. Run a repair on the affected keyspace so the node streams replacement SSTables from healthy replicas. A full repair is usually required unless incremental repair is already enabled for the table. Monitor nodetool netstats during the streaming phase to confirm data is moving from the correct replicas. After repair completes, run nodetool verify on the repaired table to confirm the new local SSTables pass validation. Once confirmed, you can safely delete the quarantined files.
Warning: Repair generates significant cluster load. Run it during low-traffic windows and monitor for streaming errors or compaction backlog.
If replication factor is 1. You have no replica source. If the SSTable is unrecoverable, the data is lost. Attempt filesystem-level recovery or restore from backup before starting Cassandra without the file.
Disk failure policy considerations
The choice of disk_failure_policy changes the recovery steps.
stop(default): The node becomes unavailable for traffic but stays running. This limits client-visible errors but requires a restart after you fix the underlying issue.die: The JVM exits immediately. You lose any in-flight mutations not yet flushed, and you must restart the node. This is the safest option if you prefer fast failure over limping.best_effort: Cassandra blacklists the failed directory for the current session and continues using remaining directories. This is only useful when multipledata_file_directoriesare configured. Be aware thatbest_effortcan mask creeping disk failure if one directory degrades while others appear healthy.ignore: The node logs the error and continues. Reads and writes fail silently or return partial data. Do not useignorein production.
Monitoring and prevention
Detect these failures before they force a node outage.
- Alert on the JMX
Storage/Exceptionscounter. Any sustained increase indicates hardware or filesystem issues. - Alert on
CorruptSSTableExceptionandFSErrorstrings insystem.log. - Monitor OS disk health metrics (SMART attributes, filesystem errors,
dmesgI/O errors) alongside Cassandra logs. ACorruptSSTableExceptionwithout a preceding OS error suggests bit rot or a Cassandra-level bug; anFSErrorusually indicates hardware. - Keep
disk_failure_policyatstopordie. Usebest_effortonly when you have multiple independent data directories and understand the failure-isolation behavior. - Correlate Cassandra errors with disk I/O metrics. If you are using Netdata, compare the time of the exception against the
disk.awaitchart for the affected volume. Sustained latency spikes or I/O error counters that align withFSErrorlogs confirm hardware degradation. - Schedule regular
nodetool verifyon critical tables during maintenance windows. This catches bit rot and compaction defects before they cause a runtime failure.







