Cassandra commitlog disk full: segment exhaustion and forced flushes
WriteTimeoutException from client drivers, a node still UP in gossip, and a commitlog volume nearing 100% with climbing pending tasks in nodetool info mean commitlog segment exhaustion. Unlike data disk exhaustion, which slows compaction, commitlog pressure blocks the write path directly: every mutation must be durably appended to the WAL before acknowledgment.
Cassandra recycles commitlog segments only after all memtables they reference are flushed to SSTables. When the flush pipeline cannot keep pace, segments accumulate until the total size exceeds commitlog_total_space (or commitlog_total_space_in_mb on older versions) or the filesystem fills. Cassandra then forces flushes of every dirty column family referenced in the oldest segment to free space. If flushes are already backed up, this cascades into blocked segment allocation, dropped mutations, and depending on commit_failure_policy, a node that stops accepting writes entirely.
Recovery requires distinguishing between a tight configuration limit, I/O saturation on a shared disk, and genuine filesystem exhaustion. Do not delete commitlog files while Cassandra is running. Relieve pressure by draining the node, expanding capacity, or tuning the flush pipeline, then fix the root cause so segments recycle faster than they are allocated.
What this means
Cassandra’s commitlog is an append-only WAL organized into segments. When the active segment fills, the CommitLogSegmentManager allocates a new one. If the total on-disk commitlog size exceeds commitlog_total_space (or commitlog_total_space_in_mb on older versions), Cassandra triggers a forced flush of every dirty column family referenced in the oldest segment so that segment can be deleted and its space reused. This is normal backpressure.
The problem arises when the flush pipeline is already saturated. Flushes run on the MemtableFlushWriter thread pool. If pending flushes accumulate, the oldest segments remain pinned until those flushes complete. New writes continue arriving, allocating fresh segments, while old ones cannot be freed. JMX metric WaitingOnSegmentAllocation climbs. Eventually writes stall, coordinators time out, and replicas may begin dropping mutations. Depending on commit_failure_policy, the node may stop committing writes (stop_commit), shut down gossip and native transport (stop), or kill the JVM (die). The default is stop, which leaves the node in a zombie state: alive in gossip but not accepting client writes.
If the commitlog volume itself runs out of filesystem space, segment allocation fails at the OS level. Even with a generous commitlog_total_space setting, the volume cap wins. Cassandra has no room to allocate new segments and the write path blocks immediately.
flowchart TD
A[Write appends to commitlog] --> B{No free segments}
B --> C[Force flush oldest dirty tables]
C --> D[Flush pipeline saturated]
D --> E[Segments stay pinned]
E --> F[WaitingOnSegmentAllocation rises]
F --> G[Write latency spikes]
G --> H[Mutations dropped or blocked]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Commitlog and data share the same disk | High I/O wait on the shared device; flush latency correlates with compaction bursts | iostat -x 2 on the commitlog device |
| commitlog_total_space set too low for the workload | Frequent forced flushes even at moderate write rates; lightly written tables flushed unnecessarily | nodetool info commit log size vs. configured limit |
| MemtableFlushWriter saturation | Pending or blocked flush tasks; commitlog segment count grows while flushes lag | nodetool tpstats MemtableFlushWriter pool |
| Sudden write burst or oversized mutations | Write request rate spikes; commitlog size grows faster than flush throughput | Client request rate and mutation sizes |
| Underprovisioned commitlog disk capacity | Filesystem usage at 100% on the commitlog volume; segment allocation fails at OS level | df -h on the commitlog mount |
Quick checks
# Check commitlog size and pending tasks
nodetool info | grep -E "Commit Log"
# Check flush pipeline saturation
nodetool tpstats | grep -E "Pool Name|MemtableFlushWriter"
# Check disk I/O on the commitlog device
iostat -x 2
# Check commitlog filesystem usage
df -h
# Check commitlog and failure policy configuration
grep -E "commitlog_total_space|commitlog_segment_size|commit_failure_policy" cassandra.yaml
How to diagnose it
- Confirm commitlog pressure. Run
nodetool infoand look for “Commit Log pending tasks” and “Commit Log size.” Sustained pending tasks > 0 and a size nearcommitlog_total_spaceconfirm the write path is backing up at the WAL. - Inspect the flush pipeline. Run
nodetool tpstatsand examine the MemtableFlushWriter pool. Pending > 0 means flushes are queuing. Blocked or all-time-blocked counts mean the pool is at capacity. - Correlate disk I/O. Run
iostat -x 2on the device backingcommitlog_directory. If commitlog and data share a device, look for%utilsustained above 80% orawaitabove 10 ms on SSD (50 ms on HDD). Separation is recommended; if they share a spindle, compaction and reads starve commitlog fsyncs. - Check segment allocation waits. Query JMX for
org.apache.cassandra.metrics:type=CommitLog,name=WaitingOnSegmentAllocation. Any nonzero value means new writes are blocked waiting for a free segment. This is where client timeouts begin. - Determine if the volume is truly full. Run
df -hagainst the commitlog filesystem. If available space is 0%, the OS rejects allocations regardless of Cassandra’s internal limits. Free space by expanding the volume or moving commitlog to larger storage. - Review configuration for mismatched limits. Check
commitlog_total_space(orcommitlog_total_space_in_mbon older versions) andcommitlog_segment_size. Avoid setting the limit arbitrarily low; the default 8192 MiB is sufficient for many workloads, but undersizing it relative to write volume forces aggressive flushes on lightly written tables. - Identify the failure policy behavior. Check
commit_failure_policyincassandra.yaml.stopdisables gossip and native transport.stop_commitallows reads but blocks writes.diekills the JVM. This determines what clients see and whether the node needs a restart to recover.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| TotalCommitLogSize (JMX) | Current byte usage of the commitlog | Approaching commitlog_total_space for sustained periods |
| CommitLog PendingTasks (JMX/nodetool) | Mutations queued waiting for commitlog sync | Sustained value > 0 for more than 60 seconds |
| WaitingOnSegmentAllocation (JMX) | New writes blocked until a segment is freed | Any nonzero value indicates segment exhaustion |
| MemtableFlushWriter pending/blocked (nodetool tpstats) | Ability to recycle segments via flush | Pending > 0 or Blocked increasing |
| Disk I/O await/%util on commitlog device | Physical I/O capacity to sustain WAL append and fsync | await > 10 ms or %util > 80% sustained |
| Write latency P99 (coordinator) | Client-visible impact of write path backpressure | Sustained elevation above baseline or approaching timeout thresholds |
| Dropped MUTATION messages (nodetool tpstats) | Replica-level load shedding when queues overflow | Nonzero rate sustained for more than 60 seconds |
Fixes
Immediate relief: drain the node
If the node is impaired and you must recover commitlog disk space, run nodetool drain. This flushes all memtables and recycles commitlog segments.
Warning: drain stops the node from accepting writes. Clients see UnavailableException until the node restarts or traffic reroutes. Do not delete commitlog files while Cassandra is running. If segments remain after drain, stop Cassandra before manually removing anything.
Separate commitlog from data
Move commitlog_directory to a dedicated volume and restart Cassandra. HDD deployments especially benefit from a separate spindle. Even on SSDs, isolating commitlog prevents compaction and reads from contending with WAL fsyncs. This is a configuration change requiring a rolling restart.
Increase commitlog_total_space
Raise commitlog_total_space (or commitlog_total_space_in_mb on older versions) to give the WAL more room to absorb bursts. Tradeoff: larger commitlogs increase restart replay time and reserve more disk exclusively for WAL. Do not set this above what the underlying volume can actually provide.
Add storage to the commitlog volume
If the filesystem is full, expand the underlying block device (LVM, cloud volume resize, etc.) and restart Cassandra if the mount requires it. If expansion is impossible, move commitlog_directory to a larger filesystem, update cassandra.yaml, and restart.
Tune flush concurrency
If the storage subsystem has headroom but flush throughput is the bottleneck, increase memtable_flush_writers. The default is conservative; raise it only if iostat shows the commitlog device is not already saturated. More concurrent flush I/O helps segments recycle faster.
Reduce write pressure
Throttle client write rate, break large batches into smaller ones, and ensure individual mutations are sized well below the segment size. Large mutations consume disproportionate segment space and flush time.
Prevention
- Place
commitlog_directoryon a dedicated volume, physically separate from data directories where possible. - Size
commitlog_total_spaceto absorb normal diurnal write bursts without forced flushes on idle tables. - Monitor
TotalCommitLogSizeas a time series to spot gradual growth toward the limit before hard blocking occurs. - Keep
memtable_flush_writersmatched to storage IOPS; the default is conservative for fast SSDs. - After any configuration change, verify that
commitlog_segment_sizeis at least twicemax_mutation_size.
How Netdata helps
Netdata correlates per-device disk I/O so you can distinguish commitlog fsync latency from compaction I/O when volumes are separated. JVM heap and GC pause metrics sit alongside commitlog pending tasks and flush writer pool state. Write latency percentiles and dropped mutation rates show backpressure before clients fail. Disk space alerts on the commitlog volume fire independently from data directory alerts.
Related guides
- Cassandra adding and removing nodes safely: vnodes, tokens, and cleanup
- Cassandra node stuck in joining (UJ): bootstrap diagnosis
- Cassandra compaction strategies: STCS vs LCS vs TWCS vs UCS
- Cassandra clock skew: how NTP drift silently corrupts data
- Cassandra commitlog pending tasks: write-path I/O pressure
- Cassandra compaction death spiral: when writes outrun compaction throughput
- Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM
- Cassandra zombie data resurrection: gc_grace_seconds and unrepaired tombstones
- Cassandra disk space exhaustion: emergency recovery when the data volume fills
- Cassandra dropped mutations: silent write loss and load shedding
- Cassandra dropped reads and other messages: reading nodetool tpstats Dropped
- Cassandra GC death spiral: long pauses, gossip flapping, and recovery







