Cassandra commitlog disk full: segment exhaustion and forced flushes

WriteTimeoutException from client drivers, a node still UP in gossip, and a commitlog volume nearing 100% with climbing pending tasks in nodetool info mean commitlog segment exhaustion. Unlike data disk exhaustion, which slows compaction, commitlog pressure blocks the write path directly: every mutation must be durably appended to the WAL before acknowledgment.

Cassandra recycles commitlog segments only after all memtables they reference are flushed to SSTables. When the flush pipeline cannot keep pace, segments accumulate until the total size exceeds commitlog_total_space (or commitlog_total_space_in_mb on older versions) or the filesystem fills. Cassandra then forces flushes of every dirty column family referenced in the oldest segment to free space. If flushes are already backed up, this cascades into blocked segment allocation, dropped mutations, and depending on commit_failure_policy, a node that stops accepting writes entirely.

Recovery requires distinguishing between a tight configuration limit, I/O saturation on a shared disk, and genuine filesystem exhaustion. Do not delete commitlog files while Cassandra is running. Relieve pressure by draining the node, expanding capacity, or tuning the flush pipeline, then fix the root cause so segments recycle faster than they are allocated.

What this means

Cassandra’s commitlog is an append-only WAL organized into segments. When the active segment fills, the CommitLogSegmentManager allocates a new one. If the total on-disk commitlog size exceeds commitlog_total_space (or commitlog_total_space_in_mb on older versions), Cassandra triggers a forced flush of every dirty column family referenced in the oldest segment so that segment can be deleted and its space reused. This is normal backpressure.

The problem arises when the flush pipeline is already saturated. Flushes run on the MemtableFlushWriter thread pool. If pending flushes accumulate, the oldest segments remain pinned until those flushes complete. New writes continue arriving, allocating fresh segments, while old ones cannot be freed. JMX metric WaitingOnSegmentAllocation climbs. Eventually writes stall, coordinators time out, and replicas may begin dropping mutations. Depending on commit_failure_policy, the node may stop committing writes (stop_commit), shut down gossip and native transport (stop), or kill the JVM (die). The default is stop, which leaves the node in a zombie state: alive in gossip but not accepting client writes.

If the commitlog volume itself runs out of filesystem space, segment allocation fails at the OS level. Even with a generous commitlog_total_space setting, the volume cap wins. Cassandra has no room to allocate new segments and the write path blocks immediately.

flowchart TD
    A[Write appends to commitlog] --> B{No free segments}
    B --> C[Force flush oldest dirty tables]
    C --> D[Flush pipeline saturated]
    D --> E[Segments stay pinned]
    E --> F[WaitingOnSegmentAllocation rises]
    F --> G[Write latency spikes]
    G --> H[Mutations dropped or blocked]

Common causes

CauseWhat it looks likeFirst thing to check
Commitlog and data share the same diskHigh I/O wait on the shared device; flush latency correlates with compaction burstsiostat -x 2 on the commitlog device
commitlog_total_space set too low for the workloadFrequent forced flushes even at moderate write rates; lightly written tables flushed unnecessarilynodetool info commit log size vs. configured limit
MemtableFlushWriter saturationPending or blocked flush tasks; commitlog segment count grows while flushes lagnodetool tpstats MemtableFlushWriter pool
Sudden write burst or oversized mutationsWrite request rate spikes; commitlog size grows faster than flush throughputClient request rate and mutation sizes
Underprovisioned commitlog disk capacityFilesystem usage at 100% on the commitlog volume; segment allocation fails at OS leveldf -h on the commitlog mount

Quick checks

# Check commitlog size and pending tasks
nodetool info | grep -E "Commit Log"

# Check flush pipeline saturation
nodetool tpstats | grep -E "Pool Name|MemtableFlushWriter"

# Check disk I/O on the commitlog device
iostat -x 2

# Check commitlog filesystem usage
df -h

# Check commitlog and failure policy configuration
grep -E "commitlog_total_space|commitlog_segment_size|commit_failure_policy" cassandra.yaml

How to diagnose it

  1. Confirm commitlog pressure. Run nodetool info and look for “Commit Log pending tasks” and “Commit Log size.” Sustained pending tasks > 0 and a size near commitlog_total_space confirm the write path is backing up at the WAL.
  2. Inspect the flush pipeline. Run nodetool tpstats and examine the MemtableFlushWriter pool. Pending > 0 means flushes are queuing. Blocked or all-time-blocked counts mean the pool is at capacity.
  3. Correlate disk I/O. Run iostat -x 2 on the device backing commitlog_directory. If commitlog and data share a device, look for %util sustained above 80% or await above 10 ms on SSD (50 ms on HDD). Separation is recommended; if they share a spindle, compaction and reads starve commitlog fsyncs.
  4. Check segment allocation waits. Query JMX for org.apache.cassandra.metrics:type=CommitLog,name=WaitingOnSegmentAllocation. Any nonzero value means new writes are blocked waiting for a free segment. This is where client timeouts begin.
  5. Determine if the volume is truly full. Run df -h against the commitlog filesystem. If available space is 0%, the OS rejects allocations regardless of Cassandra’s internal limits. Free space by expanding the volume or moving commitlog to larger storage.
  6. Review configuration for mismatched limits. Check commitlog_total_space (or commitlog_total_space_in_mb on older versions) and commitlog_segment_size. Avoid setting the limit arbitrarily low; the default 8192 MiB is sufficient for many workloads, but undersizing it relative to write volume forces aggressive flushes on lightly written tables.
  7. Identify the failure policy behavior. Check commit_failure_policy in cassandra.yaml. stop disables gossip and native transport. stop_commit allows reads but blocks writes. die kills the JVM. This determines what clients see and whether the node needs a restart to recover.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
TotalCommitLogSize (JMX)Current byte usage of the commitlogApproaching commitlog_total_space for sustained periods
CommitLog PendingTasks (JMX/nodetool)Mutations queued waiting for commitlog syncSustained value > 0 for more than 60 seconds
WaitingOnSegmentAllocation (JMX)New writes blocked until a segment is freedAny nonzero value indicates segment exhaustion
MemtableFlushWriter pending/blocked (nodetool tpstats)Ability to recycle segments via flushPending > 0 or Blocked increasing
Disk I/O await/%util on commitlog devicePhysical I/O capacity to sustain WAL append and fsyncawait > 10 ms or %util > 80% sustained
Write latency P99 (coordinator)Client-visible impact of write path backpressureSustained elevation above baseline or approaching timeout thresholds
Dropped MUTATION messages (nodetool tpstats)Replica-level load shedding when queues overflowNonzero rate sustained for more than 60 seconds

Fixes

Immediate relief: drain the node

If the node is impaired and you must recover commitlog disk space, run nodetool drain. This flushes all memtables and recycles commitlog segments.

Warning: drain stops the node from accepting writes. Clients see UnavailableException until the node restarts or traffic reroutes. Do not delete commitlog files while Cassandra is running. If segments remain after drain, stop Cassandra before manually removing anything.

Separate commitlog from data

Move commitlog_directory to a dedicated volume and restart Cassandra. HDD deployments especially benefit from a separate spindle. Even on SSDs, isolating commitlog prevents compaction and reads from contending with WAL fsyncs. This is a configuration change requiring a rolling restart.

Increase commitlog_total_space

Raise commitlog_total_space (or commitlog_total_space_in_mb on older versions) to give the WAL more room to absorb bursts. Tradeoff: larger commitlogs increase restart replay time and reserve more disk exclusively for WAL. Do not set this above what the underlying volume can actually provide.

Add storage to the commitlog volume

If the filesystem is full, expand the underlying block device (LVM, cloud volume resize, etc.) and restart Cassandra if the mount requires it. If expansion is impossible, move commitlog_directory to a larger filesystem, update cassandra.yaml, and restart.

Tune flush concurrency

If the storage subsystem has headroom but flush throughput is the bottleneck, increase memtable_flush_writers. The default is conservative; raise it only if iostat shows the commitlog device is not already saturated. More concurrent flush I/O helps segments recycle faster.

Reduce write pressure

Throttle client write rate, break large batches into smaller ones, and ensure individual mutations are sized well below the segment size. Large mutations consume disproportionate segment space and flush time.

Prevention

  • Place commitlog_directory on a dedicated volume, physically separate from data directories where possible.
  • Size commitlog_total_space to absorb normal diurnal write bursts without forced flushes on idle tables.
  • Monitor TotalCommitLogSize as a time series to spot gradual growth toward the limit before hard blocking occurs.
  • Keep memtable_flush_writers matched to storage IOPS; the default is conservative for fast SSDs.
  • After any configuration change, verify that commitlog_segment_size is at least twice max_mutation_size.

How Netdata helps

Netdata correlates per-device disk I/O so you can distinguish commitlog fsync latency from compaction I/O when volumes are separated. JVM heap and GC pause metrics sit alongside commitlog pending tasks and flush writer pool state. Write latency percentiles and dropped mutation rates show backpressure before clients fail. Disk space alerts on the commitlog volume fire independently from data directory alerts.