Cassandra commitlog pending tasks: write-path I/O pressure

Sustained non-zero CommitLog PendingTasks means a Cassandra node’s write path is backing up. Every write must be appended to the commitlog and synced to disk before the coordinator acknowledges it. When the fsync thread cannot keep up, mutations queue. This starts as elevated write latency; if the queue persists, it forces emergency memtable flushes, overwhelms the flush and compaction pipeline, and produces dropped mutations.

This is a durability bottleneck that affects every write replica-wide. Because the commitlog sits at the start of the write path, a slowdown cascades predictably: delayed acknowledgments, segment allocation pressure, forced flushes, then load shedding. The root cause is almost always I/O saturation on the commitlog device, an undersized or shared disk, or a mismatch between commitlog_sync mode and hardware.

Operators usually notice only after client write timeouts or dropped mutation alerts fire. By then the node has been under pressure for minutes. Treat CommitLog PendingTasks as an early warning, not background noise.

What this means

Cassandra’s write path is append-only. A replica receives a mutation, writes it sequentially to the current commitlog segment, and waits for the sync strategy to confirm durability. Under commitlog_sync: periodic (the default), the sync thread batches fsyncs every commitlog_sync_period_in_ms (default 10 seconds). Under commitlog_sync: batch, the coordinator blocks until that write batch is physically synced. In both modes, the sync operation gates acknowledgment.

CommitLog PendingTasks tracks mutations waiting for sync or segment allocation. A transient spike during a burst is normal, but sustained > 0 means the sync thread is falling behind. Segments fill faster than they are recycled. Segments cannot be discarded until every memtable that references them is flushed. If the flush pipeline is busy, commitlog space pressure builds, triggering WaitingOnSegmentAllocation and WaitingOnCommit. At that point the node is actively stalling writes.

With segments retained longer, the commitlog directory grows. Cassandra forces memtable flushes to free segments, but those flushes compete with compaction for disk I/O. If the commitlog shares a spindle with data directories, contention worsens: flushes write to the same device struggling to fsync the commitlog. The flush pipeline saturates, memtables grow, and the mutation stage drops messages. A slow disk becomes a cluster-wide write reliability risk.

flowchart TD
  A[Slow commitlog fsync] --> B[PendingTasks increases]
  B --> C[Write ack delayed]
  A --> D[Slow segment recycle]
  D --> E[Commitlog space pressure]
  E --> F[Forced memtable flushes]
  F --> G[Flush pipeline saturated]
  G --> H[Compaction debt rises]
  H --> I[Dropped mutations]

Common causes

CauseWhat it looks likeFirst thing to check
Commitlog disk saturation (shared with data or slow storage)PendingTasks > 0, high w_await or %util on the commitlog device, data disk may look normaliostat -x 1 on the commitlog device
commitlog_sync: batch on undersized I/OHigh PendingTasks even at moderate throughput; every batch waits for a dedicated fsynccommitlog_sync mode in cassandra.yaml
Memtable flush bottleneck blocking segment recycleGrowing commitlog directory (du -sh), MemtableFlushWriter pending > 0 in nodetool tpstatsnodetool tpstats and nodetool compactionstats
Commitlog total space pressureWaitingOnSegmentAllocation > 0, commitlog size approaching commitlog_total_space_in_mbdu -sh on the commitlog path and df -h on the volume

Quick checks

# Check commitlog directory size
du -sh /var/lib/cassandra/commitlog

# Check commitlog backlog (PendingTasks) via JMX or your metrics pipeline
# MBean example: org.apache.cassandra.db:type=Commitlog <!-- TODO: verify MBean name and case for target version -->

# Check thread pool saturation, especially MutationStage and MemtableFlushWriter
nodetool tpstats

# Check commitlog device I/O latency and utilization
iostat -x 1

# Check commitlog volume free space
df -h

# Check whether compaction and flush are keeping up
nodetool compactionstats

How to diagnose it

  1. Confirm the symptom is sustained. Sample CommitLog PendingTasks at 10-second intervals via JMX or your metrics pipeline. A transient spike during a bulk load differs from a sustained plateau.
  2. Isolate the commitlog disk. Run iostat -x 1 on the commitlog device. Look for w_await > 10 ms on SSD or > 50 ms on HDD, or %util > 80%. If commitlog and data share the same device, I/O contention is likely.
  3. Inspect the flush pipeline. In nodetool tpstats, check MemtableFlushWriter for pending or blocked tasks. If flushes back up, segments cannot be recycled. Run nodetool compactionstats to see if compaction tasks are accumulating.
  4. Check segment allocation pressure. If your monitoring exposes JMX WaitingOnSegmentAllocation or WaitingOnCommit, any non-zero value indicates the commitlog cannot acquire or recycle segments. This is a stronger signal than PendingTasks alone.
  5. Correlate with write-path errors. Check nodetool tpstats Dropped section for MUTATION drops. Cross-reference with client write timeout metrics. If dropped mutations rise while commitlog pending stays high, the node is shedding load.
  6. Review sync mode and throughput. Check commitlog_sync in cassandra.yaml. If set to batch, the commitlog thread fsyncs every write group rather than batching. On spinning disks or variable-latency cloud storage, this often saturates the device even at moderate write rates.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
CommitLog PendingTasksDirect measure of commitlog sync backlog> 0 sustained for > 60 s
WaitingOnSegmentAllocationSegment allocation blocked; flushes cannot free space fast enoughAny non-zero value
WaitingOnCommitMutations queued behind fsyncAny non-zero sustained
CommitLog TotalCommitLogSizeGrowth means segments are retained and not recycledGrowing beyond steady-state baseline
MemtableFlushWriter pendingFlush backlog prevents segment reuse> 0 sustained
MutationStage pendingWrites queuing behind the commitlog stage> 0 sustained
Dropped MUTATIONActive write loss from overloadAny sustained non-zero rate
Disk w_await (commitlog device)fsync latency directly gates write acknowledgments> 10 ms on SSD, > 50 ms on HDD
Client write latency P99End-user impact of commitlog delay> 3x baseline or sustained > 100 ms

Fixes

Move commitlog to a dedicated disk

Move the commitlog to a dedicated disk. The commitlog workload is purely sequential write and fsync; data directories handle random reads, large sequential compaction writes, and flushes. When they share a device, head movement and queue depth contention kill fsync latency. This requires provisioning a separate volume for the commitlog path, updating cassandra.yaml, and restarting the node. Do not do this during heavy write load without a maintenance window.

Switch from batch to periodic sync

If the workload does not require per-write-group durability, change commitlog_sync from batch to periodic. The default interval of 10 seconds batches fsyncs, dramatically reducing IOPS demand. The tradeoff is a larger window of uncommitted data on power loss. For most workloads, periodic with a dedicated disk provides sufficient durability.

Throttle write pressure

If a traffic spike or bulk load exceeds provisioned IOPS, reduce the incoming write rate at the application or coordinator level. Pause non-critical batch jobs, reduce unlogged batch sizes, or temporarily reroute traffic away from the affected replica. This buys time without restarting the node.

Increase flush concurrency

If flushes are too slow to recycle segments and the disk has unused IOPS headroom, increase memtable_flush_writers. This allows more concurrent flush threads. On a shared disk, additional flush writers increase contention rather than help.

Scale commitlog IOPS

On cloud or virtualized infrastructure, upgrade the commitlog volume to a higher-IOPS tier or move to local NVMe. Remote storage with high fsync variance commonly causes commitlog backup. On-premise, verify the disk is not degraded and no other services share the spindle.

Prevention

  • Dedicated commitlog disk. Provision a separate device at deployment time. Never share it with data, hints, or snapshots.
  • Monitor commitlog pending as a first-class signal. Include CommitLog PendingTasks in your paging alerts. Page on sustained > 0 for more than 60 seconds.
  • Size for peak fsync IOPS. Size the commitlog disk for peak write rate multiplied by fsync frequency, not average throughput.
  • Validate sync mode against hardware. Only use batch sync if storage can sustain the fsync rate at peak write volume.
  • Watch the flush pipeline. Monitor MemtableFlushWriter pending tasks and commitlog size trends. Segment recycling depends on healthy flushes.

How Netdata helps

  • Correlate CommitLog PendingTasks with per-disk I/O latency and utilization to spot commitlog device saturation.
  • Track dropped mutations, write latency percentiles, and commitlog size to visualize the cascade from fsync delay to load shedding.
  • Alert on sustained commitlog backlog without manual JMX sampling.
  • Surface memtable flush and compaction pressure alongside commitlog metrics to distinguish disk saturation from flush pipeline failure.
  • Baseline write-path latency per node to distinguish normal spikes from sustained pressure before mutations drop.