$ guides / cassandra / cassandra-commitlog-pending-tasks ▌

Operations Guides

Cassandra commitlog pending tasks: write-path I/O pressure

Sustained non-zero CommitLog PendingTasks means a Cassandra node’s write path is backing up. Every write must be appended to the commitlog and synced to disk before the coordinator acknowledges it. When the fsync thread cannot keep up, mutations queue. This starts as elevated write latency; if the queue persists, it forces emergency memtable flushes, overwhelms the flush and compaction pipeline, and produces dropped mutations.

This is a durability bottleneck that affects every write replica-wide. Because the commitlog sits at the start of the write path, a slowdown cascades predictably: delayed acknowledgments, segment allocation pressure, forced flushes, then load shedding. The root cause is almost always I/O saturation on the commitlog device, an undersized or shared disk, or a mismatch between commitlog_sync mode and hardware.

Operators usually notice only after client write timeouts or dropped mutation alerts fire. By then the node has been under pressure for minutes. Treat CommitLog PendingTasks as an early warning, not background noise.

What this means

Cassandra’s write path is append-only. A replica receives a mutation, writes it sequentially to the current commitlog segment, and waits for the sync strategy to confirm durability. Under commitlog_sync: periodic (the default), the sync thread batches fsyncs every commitlog_sync_period_in_ms (default 10 seconds). Under commitlog_sync: batch, the coordinator blocks until that write batch is physically synced. In both modes, the sync operation gates acknowledgment.

CommitLog PendingTasks tracks mutations waiting for sync or segment allocation. A transient spike during a burst is normal, but sustained > 0 means the sync thread is falling behind. Segments fill faster than they are recycled. Segments cannot be discarded until every memtable that references them is flushed. If the flush pipeline is busy, commitlog space pressure builds, triggering WaitingOnSegmentAllocation and WaitingOnCommit. At that point the node is actively stalling writes.

With segments retained longer, the commitlog directory grows. Cassandra forces memtable flushes to free segments, but those flushes compete with compaction for disk I/O. If the commitlog shares a spindle with data directories, contention worsens: flushes write to the same device struggling to fsync the commitlog. The flush pipeline saturates, memtables grow, and the mutation stage drops messages. A slow disk becomes a cluster-wide write reliability risk.

flowchart TD
  A[Slow commitlog fsync] --> B[PendingTasks increases]
  B --> C[Write ack delayed]
  A --> D[Slow segment recycle]
  D --> E[Commitlog space pressure]
  E --> F[Forced memtable flushes]
  F --> G[Flush pipeline saturated]
  G --> H[Compaction debt rises]
  H --> I[Dropped mutations]

Common causes

Cause	What it looks like	First thing to check
Commitlog disk saturation (shared with data or slow storage)	PendingTasks > 0, high `w_await` or `%util` on the commitlog device, data disk may look normal	`iostat -x 1` on the commitlog device
`commitlog_sync: batch` on undersized I/O	High PendingTasks even at moderate throughput; every batch waits for a dedicated fsync	`commitlog_sync` mode in `cassandra.yaml`
Memtable flush bottleneck blocking segment recycle	Growing commitlog directory (`du -sh`), MemtableFlushWriter pending > 0 in `nodetool tpstats`	`nodetool tpstats` and `nodetool compactionstats`
Commitlog total space pressure	`WaitingOnSegmentAllocation` > 0, commitlog size approaching `commitlog_total_space_in_mb`	`du -sh` on the commitlog path and `df -h` on the volume

Quick checks

# Check commitlog directory size
du -sh /var/lib/cassandra/commitlog

# Check commitlog backlog (PendingTasks) via JMX or your metrics pipeline
# MBean example: org.apache.cassandra.db:type=Commitlog <!-- TODO: verify MBean name and case for target version -->

# Check thread pool saturation, especially MutationStage and MemtableFlushWriter
nodetool tpstats

# Check commitlog device I/O latency and utilization
iostat -x 1

# Check commitlog volume free space
df -h

# Check whether compaction and flush are keeping up
nodetool compactionstats

How to diagnose it

Confirm the symptom is sustained. Sample CommitLog PendingTasks at 10-second intervals via JMX or your metrics pipeline. A transient spike during a bulk load differs from a sustained plateau.
Isolate the commitlog disk. Run iostat -x 1 on the commitlog device. Look for w_await > 10 ms on SSD or > 50 ms on HDD, or %util > 80%. If commitlog and data share the same device, I/O contention is likely.
Inspect the flush pipeline. In nodetool tpstats, check MemtableFlushWriter for pending or blocked tasks. If flushes back up, segments cannot be recycled. Run nodetool compactionstats to see if compaction tasks are accumulating.
Check segment allocation pressure. If your monitoring exposes JMX WaitingOnSegmentAllocation or WaitingOnCommit, any non-zero value indicates the commitlog cannot acquire or recycle segments. This is a stronger signal than PendingTasks alone.
Correlate with write-path errors. Check nodetool tpstats Dropped section for MUTATION drops. Cross-reference with client write timeout metrics. If dropped mutations rise while commitlog pending stays high, the node is shedding load.
Review sync mode and throughput. Check commitlog_sync in cassandra.yaml. If set to batch, the commitlog thread fsyncs every write group rather than batching. On spinning disks or variable-latency cloud storage, this often saturates the device even at moderate write rates.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
CommitLog PendingTasks	Direct measure of commitlog sync backlog	> 0 sustained for > 60 s
WaitingOnSegmentAllocation	Segment allocation blocked; flushes cannot free space fast enough	Any non-zero value
WaitingOnCommit	Mutations queued behind fsync	Any non-zero sustained
CommitLog TotalCommitLogSize	Growth means segments are retained and not recycled	Growing beyond steady-state baseline
MemtableFlushWriter pending	Flush backlog prevents segment reuse	> 0 sustained
MutationStage pending	Writes queuing behind the commitlog stage	> 0 sustained
Dropped MUTATION	Active write loss from overload	Any sustained non-zero rate
Disk `w_await` (commitlog device)	fsync latency directly gates write acknowledgments	> 10 ms on SSD, > 50 ms on HDD
Client write latency P99	End-user impact of commitlog delay	> 3x baseline or sustained > 100 ms

Fixes

Move commitlog to a dedicated disk

Move the commitlog to a dedicated disk. The commitlog workload is purely sequential write and fsync; data directories handle random reads, large sequential compaction writes, and flushes. When they share a device, head movement and queue depth contention kill fsync latency. This requires provisioning a separate volume for the commitlog path, updating cassandra.yaml, and restarting the node. Do not do this during heavy write load without a maintenance window.

Switch from batch to periodic sync

If the workload does not require per-write-group durability, change commitlog_sync from batch to periodic. The default interval of 10 seconds batches fsyncs, dramatically reducing IOPS demand. The tradeoff is a larger window of uncommitted data on power loss. For most workloads, periodic with a dedicated disk provides sufficient durability.

Throttle write pressure

If a traffic spike or bulk load exceeds provisioned IOPS, reduce the incoming write rate at the application or coordinator level. Pause non-critical batch jobs, reduce unlogged batch sizes, or temporarily reroute traffic away from the affected replica. This buys time without restarting the node.

Increase flush concurrency

If flushes are too slow to recycle segments and the disk has unused IOPS headroom, increase memtable_flush_writers. This allows more concurrent flush threads. On a shared disk, additional flush writers increase contention rather than help.

Scale commitlog IOPS

On cloud or virtualized infrastructure, upgrade the commitlog volume to a higher-IOPS tier or move to local NVMe. Remote storage with high fsync variance commonly causes commitlog backup. On-premise, verify the disk is not degraded and no other services share the spindle.

Prevention

Dedicated commitlog disk. Provision a separate device at deployment time. Never share it with data, hints, or snapshots.
Monitor commitlog pending as a first-class signal. Include CommitLog PendingTasks in your paging alerts. Page on sustained > 0 for more than 60 seconds.
Size for peak fsync IOPS. Size the commitlog disk for peak write rate multiplied by fsync frequency, not average throughput.
Validate sync mode against hardware. Only use batch sync if storage can sustain the fsync rate at peak write volume.
Watch the flush pipeline. Monitor MemtableFlushWriter pending tasks and commitlog size trends. Segment recycling depends on healthy flushes.

How Netdata helps

Correlate CommitLog PendingTasks with per-disk I/O latency and utilization to spot commitlog device saturation.
Track dropped mutations, write latency percentiles, and commitlog size to visualize the cascade from fsync delay to load shedding.
Alert on sustained commitlog backlog without manual JMX sampling.
Surface memtable flush and compaction pressure alongside commitlog metrics to distinguish disk saturation from flush pipeline failure.
Baseline write-path latency per node to distinguish normal spikes from sustained pressure before mutations drop.

The Netdata solution

Cassandra monitoring with Netdata

Netdata monitors Apache Cassandra with per-second metrics and automatic dashboards. Correlate GC pauses, compaction backlog, tombstone rates, pending hints, and disk usage across nodes to catch a creeping cluster before it tips over.

See Cassandra monitoring → Start monitoring free

Cassandra commitlog pending tasks: write-path I/O pressure

Cassandra commitlog pending tasks: write-path I/O pressure

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Move commitlog to a dedicated disk

Switch from batch to periodic sync

Throttle write pressure

Increase flush concurrency

Scale commitlog IOPS

Prevention

How Netdata helps

Related guides

Cassandra monitoring with Netdata