Cassandra WriteTimeoutException: coordinator write timeouts and writeType

A WriteTimeoutException means the coordinator did not receive enough replica acknowledgments before write_request_timeout_in_ms expired. It does not mean the write failed; one or more replicas may have already persisted the mutation, so the outcome is ambiguous. The writeType field determines whether a client-side retry is safe.

Distinguish between a slow replica and an unavailable one. Understand the idempotency contract for the write type, and correlate coordinator timeouts with replica-side saturation. The default write_request_timeout_in_ms is 2000 ms. If replicas cannot append to the commitlog and update the memtable within that window, the coordinator throws this exception.

What this means

The coordinator forwards the write to the partition replicas and waits for blockFor acknowledgments. If it receives fewer than blockFor before the timeout fires, it throws WriteTimeoutException. The exception includes the coordinator address, consistency level, acknowledgments received, and writeType.

writeType values:

  • SIMPLE: a standard single-partition write.
  • BATCH: a logged atomic batch.
  • COUNTER: a counter increment.
  • BATCH_LOG: an internal write to the distributed batch log.
  • CAS: a conditional lightweight transaction (Paxos).

The driver is conservative: if a statement is not marked idempotent, the driver does not invoke the retry policy on a write timeout. The DefaultRetryPolicy only retries BATCH_LOG automatically. For SIMPLE, retry only if the statement is declared idempotent. Never blindly retry COUNTER or CAS. Counter increments are read-modify-write operations; replaying them corrupts the value. CAS timeouts leave the Paxos round in an unknown state, and retrying without reading first produces indeterminate results.

flowchart TD
    A[Client write arrives at coordinator] --> B{Received >= blockFor before timeout?}
    B -->|Yes| C[Return success]
    B -->|No| D[Throw WriteTimeoutException]
    D --> E[Check writeType]
    E --> F{BATCH_LOG?}
    F -->|Yes| G[Driver may retry safely]
    F -->|No| H{Idempotent SIMPLE?}
    H -->|Yes| I[Retry is safe]
    H -->|No| J[Do not retry]
    E --> K[COUNTER / CAS]
    K --> L[Never blindly retry]
    D --> M[Correlate with replica-side signals]
    M --> N[Dropped MUTATION]
    M --> O[CommitLog PendingTasks]
    M --> P[MutationStage PendingTasks]

Common causes

CauseWhat it looks likeFirst thing to check
Commitlog disk saturationWrite latency spikes; CommitLog PendingTasks > 0; commitlog shares a volume with dataiostat -x on the commitlog device; nodetool info for commitlog pending tasks
MutationStage saturationHigh write throughput; mutation pending tasks sustained > 0; CPU busynodetool tpstats mutation stage pending and blocked counts
GC pause on replicaNode gossip flaps; long pauses in gc.log; timeouts correlate with GC eventsgrep "pause" /var/log/cassandra/gc.log; nodetool info heap usage
Compaction backlog blocking flushesPending compactions growing; memtable flush writers backed up; commitlog segments accumulatingnodetool compactionstats; check MemtableFlushWriter pending in nodetool tpstats
Cross-DC or internode latencyTimeouts without local saturation; nodes are UP but slow; EACH_QUORUM in usenodetool status from multiple nodes; compare internode latency
Counter or CAS workloadTimeouts isolated to counter or LWT writes; normal latency for standard writesApplication query patterns; check ClientRequest scopes for CASWrite

Quick checks

Confirm the failure mode without changing cluster state:

# Check for dropped mutations and thread pool saturation
nodetool tpstats

# Check write latency percentiles at the coordinator
nodetool proxyhistograms

# Verify node liveness and identify DOWN replicas
nodetool status

# Check compaction backlog that may be blocking flushes
nodetool compactionstats

# Check heap and commitlog pressure
nodetool info | grep -E "Heap Memory|Commit Log pending tasks"

# Check disk I/O latency on commitlog and data devices
iostat -x 1

# Search logs for recent WriteTimeoutException or GC pauses
grep -i "writetimeout\|pause" /var/log/cassandra/system.log

How to diagnose it

  1. Confirm the exception type. WriteTimeoutException means replicas are alive but slow. UnavailableException means the coordinator could not find enough live replicas. Check nodetool status and ClientRequest,scope=Write,name=Unavailables to rule out quorum loss.
  2. Read writeType. If it is COUNTER or CAS, treat the outcome as unknown and do not retry without an application-level read. If it is BATCH_LOG, the driver may retry safely. If it is SIMPLE, retry only if the statement is idempotent.
  3. Check the timeout rate. Correlate ClientRequest,scope=Write,name=Timeouts with baseline write throughput. A sustained rate above zero is abnormal.
  4. Inspect replica health. Look for high write latency or elevated DroppedMessage,scope=MUTATION on individual nodes. One slow replica can drag down a QUORUM write.
  5. Check the mutation stage. In nodetool tpstats, sustained pending tasks (MutationStage in 3.x, MUTATION in 4.x) mean the node cannot process local writes fast enough. Blocked tasks mean the queue is full.
  6. Check commitlog pressure. Non-zero CommitLog PendingTasks means the commitlog device cannot absorb the append rate. This is common when commitlog and data directories share a disk.
  7. Review GC logs. Stop-the-world pauses longer than a few hundred milliseconds block replica-side write acknowledgment. If pauses approach write_request_timeout_in_ms (default 2000 ms), the coordinator will time out.
  8. Look for compaction blocking flushes. Pending tasks in MemtableFlushWriter and growing pending compactions mean memtables cannot flush. Unflushed memtables prevent commitlog segment recycling, which backpressures the write path.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
ClientRequest,scope=Write,name=TimeoutsDirect measure of coordinator write timeoutsSustained rate > 0
DroppedMessage,scope=MUTATION,name=DroppedReplica is shedding writes it cannot processNon-zero rate indicates overload or GC stalls
CommitLog,name=PendingTasksCommitlog I/O cannot keep up with write rateSustained value > 0
Mutation stage pending tasksWrite execution is queuing on the replicaPending tasks sustained > 0 for > 60 seconds
GC pause durationSTW pauses block replica acksPauses > 500 ms; pauses approaching timeout threshold
Disk await on commitlog devicePhysical I/O latency on the durability pathawait > 10 ms sustained on SSD

Fixes

Commitlog I/O saturation

Move the commitlog directory to a dedicated volume separate from the data directories. Shared devices cause sequential commitlog writes to contend with random reads and compaction I/O. If separation is not possible immediately, postpone repairs and streaming until peak write load subsides.

MutationStage saturation

Reduce the application write rate or break large batches into smaller ones. Stop background repair and streaming if they overlap with peak traffic. If a hot partition causes a thundering herd, rate-limit that key at the application layer.

GC pauses on replicas

Treat this as a GC death spiral warning. See Cassandra GC death spiral: long pauses, gossip flapping, and recovery.

Warning: Disabling native transport stops all client traffic to the node. Only do this if the node is actively degrading the cluster.

Disable native transport to stop new load, then investigate heap usage, large partition reads, and row cache misconfiguration.

Compaction backlog

Use nodetool setcompactionthroughput to temporarily raise the compaction throttle if CPU and disk headroom exist. This change takes effect immediately. If pending compactions have been growing for hours, the node is in a compaction debt spiral. Stop non-critical writes and see Cassandra compaction death spiral: when writes outrun compaction throughput.

Counter and CAS timeouts

Do not retry these blindly. For counters, design the application to tolerate incomplete increments or use an idempotent alternative. For CAS, read the current state before deciding whether to re-attempt the conditional write.

Prevention

  • Keep commitlog and data directories on separate physical devices.
  • Monitor CommitLog PendingTasks and mutation stage pending tasks as leading indicators. A value sustained above zero precedes timeouts by seconds or minutes.
  • Monitor the trend of pending compactions, not just the absolute value. A rising trend over hours predicts flush backpressure.
  • Ensure GC pause duration stays well below write_request_timeout_in_ms. Parse GC logs to track old-generation pause times.
  • Mark writes idempotent in the driver when the data model allows it. This lets the driver retry SIMPLE timeouts safely.
  • Monitor ClientRequest,scope=Write,name=Timeouts and alert on any sustained non-zero rate.

How Netdata helps

Netdata correlates ClientRequest Write Timeouts with DroppedMessage MUTATION and CommitLog PendingTasks on the same time axis to separate replica overload from commitlog I/O issues. It tracks ThreadPools MutationStage PendingTasks as a leading indicator and overlays JVM GC pause duration with write timeout spikes to flag stop-the-world events. Per-device disk await distinguishes commitlog saturation from data disk contention. P99 write latency trends catch slowdowns before they breach the timeout threshold.