Cassandra dropped mutations: silent write loss and load shedding
Your application logs successful writes, but reads return stale or missing data. An alert fires on DroppedMessage rate for MUTATION scope. The client never received an error, yet a replica discarded the write after it sat in the MutationStage queue past timeout. This is Cassandra load shedding. Silent write loss occurs whenever not enough other replicas succeed to meet the consistency level.
Dropped mutations are a lagging indicator. By the time they appear, the replica is already choking on commitlog I/O, CPU starvation, GC pauses, or thread pool exhaustion. The DroppedMessage counter is cumulative since JVM start; a single large value means nothing unless the rate is increasing. Any sustained non-zero rate is abnormal and demands immediate investigation.
What this means
When a coordinator forwards a write to a replica, the replica places the mutation into the MutationStage queue. A worker thread eventually picks it up, appends it to the commitlog, inserts it into the memtable, and sends an ACK. If the queue is backed up because the node cannot process mutations fast enough, the mutation ages past write_request_timeout_in_ms (default 2000 ms) and is silently discarded.
The coordinator may have already ACKed the write to the client if enough other replicas responded. The drop is invisible to the application. The missing data on that replica creates an inconsistency that will not self-heal unless the key is later touched by read repair or by an anti-entropy repair run within gc_grace_seconds. Hinted handoff does not protect against this: hints are generated only when a replica is marked DOWN by gossip, not when it is UP but overloaded and dropping mutations.
flowchart TD
A[Coordinator forwards write] --> B[Replica MutationStage queue]
B --> C[Thread processes mutation]
C --> D[Commitlog append + memtable insert]
D --> E[ACK to coordinator]
F[Commitlog disk slow] --> G[Queue backlog]
H[GC pause] --> G
I[CPU saturation] --> G
G --> J[Mutation exceeds timeout]
J --> K[MUTATION silently dropped]
K --> L[Data loss on replica]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Commitlog disk I/O saturation | High w_await or %util on the commitlog device; commitlog pending tasks grow | iostat -x 1 on the commitlog volume |
| MutationStage thread pool saturation | Pending > 0 or Blocked > 0 in MutationStage; CPU high | nodetool tpstats |
| Long GC pauses | GC logs show pauses > 500 ms; heap after GC > 75%; drops correlate with GC spikes | nodetool info and GC logs |
| Compaction backlog starving disk I/O | PendingCompactions trending up; disk %util > 80% | nodetool compactionstats |
| CPU starvation | High sys/user CPU with low idle; multiple thread pools lag | mpstat or top |
Quick checks
Run these safe, read-only commands to triage the scope of the drops and identify the saturated resource.
# Check dropped mutation counts and thread pool state
nodetool tpstats
# Check commitlog and data disk saturation
iostat -x 1
# Check compaction debt
nodetool compactionstats
# Check heap usage and pressure
nodetool info | grep "Heap Memory"
# Check for long GC pauses in recent logs
grep -iE "pause|stopped" /var/log/cassandra/gc.log | tail -20
# Check coordinator write latency
nodetool proxyhistograms
# Check node liveness and cluster view
nodetool status
# Check disk space on commitlog and data volumes
df -h /var/lib/cassandra/commitlog /var/lib/cassandra/data
How to diagnose it
- Confirm a sustained drop rate. Run
nodetool tpstatsand note theDroppedsection. The counters are cumulative. Sample the value, wait 60 seconds, and sample again. Any increase in theMUTATIONrow is a problem. - Inspect the MutationStage queue. In
nodetool tpstats, look forPending> 0 orBlocked> 0 under the request stages. Sustained pending means the write path cannot keep up; blocked means the queue is full and work is being rejected. - Check commitlog disk latency. Use
iostat -xon the commitlog device. Ifawaitis elevated or%utilis near 100%, fsync stalls are backing up the write pipeline. If commitlog and data share a device, separation is strongly recommended. - Check GC health. Parse GC logs for stop-the-world pauses. Pauses > 500 ms freeze all stage processing and cause queued mutations to expire. If heap after full GC > 75% of max, memory pressure is the root cause.
- Check compaction status. Run
nodetool compactionstats. If pending tasks are trending upward over hours, compaction is stealing I/O bandwidth from commitlog writes and read processing. - Check for asymmetric patterns. If only one rack or node shows drops, inspect that specific host for hardware degradation, uneven traffic, or a hot partition. Use
nodetool tablehistogramsto see if a single table dominates write latency. - Verify hinted handoff is not masking the issue. Hints are stored when a replica is marked DOWN by gossip. An overloaded node that remains UP will not receive hints for dropped mutations, so do not rely on hint replay to backfill lost data.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
DroppedMessage rate for MUTATION | Direct measure of silent write loss | Any sustained non-zero rate |
MutationStage pending tasks | Backpressure on the write path | Pending > 0 for > 60 seconds |
| Commitlog pending tasks | Commitlog fsync cannot keep up | Pending > 0 sustained |
Disk await on commitlog device | Physical I/O bottleneck delaying durability | await > 10 ms sustained |
| GC pause duration | STW freezes all stage processing | Pause > 500 ms |
| Compaction pending tasks | Background I/O debt steals bandwidth from writes | Trending upward over hours |
| Client write timeouts | Coordinator-side view of replica slowness | Rate > 0.1% of write rate |
Fixes
Fix the root cause; do not restart Cassandra as a first response. Restarting clears the backlog temporarily but destroys the diagnostic state and does not prevent immediate recurrence.
Commitlog I/O saturation
If the commitlog device is saturated, move commitlog to a dedicated volume. This requires updating cassandra.yaml and restarting the node, so plan for maintenance. As a temporary relief, reduce write throughput from clients or batch jobs. If you are using commitlog_sync: batch, which fsyncs before ACK, switching to periodic trades durability (up to commitlog_sync_period_in_ms, default 10000 ms) for lower latency. Understand the data-loss implications before changing this.
MutationStage saturation from CPU pressure
Thread pool saturation is usually a symptom, not a root cause. Identify what is starving CPU. If compaction is the consumer, you can temporarily increase compaction_throughput_mb_per_sec to clear debt faster, or decrease it to throttle compaction and leave headroom for writes. If GC is consuming CPU, see the Cassandra GC death spiral guide. Identify large partitions, tombstone scans, or misconfigured caches rather than blindly resizing pools.
Disk I/O contention from compaction backlog
Increase compaction_throughput_mb_per_sec to let compaction catch up, but monitor read latency because this adds I/O load. Postpone repairs, bootstraps, and streaming that compete for disk. If you use SizeTieredCompactionStrategy and disk usage is above 50%, you are at risk of space exhaustion during a major compaction. Clear old snapshots and add capacity before compaction can recover.
Memory pressure and GC pauses
Reduce in-flight writes by throttling clients at the application layer. Check for large batch statements or unbounded partitions that allocate massive objects on heap. If heap after full GC > 85%, a rolling restart with increased heap may be needed. Do not exceed roughly 16 GB with G1GC, or pause times will worsen.
Prevention
- Alert on the rate of
DroppedMessage, not the cumulative count. Counters reset on JVM restart, so rate is the only actionable signal. - Keep commitlog on a dedicated disk, separate from data directories.
- Watch
MutationStagepending tasks as a leading indicator. Any sustained pending predicts future drops. - Monitor compaction pending trends, not just absolute values. A rising trend means I/O debt is accumulating.
- Keep JVM heap after full GC below 75% of max. Track this from GC logs, not just
HeapMemoryUsage. - Run anti-entropy repair within
gc_grace_secondsso that any inconsistencies from transient drops are eventually reconciled.
How Netdata helps
Netdata collects DroppedMessage rates and MutationStage pending tasks from JMX and places them on the same timeline as disk I/O latency and GC pauses. This correlation shows whether drops lag a commitlog stall or a GC spike by seconds.
Per-disk await and utilization metrics for the commitlog volume let you distinguish disk saturation from CPU saturation without logging into the node.
Process RSS tracking surfaces off-heap memory pressure that JVM heap metrics miss, catching OOM-killer risk before it triggers.
Composite alerting on drops plus thread pool saturation plus GC pauses reduces false positives from single-metric blips.
Related guides
- Cassandra compaction strategies: STCS vs LCS vs TWCS vs UCS
- Cassandra compaction death spiral: when writes outrun compaction throughput
- Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM
- Cassandra zombie data resurrection: gc_grace_seconds and unrepaired tombstones
- Cassandra disk space exhaustion: emergency recovery when the data volume fills
- Cassandra GC death spiral: long pauses, gossip flapping, and recovery
- Cassandra GC pauses too long: diagnosing G1 stop-the-world pauses
- Cassandra heap pressure: sizing the JVM heap and tuning G1GC
- Cassandra monitoring checklist: the signals every production cluster needs
- Cassandra monitoring maturity model: from survival to expert
- Cassandra Not enough space for compaction: STCS space amplification and recovery
- Cassandra java.lang.OutOfMemoryError: Java heap space - causes and recovery







