Cassandra dropped reads and other messages: reading nodetool tpstats Dropped

When nodetool tpstats reports non-zero values in the Dropped section, the node discarded internal messages that exceeded their stage timeout. These counters are cumulative since JVM startup, not rates. A non-zero value warrants investigation: the timeout is defined in cassandra.yaml by settings such as read_request_timeout_in_ms and write_request_timeout_in_ms, so a drop means the message sat in the queue for seconds.

Dropped messages are a lagging indicator. Correlate the drop type with the matching thread pool pending count, disk I/O latency, and GC pause duration to find the root cause.

What it is and why it matters

Cassandra routes internal operations through dedicated thread pools: ReadStage, MutationStage, HintsDispatcher, and RequestResponseStage. Every message carries a timeout. If a pool cannot process the message before expiration, the stage discards it and increments the Dropped counter for that scope.

Because these counters are cumulative since JVM startup, a large absolute number on a long-running node may reflect a past incident. What matters is the rate of change. A healthy node in steady state drops zero messages. Any sustained non-zero rate indicates overload.

How Cassandra decides to drop a message

When a message arrives, Cassandra places it in its assigned stage queue: ReadStage for reads, MutationStage for standard writes and read repairs, HintsDispatcher for hints, and RequestResponseStage for internode responses. Worker threads pull from these queues.

If threads are blocked on disk I/O, stalled by a GC pause, or outpaced by arrival rate, queue depth grows. Once a message’s age exceeds its timeout, the stage drops it. The timeout is defined in cassandra.yaml by settings such as read_request_timeout_in_ms and write_request_timeout_in_ms; exceeding it means the queue was severely backed up or the JVM was frozen.

flowchart TD
    A[Message arrives at stage queue] --> B{Queued longer than timeout?}
    B -->|Yes| C[Increment Dropped counter]
    B -->|No| D[Thread processes request]
    C --> E[READ / RANGE_SLICE]
    C --> F[MUTATION / COUNTER / BATCH]
    C --> G[HINT]
    C --> H[REQUEST_RESPONSE]
    E --> I[ReadStage pending / Disk I/O / GC]
    F --> J[MutationStage pending / Commitlog]
    G --> K[Target node health / Repair]
    H --> L[Cross-node latency / CPU]

What each drop type means

READ and RANGE_SLICE

READ drops mean a point read request was abandoned after exceeding its timeout in ReadStage. RANGE_SLICE drops mean a range scan suffered the same fate. Both indicate that the read path on this replica could not keep up.

Common causes include GC pauses freezing the stage, disk I/O saturation preventing SSTable lookups, or thread pool saturation from large partition reads and tombstone scans. When these drop, the client has already received a timeout. Check ReadStage pending tasks with nodetool tpstats | grep -E "ReadStage|MutationStage", disk await with iostat -x 1 on the data volume, and GC logs for stop-the-world pauses.

MUTATION, COUNTER_MUTATION, BATCH_STORE, and BATCH_REMOVE

These are write-path drops routed through MutationStage. MUTATION drops indicate a standard write was discarded on this replica. If the consistency level was met by other replicas, the client may think the write succeeded. The data is now inconsistent on this node and can only be fixed by repair.

COUNTER_MUTATION drops point to counter writes, which are read-modify-write operations and more expensive than standard writes. BATCH_STORE and BATCH_REMOVE relate to logged batch processing. BATCH_STORE drops indicate the batchlog metadata write failed. BATCH_REMOVE drops indicate batch cleanup failed.

For all four, check MutationStage pending tasks with nodetool tpstats, and whether compaction backlog is stealing disk throughput from the write path using nodetool compactionstats.

READ_REPAIR

READ_REPAIR drops occur when a read repair mutation is discarded. Because read repair is implemented as a mutation sent to inconsistent replicas, it routes through MutationStage and is governed by the write request timeout. If read repair mutations are dropping, the write path is overloaded and replica inconsistencies may persist. Do not tune read repair parameters first; investigate MutationStage saturation, commitlog latency, and disk await.

HINT

HINT drops occur in HintsDispatcher. They are almost always symptomatic of a problem elsewhere, not a primary bottleneck on the coordinator. Either the target replica is down or unreachable, or the node is overwhelmed by accumulated hint volume. Sustained HINT drops usually mean a node has been down long enough that hints are backing up, or hint replay is overwhelming a recovering node. Verify target node liveness with nodetool status, check the configured hints directory size on disk, and schedule repair. Do not tune the hint dispatcher before confirming replica health.

REQUEST_RESPONSE

REQUEST_RESPONSE drops mean the local node completed the work but could not send the response before the originating timeout fired. This indicates the node was slow to respond to another node even after processing finished. Check cross-node latency with ping or mtr, run nodetool netstats for pending internode responses, and review GC logs and CPU saturation on the local node.

Common misreads and missteps

Cumulative counters mask rate. Sample the counter twice with a known interval to compute the current rate:

date && nodetool tpstats | grep -A 20 "Message type"

On a long-running node, thousands of drops may reflect one past incident.

Client success does not mean zero data loss. At QUORUM, a dropped mutation on one replica may be invisible to the client if other replicas acknowledged. The missed replica is now inconsistent. If you only monitor client errors, you will miss replica-side drops.

GC pauses cause multi-scope drops. A long stop-the-world pause freezes every stage simultaneously. If you see READ, MUTATION, and REQUEST_RESPONSE all increasing together, check GC logs before tuning thread pools or disk layout.

Do not ignore HINT drops. They are not harmless metadata. They indicate either a downed replica or a coordinator that cannot keep up with replay, both of which require operational action.

Signals to watch in production

SignalWhy it mattersWarning signHow to check
ReadStage / MutationStage pending tasksLeading indicator that a stage is backing up before drops occur.Sustained pending greater than 0 for more than 60 secondsnodetool tpstats
GC pause durationLong pauses freeze all stages and directly cause message timeouts.Pause exceeds 2 seconds, or old-gen collections exceed 1 per minuteGC logs (location depends on JVM flags)
Disk I/O awaitSaturated I/O prevents reads and writes from completing in time.await greater than 10 ms on SSD sustained for 5 minutesiostat -x 1 on the data volume
Compaction pending tasksCompactions steal disk I/O and CPU from the write path.Pending compactions sustained above baselinenodetool compactionstats
Client request timeoutsCoordinator-level view of the same pressure causing drops on replicas.Non-zero timeout rate sustained for more than 60 secondsApplication metrics

How Netdata helps

  • Plots per-scope dropped-message rates from JMX in real time, so you do not need to sample nodetool tpstats manually.
  • Correlates MUTATION or READ spikes with thread pool pending tasks and GC pause duration on the same timeline.
  • Tracks disk I/O latency per device to distinguish commitlog contention from data directory saturation.
  • Alerts on non-zero dropped message rates by scope, treating any sustained rate as an anomaly.