Cassandra hint overflow: max_hint_window expiry and silent data divergence
You restart a node after a four-hour outage. Gossip converges, nodetool status shows UN, and clients reconnect. Reads at consistency level ONE return stale data. The node has never been repaired.
The problem is hint overflow: the outage lasted longer than max_hint_window_in_ms (default three hours), so coordinators stopped saving hints after the window expired. Writes accepted during the final hour of the outage are missing from that replica. Coordinator logs show no errors; write acknowledgments succeeded because other replicas responded. Only anti-entropy repair closes the gap. Without it, the missing data sits on that replica indefinitely, surfacing as inconsistent reads or resurrected deletes.
What this means
Hinted handoff is a temporary durability mechanism. When a write’s target replica is unreachable, the coordinator stores a hint locally on disk at /var/lib/cassandra/hints/. When the replica returns, the coordinator replays those hints as mutations. Hints carry the original mutation timestamp, so replay is idempotent and will not overwrite newer data.
The mechanism is bounded by max_hint_window_in_ms (default: 10800000 ms, three hours). Once a node has been down longer than this window, the cluster stops creating hints for that replica. Writes continue to the remaining replicas, but the down node receives nothing. When the node recovers, it replays whatever hints remain from the initial window, but all writes from the post-window period are permanently absent from that replica unless you run a full anti-entropy repair.
The cluster appears healthy throughout. Coordinators do not fail writes when the replication factor and consistency level are satisfied, and clients see normal acknowledgments. The missing data is only detectable when the under-replicated partition is read from the recovered node, or when a repair compares Merkle trees.
flowchart LR
A[Replica DOWN] --> B[Coordinators store hints]
B --> C{Outage > max_hint_window?}
C -->|No| D[Replay all hints]
D --> E[Replica consistent]
C -->|Yes| F[Hint creation stops]
F --> G[Post-window writes lost]
G --> H[Replica recovers]
H --> I[Repair required]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
Node down longer than max_hint_window_in_ms (default 3h) | Hints stop accumulating after the window expires; recovered replica misses post-window mutations | nodetool status to confirm downtime duration versus the window |
| Hinted handoff disabled | No hints stored even for brief outages; immediate divergence on any replica downtime | nodetool statushandoff |
| CASSANDRA-19495 (Cassandra 4.1.0-4.1.4) | Node recovers then fails again; no hints created on the second outage even if total time is within window | nodetool version to confirm if the fix is present |
| Coordinator disk saturated by hint backlog | Hints directory grows to tens of gigabytes; disk pressure triggers broader write-path degradation | df -h /var/lib/cassandra/hints/ or data root |
| Hint delivery throttle too low for backlog | Recovered node cannot drain hints fast enough; replay stalls and the node may become overloaded | nodetool tpstats HintsDispatcher pending |
Quick checks
# Is hinted handoff enabled?
nodetool statushandoff
# Current max hint window (Cassandra 4.0+)
nodetool getmaxhintwindow
# Hints directory size on coordinators
du -sh /var/lib/cassandra/hints/
# Hint delivery thread pool activity
nodetool tpstats | grep -A1 "HintsDispatcher"
# Down nodes
nodetool status
# Active streaming and hint delivery sessions
nodetool netstats
How to diagnose it
- Confirm the outage exceeded the hint window. Check
nodetool statusand your monitoring for the DOWN timestamp. Compare the duration againstmax_hint_window_in_ms(default 3h). If the node was down longer than the window, assume divergence. - Verify hint accumulation stopped mid-outage. On coordinators, check the hints directory size with
du -sh /var/lib/cassandra/hints/. If the directory stopped growing while the node was still DOWN, the window expired and hint creation ceased. - Check the cumulative hint counter. Monitor JMX
org.apache.cassandra.metrics:type=Storage,name=TotalHintsCount. A rising counter during the outage indicates hint generation; a flat counter after the window indicates the cluster moved on without that replica. - Inspect hint delivery after recovery. Run
nodetool tpstatsand look atHintsDispatcher. Active or Pending tasks indicate replay is in progress. Completed should increase over time. - Validate repair status. Check
system_distributed.repair_historyornodetool repair_admin list(4.0+). If no repair has run since the node recovered, the post-window gap is still present. - Check for CASSANDRA-19495 if the node went down twice. In Cassandra 4.1.0 through 4.1.4, if a node recovers and then fails again, and the total elapsed time since the first downtime start exceeds
max_hint_window, no hints are created on the second outage at all. If this applies, upgrade to 4.1.5+ or 5.0+ and run full repair.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Storage.TotalHints Count | Cumulative hints written since restart; growth during a node outage confirms hint generation | Continuously increasing while a node is DOWN; sudden stop indicates window expiry |
| Hints directory size | Hints consume disk on coordinators; large backlogs risk disk exhaustion and I/O contention | du -sh /var/lib/cassandra/hints/ growing past 1 GB or trending upward for hours |
HintsDispatcher pool (nodetool tpstats) | Tracks active hint delivery threads replaying to recovered nodes | Pending greater than 0 or Active greater than 0 sustained after recovery; Completed stalling |
HintsService.HintsFailed / HintsTimedOut | Failed hint delivery means the target node is rejecting or missing the replay | Non-zero rate sustained for more than 5 minutes |
Repair completion (repair_history) | The only mechanism that reconciles post-window divergence | No successful repair session after a node recovery from an extended outage |
| Client write timeouts / unavailables | May indicate the down node is affecting quorum or that coordinators are struggling | Sustained increase correlating with node DOWN events |
Fixes
Replica recovered but outage exceeded the window
Run full anti-entropy repair on the recovered node. In Cassandra 4.0+, run nodetool repair -full. If you use Reaper, schedule a full repair job. If you are uncertain whether the outage exceeded the window, run repair anyway; it is the only safe path. Do not rely on read repair to close the gap.
Hints are draining too slowly and overloading the recovered node
The default hinted_handoff_throttle_in_kb is 1024 KiB/s per delivery thread. For large backlogs, this can be too slow. In Cassandra 4.0+, increase the throttle dynamically with nodetool sethintedhandoffthrottlekb to drain faster, but monitor the target node for compaction and GC pressure. If the node struggles, reduce the throttle to prevent re-failure.
Coordinator disk pressure from hint accumulation
If the hints directory is filling the disk and the target node is still down, you have two risky options. You can clear hints manually from /var/lib/cassandra/hints/ and restart the coordinator. This destroys the hints and requires full repair of the target node regardless. Only do this under disk-exhaustion emergency. The safer path is to add storage or bring the target node back to drain hints normally.
Affected by CASSANDRA-19495
If you run Cassandra 4.1.0 through 4.1.4 and the node experienced a second outage, upgrade to 4.1.5 or 5.0+ before bringing the node back. After upgrade, run full repair.
Temporarily extending the window
If your mean time to recovery is consistently near or above three hours, raise max_hint_window_in_ms. In Cassandra 4.0+, use nodetool setmaxhintwindow to adjust the running JVM. A longer window increases disk usage on every coordinator and prolongs hint replay.
Prevention
- Treat a node DOWN for more than 80% of
max_hint_window_in_msas a repair mandate. If you cannot recover the node within roughly two and a half hours, schedule full repair before it rejoins or immediately after. - Monitor hints directory size on all coordinators. A runaway hint backlog indicates an extended outage or a very slow replica.
- Automate repair scheduling with Reaper or equivalent to ensure repairs complete within
gc_grace_seconds. Repair is the only safety net once hints expire. - Size the hint delivery throttle for your workload. After any outage longer than thirty minutes, check whether the default 1024 KiB/s throttle will drain hints before compaction or GC pressure builds. Adjust proactively with
nodetool sethintedhandoffthrottlekb. - Do not disable hinted handoff. Running with
hinted_handoff_enabled: falseremoves the safety net entirely and guarantees divergence on any replica downtime. - Upgrade past CASSANDRA-19495. If you run Cassandra 4.1.x, ensure you are on 4.1.5 or newer.
How Netdata helps
- Correlate
Storage.TotalHintscounter growth with node liveness transitions to identify when hint generation started and whether it stopped before recovery. - Monitor hints directory size growth rate alongside disk space alerts to catch coordinator disk exhaustion before accumulated hint files fill the volume.
- Cross-reference
HintsDispatcheractivity with write latency spikes on recovered nodes to detect replay overload before it cascades into GC pressure. - Alert when a node remains DOWN for a sustained duration approaching
max_hint_window_in_ms. This gives a proactive window to schedule repair before silent divergence begins.
Related guides
- Cassandra compaction strategies: STCS vs LCS vs TWCS vs UCS
- Cassandra compaction death spiral: when writes outrun compaction throughput
- Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM
- Cassandra zombie data resurrection: gc_grace_seconds and unrepaired tombstones
- Cassandra disk space exhaustion: emergency recovery when the data volume fills
- Cassandra dropped mutations: silent write loss and load shedding
- Cassandra dropped reads and other messages: reading nodetool tpstats Dropped
- Cassandra GC death spiral: long pauses, gossip flapping, and recovery
- Cassandra GC pauses too long: diagnosing G1 stop-the-world pauses
- Cassandra gossip flapping: nodes bouncing UP and DOWN
- Cassandra heap pressure: sizing the JVM heap and tuning G1GC
- Cassandra monitoring checklist: the signals every production cluster needs







