Cassandra GC pauses too long: diagnosing G1 stop-the-world pauses

ReadTimeoutException and WriteTimeoutException from clients, GCInspector warnings in system.log, and nodes flapping between UP and DOWN in nodetool status without a JVM restart mean G1 is producing long stop-the-world pauses. Root causes include promotion pressure, humongous objects, or allocation bursts. Left unchecked, one node’s pauses trigger gossip failures, retries, and hint replay that drive cluster-wide degradation.

What this means

G1GC is the default collector for Cassandra 4.x on JDK 11+. During a stop-the-world pause, every thread freezes, including gossip, native transport, and compaction. Cassandra logs GCInspector warnings when a pause exceeds the configured threshold, commonly 500 ms . Pauses longer than ~2 seconds cause gossip rounds to be missed; under the default phi accrual failure detector threshold of 8, sustained pauses of tens of seconds result in the node being marked DOWN . While the JVM is paused, mutations queue, reads stall, hints accumulate on peers, and clients retry. On recovery, hint replay and retry bursts raise allocation pressure, creating a self-reinforcing spiral.

flowchart TD
    A[Heap pressure] --> B[Long G1 STW pause]
    B --> C[Gossip missed]
    C --> D[Node marked DOWN]
    D --> E[Hints and retries]
    E --> F[Replay burst on recovery]
    F --> A

Common causes

CauseWhat it looks likeFirst thing to check
Undersized heap or high promotion pressureHeap floor after full GC stays above 70-75% of max; old-gen occupancy trending upnodetool info and GC log “Heap after GC” lines
Large partition reads or tombstone scansP99 latency spikes correlate with specific queries; tombstone warnings in logsnodetool toppartitions and log greps
Humongous objects from undersized G1 regionsGC log shows frequent humongous allocations with auto-calculated regionsgrep -i "humongous" /var/log/cassandra/gc.log*
Surge in allocation rate (batches, hint replay)Young GC frequency rises sharply; hint dispatcher active after node recoveryjstat -gcutil YGC/YGCT columns; nodetool tpstats
Oversized on-heap cachesHeap floor is high but allocation rate is normal; key cache or row cache largenodetool info cache lines; jstat -gcutil O column

Quick checks

# Heap used vs max
nodetool info | grep -i "Heap Memory"

# Cumulative GC counts and time
nodetool gcstats

# Recent pause durations
grep -i "pause" /var/log/cassandra/gc.log* | tail -20

# Old gen occupancy live (O column)
jstat -gcutil <cassandra_pid> 1000

# Dropped messages indicating overload
nodetool tpstats

# Node liveness flapping
nodetool status

# Compaction backlog adding heap pressure
nodetool compactionstats

How to diagnose it

  1. Quantify pauses. Parse the GC log for G1 Young and G1 Old pause times. Look for GCInspector lines in system.log exceeding 500 ms. Sustained pauses over 2 seconds predict gossip disruption.
  2. Find the heap floor. After a full GC completes, check the “Heap after GC” value in the log, or sample old generation occupancy with jstat -gcutil. If the floor is above 70-75% of max, the heap is effectively full of long-lived objects.
  3. Map pauses to workload. Correlate pause timestamps with query patterns. Use nodetool toppartitions to identify large partitions being read. Search system.log for tombstone scan warnings around those times. Check nodetool tablestats for max partition size per table.
  4. Check G1 region sizing. If using G1 with auto-calculated regions, inspect the GC log for humongous object allocations. Auto-calculation can produce small regions that turn medium objects into humongous allocations. Set an explicit -XX:G1HeapRegionSize matched to your heap; it must be a power of two between 1 MB and 32 MB.
  5. Measure allocation rate. A climbing young generation collection count (YGC) with short intervals indicates high allocation pressure from batches, retries, or hint replay.
  6. Check cache footprint. Run nodetool info and review key cache and row cache size. Row cache is disabled by default and should generally stay disabled; if enabled and large, it competes for old generation space.
  7. Assess cluster-wide impact. Check nodetool status for flapping and nodetool tpstats for dropped MUTATION or READ messages. If dropped mutations are nonzero, the pause is already causing data inconsistency.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
GC pause durationDirect STW impact on availability>500 ms (GCInspector warns); >2s risks gossip failure
Heap floor after full GCLeading indicator of promotion pressure; old gen cannot be reclaimed>70-75% of -Xmx sustained after old GC
Old generation % used (jstat -gcutil O)Old gen fill rate between collectionsTrending upward over days
Dropped messages (all types)Node is shedding load while pausedNon-zero rate sustained for >60s
Node liveness transitionsPhi detector convicting node due to missed heartbeats>2 UP/DOWN transitions in 10 minutes
Pending compactionsCompaction debt increases metadata and bloom filter heap pressureTrending upward over >4 hours

Fixes

Immediate load reduction

If the node is flapping and dropping messages, run nodetool disablebinary to reject new native transport connections while keeping the node in the ring. This cuts allocation pressure without a restart. Warning: this drops client connectivity to the node; use it only when the node is already missing client deadlines. Identify the memory consumer before bouncing the process.

Heap sizing

If the heap floor is above 75%, increase the heap only if it is below 16 GB. G1 pause duration scales with heap size and object graph complexity; heaps above 16 GB often see worse STW behavior under write load. Stay below the compressed-OOPs threshold (roughly 32 GB; 31 GB is a safe ceiling). Always set -Xms equal to -Xmx to avoid resize cost.

G1 tuning

If the heap is appropriately sized but pauses remain long:

  • Set an explicit -XX:G1HeapRegionSize to avoid small auto-calculated regions that inflate humongous object overhead. Choose a power of two between 1 MB and 32 MB based on average object size.
  • On hosts with many cores, raise -XX:ParallelGCThreads and -XX:ConcGCThreads if JVM defaults leave cores idle during STW phases.

Workload and schema fixes

  • Reduce batch statement sizes and limit in-flight requests.
  • Fix large partitions identified by nodetool toppartitions; redesign data models to bound partition size.
  • For TTL-heavy workloads, switch affected tables to TimeWindowCompactionStrategy so tombstones are efficiently dropped in whole time windows, reducing tombstone-induced GC pressure.
  • Disable row cache if enabled; it stores full rows on-heap and is disabled by default for good reason.

Prevention

  • Monitor the heap floor after full GC, not just peak heap usage. The floor is the irreducible minimum of long-lived objects; when it trends upward, promotion pressure is building.
  • Keep G1 heaps in the 8-16 GB range for most workloads. Larger heaps increase STW duration under G1.
  • Set explicit G1HeapRegionSize during provisioning rather than relying on JVM auto-calculation.
  • Track pending compactions and SSTable count trends. Compaction backlog increases on-heap metadata and bloom filter overhead.
  • Monitor tombstone warnings and partition size distributions with nodetool toppartitions or slow query logging to catch data-model issues before they trigger allocation spikes.
  • Run repair during off-peak hours; the I/O and streaming load competes for resources and can prolong pauses.

How Netdata helps

  • Correlate G1 Young and Old Generation CollectionTime with node liveness transitions and dropped messages on the same timeline to confirm GC-induced gossip failures.
  • Alert on post-collection old-generation occupancy drawn from JMX to catch heap floor growth before pauses breach 500 ms.
  • Visualize thread pool pending tasks and compaction queues alongside GC metrics to distinguish pure heap pressure from compaction-driven memory growth.
  • Track client request timeout rates against GC pause duration to quantify the latency cost of each STW event.
  • Monitor off-heap memory growth (RSS minus JVM heap) to catch Linux OOM kills caused by native memory exhaustion rather than GC pressure.
  • Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM: /guides/cassandra/cassandra-consistency-levels-explained/
  • Cassandra GC death spiral: long pauses, gossip flapping, and recovery: /guides/cassandra/cassandra-gc-death-spiral/
  • Cassandra monitoring checklist: the signals every production cluster needs: /guides/cassandra/cassandra-monitoring-checklist/
  • Cassandra monitoring maturity model: from survival to expert: /guides/cassandra/cassandra-monitoring-maturity-model/
  • How Cassandra actually works in production: a mental model for operators: /guides/cassandra/how-cassandra-works-in-production/