Cassandra java.lang.OutOfMemoryError: Java heap space - causes and recovery

Your Cassandra log shows java.lang.OutOfMemoryError: Java heap space. The node stops responding to client requests, gossip marks it DOWN, and the JVM may crash. Because Cassandra runs as a single JVM process, heap exhaustion freezes every in-memory subsystem: memtables, prepared statement caches, bloom filter summaries, and in-flight request buffers.

The heap may climb toward its limit for hours, or a single massive allocation from a large partition read or oversized batch can push it over immediately. Recovery depends on whether the root cause is capacity, data modeling, or a traffic flood.

What this means

Cassandra stores memtables, key caches, row caches, compression metadata, and in-flight requests on the JVM heap. When live objects exceed the maximum, the JVM throws OutOfMemoryError and stops allocating. The node may hang in repeated full GC cycles, drop messages until queues expire, or crash. Once the heap is exhausted, the node cannot even allocate gossip messages, so peers mark it DOWN while the process is still alive.

This is distinct from the Linux OOM killer. The OS terminates the process when total RSS, including off-heap bloom filters and direct buffers, exceeds available RAM. Because off-heap memory is invisible to JVM heap metrics, the OS can kill the node while HeapMemoryUsage looks healthy. A process death without a Java stack trace points to off-heap or RSS pressure.

flowchart TD
    A[Large partition read or oversized batch] --> B[Heap fills rapidly]
    C[Row cache enabled] --> B
    D[Repair Merkle trees] --> B
    B --> E[GC pauses lengthen]
    E --> F[Gossip times out]
    F --> G[Node marked DOWN]
    G --> H[Hints accumulate on peers]
    H --> I[Hint replay floods node on recovery]
    I --> B
    E --> J[OutOfMemoryError]

Common causes

CauseWhat it looks likeFirst thing to check
Heap undersized for workloadHeap used/max ratio climbs steadily after startup and stays above 75%nodetool info | grep -i "Heap Memory"
Large partition readsSudden OOM during specific queries or ETL jobs; GC logs show humongous allocationsnodetool tablestats <keyspace> max partition size
Oversized batch statementsCoordinator OOM under write load; batch size warnings in system logsgrep "exceeding specified threshold" /var/log/cassandra/system.log
Row cache misconfiguredHeap pressure on read-heavy nodes with row cache enablednodetool info | grep -i "Row Cache"
In-flight request floodHeap spikes during traffic surges or client retry stormsnodetool tpstats Native-Transport-Requests pending
Concurrent repair or anti-compactionOOM during repair windows; Merkle trees consume substantial heapnodetool compactionstats and repair schedule

Quick checks

# Heap utilization and max
nodetool info | grep -i "Heap Memory"

# Generation breakdown without triggering GC (JDK 8+)
jcmd $(pgrep -f CassandraDaemon) GC.heap_info

# Sample GC live
jstat -gcutil $(pgrep -f CassandraDaemon) 1000

# Inspect GC logs for long pauses or full collections.
# Format varies by JDK and GC flags; look for repeated Full GC events
# that reclaim little memory.
grep -Ei "pause|Full GC" /var/log/cassandra/gc.log*

# System logs for OOM or GC warnings
grep -i "OutOfMemoryError\|GCInspector" /var/log/cassandra/system.log

# Thread pool saturation
nodetool tpstats

# Max partition size per table
nodetool tablestats <keyspace>

# Hot partitions (sample 1 second)
nodetool toppartitions <keyspace> <table> 1000

# Active repair streams
nodetool netstats

# Row cache metrics
nodetool info | grep -i "Row Cache"

# Compaction backlog
nodetool compactionstats

How to diagnose it

  1. Confirm heap exhaustion. Run nodetool info | grep -i "Heap Memory" and compare used to max. If used is near max and the node is still running, the JVM is likely in a GC death spiral.
  2. Inspect GC logs. Look for full GC events that reclaim little memory. Sustained old-generation pauses over 2 seconds indicate the heap is functionally exhausted. On G1GC, watch for to-space exhausted or humongous allocation messages, which signal massive object promotion or region size pressure.
  3. Check generation sizing. jcmd <pid> GC.heap_info reports Eden, Survivor, and Old space utilization without triggering a collection. If Old Gen is nearly full and does not recover between cycles, the heap is exhausted.
  4. Identify the allocation source. If the node is running but impaired, jmap -histo:live <pid> | head -30 shows the largest live object types. This triggers a full GC and can freeze the JVM for seconds to minutes. Use only on an already impaired node.
  5. Find large partitions. Run nodetool tablestats <keyspace> and check the maximum partition size. If any partition approaches a significant fraction of the heap, reading it can trigger OOM.
  6. Review batch warnings. Search /var/log/cassandra/system.log for exceeding specified threshold to find oversized batches.
  7. Check cache configuration. If row cache is enabled, verify whether the hit rate justifies the heap cost. A low hit rate means wasted memory.
  8. Correlate with traffic. Check nodetool tpstats for pending tasks in Native-Transport-Requests and client request rates. Retry storms after timeouts can flood the heap.
  9. Check repair and compaction timing. Anti-entropy repair builds Merkle trees that consume substantial heap. Multiple concurrent repairs multiply this cost. Overlapping repair with peak traffic exhausts heap quickly. Verify with nodetool netstats if streams are active and nodetool compactionstats if compactions are backlogged.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
JVM Heap Usage (HeapMemoryUsage.used / max)Direct memory pressure indicator> 75% after full GC sustained
GC Pause DurationLong pauses freeze gossip and client threads> 2 seconds
DroppedMessage rateNode shedding load when queues expireNon-zero rate of MUTATION or READ drops
ThreadPools pending tasksInternal backpressure buildingPending > 0 in ReadStage or MutationStage for > 60 s
LiveSSTableCountMore SSTables increase metadata overhead and read amplificationGrowing steadily over days
RowCache hit rateWasted heap if enabled but ineffectiveEnabled and hit rate low
ClientRequest timeoutsTail latency from GC pressureSustained timeout rate > 0
RSS minus JVM heapOff-heap memory can trigger Linux OOM killerTotal RSS approaching system RAM

Fixes

Increase heap size if undersized

If the workload legitimately requires more heap than allocated, increase -Xmx and -Xms together. The default startup scripts usually set them to the same value; if you override heap sizing manually, keep them equal. The recommended range is 8-16 GB for most workloads. Beyond 16 GB, G1GC pause duration usually increases, and above 32 GB compressed oops are disabled. Before expanding, benchmark pause behavior with your actual SSTable count and query pattern. Do not simply raise the heap without confirming the root cause; a memory leak or large partition will eventually fill any size.

Stop large partition reads

Large partition reads can allocate buffers proportional to partition size. Use nodetool tablestats <keyspace> to find the maximum partition size, and nodetool toppartitions <keyspace> <table> 1000 to spot hot partitions in real time. If a partition approaches a significant fraction of the heap, fix the data model: switch to composite keys or add a bucketing column so no single partition grows unbounded. Enforce driver fetchSize limits so wide rows are paged instead of materialized whole on the coordinator. As an emergency measure, block the offending query pattern at the application layer or run nodetool disablebinary to stop accepting new requests while you stabilize the node.

Eliminate oversized batches

Unlogged batches spanning many partitions still create coordinator-side heap pressure because the node must hold and fan out every mutation. Search system logs for the phrase exceeding specified threshold to identify the application code generating them. Reduce batch sizes and switch to smaller, asynchronous writes. Set batch_size_fail_threshold_in_kb in cassandra.yaml to reject oversized batches at the coordinator.

Disable or shrink row cache

Row cache is disabled by default. If you enabled it and the hit rate is low, the cache consumes old-generation space for no benefit. Disable it or reduce row_cache_size in cassandra.yaml.

Throttle in-flight requests

Client retry storms after timeouts can create a feedback loop: more requests fill the heap, causing more timeouts, causing more retries. Reduce driver connection pool sizes and disable speculative retry temporarily. In an emergency, run nodetool disablebinary to stop accepting new native transport requests while the node drains in-flight work.

Reschedule repair and compaction

Merkle trees consume substantial heap per repair session. Running multiple repairs concurrently or overlapping repair with peak traffic can exhaust heap. Schedule repair during off-peak hours and limit concurrent repairs. For compaction, ensure compaction_throughput_mb_per_sec is not set so low that compaction falls behind, because compaction debt also increases heap pressure from SSTable metadata.

[OUTPUT TRUNCATED: Response exceeded output token limit.]