Cassandra killed by the Linux OOM killer: off-heap memory and RSS
The JVM heap chart shows 50% utilization and a flat line. There is no OutOfMemoryError. Then the Cassandra process vanishes. dmesg shows the OOM killer terminated the JVM: Killed process 12345 (java). The JVM heap metric does not include native allocations: bloom filters, compression metadata, index summaries, direct buffers, and chunk cache. When heap plus off-heap RSS exceeds available RAM, the kernel kills the process. This guide covers how to confirm that pattern, reduce off-heap footprint, and prevent recurrence.
What this means
Cassandra stores critical data structures in native memory, outside JVM GC control. Bloom filters are off-heap by default since 3.x. Compression metadata, index summaries, and direct ByteBuffers for network I/O also live off-heap. Cassandra 4.0 and later add an off-heap chunk cache for SSTable data. If you configure off-heap memtables, cell data bypasses the heap entirely. None of these appear in standard JVM heap metrics.
The OOM killer evaluates total process RSS. If heap is 8 GB and off-heap structures consume another 8 GB, the process needs 16 GB of RAM. If the node or container only has 14 GB available, the kernel kills Cassandra even though the JVM believes it has headroom. A common misdiagnosis is assuming healthy heap usage means headroom exists, while RSS climbs toward the physical limit.
nodetool info reports an incomplete picture. It shows JVM heap and a subset of off-heap components: bloom filters, compression metadata, index summaries, and memtable off-heap usage. It does not include the chunk cache, key cache, direct buffers, or memory-mapped file cache. A node that reports 8 GB of heap and 4 GB of off-heap via nodetool info can easily consume 16 GB or more of RSS.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Bloom filter growth from SSTable accumulation | RSS grows while heap stays flat; read latency climbs from read amplification | nodetool tablestats bloom filter off-heap size and SSTable count |
| Chunk cache or off-heap memtables oversized | RSS jumps after upgrading to Cassandra 4.0+ or after changing allocation type | file_cache_size_in_mb and memtable_offheap_space_in_mb in cassandra.yaml |
| Direct buffer accumulation | High connection count or large payloads; Netty allocates direct ByteBuffers off-heap | Connected native client count and connection patterns |
| Compaction debt amplifying metadata | Pending compactions rise for days; each SSTable carries bloom filter and index overhead | nodetool compactionstats and nodetool tablestats per table |
| Container cgroup limit too tight | Process dies on a container well before node RAM is exhausted | Container memory limit versus JVM max heap plus off-heap estimate |
Quick checks
Run these read-only commands to confirm the gap between heap and total RSS.
# Total process RSS in kilobytes
PID=$(pgrep -f CassandraDaemon | head -n 1)
[ -n "$PID" ] && grep VmRSS /proc/$PID/status
# JVM heap usage reported by Cassandra
nodetool info | grep -i "Heap Memory"
# Per-table off-heap breakdown for a suspect keyspace
nodetool tablestats <keyspace>
# SSTable count and compaction backlog (driver of bloom filter growth)
nodetool compactionstats
# Open file descriptor count and limit (surrogate for SSTable proliferation)
PID=$(pgrep -f CassandraDaemon | head -n 1)
[ -n "$PID" ] && echo "FDs: $(ls /proc/$PID/fd | wc -l)" && grep "Max open files" /proc/$PID/limits
# Live SSTable count for a specific table
nodetool tablestats <keyspace>.<table> | grep "SSTable count"
How to diagnose it
- Confirm the OOM kill in kernel logs. Run
dmesg -T | grep -i "killed process"or checkjournalctl -kfor entries naming the Java process. The log will show the RSS value that triggered the kill. - Compare RSS to heap. Take the VmRSS value from
/proc/<pid>/statusand subtract the JVM max heap (-Xmx). The remainder is the approximate native footprint, including off-heap structures, metaspace, thread stacks, and code cache. If this delta is large and growing, off-heap allocations are the primary driver. - Identify the largest off-heap consumers. Use
nodetool tablestatsto inspectBloom filter off heap memory used,Compression metadata off heap memory used, andIndex summary off heap memory usedper table. In most clusters, bloom filters dominate. - Correlate with SSTable count. High SSTable counts directly increase bloom filter and index summary memory. Use
nodetool tablestatsto find tables with an unexpectedly high SSTable count. If compaction is behind, pending tasks will be growing. - Check for chunk cache pressure. If you are running Cassandra 4.0 or later, review whether
file_cache_size_in_mbis sized aggressively. The chunk cache lives off-heap and competes with RAM. - Validate container limits. If the node runs in Kubernetes or Docker, compare the container memory limit to the sum of JVM max heap, estimated off-heap, and OS overhead. A container limit set close to the JVM heap size leaves no room for native allocations.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Total process RSS | The OOM killer acts on RSS, not JVM heap | RSS exceeding 80% of system RAM |
| Heap-to-RSS delta | Reveals off-heap growth invisible to JVM metrics | Delta growing over days |
| Bloom filter off-heap memory | Usually the largest per-table off-heap consumer | Growing faster than table data volume |
| SSTable count per table | Each SSTable adds bloom filter and index summary overhead | Count trending upward for days |
| Compression metadata off-heap | Scales with on-disk data and chunk length | Large relative to table size |
| File descriptor usage | Proxy for SSTable proliferation | Exceeding 80% of ulimit |
Fixes
Shrink bloom filter memory
Bloom filters scale with SSTable count and partition key count. The fastest way to reduce their footprint is to raise bloom_filter_fp_chance on the table, which trades a small increase in disk I/O for lower memory usage. The default is 0.01 for most tables and 0.1 for LCS tables. This is a live configuration change via ALTER TABLE, but it increases read amplification slightly. Existing SSTables keep their current filters until compacted; a major compaction rebuilds them for a table, but it is expensive and creates significant I/O.
Reduce SSTable count
If bloom filters are large because compaction has fallen behind, increase compaction_throughput_mb_per_sec temporarily to let compaction catch up. Once SSTable count drops, bloom filter and index summary memory will follow. Long term, place commitlog and data directories on separate devices so compaction I/O cannot starve flushes.
Resize the chunk cache
In Cassandra 4.0 and later, the chunk cache caches SSTable chunks off-heap. If file_cache_size_in_mb is configured too large for the node’s RAM budget, lower it in cassandra.yaml and perform a rolling restart. Review this first on nodes with small RAM footprints.
Tune off-heap memtables
If you use off-heap memtables, review memtable_offheap_space_in_mb. Lowering the limit forces more frequent flushes, which increases compaction load but caps native memory usage. Test this on a single node first because it shifts pressure from native memory to the flush and compaction pipeline.
Adjust container memory limits
In containerized deployments, the cgroup memory limit is the hard ceiling that triggers the OOM killer, even if the host has free RAM. Set the container limit to accommodate JVM max heap plus estimated off-heap plus a margin for the OS and page cache. A limit equal to the JVM heap size guarantees an OOM kill.
Emergency relief
Warning: These steps are disruptive.
If a node is cycling toward another OOM kill and you need immediate stability, run nodetool disablebinary to stop accepting new client requests. This reduces allocation pressure from incoming writes and network direct buffers. Use nodetool drain before any restart to flush memtables cleanly. Restarting clears transient native allocations, but it is only a temporary reprieve unless you fix the underlying RSS growth.
Prevention
- Monitor RSS, not just heap. Track the delta between total process RSS and JVM max heap as a first-class metric. Alert when RSS approaches 80% of system RAM or when the off-heap delta trends upward over multiple days.
- Keep compaction healthy. Monitor pending tasks under the
CompactionExecutor. A steadily growing queue means SSTable count will rise, which in turn grows bloom filter and index summary memory. - Size heap conservatively. The recommended JVM heap for most Cassandra workloads is 8-16 GB. Setting a heap larger than 16 GB increases GC pause risk and leaves less room for off-heap structures. Keep total Cassandra memory footprint, including heap and off-heap, within roughly 50-75% of system RAM.
- Plan container budgets with off-heap headroom. In Kubernetes or Docker, treat the JVM heap as only one component of the container’s memory budget. Leave several gigabytes of headroom for bloom filters, compression metadata, chunk cache, and direct buffers.
- Review table definitions for bloom filter settings. Tables that do not rely heavily on single-partition lookups can tolerate a higher
bloom_filter_fp_chance. Evaluate this during schema design rather than during an incident.
How Netdata helps
- Correlate process RSS with JVM heap on the same time-series chart to expose the off-heap gap.
- Alert on system RAM utilization before the OOM killer fires.
- Track per-table SSTable count and pending compaction tasks to catch bloom filter growth early.
- Monitor file descriptor usage and JVM GC pause duration alongside RSS to distinguish off-heap pressure from other resource exhaustion patterns.
Related guides
- Cassandra adding and removing nodes safely: vnodes, tokens, and cleanup
- Cassandra node stuck in joining (UJ): bootstrap diagnosis
- Cassandra compaction strategies: STCS vs LCS vs TWCS vs UCS
- Cassandra clock skew: how NTP drift silently corrupts data
- Cassandra commitlog disk full: segment exhaustion and forced flushes
- Cassandra commitlog pending tasks: write-path I/O pressure
- Cassandra compaction death spiral: when writes outrun compaction throughput
- Cassandra consistency levels explained: QUORUM, ONE, LOCAL_QUORUM, and EACH_QUORUM
- Cassandra zombie data resurrection: gc_grace_seconds and unrepaired tombstones
- Cassandra disk space exhaustion: emergency recovery when the data volume fills
- Cassandra dropped mutations: silent write loss and load shedding
- Cassandra dropped reads and other messages: reading nodetool tpstats Dropped







