Elasticsearch node OOM-killed: heap ceiling, page cache, and container limits
An Elasticsearch node leaves the cluster, restarts seconds later via systemd or a supervisor, and is killed again. Kernel logs show the OOM-killer terminated the Java process. heap.percent often looks reasonable right up until the kill.
The JVM heap is only one component of resident set size. Off-heap allocations, memory-mapped Lucene segments, and co-located processes all compete for the same memory budget. In containers, the cgroup limit is the hard boundary, not the host’s physical RAM.
Setting -Xmx caps the JVM heap, not the process RSS. Elasticsearch uses off-heap buffers for network I/O via Netty 4. The JVM allocates Metaspace, JIT code cache, and thread stacks outside the heap. Lucene accesses index segments via memory-mapped files, which consume OS page cache. The page cache drives search performance, but also contributes to RSS.
In a containerized deployment, the OOM-killer triggers when the cgroup’s total memory usage reaches the container limit. This can happen even when heap.percent is below 75% because the heap is not the only consumer.
The parent circuit breaker defaults to 95% of JVM heap with real memory tracking. It rejects operations that would push heap usage too high, but it does not account for Lucene mmap regions, direct ByteBuffers, or memory used by other processes sharing the cgroup. Consequently, the breaker may never trip before the kernel kills the process.
flowchart TD
A[Bulk indexing or aggregations] --> B[JVM heap fills]
B --> C[Circuit breaker may trip]
A --> D[Netty direct buffers grow]
A --> E[Lucene mmap segments expand]
D --> F[Container RSS hits memory limit]
E --> F
B --> F
F --> G[Kernel OOM-killer sends SIGKILL]
G --> H[Node exits 137]
H --> I[Master removes node]
I --> J[Shard reallocation starts]
J --> K[Remaining nodes absorb load]
K --> ACommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Container memory limit too tight | Node restarts in a loop with exit code 137; dmesg shows oom-killer | Container memory limit vs -Xmx plus headroom |
| Heap sized above 50% of available RAM | Frequent OOM despite moderate heap percent; search latency high from cold page cache | _cat/nodes heap.max vs container or host total memory |
| Off-heap pressure from segments and buffers | RSS grows steadily while heap stays flat; many open file descriptors | _cat/nodes segments.count and segments.memory |
| Startup RSS spike | Node killed during bootstrap before handling traffic | Service logs for early exit, dmesg timestamp vs start time |
| Co-located services in pod or on host | ES process alone fits budget, but total RSS exceeds limit | Per-process RSS with ps or container sidecar metrics |
Quick checks
# Confirm kernel OOM-killer killed the Java process
dmesg | grep -i "killed process"
# Same check via journalctl if dmesg is empty or rotated
journalctl -k | grep -i "killed process"
# Check JVM heap max and current usage
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,heap.max,heap.percent'
# Check segment count and off-heap segment memory per node
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.count,segments.memory'
# Inspect circuit breaker state
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers'
# Check for restart loops in systemd logs
systemctl status elasticsearch --no-pager
# Show process RSS on the host
ps -o pid,rss,comm -p $(pgrep -f org.elasticsearch.bootstrap.Elasticsearch)
# Read container memory limit from inside the pod/container
cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null || cat /sys/fs/cgroup/memory.max 2>/dev/null
How to diagnose it
Verify the OOM kill. Run
dmesg | grep -i "out of memory"orjournalctl -k | grep -i "killed process". Look for lines naming the Java PID and reportinganon-rss. Note the timestamp. If the node is in a container, check the host dmesg, not the container.Confirm the restart pattern. Check
systemctl status elasticsearchor the container runtime for exit code 137 (128 + SIGKILL 9). Rapid uptime resets in_cat/nodesindicate the supervisor is respawning the process.Compare heap to limit. Query
_cat/nodes?v&h=name,heap.max,heap.percent. Convertheap.maxto the same unit as the container limit or host RAM. Ifheap.maxexceeds 50% of the limit, the configuration violates the headroom guideline.Measure off-heap growth. Check
_cat/nodes?v&h=name,segments.count,segments.memory. High segment count increases mmap pressure and file descriptor usage. Correlate with_nodes/stats/jvm?filter_path=nodes.*.jvm.memto see the gap between heap committed and process RSS.Check circuit breaker history. Query
_nodes/stats/breaker. If the parent breakertrippedcount is zero, the OOM was caused by untracked memory. If it tripped repeatedly, heap pressure preceded the kill but was not the only factor.Identify co-located consumers. On Kubernetes, check the pod spec for sidecar containers. On bare metal or VMs, sum RSS across all processes. Non-ES consumers can push total usage over the limit even when ES itself is sized correctly.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
jvm.mem.heap_used_percent | Largest controllable memory consumer | Sustained >75% |
breakers.parent.tripped | Indicates heap pressure before OOM | Any delta > 0 |
segments.memory | Segment metadata and mmap pressure | Growing without index growth |
process.open_file_descriptors | Proxies for segment count and mmap regions | >80% of max |
| Container or host memory usage vs limit | Hard boundary for OOM-killer | Usage >80% of limit |
| Node uptime / restart frequency | Catches supervisor respawn loops | Unexpected restart within 10 minutes |
Fixes
Raise the container memory limit
If the container limit is artificially low, increase it. Do not raise -Xmx to consume all the extra space. Keep -Xmx at no more than 50% of the container limit, capped at roughly 26-30 GB to keep compressed OOPs enabled.
Lower -Xmx to free headroom
If you cannot raise the limit, reduce -Xmx. This requires a rolling restart. A smaller heap gives more room to the OS page cache and off-heap allocations. Tradeoff: young GC frequency rises and heavy aggregation loads are more likely to trip the parent circuit breaker.
Reduce segment and shard pressure
High segment counts increase off-heap memory and file descriptor usage. Force merge read-only indices to reduce segments. Delete old indices or close them. Warning: force merge is I/O-intensive and temporarily doubles disk usage for the segments involved.
Isolate co-located workloads
Move monitoring agents, log shippers, and sidecars out of the Elasticsearch pod or off the host. If that is impossible, size their memory and subtract it from the available budget before setting -Xmx.
Correct CPU container detection
If running in a container with CPU limits, set -XX:ActiveProcessorCount to match the limit. Thread pools sized for too many cores allocate excessive thread stacks, adding to RSS. This also requires a rolling restart.
Prevention
- Size heap to half the budget. Set
-Xmsand-Xmxto no more than 50% of the memory available to the node, with a ceiling of roughly 26-30 GB. - Leave headroom for page cache. Elasticsearch relies on the OS page cache for Lucene segment access. Starving the page cache increases search latency and does not prevent OOM.
- Monitor total memory usage, not just heap. Heap percentage is a sawtooth that hides off-heap growth. Track process or container memory usage against the limit.
- Account for startup spikes. Some versions briefly allocate extra memory during bootstrap. Size container limits to handle startup, not just steady state.
- Watch for respawn loops. A supervisor restarting the process after exit 137 creates a flapping node that triggers unnecessary shard reallocation. Alert on unexpected node uptime resets.
How Netdata helps
- Correlates
elasticsearch.jvm_heap_used_percentwith system RAM and cgroup memory usage, revealing when RSS diverges from heap. - Surfaces kernel OOM-killer events from system logs without manual
dmesgsearches. - Tracks
elasticsearch.thread_pool_queued_operationsandelasticsearch.breaker_trippedto identify memory pressure before the kernel intervenes. - Alerts on node uptime drops and process restarts, catching supervisor respawn loops that mask chronic OOM kills.
- Monitors per-process RSS and open file descriptors to expose segment-related off-heap growth.
Related guides
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch heap pressure death spiral: GC, node removal, and the cascade
- Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor
- Elasticsearch monitoring checklist: the signals every production cluster needs
- Elasticsearch monitoring maturity model: from survival to expert
- Elasticsearch long GC pauses: old-generation stop-the-world and node drops
- How Elasticsearch actually works in production: a mental model for operators







