The heap pressure death spiral
Something fills the heap — too many shards' segment metadata, a mapping explosion bloating the cluster state, fielddata on a text field, or a giant aggregation. Old GC fires more often, each pause is stop-the-world, and once a pause exceeds the fault-detection timeout the master removes the node. Its shards relocate onto the survivors, raising their heap pressure. The cascade can take down the cluster.
- jvm.mem.heap_used_percent sustained above 85% with the post-GC floor rising
- jvm.gc.collectors.old.collection_time climbing, individual pauses over 10s
- breakers.parent.tripped incrementing and write/search queues growing
- number_of_nodes dropping during GC peaks, then unassigned shards appearing







