Elasticsearch high disk watermark [90%] exceeded: shard relocation and the cascade
When Elasticsearch logs high disk watermark [90%] exceeded on [node] ... shards will be relocated away, the allocator immediately begins moving shards off the affected node. Relocation generates disk I/O and network traffic on both source and target. If targets were already close to their own watermarks, incoming shards can push them past 90%. This is the disk watermark cascade: a self-reinforcing loop where one full node triggers relocations that make other nodes full, eventually leaving the cluster with no legal allocation target and, if flood stage is hit, read-only indices.
The high watermark (90% by default) is an active intervention. Unlike the low watermark (85%), which only blocks new shard allocation, the high watermark forces existing shards to move. In clusters with tight disk margins or uneven shard distribution, that intervention can be more disruptive than the original disk pressure. If the condition progresses to the flood-stage watermark (95%), Elasticsearch sets index.blocks.read_only_allow_delete on every index with a shard on the affected node, stopping writes.
What this means
Elasticsearch uses three disk-based thresholds: low (85%), high (90%), and flood stage (95%). Exceeding the low watermark stops new shard allocation. Exceeding the high watermark triggers asynchronous shard relocation. Exceeding the flood stage makes affected indices read-only.
The relocation process is the risk. Moving a shard requires reading the full segment set from the source and writing it to the target, consuming disk bandwidth on both sides and temporarily increasing disk usage on the target until the old copy is deleted from the source. If the allocator chooses a target already at 87%, that target may cross 90% during the relocation, triggering another round of moves. In severe cases, every data node is above the low watermark and no legal allocation target remains. If a node uses multiple data paths, the watermark check applies to each path independently, so one full path can trigger relocations even if another path has space.
In Elasticsearch 7.x and 8.x, the flood-stage read-only block is automatically removed when disk usage falls below the high watermark. If you cannot free space, the block persists and writes remain blocked. You can also clear it manually after cleanup.
flowchart TD
A[Node crosses high watermark 90%] --> B[Allocator relocates shards away]
B --> C[Relocation I/O consumes disk and network]
C --> D[Target node disk rises]
D --> E{Target crosses 90%?}
E -->|Yes| F[New relocations triggered]
E -->|No| G[Cluster stabilizes]
F --> C
D --> H[Node crosses flood stage 95%]
H --> I[Indices set read-only]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Data growth without ILM cleanup | All nodes climbing steadily toward 90% | _cat/indices for old or large indices and _ilm/explain |
| Uneven shard distribution | One node at 90% while others sit at 60% | _cat/allocation?v&s=disk.percent:desc |
| Large merge temporarily doubling disk | Sudden spike during heavy indexing | _cat/nodes?v&h=name,merges.current,merges.current.size |
| Translog accumulation on lagging replicas | Disk growing on nodes with recovering shards | _nodes/stats/indices/translog |
| All nodes above low watermark | Relocations stall; cluster health turns yellow or red | _cat/allocation disk.percent on every node |
Quick checks
# Disk usage per node, sorted by fullness
curl -s 'http://localhost:9200/_cat/allocation?v&s=disk.percent:desc'
# Cluster health and active relocations
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,relocating_shards,unassigned_shards'
# Active recoveries (relocation generates recovery traffic)
curl -s 'http://localhost:9200/_cat/recovery?v&active_only=true&h=index,shard,source_node,target_node,bytes_percent'
# Indices blocked by flood stage
curl -s 'http://localhost:9200/_all/_settings/index.blocks.read_only_allow_delete?pretty'
# Why a specific shard is unassigned
curl -s 'http://localhost:9200/_cluster/allocation/explain?pretty'
# Merge activity that may be temporarily inflating disk
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,merges.current,merges.current.size'
# OS-level disk check (adjust path to your data directory)
df -h /var/lib/elasticsearch
How to diagnose it
- Identify the fullest nodes. Use
_cat/allocationsorted bydisk.percent. If one node is at 90% and the rest are below 75%, the problem is hot-spotting. Check_cat/shards?v&s=store:descto see which indices are largest on that node. If multiple nodes are near 90%, the problem is cluster-wide capacity. - Correlate with shard relocation activity. Check
relocating_shardsin_cluster/health. If the count is high and disk is rising on target nodes, the cascade is active. Note the indices involved; large shards (hundreds of GB) move slowly and keep the cluster in a stressed state longer. - Check for flood-stage blocks. Query
_all/_settingsforindex.blocks.read_only_allow_delete. If present, writes are already blocked for those indices. - Determine whether targets have room. If every data node is above the low watermark (85%), Elasticsearch cannot legally allocate new shards anywhere. Relocations stall and shards become unassigned.
- Compare OS disk usage with Elasticsearch-reported disk. If the OS shows 92% but
_cat/allocationshows 85%, non-Elasticsearch data such as logs or snapshots stored on the same mount are consuming the difference. - Investigate the root capacity trend. Check ILM status with
GET /*/_ilm/explain?only_errors=true&only_managed=trueto see if old indices are stuck. Check_cat/indicesfor unexpectedly large indices. Check_cat/segments/<index>for high segment counts; ifcountis very high relative to shard count, a merge backlog is inflating disk. - Check recovery progress. Use
_cat/recovery?active_only=true. Ifbytes_percentis moving slowly, disk I/O or network is saturated. If it is stuck near the same value for minutes, the target may have hit its own watermark mid-relocation.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Disk percent per node (_cat/allocation) | Watermarks apply per node, not cluster-wide average | Any node above 85%; multiple nodes trending toward 90% simultaneously |
| Relocating shard count | Active relocations are the mechanism of the cascade | Sustained high relocating_shards while disk is rising on target nodes |
| Indexing rate | Flood stage blocks stop writes | Sudden drop to zero on active indices |
| Disk I/O wait (OS-level) | Relocation and merge traffic compete with indexing and search | iowait >30% sustained during relocation spikes |
| Flood stage block presence | Indicates writes are blocked | index.blocks.read_only_allow_delete: true on any active index |
| Unassigned shard count | Stalled relocations leave shards homeless | Replicas unassigned for >30 minutes during non-maintenance windows |
Fixes
Free disk space immediately
The fastest way to break the cascade is to delete data. Target the oldest indices first. Use _cat/indices?v&s=store.size:desc to identify the largest consumers, then verify they are managed by ILM or have snapshots before deleting. For non-critical indices, reducing replica counts frees space on every node that holds a copy.
Warning: Deleting indices is destructive and irreversible. Ensure snapshots exist before deleting. Reducing replica counts first triggers relocations to drop copies, temporarily increasing disk I/O and network load before space is freed.
Handle flood-stage blocks
In Elasticsearch 7.x and 8.x, flood-stage blocks are automatically removed when a node’s disk drops below the high watermark. If you free space and the block does not clear within a few minutes, remove it manually:
# Remove read-only block from all indices after freeing disk
curl -X PUT 'http://localhost:9200/_all/_settings' -H 'Content-Type: application/json' -d '{"index.blocks.read_only_allow_delete": null}'
Add capacity
If the cluster is uniformly above 80%, deleting data is only a temporary fix. Add data nodes or expand attached volumes. Expanding volumes is often faster in cloud environments, but ensure the filesystem is resized before expecting Elasticsearch to use the new space.
Fix stuck ILM policies
If disk growth is caused by indices accumulating, use GET /*/_ilm/explain?only_errors=true&only_managed=true to find stuck indices. Retry with POST /<index>/_ilm/retry after resolving the blocker (usually disk space or missing aliases).
Avoid force merges during disk pressure
Force merging reduces segment count but temporarily requires disk space for both old and new segments. Running a force merge when a node is near 90% can push it into flood stage.
Prevention
- Monitor disk usage per node, not cluster-wide averages. Hot-spotting is invisible in aggregate.
- Keep sustained disk usage below 70%. Plan capacity so that daily growth plus merge overhead leaves headroom above the low watermark.
- Verify ILM execution regularly. A stuck ILM policy is a slow-motion disk exhaustion event.
- Account for merge overhead in capacity planning. A node at 80% can briefly spike to 90% during large merges.
- In hot-warm-cold architectures, monitor tiers independently. Warm and cold nodes have different baseline I/O and fill rates.
- Do not lower watermarks to mask capacity problems. If you must adjust
cluster.routing.allocation.disk.watermark.lowandhigh, do so only as a temporary bridge while adding capacity.
How Netdata helps
Netdata tracks the signals that predict and identify a watermark cascade:
- Per-node disk utilization, saturation, and I/O wait. Hot-spotted nodes show elevated utilization before the allocator triggers relocations.
- Disk latency and utilization charts distinguish normal indexing from cascade-induced I/O storms.
- The Elasticsearch collector exposes cluster health, shard counts, and relocating shards, correlating disk pressure with allocator behavior.
- Alerts on disk percent per node set below the low watermark (for example, 80%) give operators time to act before Elasticsearch intervenes.
- Indexing rate and search latency charts identify when write blocks or relocation overhead impact workload performance.
Related guides
- Elasticsearch all shards failed: diagnosing search_phase_execution_exception
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch heap pressure death spiral: GC, node removal, and the cascade
- Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor
- Elasticsearch this action would add too many shards: max_shards_per_node limit
- Elasticsearch monitoring checklist: the signals every production cluster needs
- Elasticsearch monitoring maturity model: from survival to expert
- Elasticsearch long GC pauses: old-generation stop-the-world and node drops
- Elasticsearch node OOM-killed: heap ceiling, page cache, and container limits







