Elasticsearch high disk watermark [90%] exceeded: shard relocation and the cascade

When Elasticsearch logs high disk watermark [90%] exceeded on [node] ... shards will be relocated away, the allocator immediately begins moving shards off the affected node. Relocation generates disk I/O and network traffic on both source and target. If targets were already close to their own watermarks, incoming shards can push them past 90%. This is the disk watermark cascade: a self-reinforcing loop where one full node triggers relocations that make other nodes full, eventually leaving the cluster with no legal allocation target and, if flood stage is hit, read-only indices.

The high watermark (90% by default) is an active intervention. Unlike the low watermark (85%), which only blocks new shard allocation, the high watermark forces existing shards to move. In clusters with tight disk margins or uneven shard distribution, that intervention can be more disruptive than the original disk pressure. If the condition progresses to the flood-stage watermark (95%), Elasticsearch sets index.blocks.read_only_allow_delete on every index with a shard on the affected node, stopping writes.

What this means

Elasticsearch uses three disk-based thresholds: low (85%), high (90%), and flood stage (95%). Exceeding the low watermark stops new shard allocation. Exceeding the high watermark triggers asynchronous shard relocation. Exceeding the flood stage makes affected indices read-only.

The relocation process is the risk. Moving a shard requires reading the full segment set from the source and writing it to the target, consuming disk bandwidth on both sides and temporarily increasing disk usage on the target until the old copy is deleted from the source. If the allocator chooses a target already at 87%, that target may cross 90% during the relocation, triggering another round of moves. In severe cases, every data node is above the low watermark and no legal allocation target remains. If a node uses multiple data paths, the watermark check applies to each path independently, so one full path can trigger relocations even if another path has space.

In Elasticsearch 7.x and 8.x, the flood-stage read-only block is automatically removed when disk usage falls below the high watermark. If you cannot free space, the block persists and writes remain blocked. You can also clear it manually after cleanup.

flowchart TD
    A[Node crosses high watermark 90%] --> B[Allocator relocates shards away]
    B --> C[Relocation I/O consumes disk and network]
    C --> D[Target node disk rises]
    D --> E{Target crosses 90%?}
    E -->|Yes| F[New relocations triggered]
    E -->|No| G[Cluster stabilizes]
    F --> C
    D --> H[Node crosses flood stage 95%]
    H --> I[Indices set read-only]

Common causes

CauseWhat it looks likeFirst thing to check
Data growth without ILM cleanupAll nodes climbing steadily toward 90%_cat/indices for old or large indices and _ilm/explain
Uneven shard distributionOne node at 90% while others sit at 60%_cat/allocation?v&s=disk.percent:desc
Large merge temporarily doubling diskSudden spike during heavy indexing_cat/nodes?v&h=name,merges.current,merges.current.size
Translog accumulation on lagging replicasDisk growing on nodes with recovering shards_nodes/stats/indices/translog
All nodes above low watermarkRelocations stall; cluster health turns yellow or red_cat/allocation disk.percent on every node

Quick checks

# Disk usage per node, sorted by fullness
curl -s 'http://localhost:9200/_cat/allocation?v&s=disk.percent:desc'

# Cluster health and active relocations
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,relocating_shards,unassigned_shards'

# Active recoveries (relocation generates recovery traffic)
curl -s 'http://localhost:9200/_cat/recovery?v&active_only=true&h=index,shard,source_node,target_node,bytes_percent'

# Indices blocked by flood stage
curl -s 'http://localhost:9200/_all/_settings/index.blocks.read_only_allow_delete?pretty'

# Why a specific shard is unassigned
curl -s 'http://localhost:9200/_cluster/allocation/explain?pretty'

# Merge activity that may be temporarily inflating disk
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,merges.current,merges.current.size'

# OS-level disk check (adjust path to your data directory)
df -h /var/lib/elasticsearch

How to diagnose it

  1. Identify the fullest nodes. Use _cat/allocation sorted by disk.percent. If one node is at 90% and the rest are below 75%, the problem is hot-spotting. Check _cat/shards?v&s=store:desc to see which indices are largest on that node. If multiple nodes are near 90%, the problem is cluster-wide capacity.
  2. Correlate with shard relocation activity. Check relocating_shards in _cluster/health. If the count is high and disk is rising on target nodes, the cascade is active. Note the indices involved; large shards (hundreds of GB) move slowly and keep the cluster in a stressed state longer.
  3. Check for flood-stage blocks. Query _all/_settings for index.blocks.read_only_allow_delete. If present, writes are already blocked for those indices.
  4. Determine whether targets have room. If every data node is above the low watermark (85%), Elasticsearch cannot legally allocate new shards anywhere. Relocations stall and shards become unassigned.
  5. Compare OS disk usage with Elasticsearch-reported disk. If the OS shows 92% but _cat/allocation shows 85%, non-Elasticsearch data such as logs or snapshots stored on the same mount are consuming the difference.
  6. Investigate the root capacity trend. Check ILM status with GET /*/_ilm/explain?only_errors=true&only_managed=true to see if old indices are stuck. Check _cat/indices for unexpectedly large indices. Check _cat/segments/<index> for high segment counts; if count is very high relative to shard count, a merge backlog is inflating disk.
  7. Check recovery progress. Use _cat/recovery?active_only=true. If bytes_percent is moving slowly, disk I/O or network is saturated. If it is stuck near the same value for minutes, the target may have hit its own watermark mid-relocation.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Disk percent per node (_cat/allocation)Watermarks apply per node, not cluster-wide averageAny node above 85%; multiple nodes trending toward 90% simultaneously
Relocating shard countActive relocations are the mechanism of the cascadeSustained high relocating_shards while disk is rising on target nodes
Indexing rateFlood stage blocks stop writesSudden drop to zero on active indices
Disk I/O wait (OS-level)Relocation and merge traffic compete with indexing and searchiowait >30% sustained during relocation spikes
Flood stage block presenceIndicates writes are blockedindex.blocks.read_only_allow_delete: true on any active index
Unassigned shard countStalled relocations leave shards homelessReplicas unassigned for >30 minutes during non-maintenance windows

Fixes

Free disk space immediately

The fastest way to break the cascade is to delete data. Target the oldest indices first. Use _cat/indices?v&s=store.size:desc to identify the largest consumers, then verify they are managed by ILM or have snapshots before deleting. For non-critical indices, reducing replica counts frees space on every node that holds a copy.

Warning: Deleting indices is destructive and irreversible. Ensure snapshots exist before deleting. Reducing replica counts first triggers relocations to drop copies, temporarily increasing disk I/O and network load before space is freed.

Handle flood-stage blocks

In Elasticsearch 7.x and 8.x, flood-stage blocks are automatically removed when a node’s disk drops below the high watermark. If you free space and the block does not clear within a few minutes, remove it manually:

# Remove read-only block from all indices after freeing disk
curl -X PUT 'http://localhost:9200/_all/_settings' -H 'Content-Type: application/json' -d '{"index.blocks.read_only_allow_delete": null}'

Add capacity

If the cluster is uniformly above 80%, deleting data is only a temporary fix. Add data nodes or expand attached volumes. Expanding volumes is often faster in cloud environments, but ensure the filesystem is resized before expecting Elasticsearch to use the new space.

Fix stuck ILM policies

If disk growth is caused by indices accumulating, use GET /*/_ilm/explain?only_errors=true&only_managed=true to find stuck indices. Retry with POST /<index>/_ilm/retry after resolving the blocker (usually disk space or missing aliases).

Avoid force merges during disk pressure

Force merging reduces segment count but temporarily requires disk space for both old and new segments. Running a force merge when a node is near 90% can push it into flood stage.

Prevention

  • Monitor disk usage per node, not cluster-wide averages. Hot-spotting is invisible in aggregate.
  • Keep sustained disk usage below 70%. Plan capacity so that daily growth plus merge overhead leaves headroom above the low watermark.
  • Verify ILM execution regularly. A stuck ILM policy is a slow-motion disk exhaustion event.
  • Account for merge overhead in capacity planning. A node at 80% can briefly spike to 90% during large merges.
  • In hot-warm-cold architectures, monitor tiers independently. Warm and cold nodes have different baseline I/O and fill rates.
  • Do not lower watermarks to mask capacity problems. If you must adjust cluster.routing.allocation.disk.watermark.low and high, do so only as a temporary bridge while adding capacity.

How Netdata helps

Netdata tracks the signals that predict and identify a watermark cascade:

  • Per-node disk utilization, saturation, and I/O wait. Hot-spotted nodes show elevated utilization before the allocator triggers relocations.
  • Disk latency and utilization charts distinguish normal indexing from cascade-induced I/O storms.
  • The Elasticsearch collector exposes cluster health, shard counts, and relocating shards, correlating disk pressure with allocator behavior.
  • Alerts on disk percent per node set below the low watermark (for example, 80%) give operators time to act before Elasticsearch intervenes.
  • Indexing rate and search latency charts identify when write blocks or relocation overhead impact workload performance.