$ guides / elasticsearch / elasticsearch-unassigned-shards ▌

Operations Guides

Elasticsearch unassigned shards: reading allocation explain and fixing each reason

Yellow or red cluster health with unassigned_shards > 0 means the allocator cannot place one or more shard copies on any node. Missing primaries block queries and risk data loss; missing replicas only cost redundancy. Do not guess from the cluster color. The allocator already knows why it rejected every node. Ask it.

What this means

Unassigned primaries make their data unreachable. Affected indices return partial results or fail. Unassigned replicas remove redundancy; a second failure on those primaries drops the data. The master allocator evaluates every node through a chain of deciders: disk watermarks, allocation filters, awareness attributes, the same-shard rule, and retry limits. When every node is rejected, the shard stays UNASSIGNED until the blocking condition clears or you intervene.

flowchart TD
    A[Unassigned shards detected] --> B[GET /_cluster/allocation/explain]
    B --> C{unassigned_info.reason}
    C -->|NODE_LEFT| D[Check node count and delayed_timeout]
    C -->|ALLOCATION_FAILED| E[Check failed_allocations count]
    E -->|>= 5| F[POST /_cluster/reroute?retry_failed=true]
    C -->|Disk watermark| G[Check /_cat/allocation disk percent]
    C -->|Filter / Awareness| H[Check /_cat/nodeattrs and index routing settings]
    C -->|Same shard| I[Reduce replicas or add data nodes]

Common causes

Cause	What it looks like	First thing to check
Disk watermark breached	Nodes above low/high/flood watermarks; new allocations blocked	`GET /_cat/allocation?v`
NODE_LEFT with delayed timeout	Node departed; replicas unassigned but waiting	`GET /_cat/nodes` and `index.unassigned.node_left.delayed_timeout`
ALLOCATION_FAILED at max retries	Shard copy corrupt or failed validation; never retries again	`GET /_cluster/allocation/explain` for `failed_allocations` count
Allocation filter or awareness mismatch	Index requires a node attribute that no node carries	`GET /_cat/nodeattrs?v` and index `routing.allocation` settings
Same-shard rule / insufficient nodes	Replica count equals or exceeds data node count	`GET /_cat/nodes` count vs replica count
Allocation explicitly disabled	`cluster.routing.allocation.enable` set to `none` or `primaries`	`GET /_cluster/settings?flat_settings=true`

Quick checks

# Cluster health and unassigned count
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,unassigned_shards,number_of_nodes'

# Unassigned shards with reasons
curl -s 'http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state' | grep UNASSIGNED

# The single most useful diagnostic
curl -s 'http://localhost:9200/_cluster/allocation/explain?pretty'

# Disk usage per node
curl -s 'http://localhost:9200/_cat/allocation?v'

# Current allocation enablement settings
curl -s 'http://localhost:9200/_cluster/settings?flat_settings=true&filter_path=**.cluster.routing.allocation.enable'

# Node awareness attributes
curl -s 'http://localhost:9200/_cat/nodeattrs?v'

How to diagnose it

Confirm severity. An unassigned primary is an immediate incident. An unassigned replica is a ticket unless recovery stalls past your SLO.
Run GET /_cluster/allocation/explain. Without a body, it explains the first unassigned shard. For a specific shard, pass {"index":"name","shard":0,"primary":true}.
Read unassigned_info.reason. Values like NODE_LEFT or CLUSTER_RECOVERED often self-heal. ALLOCATION_FAILED never self-heals after max retries.
Read the can_allocate field and the decider list. The decider name tells you the rule that rejected every node: disk_watermark, filter, same_shard, awareness, throttle, etc.
Check allocate_explanation and node_allocation_decisions for per-node rejections. Pass ?include_yes_decisions=true to also see nodes that would accept the shard if other constraints were lifted.
Correlate with node count drops, disk usage, and recent cluster settings changes.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`unassigned_shards`	Direct measure of stuck shards	Nonzero for more than 5 minutes outside maintenance
Disk used percent per node	Watermarks block allocation	Any node above the low watermark
`number_of_nodes` / `number_of_data_nodes`	Node loss triggers reallocation	Unexpected drop from baseline
`relocating_shards`	Recovery or rebalance storm	Sudden spike without planned change
`cluster.routing.allocation.enable`	Admin or automation may disable allocation	Value is not `all`
`index.allocation.max_retries` exceeded	Shards stuck forever without operator action	`ALLOCATION_FAILED` with `failed_allocations >= 5`

Fixes

Disk watermark breach

If a node crosses the low watermark (85%), the allocator stops sending new shards to it. At the high watermark (90%), Elasticsearch actively relocates shards away. At flood stage (95%), it sets index.blocks.read_only_allow_delete on indices with shards on that node.

Immediate response: free disk by deleting old indices, shrinking indices, or expanding storage. The flood-stage block should clear automatically once disk is freed. If it does not, or you need to unblock immediately after confirming sufficient space:

# Remove read-only block after freeing disk
curl -X PUT 'http://localhost:9200/_all/_settings' -H 'Content-Type: application/json' -d '{"index.blocks.read_only_allow_delete": null}'

Tradeoff: deleting indices is destructive. Reducing replica count frees space but reduces redundancy. Recent 8.x versions support max_headroom watermarks for large disks.

NODE_LEFT and delayed timeout

When a node leaves, replicas go unassigned. By default, the master waits index.unassigned.node_left.delayed_timeout (one minute) before reallocating, in case the node restarts. Once the timeout expires, recovery starts automatically. Lower the timeout to recover faster; raise it during rolling restarts to suppress unnecessary movement.

ALLOCATION_FAILED and max retries

If a shard fails allocation, Elasticsearch retries up to index.allocation.max_retries (default 5). After that, the shard stays unassigned indefinitely, even if the root cause is fixed.

After fixing the underlying issue (disk space, hardware, corrupt translog), trigger a retry:

# Reset retry counter and attempt allocation again
curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed=true'

Warning: retry_failed=true does not bypass allocation rules. If the disk is still full or the filter still mismatches, the retry fails again and burns another retry cycle. Always read the allocation explain output before retrying.

Allocation filters and awareness attributes

Stale index.routing.allocation.require.*, include.*, or exclude.* settings can pin shards to nodes that no longer exist. Forced awareness (cluster.routing.allocation.awareness.force.*) strands replicas when an attribute value is missing from the cluster. Verify node attributes with GET /_cat/nodeattrs?v, then update or remove the offending index setting. This is common after node replacement if the new node advertises different attributes.

Same-shard rule and shard limits

The same-shard rule forbids a primary and replica from sharing a node. If replica count equals or exceeds the data node count, at least one replica stays unassigned. Reduce replicas or add nodes. Also watch cluster.max_shards_per_node (default 1000 non-frozen shards). Hitting the limit rejects new shards with a maximum shards open error.

Last-resort manual allocation

If the only copies of a primary are gone, you may need allocate_empty_primary or allocate_stale_primary. Both require "accept_data_loss": true. These are destructive and should only be used when the original data is provably gone and recovery from snapshot is not faster.

Prevention

Project disk time-to-watermark and expand before the low watermark.
Keep replica counts below the data node count.
Audit allocation filter settings during node replacements and tier migrations.
Watch shard density per node; avoid approaching cluster.max_shards_per_node.
During rolling restarts, set cluster.routing.allocation.enable: none to prevent rebalancing storms, then re-enable to all.

How Netdata helps

Correlate unassigned_shards with per-node disk usage to spot the blocking node.
Alert on unexpected node count drops before NODE_LEFT events.
Track JVM heap and old-generation GC pauses; long GC causes node removal and unassigned shards.
Correlate relocating_shards spikes with network and disk I/O to spot recovery storms.
Track changes to cluster.routing.allocation.enable to catch accidental allocation locks.

The Netdata solution

Elasticsearch monitoring with Netdata

Netdata monitors Elasticsearch with per-second metrics and ML anomaly detection. Correlate JVM heap pressure, shard counts, disk watermarks, mapping growth, and merge activity with cluster and node health in one view.

See Elasticsearch monitoring → Start monitoring free

Elasticsearch unassigned shards: reading allocation explain and fixing each reason

Elasticsearch unassigned shards: reading allocation explain and fixing each reason

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Disk watermark breach

NODE_LEFT and delayed timeout

ALLOCATION_FAILED and max retries

Allocation filters and awareness attributes

Same-shard rule and shard limits

Last-resort manual allocation

Prevention

How Netdata helps

Related guides

Elasticsearch monitoring with Netdata