$ guides / elasticsearch / elasticsearch-cluster-health-red ▌

Operations Guides

Elasticsearch cluster health red: unassigned primaries and how to recover

Cluster health red means at least one primary shard is unassigned. Queries against affected indices return partial results or fail; writes are blocked. yellow only signals missing replicas, but red signals active data unavailability.

Cluster health is a lagging indicator. A red status sustained longer than two minutes after the cluster has formed is a real fault; a brief flash during startup is normal. By the time the status turns red, a node has likely departed, a disk has crossed a watermark, or a shard copy has been rejected as corrupt.

The master allocates shards based on disk watermarks (low 85%, high 90%, flood stage 95%), allocation filtering rules, awareness attributes, and the validity of existing shard copies. When a primary goes unassigned, the allocator has evaluated every candidate node and found none acceptable. Your job is to discover which constraint blocked placement, then either remove the constraint or recover the data through other means.

flowchart TD
    A[Cluster health red] --> B{Sustained >2m
uptime >600s}
    B -- No --> C[Transient startup
or restart]
    B -- Yes --> D[GET /_cluster/health
?level=indices]
    D --> E[List unassigned
primaries]
    E --> F[POST /_cluster/
allocation/explain]
    F --> G{Root cause}
    G -- NODE_LEFT --> H[Check node logs
for GC or OOM]
    G -- WATERMARK --> I[Free disk and clear
read_only blocks]
    G -- ALLOC_FAILED --> J[reroute?retry_failed]
    G -- NO_VALID_COPY --> K[Restore snapshot or
accept data loss]
    G -- FILTER --> L[Fix allocation
settings]

Common causes

Cause	What it looks like	First check
Node loss (crash, OOM kill, GC pause, network partition)	Node count drops in `/_cluster/health`; unassigned reason shows `NODE_LEFT`	`GET /_cat/nodes` and node logs for OOM killer messages or fatal GC errors
Disk watermark exceeded	Shards refuse to allocate; indices become read-only at flood stage	`GET /_cat/allocation?v` for disk usage percent on every data node
Corrupt shard copy or max retries exceeded	Unassigned reason `ALLOCATION_FAILED`; shard repeatedly fails to initialize	`POST /_cluster/allocation/explain` for `can_allocate` and failure details
Allocation filtering or awareness misconfiguration	Shards stay unassigned despite healthy nodes and adequate disk space	`POST /_cluster/allocation/explain` for the blocking decider; `GET /_cluster/settings`
Insufficient nodes to host all copies	All nodes that held valid copies have left; remaining nodes cannot satisfy replication rules	`POST /_cluster/allocation/explain` per shard for `no_valid_shard_copy`

Quick checks

Run these read-only commands to scope the incident.

# Check cluster health and identify red indices
curl -s 'http://localhost:9200/_cluster/health?level=indices&filter_path=status,unassigned_shards,indices.*.status'

# List unassigned shards and their reasons
curl -s 'http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state' | grep UNASSIGNED

# Explain why the first unassigned shard is blocked
curl -s 'http://localhost:9200/_cluster/allocation/explain?pretty'

# Check node membership and basic health
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,node.role,heap.percent,cpu,load_1m,disk.used_percent'

# Check per-node disk usage against watermarks
curl -s 'http://localhost:9200/_cat/allocation?v'

# Verify allocation has not been disabled globally
curl -s 'http://localhost:9200/_cluster/settings?flat_settings=true&filter_path=persistent.cluster.routing.allocation.enable,transient.cluster.routing.allocation.enable'

How to diagnose it

Confirm the state is sustained. If nodes have been running for less than 600 seconds, wait briefly. Initial cluster formation and shard discovery can transiently show red.
Identify affected indices. Use GET /_cluster/health?level=indices. Note which indices report red; these own the unassigned primaries.
List unassigned primaries. Query /_cat/shards and filter to UNASSIGNED. Focus on rows where prirep is p. Note the unassigned.reason value.
Get the allocator’s exact reasoning. Call POST /_cluster/allocation/explain. The response contains can_allocate (no, throttled, no_valid_shard_copy, allocation_delayed) and a per-node breakdown of why each node rejected the shard.
Correlate with node and disk health. A NODE_LEFT reason paired with a lower node count points to a departed node. If nodes are present but disk usage is high, watermark deciders are blocking placement.
Inspect logs. On departed or target nodes, check Elasticsearch logs for OutOfMemoryError, long GC pauses exceeding the fault detection timeout, or disk I/O errors that caused the allocator to reject a shard copy.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Cluster health status	Binary indicator of primary availability	`red` sustained longer than 2 minutes after JVM uptime exceeds 600 seconds
Unassigned shard count	Quantifies scope; reason points to root cause	Any unassigned primary sustained beyond the startup window
Node count	A drop means a node left and triggered reallocation	Unexpected decrease in `number_of_data_nodes`
Disk usage and watermarks	Disk above 85% blocks new allocation; 95% blocks writes	Disk usage above 85% on any data node
JVM heap and GC activity	Long stop-the-world GC causes node removal via fault detection	Heap usage above 85% with increasing old GC duration
Master stability	Master instability stalls all allocation decisions	Pending cluster tasks growing or master identity changing
Pending cluster tasks	Backlogged tasks delay shard allocation decisions	More than 20 pending tasks or any task older than 30 seconds

Fixes

Transient node restart or rolling maintenance

If a node restarted, Elasticsearch delays automatic recovery by index.unassigned.node_left.delayed_timeout (default 1 minute) to give the node time to rejoin. During rolling restarts, set cluster.routing.allocation.enable: none before stopping nodes to prevent a rebalancing storm. If the delay has passed and primaries remain unassigned, move to the specific cause below.

Disk watermark and flood stage

If /_cat/allocation shows nodes above the high watermark (90%) or flood stage (95%), free disk space immediately. Delete old indices, force-merge read-only indices to reclaim space, or remove unneeded snapshots. When flood stage is reached, affected indices are automatically set to index.blocks.read_only_allow_delete. After freeing disk space, clear the block manually:

# Clear flood-stage read-only block after freeing disk space
curl -X PUT 'http://localhost:9200/_all/_settings' -H 'Content-Type: application/json' -d '{"index.blocks.read_only_allow_delete": null}'

Do not clear the block before freeing space, or writes will immediately re-trigger it.

Max retries exceeded or corrupt shard

Shards with reason ALLOCATION_FAILED have exhausted automatic retries. Trigger a new allocation attempt:

# Retry shards that failed automatic allocation
curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed'

If the shard copy is corrupt and no valid copy exists on another node, choose between restoring from snapshot or accepting data loss with a manual override.

No valid shard copy

When allocation/explain returns no_valid_shard_copy, the cluster has no intact primary. If you have a recent snapshot, restore the index. Snapshot restore is always preferable to forcing a partial allocation.

If no snapshot exists and you must recover the index, you can allocate a stale copy or an empty primary. Both require explicitly accepting data loss. The following example forces allocation of a stale primary to a node that holds an older copy:

# Force allocate a stale primary - DESTRUCTIVE, may lose data
curl -X POST 'http://localhost:9200/_cluster/reroute' -H 'Content-Type: application/json' -d '{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "my-index",
        "shard": 0,
        "node": "target-node-name",
        "accept_data_loss": true
      }
    }
  ]
}'

If no stale copy exists anywhere, allocate_empty_primary creates a new empty shard on the named node. These commands are destructive. They may result in partial or total data loss for that shard and should only be used when snapshot restore is impossible.

Allocation filtering or awareness misconfiguration

If the allocation explain output names a filter or awareness decider, review cluster.routing.allocation.* and index.routing.allocation.* settings. Correct the attribute mismatch or remove the errant filter, then allow the allocator to retry.

Prevention

Monitor per-node disk usage and project time-to-watermark; keep routine usage below 70% to absorb merge spikes.
Use ILM to roll over and delete time-series indices before disks fill.
Maintain tested snapshots; a successful snapshot does not guarantee a successful restore.
Deploy dedicated master nodes to avoid master instability causing allocation stalls.
Monitor JVM heap floor and old GC frequency to predict node loss from GC pressure before fault detection removes the node.
Keep Elasticsearch versions uniform across the cluster; version mismatch after upgrades can block shard allocation.

How Netdata helps

Netdata collects Elasticsearch metrics that correlate red health with its leading indicators:

Cluster health status and uptime: Alert on red sustained longer than 2 minutes when JVM uptime exceeds 600 seconds, filtering out startup noise.
Per-node disk usage: Correlate unassigned shards with nodes crossing the 85%, 90%, or 95% disk watermarks before flood stage blocks writes.
JVM heap usage and GC latency: Rising old-generation GC duration predicts node departures that trigger unassigned primaries.
Node count and unassigned shard count: Surface unexpected node drops and quantify how many primaries are affected.
Thread pool rejections and circuit breaker trips: Identify heap pressure and saturation that precede node removal and cascading failures.

The Netdata solution

Elasticsearch monitoring with Netdata

Netdata monitors Elasticsearch with per-second metrics and ML anomaly detection. Correlate JVM heap pressure, shard counts, disk watermarks, mapping growth, and merge activity with cluster and node health in one view.

See Elasticsearch monitoring → Start monitoring free

Elasticsearch cluster health red: unassigned primaries and how to recover

Elasticsearch cluster health red: unassigned primaries and how to recover

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Transient node restart or rolling maintenance

Disk watermark and flood stage

Max retries exceeded or corrupt shard

No valid shard copy

Allocation filtering or awareness misconfiguration

Prevention

How Netdata helps

Related guides

Elasticsearch monitoring with Netdata