Elasticsearch all shards failed: diagnosing search_phase_execution_exception

You run a search and Elasticsearch returns search_phase_execution_exception with reason all shards failed. Every shard copy involved in the query returned a failure to the coordinating node. The outer error is a container; the actual root cause lives in the per-shard failure reasons inside the response body. Do not assume the cluster is down. This error fires on clusters with green health and stable nodes when a query is malformed, a mapping is incompatible, or a resource limit is breached uniformly across every target shard.

The read path uses a two-phase scatter-gather. The coordinating node broadcasts the query to one copy of each relevant shard. Each shard executes the query locally and returns document IDs and sort values. When every shard copy fails, the coordinating node has no partial result to merge and throws the composite exception. Determine whether the failure is data unavailability, a query error, or resource exhaustion.

What this means

Cluster health green does not prevent this error. Green only means all primaries and replicas are assigned. It says nothing about query correctness or node resource headroom. A malformed query against a healthy cluster produces all shards failed.
The error is a composite signal. It can represent unassigned primaries (cluster red), a bad query that fails identically on every shard, a mapping mismatch, or a uniform resource limit breach such as circuit breakers, thread pool rejections, or disk flood-stage blocks.
The per-shard reason is the diagnosis. If every shard reports the same mapper_parsing_exception or illegal_argument_exception, the problem is the query. If the reason references an unassigned shard or a tripped circuit breaker, the problem is infrastructure.

flowchart TD
    A[search_phase_execution_exception
all shards failed] --> B{Read per-shard
failure reasons}
    B -->|mapper_parsing_exception
or query error| C[Fix query / mapping]
    B -->|es_rejected_execution_exception| D[Check thread pool
saturation]
    B -->|circuit_breaking_exception| E[Check heap and
circuit breakers]
    B -->|unassigned or node_left| F[Check cluster health
and allocation explain]
    B -->|read_only_allow_delete| G[Check disk watermark
and clear blocks]
    C --> H[Cancel bad tasks
and reissue]
    D --> I[Reduce concurrency
or scale out]
    E --> J[Reduce aggregation
cardinality / add heap]
    F --> K[Retry failed or
restore nodes]
    G --> L[Free disk and
remove blocks]

Common causes

Cause	What it looks like	First thing to check
Unassigned or initializing primary shards	Cluster health red; affected indices return complete query failure	`GET /_cluster/health` and `GET /_cluster/allocation/explain`
Query or mapping error hitting every shard identically	Cluster health green; error text references unknown fields, missing `.keyword` sub-field, or script errors	The per-shard `reason` block in the error response
Circuit breaker tripped (`request` or `parent`)	Heap pressure sustained above 85 percent; heavy aggregations fail uniformly	`GET /_nodes/stats/breaker` for `tripped` counters and `estimated_size_in_bytes`
Search thread pool saturation	Query load spikes; rejections inside shard failures	`GET /_cat/thread_pool/search?v&h=node_name,active,queue,rejected`
Disk flood-stage watermark / read-only block	Disk above 95 percent; indexing also failing	`GET /_cat/allocation?v` and index settings for `index.blocks.read_only_allow_delete`
High-cardinality aggregation exceeding memory limits	Deep terms aggregations fail across all shards; `request` breaker may trip	Slow log and `GET /_nodes/stats/breaker`

Quick checks

# Check cluster health and unassigned shards
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,unassigned_shards,number_of_nodes'

# Check search thread pool saturation
curl -s 'http://localhost:9200/_cat/thread_pool/search?v&h=node_name,active,queue,rejected'

# Check circuit breaker trips and estimated sizes
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers.parent,nodes.*.breakers.request'

# Check disk watermark proximity and blocks
curl -s 'http://localhost:9200/_cat/allocation?v&h=node_name,disk.percent,disk.used,disk.total'

# Check JVM heap pressure on data nodes
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,heap.percent,gc.old.time,gc.old.count'

# Check for active read-only blocks on indices
curl -s 'http://localhost:9200/_all/_settings?filter_path=*.settings.index.blocks.read_only_allow_delete'

How to diagnose it

Read the shard-level failure reasons. Each failed shard includes a reason object with type and reason fields. If every shard reports the same parsing or scripting error, the problem is the query.
Check cluster health. Run GET /_cluster/health. If the status is red, unassigned primaries are the cause. Use GET /_cluster/allocation/explain to find the allocation block. See Elasticsearch cluster health red: unassigned primaries and how to recover and Elasticsearch unassigned shards: reading allocation explain and fixing each reason.
Check for circuit breaker trips. Run GET /_nodes/stats/breaker. If parent.tripped or request.tripped are increasing, the query is consuming too much heap. Look for heavy aggregations, high-cardinality terms, or loading fielddata on text fields. See Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix and Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes.
Check search thread pool saturation. Run GET /_cat/thread_pool/search. If rejected is increasing, the search queue is full. This surfaces as es_rejected_execution_exception inside shard failures. Reduce query concurrency or add data nodes.
Check disk watermark and index blocks. Run GET /_cat/allocation. If any node is above 95 percent disk usage, Elasticsearch sets index.blocks.read_only_allow_delete. Free disk space and clear the block. See Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks.
Check the slow log for expensive queries. If shard failures correlate with a specific query pattern, the slow log reveals the aggregation or script that breached limits. Cancel the task if it is still running via POST /_tasks/{task_id}/_cancel.
Check for node departures mid-query. If shard failures reference a node that left the cluster, correlate with GET /_cat/nodes and GC logs. Long GC pauses above 10 seconds cause fault detection timeouts and node removal. See Elasticsearch long GC pauses: old-generation stop-the-world and node drops.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Cluster health status	Unassigned primaries make data unavailable for queries	Red sustained for more than 2 minutes
Search thread pool rejections	Direct cause of shard-level query rejections under load	Sustained rate above 0 per minute for more than 5 minutes
Circuit breaker `parent` or `request` trips	Memory limits prevent query execution or aggregation completion	Any delta greater than 0 per interval
JVM heap used percent	Heap pressure triggers breakers and long GC pauses	Sustained above 85 percent
Disk usage vs watermarks	Flood stage blocks writes and can cause query failures	Above 95 percent or `read_only_allow_delete` block present
Unassigned shard count	Missing primaries guarantee `all shards failed` on affected indices	Any unassigned primary
Search latency (query phase)	Slow shards drag down the scatter-gather response	Sustained above 5 times baseline

Fixes

Unassigned primaries. Use GET /_cluster/allocation/explain to find the specific blocker. If shards are stuck in ALLOCATION_FAILED after max retries, run POST /_cluster/reroute?retry_failed=true. If disk watermarks are the cause, free space or add nodes.

Query or mapping errors. Fix the query client-side. Use .keyword sub-fields for aggregations and sorting instead of analyzed text fields. Verify that field names in the query match the current mapping. If a bad query is still running, identify it via GET /_tasks?detailed=true&actions=*search* and cancel it.

Circuit breaker trips. Reduce aggregation cardinality by lowering the size parameter or adding pre-filters. If fielddata is the culprit, reindex with a keyword multi-field. Do not raise breaker limits to mask the problem; this risks OOM. See Elasticsearch heap pressure death spiral: GC, node removal, and the cascade.

Thread pool saturation. Add data nodes or reduce concurrent search load. Do not increase the search queue size blindly; larger queues increase memory pressure and delay rejection without fixing throughput.

Disk watermark / read-only blocks. Delete old indices or reduce replica count to free space. After freeing disk, remove the block with PUT /_all/_settings {"index.blocks.read_only_allow_delete": null}. In 7.x and 8.x, the block is automatically removed when disk drops below the flood-stage watermark, but only if space was actually freed.

Prevention

Monitor leading indicators, not just cluster health. Green status does not mean queries will succeed. Track JVM heap floor, search thread pool queue depth, and disk growth rate.
Enforce query review. Catch expensive aggregations and text-field sorts in development.
Cap shard count. Too many shards amplify the blast radius of any query error and increase heap pressure from segment metadata. Use ILM to roll over and delete on schedule.
Set slow log thresholds. Configure index.search.slowlog.threshold.query.warn to catch pathological queries before they trigger breakers.
Maintain disk headroom. Keep data nodes below 70 percent disk usage. Merges temporarily require extra space, and flood-stage blocks are a hard stop.

How Netdata helps

Correlate JVM heap, old GC pauses, and circuit breaker trips to catch heap pressure before it causes shard failures.
Alert on search thread pool queue depth and rejections before queries fail.
Track disk usage against watermark thresholds to warn before flood-stage blocks.
Monitor unassigned shards and cluster health transitions alongside node departures and GC activity.
Surface search latency spikes against query rates to distinguish capacity exhaustion from a single bad query.

The Netdata solution

Elasticsearch monitoring with Netdata

Netdata monitors Elasticsearch with per-second metrics and ML anomaly detection. Correlate JVM heap pressure, shard counts, disk watermarks, mapping growth, and merge activity with cluster and node health in one view.

See Elasticsearch monitoring → Start monitoring free

Elasticsearch all shards failed: diagnosing search_phase_execution_exception

Elasticsearch all shards failed: diagnosing search_phase_execution_exception

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Prevention

How Netdata helps

Related guides

Elasticsearch monitoring with Netdata