Elasticsearch all shards failed: diagnosing search_phase_execution_exception
You run a search and Elasticsearch returns search_phase_execution_exception with reason all shards failed. Every shard copy involved in the query returned a failure to the coordinating node. The outer error is a container; the actual root cause lives in the per-shard failure reasons inside the response body. Do not assume the cluster is down. This error fires on clusters with green health and stable nodes when a query is malformed, a mapping is incompatible, or a resource limit is breached uniformly across every target shard.
The read path uses a two-phase scatter-gather. The coordinating node broadcasts the query to one copy of each relevant shard. Each shard executes the query locally and returns document IDs and sort values. When every shard copy fails, the coordinating node has no partial result to merge and throws the composite exception. Determine whether the failure is data unavailability, a query error, or resource exhaustion.
What this means
- Cluster health green does not prevent this error. Green only means all primaries and replicas are assigned. It says nothing about query correctness or node resource headroom. A malformed query against a healthy cluster produces
all shards failed. - The error is a composite signal. It can represent unassigned primaries (cluster red), a bad query that fails identically on every shard, a mapping mismatch, or a uniform resource limit breach such as circuit breakers, thread pool rejections, or disk flood-stage blocks.
- The per-shard reason is the diagnosis. If every shard reports the same
mapper_parsing_exceptionorillegal_argument_exception, the problem is the query. If the reason references an unassigned shard or a tripped circuit breaker, the problem is infrastructure.
flowchart TD
A[search_phase_execution_exception
all shards failed] --> B{Read per-shard
failure reasons}
B -->|mapper_parsing_exception
or query error| C[Fix query / mapping]
B -->|es_rejected_execution_exception| D[Check thread pool
saturation]
B -->|circuit_breaking_exception| E[Check heap and
circuit breakers]
B -->|unassigned or node_left| F[Check cluster health
and allocation explain]
B -->|read_only_allow_delete| G[Check disk watermark
and clear blocks]
C --> H[Cancel bad tasks
and reissue]
D --> I[Reduce concurrency
or scale out]
E --> J[Reduce aggregation
cardinality / add heap]
F --> K[Retry failed or
restore nodes]
G --> L[Free disk and
remove blocks]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Unassigned or initializing primary shards | Cluster health red; affected indices return complete query failure | GET /_cluster/health and GET /_cluster/allocation/explain |
| Query or mapping error hitting every shard identically | Cluster health green; error text references unknown fields, missing .keyword sub-field, or script errors | The per-shard reason block in the error response |
Circuit breaker tripped (request or parent) | Heap pressure sustained above 85 percent; heavy aggregations fail uniformly | GET /_nodes/stats/breaker for tripped counters and estimated_size_in_bytes |
| Search thread pool saturation | Query load spikes; rejections inside shard failures | GET /_cat/thread_pool/search?v&h=node_name,active,queue,rejected |
| Disk flood-stage watermark / read-only block | Disk above 95 percent; indexing also failing | GET /_cat/allocation?v and index settings for index.blocks.read_only_allow_delete |
| High-cardinality aggregation exceeding memory limits | Deep terms aggregations fail across all shards; request breaker may trip | Slow log and GET /_nodes/stats/breaker |
Quick checks
# Check cluster health and unassigned shards
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,unassigned_shards,number_of_nodes'
# Check search thread pool saturation
curl -s 'http://localhost:9200/_cat/thread_pool/search?v&h=node_name,active,queue,rejected'
# Check circuit breaker trips and estimated sizes
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers.parent,nodes.*.breakers.request'
# Check disk watermark proximity and blocks
curl -s 'http://localhost:9200/_cat/allocation?v&h=node_name,disk.percent,disk.used,disk.total'
# Check JVM heap pressure on data nodes
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,heap.percent,gc.old.time,gc.old.count'
# Check for active read-only blocks on indices
curl -s 'http://localhost:9200/_all/_settings?filter_path=*.settings.index.blocks.read_only_allow_delete'
How to diagnose it
- Read the shard-level failure reasons. Each failed shard includes a
reasonobject withtypeandreasonfields. If every shard reports the same parsing or scripting error, the problem is the query. - Check cluster health. Run
GET /_cluster/health. If the status is red, unassigned primaries are the cause. UseGET /_cluster/allocation/explainto find the allocation block. See Elasticsearch cluster health red: unassigned primaries and how to recover and Elasticsearch unassigned shards: reading allocation explain and fixing each reason. - Check for circuit breaker trips. Run
GET /_nodes/stats/breaker. Ifparent.trippedorrequest.trippedare increasing, the query is consuming too much heap. Look for heavy aggregations, high-cardinality terms, or loading fielddata on text fields. See Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix and Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes. - Check search thread pool saturation. Run
GET /_cat/thread_pool/search. Ifrejectedis increasing, the search queue is full. This surfaces ases_rejected_execution_exceptioninside shard failures. Reduce query concurrency or add data nodes. - Check disk watermark and index blocks. Run
GET /_cat/allocation. If any node is above 95 percent disk usage, Elasticsearch setsindex.blocks.read_only_allow_delete. Free disk space and clear the block. See Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks. - Check the slow log for expensive queries. If shard failures correlate with a specific query pattern, the slow log reveals the aggregation or script that breached limits. Cancel the task if it is still running via
POST /_tasks/{task_id}/_cancel. - Check for node departures mid-query. If shard failures reference a node that left the cluster, correlate with
GET /_cat/nodesand GC logs. Long GC pauses above 10 seconds cause fault detection timeouts and node removal. See Elasticsearch long GC pauses: old-generation stop-the-world and node drops.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Cluster health status | Unassigned primaries make data unavailable for queries | Red sustained for more than 2 minutes |
| Search thread pool rejections | Direct cause of shard-level query rejections under load | Sustained rate above 0 per minute for more than 5 minutes |
Circuit breaker parent or request trips | Memory limits prevent query execution or aggregation completion | Any delta greater than 0 per interval |
| JVM heap used percent | Heap pressure triggers breakers and long GC pauses | Sustained above 85 percent |
| Disk usage vs watermarks | Flood stage blocks writes and can cause query failures | Above 95 percent or read_only_allow_delete block present |
| Unassigned shard count | Missing primaries guarantee all shards failed on affected indices | Any unassigned primary |
| Search latency (query phase) | Slow shards drag down the scatter-gather response | Sustained above 5 times baseline |
Fixes
Unassigned primaries. Use GET /_cluster/allocation/explain to find the specific blocker. If shards are stuck in ALLOCATION_FAILED after max retries, run POST /_cluster/reroute?retry_failed=true. If disk watermarks are the cause, free space or add nodes.
Query or mapping errors. Fix the query client-side. Use .keyword sub-fields for aggregations and sorting instead of analyzed text fields. Verify that field names in the query match the current mapping. If a bad query is still running, identify it via GET /_tasks?detailed=true&actions=*search* and cancel it.
Circuit breaker trips. Reduce aggregation cardinality by lowering the size parameter or adding pre-filters. If fielddata is the culprit, reindex with a keyword multi-field. Do not raise breaker limits to mask the problem; this risks OOM. See Elasticsearch heap pressure death spiral: GC, node removal, and the cascade.
Thread pool saturation. Add data nodes or reduce concurrent search load. Do not increase the search queue size blindly; larger queues increase memory pressure and delay rejection without fixing throughput.
Disk watermark / read-only blocks. Delete old indices or reduce replica count to free space. After freeing disk, remove the block with PUT /_all/_settings {"index.blocks.read_only_allow_delete": null}. In 7.x and 8.x, the block is automatically removed when disk drops below the flood-stage watermark, but only if space was actually freed.
Prevention
- Monitor leading indicators, not just cluster health. Green status does not mean queries will succeed. Track JVM heap floor, search thread pool queue depth, and disk growth rate.
- Enforce query review. Catch expensive aggregations and text-field sorts in development.
- Cap shard count. Too many shards amplify the blast radius of any query error and increase heap pressure from segment metadata. Use ILM to roll over and delete on schedule.
- Set slow log thresholds. Configure
index.search.slowlog.threshold.query.warnto catch pathological queries before they trigger breakers. - Maintain disk headroom. Keep data nodes below 70 percent disk usage. Merges temporarily require extra space, and flood-stage blocks are a hard stop.
How Netdata helps
- Correlate JVM heap, old GC pauses, and circuit breaker trips to catch heap pressure before it causes shard failures.
- Alert on search thread pool queue depth and rejections before queries fail.
- Track disk usage against watermark thresholds to warn before flood-stage blocks.
- Monitor unassigned shards and cluster health transitions alongside node departures and GC activity.
- Surface search latency spikes against query rates to distinguish capacity exhaustion from a single bad query.
Related guides
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch heap pressure death spiral: GC, node removal, and the cascade
- Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor
- Elasticsearch monitoring checklist: the signals every production cluster needs
- Elasticsearch monitoring maturity model: from survival to expert
- Elasticsearch long GC pauses: old-generation stop-the-world and node drops
- Elasticsearch node OOM-killed: heap ceiling, page cache, and container limits
- Elasticsearch unassigned shards: reading allocation explain and fixing each reason
- How Elasticsearch actually works in production: a mental model for operators







