Elasticsearch this action would add too many shards: max_shards_per_node limit
Creating an index or rolling over a data stream returns HTTP 400 validation_exception: “this action would add [N] shards, but this cluster currently has [X]/[Y] maximum normal shards open”. The cluster has hit cluster.max_shards_per_node, which defaults to 1000 open shards per non-frozen data node. Raising the limit via _cluster/settings unblocks writes but postpones the outage. The durable fix is consolidation.
What this means
Every shard is a Lucene index. Each consumes file descriptors, heap for segment metadata, and cluster state entries that the master publishes to every node on every change. cluster.max_shards_per_node guards against over-sharding, where excessive shard counts slow cluster state updates and pressure master and data node heap. When the limit is reached, the allocator refuses new shards. Existing indices remain searchable, but index creation, rollovers, and some reallocations are blocked.
flowchart TD
A[Index creation fails with max_shards_per_node] --> B[Check active shards per data node]
B --> C{Nearing 1000 per node?}
C -->|Yes| D[Identify largest index consumers via _cat/indices]
C -->|No| E[Check _cluster/allocation/explain for other blocks]
D --> F{Are indices old or empty?}
F -->|Yes| G[Delete or close abandoned indices]
F -->|No| H{Can they be made read-only?}
H -->|Yes| I[Shrink to fewer primary shards]
H -->|No| J[Reindex into fewer shards or reduce replicas]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Time-series indices accumulating without deletion | Shard count grows linearly; old daily or weekly indices remain open | GET /_cat/indices?v&h=index,pri,rep,store.size&s=index:desc for old, small indices |
| Index templates defaulting to too many primary shards | Every new index creates multiple primaries regardless of data volume | Check the active template for the index pattern and its default number_of_shards |
| Excess replica counts for the current node count | Replicas multiply total shards without adding usable redundancy on small clusters | GET /_cluster/health?filter_path=active_shards,active_primary_shards |
| Abandoned empty or tiny indices | Many indices with near-zero documents still consuming shard slots | GET /_cat/indices?v&h=index,docs.count,store.size&s=store.size:desc |
Quick checks
Run these in sequence to triage:
# Cluster health and total shard counts
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,number_of_nodes,number_of_data_nodes,active_shards,active_primary_shards,unassigned_shards'
# Shards and their states
curl -s 'http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state'
# Largest index consumers by age and size
curl -s 'http://localhost:9200/_cat/indices?v&h=index,pri,rep,docs.count,store.size,pri.store.size&s=index:desc' | head -20
# Per-node shard allocation and disk usage
curl -s 'http://localhost:9200/_cat/allocation?v'
# Segment memory per node
curl -s 'http://localhost:9200/_nodes/stats/segments?filter_path=nodes.*.name,nodes.*.segments.memory_in_bytes'
# Master task backlog
curl -s 'http://localhost:9200/_cluster/pending_tasks?pretty'
# Estimate cluster state complexity
curl -s 'http://localhost:9200/_cluster/stats?filter_path=indices.mappings.total_field_count'
How to diagnose it
- Confirm the breach. Run
_cluster/healthand divideactive_shardsbynumber_of_data_nodes. If the average is near 1000, the limit is the binding constraint. Use_cat/allocationto check for skew; one node may be at the limit while others are not. - Find the fastest wins. Use
_cat/indicessorted by age or size. Look for time-series prefixes with many small indices. Indices older than your retention requirement that are still open are immediate deletion candidates. - Check for stuck ILM policies. If you use ILM, an index that should have been deleted or shrunk may be stalled. Run
GET /<index>/_ilm/explainon the oldest managed indices. Look for anERRORstep or a stuckshrinkaction. Resolve the blocker, then callPOST /<index>/_ilm/retry. - Validate index template defaults. An outdated template may set a high
number_of_shardsfor every new data stream. Review the template matching the failing index pattern and ensure the primary count aligns with actual data volume. UseGET /_index_template/<name>for composable templates orGET /_template/<name>for legacy templates. - Assess heap and cluster state impact. Check
_nodes/stats/segmentsfor memory per node. If segment memory rises with shard count, over-sharding is already pressuring heap. Check_cluster/statsfor total field count; if it is also elevated, the cluster state is bloated. - Check allocation explain for blocked shards. If some shards are unassigned, run
_cluster/allocation/explainto confirm the limit is the specific reason or whether a disk watermark is compounding the problem.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Active shards per data node | Directly approaches the hard limit | Sustained >800 per node against a 1000 default |
| Cluster state field count | Many shards usually mean many indices and mappings, which bloat state | indices.mappings.total_field_count growing without bound |
| Segment memory per node | Each open shard carries segment metadata overhead in heap | segments.memory_in_bytes rising in lockstep with shard count |
| Pending cluster tasks | A large cluster state slows the master’s ability to publish changes | >20 tasks or any task older than 30 seconds |
| JVM heap used percent | Shard metadata accumulates in heap; pressure leads to GC death spiral | Sustained >75% with a rising post-GC floor |
Fixes
Delete or close abandoned indices
The fastest recovery is removing data you no longer need. Identify old, empty, or superseded indices via _cat/indices and delete them. Deletion frees shard slots, disk, and cluster state immediately. Closing an index removes its active shards from the cluster, though the metadata remains in state. Prefer deletion for true orphans.
WARNING: DELETE /<index> is destructive and cannot be undone without a snapshot. Verify the index name and retention policy before executing.
Reduce replica counts
If indices carry more replicas than needed for the current node count, lower number_of_replicas. This halves or thirds the shard count for those indices. The tradeoff is reduced redundancy and potentially slower reads. Do not reduce replicas on critical indices during an active node outage.
curl -X PUT 'http://localhost:9200/<index>/_settings' -H 'Content-Type: application/json' -d '{
"index": { "number_of_replicas": 1 }
}'
Shrink read-only indices
For indices that are no longer written, use the Shrink API to reduce primary shard count. You must first set index.blocks.write=true and ensure all shards relocate to a single node. The target shard count must be a factor of the original; 12 primaries can shrink to 6, 4, 3, 2, or 1. The tradeoff is temporary disk space for the new index and a brief maintenance window. After shrinking, delete the source index to reclaim slots.
Reindex active indices into fewer shards
For indices still receiving writes, create a new index with fewer primary shards and use the Reindex API to copy data. Set refresh_interval=-1 and number_of_replicas=0 on the destination during the copy to reduce overhead, then restore them after cutover. Switch aliases or data streams to the new index once caught up. The tradeoff is duplicated disk usage and additional I/O. Run this during low-traffic hours.
Fix ILM retention
If ILM is supposed to manage index lifecycle but has stalled, the root cause is often a missing rollover alias, insufficient disk space for a shrink step, or a policy conflict. Resolve the specific error, then call POST /<index>/_ilm/retry on stuck indices. Long-term, ensure your ILM delete phase aligns with actual retention needs.
Temporarily raise the limit (emergency only)
You can raise cluster.max_shards_per_node dynamically to unblock writes while you consolidate. Treat this as a circuit breaker, not a fix. The cluster will continue to degrade from metadata overhead, and you will hit the new ceiling again. Raise it only to buy minutes, not days.
WARNING: This masks the root cause. Use only to prevent a complete write outage while you delete or shrink indices.
curl -X PUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{
"persistent": { "cluster.max_shards_per_node": 2000 }
}'
Prevention
Monitor shards per node as a first-class capacity metric alongside disk and heap. Review index templates quarterly to ensure default primary shard counts match expected data volume. Consolidate time-series data into fewer, larger indices instead of many small ones. Let ILM delete or shrink indices on schedule, and alert when ILM transitions fail. On clusters with dedicated master nodes, watch pending tasks and cluster state size; they are early indicators that shard accumulation is becoming a coordination problem.
How Netdata helps
- Correlate the shard limit breach with cluster health state, JVM heap, and segment memory. Netdata collects these out of the box.
- Per-node charts for segment memory and heap percent show which nodes are paying metadata overhead before the allocator blocks.
- Alerts on pending task backlog and thread pool rejections fire while the cluster is still functional, giving you runway to delete or shrink indices instead of reacting to HTTP 400 errors.
- Per-node disk and shard counts expose uneven distribution that concentrates shards and accelerates the limit hit.
Related guides
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch heap pressure death spiral: GC, node removal, and the cascade
- Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor
- Elasticsearch monitoring checklist: the signals every production cluster needs
- Elasticsearch monitoring maturity model: from survival to expert
- Elasticsearch long GC pauses: old-generation stop-the-world and node drops
- Elasticsearch node OOM-killed: heap ceiling, page cache, and container limits
- Elasticsearch unassigned shards: reading allocation explain and fixing each reason
- How Elasticsearch actually works in production: a mental model for operators







