Elasticsearch too many shards per node: overallocation and the heap tax
Searches slow from milliseconds to seconds. Nodes drop out after long GC pauses, triggering shard relocations that overload the remaining nodes. Cluster health stays green while _cat/nodes shows climbing heap and segments.memory growing steadily. The cause is usually shard overallocation: thousands of open shards with individual nodes carrying more than their heap can sustain.
Each shard is a self-contained Lucene index. Elasticsearch keeps per-segment metadata in heap memory, and that overhead is charged per field, not per segment. A node with 1,000 shards may be spending multiple gigabytes of heap on metadata alone, leaving less room for caches, aggregations, and in-flight requests.
Small shards are worse than large ones: they accumulate more segments per gigabyte of data, and because merges are less efficient on small indices, the segment count and per-field overhead remain high.
The default cluster.max_shards_per_node is 1,000 for normal data nodes. Hitting it blocks new index creation with an HTTP 400 error. However, the practical comfort threshold is much lower. Per-node counts above 800 are critical; below 200 is comfortable. Between 500 and 800, cluster state management slows, master nodes burn CPU on allocation decisions, and heap pressure becomes visible in old GC metrics.
flowchart TD
A[Too many shards per node] -- segment metadata --> B[Heap pressure rises]
B -- stop-the-world GC --> C[Long old-gen pauses]
C -- misses checks --> D[Node removed by master]
D -- triggers --> E[Shard reallocation]
E -- adds overhead --> F[Remaining nodes overloaded]
F -- feeds back --> ACommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Time-series indices with short retention and no ILM cleanup | Shard count grows daily; old indices remain open | _cat/indices?v&s=index:desc and ILM policy status |
| Over-sharded indices (many primaries for small data volumes) | Many tiny shards with low document count | _cat/indices?v&h=index,pri,rep,docs.count,store.size |
| High replica count on many small indices | Replica shards double the count without proportional value | _cat/shards?v&h=index,shard,prirep,state |
| Age-based ILM rollover without size checks | Low-document indices that still consume a full shard quota | _cat/indices?v&h=index,pri,rep,docs.count&s=docs.count:asc |
| Missing ILM delete or shrink phases | Indices accumulate in hot/warm forever | GET /*/_ilm/explain?only_errors=true&only_managed=true |
Quick checks
Run these safe, read-only commands to assess the scope of overallocation.
# Count shards per node (includes both primary and replica)
curl -s 'http://localhost:9200/_cat/shards?h=node,shard,prirep,index' | awk '{print $1}' | sort | uniq -c | sort -rn | head
# Check segment memory and total segments per node
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.count,segments.memory,heap.percent'
# Verify the cluster-wide shard limit
curl -s 'http://localhost:9200/_cluster/settings?include_defaults=true&filter_path=*.cluster.max_shards_per_node'
# List indices by primary shard count and store size
curl -s 'http://localhost:9200/_cat/indices?v&h=index,pri,rep,docs.count,store.size&s=pri:desc' | head -20
# Find low-document indices that may be empty or near-empty
curl -s 'http://localhost:9200/_cat/indices?v&h=index,pri,rep,docs.count,store.size&s=docs.count:asc' | head -20
# Identify the largest heap consumers on each node
curl -s 'http://localhost:9200/_nodes/stats/indices/segments,fielddata,query_cache,request_cache,completion'
# Check if unassigned shards are blocked by the limit
curl -s 'http://localhost:9200/_cluster/allocation/explain?pretty'
How to diagnose it
Establish the per-node shard count. Use
_cat/shardsto count how many shards each node is actually hosting. Include both primary and replica shards. If any non-frozen data node is above 500, you are in the danger zone.Correlate shard count with heap usage. Pull
_cat/nodesforsegments.memoryandheap.percent. Ifsegments.memoryis growing linearly with shard count and heap is sustained above 75%, the shards are the likely heap consumer.Identify the indices driving the count. Use
_cat/indicessorted by primary shard count or by ascending document count. Look for indices with many primary shards but tiny store sizes (over-sharded), or large numbers of old time-series indices that should have been deleted.Check for low-document indices. Indices created by age-based rollover with little or no data still count toward the shard limit. Use
_cat/indicessorted bydocs.countto find them.Check ILM status. If ILM manages your time-series indices, run
_ilm/explainto see whether indices are stuck before the delete or shrink phase. A stuck ILM policy is a common hidden driver of shard accumulation.Confirm the cluster state is not the primary heap consumer. Large cluster states from mapping explosions can also spike heap. Use
_cluster/statsto check mapping field counts. If field count is under control and shards are high, overallocation is the primary cause.Check for allocation blocks. If the cluster has already hit
max_shards_per_node, new indices will be rejected. Use_cluster/allocation/explainon an unassigned shard to confirm whether the limit is the blocker.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Shards per node | Direct measure of overallocation burden | >500 sustained; approaching 800 critical |
segments.memory per node | Heap consumed by segment metadata | Growing linearly with shard count; >10% of heap |
| JVM heap used percent | Overall memory pressure on the node | Sustained >75%; old GC frequency increasing |
| Old GC collection time | Stop-the-world pauses that trigger node removal | Individual pauses >10s; frequency increasing |
| Cluster state field count | Rules out mapping explosion as the heap driver | Field count growing without bound |
| Unassigned shard count | Indicates the hard limit has been reached | New indices rejected; unassigned primaries |
Fixes
Close or delete obsolete indices
Deleting an index is the fastest way to reclaim shard quota and heap.
- Destructive: Verify the index name and ensure it is not needed for compliance or recovery before deleting.
- Check document count first:
_cat/indices?v&h=index,pri,rep,docs.count,store.size - Delete:
DELETE /<index> - For indices you must keep but do not search frequently:
POST /<index>/_close. Closed indices do not count towardmax_shards_per_node.
Reduce replica count on low-value indices
If you have many small indices with replicas you do not need for redundancy, lowering index.number_of_replicas removes one set of replica shards per index.
- Tradeoff: This reduces redundancy. Do not do this on critical data without accepting the risk.
- Example:
PUT /<index>/_settings {"index.number_of_replicas": 0}
Shrink existing indices
For indices that are no longer written to, use the Shrink API to reduce the primary shard count. This creates a new index with fewer shards and copies the data.
- Shrink requires the source index to be read-only and all primary shards on one node.
- The new index must have a factor of the original primary shard count (for example, 6 shards can shrink to 3, 2, or 1).
Reindex into fewer, larger shards
For indices that are still active or cannot be shrunk, reindexing into a new index with a lower number_of_shards is the cleanest fix.
- Shards are immutable after creation. You cannot resize a shard in place.
- Create the target index with the correct shard count.
- Use the Reindex API with
"source": {"index": "<old>"}and"dest": {"index": "<new>"}. - After reindexing, update aliases or write targets, then delete the old index.
Forcemerge read-only indices
For time-series indices that are no longer written, forcemerge to one segment reduces segment metadata heap and improves search speed.
- Warning: Only forcemerge indices that are fully written. Running forcemerge on a live-writing index consumes significant heap and I/O.
- Command:
POST /<index>/_forcemerge?max_num_segments=1
Tune ILM to prevent recurrence
- Use rollover based on
max_sizeormax_docsinstead ofmax_agealone to avoid creating low-value indices. - Ensure the delete phase is configured and functioning.
- Add a shrink action in the warm phase to consolidate shards on older indices.
Prevention
- Target <200 shards per data node. If you are routinely above 500, you need fewer indices, fewer primary shards per index, or more data nodes.
- Size primary shards to match data volume. Avoid creating five primary shards for an index that will never exceed a few gigabytes. One or two shards is often enough for small indices.
- Use ILM rollover with size or doc limits. Age-based rollover alone creates wasteful indices during low-traffic periods. Size-based rollover ensures each shard reaches a reasonable threshold before rolling over.
- Audit shard count before adding nodes. Adding nodes spreads shards but does not reduce per-shard overhead. If your indices are over-sharded, fix the index template first.
- Monitor
segments.memoryand shard count as first-class metrics. They are leading indicators; heap percentage is a lagging one.
How Netdata helps
- Per-node shard count: Chart shards per data node to spot imbalance before heap pressure appears.
- Segment memory correlation: Plot
segments.memoryagainst JVM heap to confirm heap growth is driven by segment metadata rather than caches or aggregations. - Old GC pause alerting: Alert on old GC duration spikes, which are often the first visible symptom of per-shard heap overhead crossing into the danger zone.
- Index growth tracking: Track index count growth against ILM execution to catch policies that are not cleaning up old shards.
Related guides
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch heap pressure death spiral: GC, node removal, and the cascade
- Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor
- Elasticsearch monitoring checklist: the signals every production cluster needs
- Elasticsearch monitoring maturity model: from survival to expert
- Elasticsearch long GC pauses: old-generation stop-the-world and node drops
- Elasticsearch node OOM-killed: heap ceiling, page cache, and container limits
- Elasticsearch unassigned shards: reading allocation explain and fixing each reason
- How Elasticsearch actually works in production: a mental model for operators







