Elasticsearch cluster state too large: field count, index count, and per-node heap
Every node holds a copy of the cluster state in heap. When it grows large, the cost is paid everywhere: 200 MB of state consumes 200 MB on every node, and the elected master burns additional CPU and heap serializing and publishing updates. Symptoms show up indirectly: the master feels sluggish, pending tasks queue for minutes, heap pressure climbs on nodes that should be idle, and master elections stall indexing and shard allocation. The usual drivers are too many indices from per-minute or per-hour time-series patterns; a mapping explosion from uncontrolled dynamic fields; excessive aliases; or churn from frequent template and setting changes. Raw size measured via /_cluster/state is a rough proxy. The indicators that matter are field-count growth, cluster state version churn, pending-task age, and master node heap pressure.
What this means
The cluster state describes every index, shard, mapping, alias, ingest pipeline, and node. The elected master maintains it and publishes a copy to every node on every change. Because publication is synchronous, a slow follower delays the entire commit.
A large state hurts in two ways. First, every node holds the full serialized state in heap. A 200 MB state burns 200 MB on every node, including dedicated masters and idle data nodes. Second, the master must serialize the state and publish it on every update. As the state grows, serialization and network transfer time grow with it. The master falls behind, pending tasks queue, and cluster operations that should take milliseconds start taking seconds. If the backlog grows too large, nodes miss updates and force unnecessary master elections. The state does not need to reach a hard cap to cause damage; it only needs to be large enough that the master cannot keep up with the rate of change.
flowchart TD
A[Mapping explosion or excessive indices and aliases] --> B[Cluster state grows]
B --> C[Heap consumed on every node]
B --> D[Master serialization cost rises]
D --> E[Pending tasks queue]
E --> F[Publication slows or times out]
F --> G[Master instability or node eviction]
C --> GCommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Mapping explosion from dynamic fields | Heap rising on all nodes; put-mapping tasks dominating the pending queue; queries against affected indices slow or fail | GET /_cluster/stats?filter_path=indices.mappings for total field count, and per-index mappings for runaway objects |
| Excessive index count (per-minute or per-hour time-series) | Cluster state version increments rapidly; master CPU and heap elevated; index creation latency spikes | GET /_cluster/state?filter_path=version sampled over time, and index count via /_cat/indices |
| Alias and template bloat | State size grows even though document volume is stable; administrative operations slow without obvious mapping or indexing pressure | GET /_cat/aliases?v and review of registered templates via the cluster state or index template APIs |
| Frequent pipeline, script, or setting changes | Version churn with few new indices; master logs show constant cluster state updates; brief write stalls correlated with template updates | GET /_cluster/pending_tasks and review of recent cluster settings changes |
Quick checks
# Check cluster state version (sample twice, 60s apart, to compute churn)
curl -s 'http://localhost:9200/_cluster/state?filter_path=version'
# Check total field count as a proxy for mapping complexity
curl -s 'http://localhost:9200/_cluster/stats?filter_path=indices.mappings'
# Check pending tasks and their age
curl -s 'http://localhost:9200/_cat/pending_tasks?v'
# Check master node identity; flapping masters show instability
curl -s 'http://localhost:9200/_cat/master?v'
# Rough estimate of serialized state size (can be very large; do not poll)
curl -s 'http://localhost:9200/_cluster/state' | wc -c
# Check master-eligible node heap and GC behavior
curl -s 'http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem,nodes.*.jvm.gc'
How to diagnose it
- Correlate master node heap with cluster state changes. If master heap climbs while traffic is flat, the state itself is likely the consumer.
- Measure field count growth. Use
GET /_cluster/stats?filter_path=indices.mappingsto establish a baseline and re-check daily. Unbounded growth indicates a mapping explosion. - Measure version churn. Sample
/_cluster/state?filter_path=versionat 60-second intervals. Sustained increments above a few per second mean excessive metadata updates. - Inspect pending tasks.
GET /_cat/pending_tasksshould normally show tasks aging under one second.put-mappingorcreate-indextasks pending longer than 30 seconds confirm the master is falling behind. - Identify the index or template responsible. If field count is high, check indices with the most fields using
/_cat/indicesor the mappings API. If index count is high, look for time-based patterns creating indices too frequently. - Check for alias and template overhead. Count aliases with
/_cat/aliasesand audit templates using the cluster state or index template APIs. Even without documents, these contribute to state size.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Field count growth | Direct indicator of mapping explosion | Growing without bound, or indices approaching index.mapping.total_fields.limit (default 1000) |
| Cluster state version churn rate | Measures how fast the state is mutating | Sustained increments above a few per second |
| Pending task age | Master cannot keep up with state changes | Any URGENT task older than 30 seconds, or backlog above 20 tasks |
| Master node heap used percent | Master burns heap on serialization and publication | Sustained above 75 percent, or floor rising between collections |
| Post-GC heap floor (all nodes) | Cluster state is a long-lived object in old gen | Minimum heap after old GC trending upward over days |
Fixes
Cap and clean up mappings. Set index.mapping.total_fields.limit and index.mapping.depth.limit in index templates before creation. The defaults are 1000 and 20, respectively. If an index has already exceeded its limit, identify whether the fields are coming from unstructured JSON, nested objects, or multi-fields. Switch to explicit mappings and disable dynamic mapping where possible; test strict mappings on new indices first, because disabling dynamic mapping on existing indices can reject documents. For indices that have already exploded, deleting the index immediately frees heap on every node. If you must keep the data, reindex into a clean index with a strict mapping. Reindexing generates load and takes time, so schedule it during low traffic. Do not simply raise the limit without fixing the source; that only delays the failure and makes future cleanup harder.
Consolidate time-series indices. Replace per-minute or per-hour index creation with ILM rollover based on age, size, or document count. Deleting empty or unused indices removes their metadata from the state instantly. For old, read-only time-series indices, use the shrink API to reduce shard count, which also shrinks the routing table portion of the state. Shrink requires the source index to be read-only and can temporarily increase disk usage during the operation.
Audit aliases, templates, and pipelines. Use /_cat/aliases to count aliases. Review registered templates, ingest pipelines, and stored scripts. Every object in the state consumes heap on every node, even if it is never queried. Removing a single unused template frees a small amount of heap on every node immediately, and the savings compound.
Relieve master pressure. If the cluster is already unstable, pause unnecessary index creation and mapping updates. Ensure the cluster uses dedicated master-eligible nodes (three is standard) with heap sized for metadata scale, not just query traffic. If a master-eligible node is also serving data or ingest traffic, migrate it to a dedicated role. The master must serialize and publish state without competing for CPU and heap with indexing and search.
Prevention
- Use ILM rollover for all time-series data instead of time-based index naming.
- Enforce
index.mapping.total_fields.limitandindex.mapping.depth.limitat the template level before indices are created. - Run a regular audit of aliases, templates, ingest pipelines, and stored scripts. Remove anything not referenced by active workloads.
- Monitor field count and cluster state version churn as leading indicators. A rising trend gives days or weeks of runway before the master becomes unstable.
- Size master nodes for peak metadata scale. A cluster with thousands of indices or tens of thousands of fields needs masters with sufficient heap and CPU to serialize and publish state without backlog.
How Netdata helps
- Correlate JVM heap usage across node roles to spot masters disproportionately burdened by cluster state serialization.
- Track old GC pause duration and frequency on master-eligible nodes; rising pauses alongside flat query load point to metadata pressure.
- Monitor pending task counts to detect master backlog before it triggers elections.
- Alert on per-node heap floor trends. A rising post-GC minimum on data nodes with stable indexing often means the cluster state itself is growing.
- Cross-reference disk and indexing metrics with index creation rates to catch runaway time-series patterns early.
Related guides
- Elasticsearch all shards failed: diagnosing search_phase_execution_exception
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster_block_exception: blocked by, the read-only blocks explained
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch disk full: emergency recovery and freeing space safely
- Elasticsearch disk watermark cascade: from low watermark to cluster-wide read-only
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch FORBIDDEN/12/index read-only / allow delete (api) - flood stage recovery
- Elasticsearch heap pressure death spiral: GC, node removal, and the cascade
- Elasticsearch high disk watermark [90%] exceeded: shard relocation and the cascade
- Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor







