Elasticsearch coordinating node overload: aggregation merge, heap spikes, and 429s
HTTP 429 or 503 responses appear on search requests while data nodes look healthy. Heap spikes on one node while others stay flat. The slow log shows heavy aggregation queries. That node is the coordinator, and it is running out of heap during the reduce phase.
Every node can act as a coordinating node. For each search, the coordinator broadcasts the query to relevant shards, collects partial results, and merges them. Aggregations compute locally per shard and reduce in memory on the coordinator. High-cardinality terms aggregations, deep pagination, or large fetch sizes force the coordinator to hold massive intermediate structures in heap. If the estimate exceeds the circuit breaker limit, the request is rejected. If the breaker is too slow, the node may suffer long GC pauses or disconnect from the cluster.
The key symptom is asymmetry: the coordinator shows high heap, breaker trips, and search thread pool queues, while data nodes handling the same query show normal heap and CPU.
What this means
Elasticsearch search follows a scatter-gather model. The coordinating node sends the query phase to every target shard. Each shard executes locally, builds partial aggregation states, and returns document IDs with sort values. The coordinator reduces these partial results into a single response. During the reduce phase, aggregation buckets from every shard are materialized simultaneously in heap. The slowest shard determines overall query latency, and the coordinator must hold all intermediate state until the last shard responds.
High-cardinality terms aggregations are the most common trigger. A terms aggregation on a field with millions of unique values forces each shard to return a large bucket list. Nested sub-aggregations multiply memory cost. Deep pagination and large fetch sizes add document sources on top of aggregation structures. If the estimated memory exceeds the request circuit breaker limit (default 60% of JVM heap), Elasticsearch rejects the query with HTTP 429. If the parent circuit breaker trips (default 95% of heap with real memory tracking), the node is protecting itself from OOM. Repeated trips on the coordinator while data nodes remain calm point directly to reduce-phase pressure.
flowchart LR
Client -->|1. Search| Coord[Coordinating Node]
Coord -->|2. Query phase| S1[Shard 1]
Coord -->|2. Query phase| S2[Shard 2]
Coord -->|2. Query phase| SN[Shard N]
S1 -->|3. Partial results| Coord
S2 -->|3. Partial results| Coord
SN -->|3. Partial results| Coord
Coord -->|4. Response| ClientCommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| High-cardinality terms aggregation with many shards | request or parent breaker trips on the coordinator; slow log shows terms aggs on fields with millions of unique values; data nodes show normal heap | Slow log for aggregation type and target field |
Deep pagination with large size or from | Heap spikes correlate with large size parameters; fetch latency is elevated while query latency is normal | Query parameters for from and size |
| Large fetch sizes combined with aggregations | Compound memory pressure from holding both aggregation buckets and full document sources at the coordinator | Request payload for size and aggregation depth |
| Too many shards per query | Coordinator merges results from hundreds of shards; fixed overhead per shard dominates | Shard count for target indices via _cat/shards |
| Permissive breaker limits masking pressure | Breaker trips are rare but the coordinator experiences long GC pauses or OOMs | Breaker estimated_size_in_bytes vs limit_size_in_bytes |
Quick checks
# Compare heap asymmetry across nodes to spot the coordinator under pressure
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,node.role,heap.percent,cpu,load_1m'
# Inspect circuit breaker trips and current estimated sizes
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers'
# Check search thread pool queues and rejections on the coordinator
curl -s 'http://localhost:9200/_cat/thread_pool/search?v&h=node_name,name,active,queue,rejected'
# List active search tasks to identify heavy queries
curl -s 'http://localhost:9200/_tasks?detailed=true&actions=*search*'
# Verify JVM heap and GC behavior per node
curl -s 'http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem,nodes.*.jvm.gc'
# Count shards for the index being queried to assess fan-out
curl -s 'http://localhost:9200/_cat/shards/<index>?v&h=index,shard,prirep,state'
# Sample hot threads to see what is consuming CPU on the coordinator
curl -s 'http://localhost:9200/_nodes/hot_threads?threads=10'
How to diagnose it
- Confirm asymmetry. Run
_cat/nodesand sort byheap.percent. One node significantly higher than others while serving the same traffic is likely the coordinator absorbing merge overhead. - Identify the breaker. Run
_nodes/stats/breakerand comparetrippedcounters. Look forrequestorparentbreakers incrementing on the coordinator. Compareestimated_size_in_bytestolimit_size_in_bytes. - Correlate with queries. Run
_tasks?detailed=true&actions=*search*to see active searches. Look for largesize, deepfrom, or heavy aggregations. Cross-reference with the slow log. - Check shard fan-out. Use
_cat/shardsto see how many shards the query hits. Many small indices or excessive shards multiply merge overhead. - Verify GC behavior. Use
_nodes/stats/jvmto check for frequent old-generation GC or long pauses on the coordinator while data nodes are stable. - Check search queues. Use
_cat/thread_pool/searchto confirm queue depth grows on the coordinator before rejections begin.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
heap.percent on coordinating nodes | Heap spikes are the primary symptom of merge pressure | Sustained >75% while data nodes are <50% |
request breaker tripped | Indicates single-request aggregation structures are too large | Any delta > 0 during normal traffic |
parent breaker tripped | Last line of defense; node is near OOM | Any delta > 0 on coordinator |
| Old GC collection time | Long old GC pauses cause node removal | Pauses >10s correlated with heap spikes |
thread_pool.search.queue | Precursor to rejection and latency | Sustained queue growth on coordinator |
thread_pool.search.rejected | Direct measure of failed searches | Nonzero rate sustained >5 minutes |
| Query latency vs. end-to-end latency | Distinguishes shard slowness from coordinator merge cost | Shard query latency normal but total latency high |
| Shard count per query | Merge overhead scales with shard count | Queries hitting >100 shards regularly |
Fixes
Reduce aggregation cardinality
Rewrite queries to avoid high-cardinality terms aggregations where possible. Pre-filter with a selective query clause, or use sampler and diversified_sampler aggregations to reduce the bucket count returned to the coordinator.
Limit fetch size and pagination depth
Lower the size parameter. Avoid deep from + size pagination for large result sets; use search_after instead.
Reduce shards per query
Target fewer indices or use the shrink API to reduce shard count for read-only indices. Avoid broad wildcard patterns that fan out to hundreds of shards.
Add dedicated coordinating nodes
Deploy nodes with no data, master, or ingest roles to handle client requests. This isolates merge heap pressure from data nodes. Size heap generously, staying under the compressed OOPs threshold (usually 30 GB or less).
Cancel abusive queries
Find long-running searches via _tasks and cancel them. This terminates the query for the client but protects the cluster:
# Cancel a specific search task (disruptive: terminates the query)
curl -X POST 'http://localhost:9200/_tasks/<task_id>/_cancel'
Block reallocation during recovery
If a coordinating node drops out due to GC and triggers a reallocation storm, pause allocation until the root query is fixed. This prevents shard recovery:
# Temporarily disable shard reallocation (disruptive: prevents recovery)
curl -X PUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{"persistent":{"cluster.routing.allocation.enable":"none"}}'
Re-enable after fixing the query.
Prevention
- Size coordinating capacity independently. Deploy dedicated coordinating-only nodes for heavy analytics, and monitor their heap, thread pools, and circuit breakers separately from data nodes.
- Set slow log thresholds. Configure
index.search.slowlog.threshold.query.warnto catch expensive aggregations before they trip breakers. - Limit shard fan-out. Design index patterns so routine queries hit a manageable number of shards. Use time-based aliases or data streams that exclude cold indices from active searches.
- Monitor the heap floor. Track the post-old-GC heap minimum on coordinating nodes. A rising floor predicts merge pressure before breakers trip.
- Stage-test aggregation queries. Run high-cardinality aggregations and deep pagination against representative data and shard counts before releasing to production.
How Netdata helps
- Real-time JVM heap per node, surfacing coordinator asymmetry without manual
_cat/nodescross-referencing. searchthread pool queue depth and rejections per node, highlighting the bottleneck before clients see HTTP 429.- Circuit breaker
trippedcounters and estimated vs. limit sizes from_nodes/stats/breaker. - Old-generation GC pause duration alongside search latency to show when merge pressure causes stop-the-world pauses.
- Alerts on sustained heap usage above baseline per node, distinguishing the overloaded coordinator from healthy data nodes.
Related guides
- Elasticsearch all shards failed: diagnosing search_phase_execution_exception
- Elasticsearch authentication failures: audit logs, brute force, and credential drift
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster_block_exception: blocked by, the read-only blocks explained
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch cluster state too large: field count, index count, and per-node heap
- Elasticsearch disk full: emergency recovery and freeing space safely
- Elasticsearch disk watermark cascade: from low watermark to cluster-wide read-only
- Elasticsearch document indexing failures: index_failed, bulk item errors, and version conflicts
- Elasticsearch EsRejectedExecutionException: write thread pool rejections and HTTP 429
- Elasticsearch exposed without authentication: open clusters and snapshot exfiltration







