Elasticsearch search thread pool rejections: failed queries and small search queues
Your application logs show HTTP 429 responses from Elasticsearch. The error is es_rejected_execution_exception and the message points to the search thread pool. User-facing queries are failing, not just slowing down.
The search thread pool uses a fixed number of threads and a bounded queue. The default queue size is 1000. Because search is a scatter-gather operation across shards, a single expensive query can hold a thread for seconds while the coordinating node waits for every shard to respond. The queue drains slowly, and under burst or pathological load it fills fast. Once full, Elasticsearch rejects new search requests immediately.
Distinguish between expensive queries, coordinating-node overload, hot-spotting, and GC stalls, and fix each without masking the problem by raising queue limits.
What this means
The search thread pool executes the query phase on every targeted shard. The coordinating node fans out the request, waits for all shard responses, merges results, and then runs the fetch phase. Every shard query consumes one search thread on the data node. If threads are busy, new requests enter the bounded queue. When the queue reaches its limit, Elasticsearch rejects the request immediately with HTTP 429.
The rejected counter is cumulative per node. A sustained delta means the cluster cannot keep up with the search workload. Raising thread_pool.search.queue_size is operationally risky: it delays rejection, increases memory pressure, and does nothing to speed up the slow queries that are holding threads hostage.
flowchart TD
A[Expensive query or GC stall] --> B[Search threads blocked]
B --> C[Queue fills toward 1000]
C --> D[HTTP 429 rejected]
D --> E[Client retries amplify load]
E --> F[Queue drains slowly]
F --> G[Latency spikes before rejection]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Expensive queries (wildcards, deep aggregations, script scoring) | Slow log entries exceeding your SLA; specific tasks holding threads for seconds | GET /_tasks?detailed=true&actions=*search* and the slow log |
| Coordinating node overload (large result merges, high-cardinality aggs) | Circuit breaker trips on the coordinator; heap spikes there while data nodes look healthy | _nodes/stats/breaker and thread_pool.search on the coordinating node versus data nodes |
| JVM GC pauses / heap pressure | Rejections correlate with old GC time spikes; node may briefly drop from cluster | _nodes/stats/jvm old collection time and heap_used_percent |
| Hot-spotting / uneven shard distribution | One node shows high search queue and rejections while others are idle; asymmetric CPU | _cat/thread_pool and _cat/nodes for CPU and load imbalance |
| Excessive shards or segment count per query | High query_total relative to user queries; elevated latency across many indices | _cat/indices shard count and pri.segments.count |
Quick checks
# Check search thread pool rejections and queue depth per node
curl -s 'http://localhost:9200/_cat/thread_pool/search?v&h=node_name,name,active,queue,rejected&s=rejected:desc'
# Check search throughput and cumulative latency counters
curl -s 'http://localhost:9200/_nodes/stats/indices/search?filter_path=nodes.*.indices.search'
# List running search tasks with detailed descriptions
curl -s 'http://localhost:9200/_tasks?detailed=true&actions=*search*'
# Check old GC collection time and heap pressure
curl -s 'http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.gc,nodes.*.jvm.mem.heap_used_percent'
# Check circuit breaker estimated sizes and trip counts
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers'
# Check segment count per index and node-level segment memory
curl -s 'http://localhost:9200/_cat/indices?v&h=index,pri,rep,docs.count,pri.segments.count&s=pri.segments.count:desc' | head -20
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.count,segments.memory'
# Check CPU and heap imbalance across nodes
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,cpu,heap.percent,load_1m'
How to diagnose it
Confirm the pool and the nodes. Use
_cat/thread_poolto see which nodes have risingrejectedcounters and deepqueue. Rejections on a single node point to hot-spotting. Rejections across many nodes point to query pattern changes or cluster-wide resource pressure.Check the coordinating node first. In multi-node clusters, the coordinating node fans out the search and merges results. If its
thread_pool.search.queueis full or its circuit breakers (parent,request) are tripping, the bottleneck is upstream of the data nodes. Compare coordinator stats to data nodes.Identify expensive queries. Use
GET /_tasks?detailed=true&actions=*search*to see long-running tasks. Cross-reference with the slow log (index.search.slowlog.threshold.query.warn) to find queries holding threads for seconds. Look for leading wildcards, regex, deeply nested aggregations, or highsizeparameters.Correlate with GC and heap. Pull
_nodes/stats/jvm. If old GC collection time spikes coincide with rejection spikes, the pool is frozen by stop-the-world pauses. Check if the post-GC heap floor is rising. A rising floor means pressure is structural, not transient.Evaluate shard and segment fan-out. Check
_cat/indicesfor targeted indices. If a single query fans out to hundreds of shards, the coordinating node must wait for every shard. High segment counts per shard (greater than 100) also increase per-shard query time, holding threads longer.Check for cold OS page cache. After a restart or on memory-constrained nodes, segment files may not be in the OS page cache. This shows up as elevated fetch latency with low CPU. Check OS-level cached memory. This is not an ES fault but an environmental one.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
thread_pool.search.rejected (delta) | Direct count of failed user queries | Sustained greater than 0 per minute for more than 5 minutes |
thread_pool.search.queue | Precursor to rejection; queueing adds latency | Sustained greater than 100 with default max 1000 |
Search query latency (query_time_in_millis / query_total) | User-visible search slowness | Sustained greater than 5x baseline |
| Old GC collection time | Stop-the-world pauses freeze all search threads | Individual pauses greater than 5 seconds or increasing frequency |
Circuit breaker estimated_size_in_bytes vs limit_size_in_bytes | Memory pressure on coordinating node or large aggregations | Consistently greater than 70% of limit; any tripped delta |
| Segment count per shard | More segments slow queries and increase thread hold time | Greater than 100 segments per active shard |
| Per-node CPU / load | Hot-spotting or expensive query execution | Asymmetric load or sustained greater than 80% |
Fixes
Expensive queries
Do not just raise queue size. Find the query. Use /_tasks to identify long-running searches. Cancel abusive tasks with POST /_tasks/{task_id}/_cancel if they are non-essential. Fix patterns: replace leading wildcards with n-grams or match queries; aggregate on keyword sub-fields instead of analyzed text; reduce aggregation cardinality; avoid script-based sorting on large result sets. Enable the slow log at multiple thresholds to catch these before they saturate the pool.
Coordinating node overload
If high-cardinality aggregations or large fetch sizes spike heap on the coordinating node while data nodes look healthy, reduce the fan-out. Target fewer shards per query with tighter index patterns, or add dedicated coordinating nodes with sufficient heap. Reduce the fetch size parameter if clients are pulling large result sets. Shrink read-only indices to fewer shards, or reindex into fewer shards.
GC stalls and heap pressure
If rejections correlate with old GC spikes, the cluster is in or near a heap pressure death spiral. Identify heap consumers: check fielddata.memory_size for text-field aggregations, segments.memory for too many shards or segments, and cluster state size. Cancel heavy queries. Do not raise circuit breaker limits; that risks OOM. If the post-GC heap floor is rising, add nodes or reduce shard count. See Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor.
Hot-spotting
Uneven shard distribution causes one node to reject while others are idle. Use /_cat/allocation and /_cat/shards to identify concentrated indices. Temporarily reroute shards if needed, but prefer structural fixes: adjust shard count per index, use index routing, or add data nodes.
Segment and shard bloat
For read-only indices, run POST /<index>/_forcemerge?max_num_segments=1 during low-traffic windows. Warning: this blocks until completion and is I/O-intensive. Do not force-merge live indices receiving writes. For active indices, ensure refresh_interval is not too low. If indices have too many shards, use the shrink API (index must be read-only) or reindex into fewer shards.
Prevention
- Monitor search rejection deltas. Cumulative counters hide bursts; alerting on the rate of change catches sustained overload early.
- Alert on search queue depth. Queueing precedes rejection; a sustained queue greater than 100 signals that latency is turning into failures.
- Configure slow log thresholds. Catching queries exceeding your SLA before they saturate the pool gives you a fixable target instead of an outage.
- Force-merge read-only indices via ILM. High segment count increases per-shard query time and threads hold locks longer.
- Right-size shard counts per query. Avoid querying across hundreds of shards; use shrink or fewer indices to reduce coordinating-node merge work.
- Track the post-GC heap floor. A rising floor means long-lived objects are accumulating and old GC pauses are approaching.
How Netdata helps
- Netdata collects
_nodes/stats/thread_pooland surfaces per-nodesearch.rejectedandsearch.queuein real time, making it easy to spot which nodes are rejecting. - Correlate search rejection spikes with per-node JVM heap charts and old GC pause duration to immediately identify GC-induced stalls.
- Overlay search latency with OS-level disk I/O wait and page cache metrics to distinguish expensive queries from I/O-bound cold cache scenarios.
- Alert on sustained search queue depth and rejection rate deltas without manual polling of cumulative counters.
- Correlate with per-node CPU and segment memory to catch hot-spotting before rejections start.
Related guides
- Elasticsearch all shards failed: diagnosing search_phase_execution_exception
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster_block_exception: blocked by, the read-only blocks explained
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch disk full: emergency recovery and freeing space safely
- Elasticsearch disk watermark cascade: from low watermark to cluster-wide read-only
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch FORBIDDEN/12/index read-only / allow delete (api) — flood stage recovery
- Elasticsearch heap pressure death spiral: GC, node removal, and the cascade
- Elasticsearch high disk watermark [90%] exceeded: shard relocation and the cascade
- Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor







