Elasticsearch search thread pool rejections: failed queries and small search queues

Your application logs show HTTP 429 responses from Elasticsearch. The error is es_rejected_execution_exception and the message points to the search thread pool. User-facing queries are failing, not just slowing down.

The search thread pool uses a fixed number of threads and a bounded queue. The default queue size is 1000. Because search is a scatter-gather operation across shards, a single expensive query can hold a thread for seconds while the coordinating node waits for every shard to respond. The queue drains slowly, and under burst or pathological load it fills fast. Once full, Elasticsearch rejects new search requests immediately.

Distinguish between expensive queries, coordinating-node overload, hot-spotting, and GC stalls, and fix each without masking the problem by raising queue limits.

What this means

The search thread pool executes the query phase on every targeted shard. The coordinating node fans out the request, waits for all shard responses, merges results, and then runs the fetch phase. Every shard query consumes one search thread on the data node. If threads are busy, new requests enter the bounded queue. When the queue reaches its limit, Elasticsearch rejects the request immediately with HTTP 429.

The rejected counter is cumulative per node. A sustained delta means the cluster cannot keep up with the search workload. Raising thread_pool.search.queue_size is operationally risky: it delays rejection, increases memory pressure, and does nothing to speed up the slow queries that are holding threads hostage.

flowchart TD
    A[Expensive query or GC stall] --> B[Search threads blocked]
    B --> C[Queue fills toward 1000]
    C --> D[HTTP 429 rejected]
    D --> E[Client retries amplify load]
    E --> F[Queue drains slowly]
    F --> G[Latency spikes before rejection]

Common causes

CauseWhat it looks likeFirst thing to check
Expensive queries (wildcards, deep aggregations, script scoring)Slow log entries exceeding your SLA; specific tasks holding threads for secondsGET /_tasks?detailed=true&actions=*search* and the slow log
Coordinating node overload (large result merges, high-cardinality aggs)Circuit breaker trips on the coordinator; heap spikes there while data nodes look healthy_nodes/stats/breaker and thread_pool.search on the coordinating node versus data nodes
JVM GC pauses / heap pressureRejections correlate with old GC time spikes; node may briefly drop from cluster_nodes/stats/jvm old collection time and heap_used_percent
Hot-spotting / uneven shard distributionOne node shows high search queue and rejections while others are idle; asymmetric CPU_cat/thread_pool and _cat/nodes for CPU and load imbalance
Excessive shards or segment count per queryHigh query_total relative to user queries; elevated latency across many indices_cat/indices shard count and pri.segments.count

Quick checks

# Check search thread pool rejections and queue depth per node
curl -s 'http://localhost:9200/_cat/thread_pool/search?v&h=node_name,name,active,queue,rejected&s=rejected:desc'
# Check search throughput and cumulative latency counters
curl -s 'http://localhost:9200/_nodes/stats/indices/search?filter_path=nodes.*.indices.search'
# List running search tasks with detailed descriptions
curl -s 'http://localhost:9200/_tasks?detailed=true&actions=*search*'
# Check old GC collection time and heap pressure
curl -s 'http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.gc,nodes.*.jvm.mem.heap_used_percent'
# Check circuit breaker estimated sizes and trip counts
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers'
# Check segment count per index and node-level segment memory
curl -s 'http://localhost:9200/_cat/indices?v&h=index,pri,rep,docs.count,pri.segments.count&s=pri.segments.count:desc' | head -20
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.count,segments.memory'
# Check CPU and heap imbalance across nodes
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,cpu,heap.percent,load_1m'

How to diagnose it

  1. Confirm the pool and the nodes. Use _cat/thread_pool to see which nodes have rising rejected counters and deep queue. Rejections on a single node point to hot-spotting. Rejections across many nodes point to query pattern changes or cluster-wide resource pressure.

  2. Check the coordinating node first. In multi-node clusters, the coordinating node fans out the search and merges results. If its thread_pool.search.queue is full or its circuit breakers (parent, request) are tripping, the bottleneck is upstream of the data nodes. Compare coordinator stats to data nodes.

  3. Identify expensive queries. Use GET /_tasks?detailed=true&actions=*search* to see long-running tasks. Cross-reference with the slow log (index.search.slowlog.threshold.query.warn) to find queries holding threads for seconds. Look for leading wildcards, regex, deeply nested aggregations, or high size parameters.

  4. Correlate with GC and heap. Pull _nodes/stats/jvm. If old GC collection time spikes coincide with rejection spikes, the pool is frozen by stop-the-world pauses. Check if the post-GC heap floor is rising. A rising floor means pressure is structural, not transient.

  5. Evaluate shard and segment fan-out. Check _cat/indices for targeted indices. If a single query fans out to hundreds of shards, the coordinating node must wait for every shard. High segment counts per shard (greater than 100) also increase per-shard query time, holding threads longer.

  6. Check for cold OS page cache. After a restart or on memory-constrained nodes, segment files may not be in the OS page cache. This shows up as elevated fetch latency with low CPU. Check OS-level cached memory. This is not an ES fault but an environmental one.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
thread_pool.search.rejected (delta)Direct count of failed user queriesSustained greater than 0 per minute for more than 5 minutes
thread_pool.search.queuePrecursor to rejection; queueing adds latencySustained greater than 100 with default max 1000
Search query latency (query_time_in_millis / query_total)User-visible search slownessSustained greater than 5x baseline
Old GC collection timeStop-the-world pauses freeze all search threadsIndividual pauses greater than 5 seconds or increasing frequency
Circuit breaker estimated_size_in_bytes vs limit_size_in_bytesMemory pressure on coordinating node or large aggregationsConsistently greater than 70% of limit; any tripped delta
Segment count per shardMore segments slow queries and increase thread hold timeGreater than 100 segments per active shard
Per-node CPU / loadHot-spotting or expensive query executionAsymmetric load or sustained greater than 80%

Fixes

Expensive queries

Do not just raise queue size. Find the query. Use /_tasks to identify long-running searches. Cancel abusive tasks with POST /_tasks/{task_id}/_cancel if they are non-essential. Fix patterns: replace leading wildcards with n-grams or match queries; aggregate on keyword sub-fields instead of analyzed text; reduce aggregation cardinality; avoid script-based sorting on large result sets. Enable the slow log at multiple thresholds to catch these before they saturate the pool.

Coordinating node overload

If high-cardinality aggregations or large fetch sizes spike heap on the coordinating node while data nodes look healthy, reduce the fan-out. Target fewer shards per query with tighter index patterns, or add dedicated coordinating nodes with sufficient heap. Reduce the fetch size parameter if clients are pulling large result sets. Shrink read-only indices to fewer shards, or reindex into fewer shards.

GC stalls and heap pressure

If rejections correlate with old GC spikes, the cluster is in or near a heap pressure death spiral. Identify heap consumers: check fielddata.memory_size for text-field aggregations, segments.memory for too many shards or segments, and cluster state size. Cancel heavy queries. Do not raise circuit breaker limits; that risks OOM. If the post-GC heap floor is rising, add nodes or reduce shard count. See Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor.

Hot-spotting

Uneven shard distribution causes one node to reject while others are idle. Use /_cat/allocation and /_cat/shards to identify concentrated indices. Temporarily reroute shards if needed, but prefer structural fixes: adjust shard count per index, use index routing, or add data nodes.

Segment and shard bloat

For read-only indices, run POST /<index>/_forcemerge?max_num_segments=1 during low-traffic windows. Warning: this blocks until completion and is I/O-intensive. Do not force-merge live indices receiving writes. For active indices, ensure refresh_interval is not too low. If indices have too many shards, use the shrink API (index must be read-only) or reindex into fewer shards.

Prevention

  • Monitor search rejection deltas. Cumulative counters hide bursts; alerting on the rate of change catches sustained overload early.
  • Alert on search queue depth. Queueing precedes rejection; a sustained queue greater than 100 signals that latency is turning into failures.
  • Configure slow log thresholds. Catching queries exceeding your SLA before they saturate the pool gives you a fixable target instead of an outage.
  • Force-merge read-only indices via ILM. High segment count increases per-shard query time and threads hold locks longer.
  • Right-size shard counts per query. Avoid querying across hundreds of shards; use shrink or fewer indices to reduce coordinating-node merge work.
  • Track the post-GC heap floor. A rising floor means long-lived objects are accumulating and old GC pauses are approaching.

How Netdata helps

  • Netdata collects _nodes/stats/thread_pool and surfaces per-node search.rejected and search.queue in real time, making it easy to spot which nodes are rejecting.
  • Correlate search rejection spikes with per-node JVM heap charts and old GC pause duration to immediately identify GC-induced stalls.
  • Overlay search latency with OS-level disk I/O wait and page cache metrics to distinguish expensive queries from I/O-bound cold cache scenarios.
  • Alert on sustained search queue depth and rejection rate deltas without manual polling of cumulative counters.
  • Correlate with per-node CPU and segment memory to catch hot-spotting before rejections start.