Elasticsearch indexing pressure rejections: memory backpressure before heap failure

Bulk indexing clients report rejections while cluster health is green, disks are below the high watermark, and write thread pool queues are not saturated. Yet the nodes are pushing back. Pull _nodes/stats/indexing_pressure and you will see climbing coordinating_rejections, primary_rejections, or replica_rejections. This is the indexing pressure framework, introduced in Elasticsearch 7.9, enforcing memory-based backpressure. It tracks in-flight indexing bytes at the coordinating, primary, and replica stages. The default limit is 10% of the JVM heap for coordinating and primary work, and 1.5 times that limit for replica operations. It fires before write thread pool rejections, indicating that in-flight write memory is too high rather than disk or CPU.

Unlike thread pool rejections, which signal queue saturation, indexing pressure rejections mean the node is protecting its heap from unbounded growth. Each bulk request holds memory until it is fully acknowledged. Large batches, sudden ingest surges, or replica recovery traffic can push one or more stages over the limit. The cluster is not broken; admission must slow down before the parent circuit breaker or an OOM kill intervenes.

What this means

Indexing pressure is admission control measured in bytes, not queue slots. When a document arrives, the coordinating node accounts for its size in an in-flight memory budget; the primary and later the replica do the same. If a stage’s accumulated bytes hit the node’s threshold, new operations at that stage are rejected. The counters roll up under _nodes/stats/indexing_pressure as cumulative rejection counts per stage.

Because the limit is a percentage of heap, the absolute ceiling scales with node size, but the mechanism behaves the same regardless of heap size. A node with a 30 GB heap gets a 3 GB default budget. A single oversized bulk request or a wave of concurrent large documents can consume it quickly. Replica operations are allowed up to 1.5 times the default limit. This provides headroom for catch-up traffic without blocking the primary, but it is still a hard cap.

The critical distinction: indexing pressure rejections happen while the write thread pool still has free queue slots and idle threads. If you see indexing pressure rejections with zero or low write pool rejections, the bottleneck is memory residency of in-flight writes, not disk I/O or thread starvation. Bumping thread_pool.write.queue_size will not help and will likely worsen memory pressure.

Common causes

CauseWhat it looks likeFirst thing to check
Oversized bulk requestsSpikes in current coordinating or primary bytes during bulk windows_nodes/stats/indexing_pressure deltas aligned with client batch jobs
Sudden indexing surgeprimary_rejections rising steadily across multiple data nodesPer-node indexing rate compared to baseline
Replica recovery or catch-upreplica_rejections appear after a node restart or during relocation_cat/recovery?v&active_only for active shard copies
Heap pressure from other consumersIndexing pressure limit is 10% of heap, but total heap is sustained above 85%_nodes/stats/jvm for heap_used_percent and old GC activity
Uneven coordinating loadRejections concentrated on one node receiving most client traffic_cat/nodes for uneven load or connection concentration

Quick checks

# Check indexing pressure stats and rejection counters
curl -s 'http://localhost:9200/_nodes/stats/indexing_pressure?filter_path=nodes.*.indexing_pressure'

# Check write thread pool rejections to distinguish memory pressure from queue exhaustion
curl -s 'http://localhost:9200/_cat/thread_pool/write?v&h=node_name,name,active,queue,rejected'

# Check JVM heap usage to see if total memory pressure is narrow
curl -s 'http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_used_percent'

# Check active shard recoveries (common source of replica pressure)
curl -s 'http://localhost:9200/_cat/recovery?v&active_only&h=index,shard,stage,source_host,target_host,bytes_percent'

# Check cluster health and node count to rule out primary unavailability
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,number_of_nodes,unassigned_shards'

# Check segment memory overhead, which competes for the same heap
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.memory,heap.percent'

How to diagnose it

  1. Identify the rejecting stage. Query _nodes/stats/indexing_pressure and compare cumulative coordinating_rejections, primary_rejections, and replica_rejections. If only coordinating rejections are increasing, the bottleneck is on the node receiving client traffic. If primary rejections are rising, the data nodes holding primaries are saturated. Replica rejections usually correlate with recovery or a node that is falling behind.

  2. Compare with write thread pool rejections. Use _cat/thread_pool/write. If indexing pressure rejections are nonzero while write pool rejections are zero or low, the problem is strictly in-flight memory. Do not increase the write queue size; that delays rejection but adds memory pressure.

  3. Correlate with heap usage. Pull _nodes/stats/jvm. If heap_used_percent is sustained above 75-85%, the node is under broad memory pressure. Indexing pressure is working as designed by rejecting early. The fix is to reduce memory demand, not to raise the indexing pressure limit.

  4. Check for recovery storms. Run _cat/recovery?v&active_only. Active recoveries generate replica replay traffic. If many shards are relocating or initializing, replica bytes can spike and hit the 1.5 times threshold even though normal ingest is moderate.

  5. Evaluate bulk sizing. Large bulk requests hold coordinating and primary bytes simultaneously until all items are processed. Check your client-side batch configuration. Reduce batch size and monitor whether current bytes and rejections drop.

  6. Inspect segment memory. High segment counts consume heap for metadata. Use _cat/nodes?v&h=name,segments.memory. If segment memory is growing, merges may be behind or shard count may be excessive, leaving less effective headroom for indexing.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
coordinating_rejectionsFront-door memory pressure on the coordinating nodeDelta > 0 over two or more sampling intervals
primary_rejectionsPrimary shard in-flight memory saturatedDelta > 0 while indexing rate is sustained
replica_rejectionsReplica stage overwhelmed, often during recoveryDelta > 0 correlated with active recovery
Current coordinating bytesReal-time memory consumption at the coordinating stageSustained > 80% of limit_in_bytes
Current primary bytesReal-time memory consumption at the primary stageSustained > 80% of limit_in_bytes
Current replica bytesReal-time memory consumption at the replica stageSustained > 80% of the 1.5 times limit
limit_in_bytesThe actual threshold on the nodeVerify it is 10% of heap_max
Write thread pool rejectedDistinguishes memory backpressure from queue exhaustionZero or low while indexing pressure rejections are high
JVM heap_used_percentTotal heap headroomSustained > 75% with indexing pressure near limit

Fixes

Reduce bulk request size. The most common trigger is bulk batches that are too large. Each document in a bulk request consumes heap at the coordinating and primary stages until acknowledged. Cut batch size and monitor current bytes and rejections. This is safe and requires no cluster changes.

Spread coordinating load. If rejections are isolated to one or two nodes, your clients may be targeting a single node. Distribute bulk traffic across multiple data nodes or use dedicated coordinating nodes. Check _cat/nodes for uneven CPU or connection distribution.

Let recovery finish. If replica_rejections spike during a rolling restart or node replacement, the replica stage is catching up. You can temporarily reduce client indexing rate, or simply wait for recovery to complete. Avoid restarting additional nodes while recoveries are active; that compounds replica pressure.

Address heap pressure root causes. If total heap is above 85% and indexing pressure is near its limit, the node is short on memory overall. Look for high segment memory, fielddata cache, or an oversized cluster state. Fixing these reduces competition for the heap and gives indexing pressure more effective headroom.

Raise the indexing pressure limit only as a last resort. The default is 10% of heap. Raising it reduces the safety margin before heap exhaustion and is not a dynamic change. Only consider this if you have verified that heap is healthy (sustained below 75%), GC is clean, and the workload legitimately needs more in-flight memory. Adding nodes is usually safer.

Throttle non-critical indexing. If the workload includes reindexing jobs, log backfill, or batch imports, pause them until pressure drops. Unlike live traffic, batch jobs can usually be rescheduled without user impact.

Prevention

Monitor indexing pressure as a leading indicator. Thread pool rejections and circuit breaker trips are lagging indicators of pain. Indexing pressure rejections fire earlier. Track rejection counters and current byte levels, and alert on sustained deltas before clients complain.

Size bulk requests under load. Test bulk sizing against production-like document sizes and mapping complexity. Monitor indexing_pressure.memory.current values during load tests to find the inflection point where memory pressure rises nonlinearly.

Maintain heap headroom. Keep sustained heap below 75% and the post-GC floor below 50%. Indexing pressure cannot protect you if the heap is already full of segment metadata, fielddata, or cluster state bloat. See Elasticsearch cluster state too large: field count, index count, and per-node heap and Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix.

Use ILM and force-merge for time-series data. Unmanaged time-series indices accumulate shards and segments. This grows segment metadata in heap and increases the baseline memory footprint, leaving less room for in-flight indexing. ILM rollover, shrink, and delete phases keep shard counts bounded.

Avoid restart storms. Restarting multiple nodes simultaneously creates a wave of replica recoveries. Each recovery increases replica-stage bytes. Stagger restarts and verify _cat/recovery is quiescent before proceeding to the next node.

How Netdata helps

  • Netdata collects _nodes/stats/indexing_pressure and charts coordinating_rejections, primary_rejections, and replica_rejections per node, so you can identify the rejecting stage without running curl during an incident.
  • It correlates indexing pressure with JVM heap usage, write thread pool rejections, and disk I/O on the same dashboard, letting you distinguish memory-bound backpressure from disk-bound slowdown.
  • Historical per-node context shows whether rejections are steady-state capacity or a transient spike tied to a deployment or recovery event.
  • Alerts on heap pressure and thread pool rejections complement indexing pressure monitoring, giving you a layered view of memory saturation before it cascades to an outage.