Elasticsearch indexing pressure rejections: memory backpressure before heap failure

Bulk indexing clients report rejections while cluster health is green, disks are below the high watermark, and write thread pool queues are not saturated. Yet the nodes are pushing back. Pull _nodes/stats/indexing_pressure and you will see climbing coordinating_rejections, primary_rejections, or replica_rejections. This is the indexing pressure framework, introduced in Elasticsearch 7.9, enforcing memory-based backpressure. It tracks in-flight indexing bytes at the coordinating, primary, and replica stages. The default limit is 10% of the JVM heap for coordinating and primary work, and 1.5 times that limit for replica operations. It fires before write thread pool rejections, indicating that in-flight write memory is too high rather than disk or CPU.

Unlike thread pool rejections, which signal queue saturation, indexing pressure rejections mean the node is protecting its heap from unbounded growth. Each bulk request holds memory until it is fully acknowledged. Large batches, sudden ingest surges, or replica recovery traffic can push one or more stages over the limit. The cluster is not broken; admission must slow down before the parent circuit breaker or an OOM kill intervenes.

What this means

Indexing pressure is admission control measured in bytes, not queue slots. When a document arrives, the coordinating node accounts for its size in an in-flight memory budget; the primary and later the replica do the same. If a stage’s accumulated bytes hit the node’s threshold, new operations at that stage are rejected. The counters roll up under _nodes/stats/indexing_pressure as cumulative rejection counts per stage.

Because the limit is a percentage of heap, the absolute ceiling scales with node size, but the mechanism behaves the same regardless of heap size. A node with a 30 GB heap gets a 3 GB default budget. A single oversized bulk request or a wave of concurrent large documents can consume it quickly. Replica operations are allowed up to 1.5 times the default limit. This provides headroom for catch-up traffic without blocking the primary, but it is still a hard cap.

The critical distinction: indexing pressure rejections happen while the write thread pool still has free queue slots and idle threads. If you see indexing pressure rejections with zero or low write pool rejections, the bottleneck is memory residency of in-flight writes, not disk I/O or thread starvation. Bumping thread_pool.write.queue_size will not help and will likely worsen memory pressure.

Common causes

Cause	What it looks like	First thing to check
Oversized bulk requests	Spikes in current coordinating or primary bytes during bulk windows	`_nodes/stats/indexing_pressure` deltas aligned with client batch jobs
Sudden indexing surge	`primary_rejections` rising steadily across multiple data nodes	Per-node indexing rate compared to baseline
Replica recovery or catch-up	`replica_rejections` appear after a node restart or during relocation	`_cat/recovery?v&active_only` for active shard copies
Heap pressure from other consumers	Indexing pressure limit is 10% of heap, but total heap is sustained above 85%	`_nodes/stats/jvm` for `heap_used_percent` and old GC activity
Uneven coordinating load	Rejections concentrated on one node receiving most client traffic	`_cat/nodes` for uneven load or connection concentration

Quick checks

# Check indexing pressure stats and rejection counters
curl -s 'http://localhost:9200/_nodes/stats/indexing_pressure?filter_path=nodes.*.indexing_pressure'

# Check write thread pool rejections to distinguish memory pressure from queue exhaustion
curl -s 'http://localhost:9200/_cat/thread_pool/write?v&h=node_name,name,active,queue,rejected'

# Check JVM heap usage to see if total memory pressure is narrow
curl -s 'http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_used_percent'

# Check active shard recoveries (common source of replica pressure)
curl -s 'http://localhost:9200/_cat/recovery?v&active_only&h=index,shard,stage,source_host,target_host,bytes_percent'

# Check cluster health and node count to rule out primary unavailability
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,number_of_nodes,unassigned_shards'

# Check segment memory overhead, which competes for the same heap
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.memory,heap.percent'

How to diagnose it

Identify the rejecting stage. Query _nodes/stats/indexing_pressure and compare cumulative coordinating_rejections, primary_rejections, and replica_rejections. If only coordinating rejections are increasing, the bottleneck is on the node receiving client traffic. If primary rejections are rising, the data nodes holding primaries are saturated. Replica rejections usually correlate with recovery or a node that is falling behind.
Compare with write thread pool rejections. Use _cat/thread_pool/write. If indexing pressure rejections are nonzero while write pool rejections are zero or low, the problem is strictly in-flight memory. Do not increase the write queue size; that delays rejection but adds memory pressure.
Correlate with heap usage. Pull _nodes/stats/jvm. If heap_used_percent is sustained above 75-85%, the node is under broad memory pressure. Indexing pressure is working as designed by rejecting early. The fix is to reduce memory demand, not to raise the indexing pressure limit.
Check for recovery storms. Run _cat/recovery?v&active_only. Active recoveries generate replica replay traffic. If many shards are relocating or initializing, replica bytes can spike and hit the 1.5 times threshold even though normal ingest is moderate.
Evaluate bulk sizing. Large bulk requests hold coordinating and primary bytes simultaneously until all items are processed. Check your client-side batch configuration. Reduce batch size and monitor whether current bytes and rejections drop.
Inspect segment memory. High segment counts consume heap for metadata. Use _cat/nodes?v&h=name,segments.memory. If segment memory is growing, merges may be behind or shard count may be excessive, leaving less effective headroom for indexing.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`coordinating_rejections`	Front-door memory pressure on the coordinating node	Delta > 0 over two or more sampling intervals
`primary_rejections`	Primary shard in-flight memory saturated	Delta > 0 while indexing rate is sustained
`replica_rejections`	Replica stage overwhelmed, often during recovery	Delta > 0 correlated with active recovery
Current coordinating bytes	Real-time memory consumption at the coordinating stage	Sustained > 80% of `limit_in_bytes`
Current primary bytes	Real-time memory consumption at the primary stage	Sustained > 80% of `limit_in_bytes`
Current replica bytes	Real-time memory consumption at the replica stage	Sustained > 80% of the 1.5 times limit
`limit_in_bytes`	The actual threshold on the node	Verify it is 10% of `heap_max`
Write thread pool rejected	Distinguishes memory backpressure from queue exhaustion	Zero or low while indexing pressure rejections are high
JVM `heap_used_percent`	Total heap headroom	Sustained > 75% with indexing pressure near limit

Fixes

Reduce bulk request size. The most common trigger is bulk batches that are too large. Each document in a bulk request consumes heap at the coordinating and primary stages until acknowledged. Cut batch size and monitor current bytes and rejections. This is safe and requires no cluster changes.

Spread coordinating load. If rejections are isolated to one or two nodes, your clients may be targeting a single node. Distribute bulk traffic across multiple data nodes or use dedicated coordinating nodes. Check _cat/nodes for uneven CPU or connection distribution.

Let recovery finish. If replica_rejections spike during a rolling restart or node replacement, the replica stage is catching up. You can temporarily reduce client indexing rate, or simply wait for recovery to complete. Avoid restarting additional nodes while recoveries are active; that compounds replica pressure.

Address heap pressure root causes. If total heap is above 85% and indexing pressure is near its limit, the node is short on memory overall. Look for high segment memory, fielddata cache, or an oversized cluster state. Fixing these reduces competition for the heap and gives indexing pressure more effective headroom.

Raise the indexing pressure limit only as a last resort. The default is 10% of heap. Raising it reduces the safety margin before heap exhaustion and is not a dynamic change. Only consider this if you have verified that heap is healthy (sustained below 75%), GC is clean, and the workload legitimately needs more in-flight memory. Adding nodes is usually safer.

Throttle non-critical indexing. If the workload includes reindexing jobs, log backfill, or batch imports, pause them until pressure drops. Unlike live traffic, batch jobs can usually be rescheduled without user impact.

Prevention

Monitor indexing pressure as a leading indicator. Thread pool rejections and circuit breaker trips are lagging indicators of pain. Indexing pressure rejections fire earlier. Track rejection counters and current byte levels, and alert on sustained deltas before clients complain.

Size bulk requests under load. Test bulk sizing against production-like document sizes and mapping complexity. Monitor indexing_pressure.memory.current values during load tests to find the inflection point where memory pressure rises nonlinearly.

Maintain heap headroom. Keep sustained heap below 75% and the post-GC floor below 50%. Indexing pressure cannot protect you if the heap is already full of segment metadata, fielddata, or cluster state bloat. See Elasticsearch cluster state too large: field count, index count, and per-node heap and Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix.

Use ILM and force-merge for time-series data. Unmanaged time-series indices accumulate shards and segments. This grows segment metadata in heap and increases the baseline memory footprint, leaving less room for in-flight indexing. ILM rollover, shrink, and delete phases keep shard counts bounded.

Avoid restart storms. Restarting multiple nodes simultaneously creates a wave of replica recoveries. Each recovery increases replica-stage bytes. Stagger restarts and verify _cat/recovery is quiescent before proceeding to the next node.

How Netdata helps

Netdata collects _nodes/stats/indexing_pressure and charts coordinating_rejections, primary_rejections, and replica_rejections per node, so you can identify the rejecting stage without running curl during an incident.
It correlates indexing pressure with JVM heap usage, write thread pool rejections, and disk I/O on the same dashboard, letting you distinguish memory-bound backpressure from disk-bound slowdown.
Historical per-node context shows whether rejections are steady-state capacity or a transient spike tied to a deployment or recovery event.
Alerts on heap pressure and thread pool rejections complement indexing pressure monitoring, giving you a layered view of memory saturation before it cascades to an outage.

The Netdata solution

Elasticsearch monitoring with Netdata

Netdata monitors Elasticsearch with per-second metrics and ML anomaly detection. Correlate JVM heap pressure, shard counts, disk watermarks, mapping growth, and merge activity with cluster and node health in one view.

See Elasticsearch monitoring → Start monitoring free

Elasticsearch indexing pressure rejections: memory backpressure before heap failure

Elasticsearch indexing pressure rejections: memory backpressure before heap failure

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Prevention

How Netdata helps

Related guides

Elasticsearch monitoring with Netdata