$ guides / elasticsearch / elasticsearch-circuitbreakingexception-parent-data-too-large ▌

Operations Guides

Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes

When a search or indexing request returns HTTP 429 with CircuitBreakingException: [parent] Data too large, data for [<http_request>] would be [X], which is larger than the limit of [Y], the parent circuit breaker has rejected the operation. This is Elasticsearch protecting the JVM from an out-of-memory kill, not a client-side rate limit.

Since version 7.0, the parent breaker tracks real memory usage by default. It can trip even when individual child breakers are within limits. The node is under genuine heap pressure. Determine quickly whether the cause is a single abusive query or structural memory exhaustion.

What this means

The parent circuit breaker guards total JVM heap consumption. When indices.breaker.total.use_real_memory is true (default since Elasticsearch 7.0), the breaker sums actual heap utilization plus the bytes the current operation would reserve. If the sum exceeds the limit, Elasticsearch rejects the operation with HTTP 429. The error message includes real usage, new bytes reserved, and a per-child breakdown inside usages[].

This breakdown identifies the subsystem consuming the most memory. Because the parent uses real memory, it can trip when fielddata or segment metadata fills the heap, even if no single request is large. The parent threshold defaults to 70% of the JVM heap.

The use_real_memory setting is static. Changing it requires a node restart, and disabling it masks real heap pressure. All other breaker limits are dynamic and can be updated via PUT /_cluster/settings. Do not raise the parent threshold as a routine fix; it is the last defense before cascading heap pressure failure.

flowchart TD
    A[Parent breaker trips
HTTP 429] --> B{Check _nodes/stats/breaker}
    B -->|estimated_size near limit
no single large usage| C[Systemic heap pressure]
    B -->|usages shows large fielddata| D[Fielddata cache load]
    B -->|Single request oversized| E[Abusive query or bulk]
    C --> F[Check segments.memory
and shard count]
    D --> G[Clear fielddata cache
fix text field mappings]
    E --> H[Cancel task via _tasks
reduce batch size]

Common causes

Cause	What it looks like	First thing to check
Fielddata cache on text fields	Error `usages[]` shows a large `fielddata` component; aggregations or sorts on analyzed text fields	`GET /_nodes/stats/indices/fielddata?fields=*`
Oversized bulk or indexing batches	Trips during ingest spikes; `write` thread pool queues and rejections increase	`GET /_cat/thread_pool/write` and bulk request sizes
Runaway search or aggregation	Coordinating node heap spikes; single query reserves massive request structures	`GET /_tasks?detailed=true&actions=search`
Structural heap pressure	Post-GC heap floor rising across nodes; `segments.memory` growing steadily	`GET /_cat/nodes?h=name,segments.memory,heap.percent`

Quick checks

Run these read-only commands to triage without changing cluster state.

# Check which breaker tripped and by how much
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers'

# Check JVM heap percent
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,heap.percent,heap.current,heap.max'

# Check old GC collection count and time
curl -s 'http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.gc.collectors.old'

# Check fielddata cache size and eviction churn
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,fielddata.memory_size,fielddata.evictions'

# Check segment metadata heap usage
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.memory'

# Check for expensive in-flight search tasks
curl -s 'http://localhost:9200/_tasks?detailed=true&actions=*search*'

# Check write thread pool rejections under ingest load
curl -s 'http://localhost:9200/_cat/thread_pool/write?v&h=node_name,active,queue,rejected'

How to diagnose it

Confirm the breaker. Query /_nodes/stats/breaker and verify the parent breaker tripped count has incremented. Compare estimated_size_in_bytes to limit_size_in_bytes. Note the current limit; if it was already raised above the default 70%, the cluster is running without adequate headroom. If the ratio is consistently above 70%, the node is operating with thin margins even when not actively tripping.
Read the error breakdown. If the exception usages[] shows a large fielddata component, the cause is text-field aggregations. If request is large, a single query is at fault. If both are small but real usage is high, the heap is consumed by segments or other long-lived structures.
Find the abusive task. Check /_tasks?detailed=true&actions=*search* for long-running searches. If one task dominates memory, note its task_id and cancel it. If no single task stands out, the pressure is systemic.
Inspect the heap floor. Look at heap.percent and old GC counts. If heap usage is sustained above 85% and old GC frequency is climbing, the node is entering a heap pressure death spiral. Check segments.memory and total shard count per node.
Correlate with workload. If the trip coincides with bulk ingest, check thread_pool.write queue and rejection rates. If it coincides with a new query deployment, review the slow log for expensive aggregations.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`breakers.parent.tripped` (delta)	Count of parent rejections; any sustained delta means the node is near OOM	Delta > 0 per minute
`jvm.mem.heap_used_percent`	Current heap utilization; sustained elevation precedes breaker trips	Sustained >85%
Post-GC heap floor	Minimum heap after old GC; a rising floor means long-lived objects are accumulating	Trending upward over days
`fielddata.memory_size_in_bytes`	Text field data loaded into heap; should be minimal with doc_values	>25% of heap
`segments.memory`	Segment metadata lives in heap; grows with shard and segment count	Steady growth without index deletion
`thread_pool.write.rejected`	Write pool pushback; often precedes parent trips under ingest load	Sustained nonzero rate

Fixes

Immediate relief for runaway queries

Find and cancel the abusive task:

# List active search tasks
curl -s 'http://localhost:9200/_tasks?detailed=true&actions=*search*&filter_path=nodes.*.tasks'
# Cancel by task ID
curl -X POST 'http://localhost:9200/_tasks/<task_id>/_cancel'

Cancel only the specific task. Mass cancellation requests can generate enough internal memory traffic to trip breakers on remote nodes.

Fielddata pressure

Clear the cache to recover headroom:

curl -X POST 'http://localhost:9200/_cache/clear?fielddata=true'

Expect temporary query latency degradation until hot data reloads.

Then fix the mapping. Use _nodes/stats/indices/fielddata?fields=* to identify offending text fields. Replace text-field aggregations and sorts with keyword sub-fields. The fielddata breaker default is 40% of heap, but if fielddata alone consumes 22 GB on a 30 GB heap, the parent breaker has no margin left. Any significant fielddata usage in modern Elasticsearch indicates a mapping problem.

Bulk indexing pressure

Reduce bulk batch size. Standard practice is 1,000 to 5,000 documents per batch. Large in-flight requests add temporary heap pressure. Combined with existing segment or fielddata load, the parent trips even for moderate batches. Also check for very large individual documents that inflate request overhead.

Structural heap pressure

If segments.memory is large and shard count per node is high, reduce the shard burden:

Close or delete old indices.
Force-merge read-only indices to 1 segment only if they are no longer written and you accept the I/O cost.
Shrink indices with the shrink API.
Add data nodes to redistribute shards.

Check _cat/allocation to confirm per-node shard distribution before expanding the cluster.

Do not disable real-memory tracking. Setting indices.breaker.total.use_real_memory: false masks heap pressure and leads to OOM kills. This is a static setting that requires a restart, so it is not a viable incident response.

Temporary dynamic tuning

You can raise child breaker limits dynamically to avoid false trips while fixing the root cause:

curl -X PUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{
  "transient": {
    "indices.breaker.fielddata.limit": "45%",
    "indices.breaker.request.limit": "65%"
  }
}'

Tradeoff: higher limits delay rejection but increase OOM risk. Revert these changes once the root cause is fixed. Do not raise the parent limit.

Prevention

Monitor the post-GC heap floor, not just the peak. A rising floor is the best leading indicator of the heap pressure death spiral.
Keep fielddata near zero by using doc_values and keyword fields for aggregations and sorting.
Size bulk batches conservatively and monitor indexing pressure if you run Elasticsearch 7.9 or later.
Use ILM to delete or rollup old indices before shard count and segment overhead exhaust heap.
Keep JVM heap at or below 31 GB to stay within compressed OOPs, and leave at least 50% of system RAM for the OS page cache.
Alert on breakers.parent.estimated_size_in_bytes consistently above 70% of limit_size_in_bytes.

How Netdata helps

Netdata correlates elasticsearch.breakers.parent.tripped with per-node JVM heap usage and old-generation GC pauses to distinguish a bad query from systemic pressure. Per-node charts for elasticsearch.fielddata.memory_size_in_bytes and elasticsearch.segments.memory_in_bytes identify the heap consumer without manual API sampling. Alert on jvm.mem.heap_used_percent and GC collection time for a leading indicator before the breaker fires. Thread pool rejection rates and shard counts per node surface gradual accumulation that eventually triggers trips.

The Netdata solution

Elasticsearch monitoring with Netdata

Netdata monitors Elasticsearch with per-second metrics and ML anomaly detection. Correlate JVM heap pressure, shard counts, disk watermarks, mapping growth, and merge activity with cluster and node health in one view.

See Elasticsearch monitoring → Start monitoring free