Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
When a search or indexing request returns HTTP 429 with CircuitBreakingException: [parent] Data too large, data for [<http_request>] would be [X], which is larger than the limit of [Y], the parent circuit breaker has rejected the operation. This is Elasticsearch protecting the JVM from an out-of-memory kill, not a client-side rate limit.
Since version 7.0, the parent breaker tracks real memory usage by default. It can trip even when individual child breakers are within limits. The node is under genuine heap pressure. Determine quickly whether the cause is a single abusive query or structural memory exhaustion.
What this means
The parent circuit breaker guards total JVM heap consumption. When indices.breaker.total.use_real_memory is true (default since Elasticsearch 7.0), the breaker sums actual heap utilization plus the bytes the current operation would reserve. If the sum exceeds the limit, Elasticsearch rejects the operation with HTTP 429. The error message includes real usage, new bytes reserved, and a per-child breakdown inside usages[].
This breakdown identifies the subsystem consuming the most memory. Because the parent uses real memory, it can trip when fielddata or segment metadata fills the heap, even if no single request is large. The parent threshold defaults to 70% of the JVM heap.
The use_real_memory setting is static. Changing it requires a node restart, and disabling it masks real heap pressure. All other breaker limits are dynamic and can be updated via PUT /_cluster/settings. Do not raise the parent threshold as a routine fix; it is the last defense before cascading heap pressure failure.
flowchart TD
A[Parent breaker trips
HTTP 429] --> B{Check _nodes/stats/breaker}
B -->|estimated_size near limit
no single large usage| C[Systemic heap pressure]
B -->|usages shows large fielddata| D[Fielddata cache load]
B -->|Single request oversized| E[Abusive query or bulk]
C --> F[Check segments.memory
and shard count]
D --> G[Clear fielddata cache
fix text field mappings]
E --> H[Cancel task via _tasks
reduce batch size]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Fielddata cache on text fields | Error usages[] shows a large fielddata component; aggregations or sorts on analyzed text fields | GET /_nodes/stats/indices/fielddata?fields=* |
| Oversized bulk or indexing batches | Trips during ingest spikes; write thread pool queues and rejections increase | GET /_cat/thread_pool/write and bulk request sizes |
| Runaway search or aggregation | Coordinating node heap spikes; single query reserves massive request structures | GET /_tasks?detailed=true&actions=*search* |
| Structural heap pressure | Post-GC heap floor rising across nodes; segments.memory growing steadily | GET /_cat/nodes?h=name,segments.memory,heap.percent |
Quick checks
Run these read-only commands to triage without changing cluster state.
# Check which breaker tripped and by how much
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers'
# Check JVM heap percent
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,heap.percent,heap.current,heap.max'
# Check old GC collection count and time
curl -s 'http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.gc.collectors.old'
# Check fielddata cache size and eviction churn
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,fielddata.memory_size,fielddata.evictions'
# Check segment metadata heap usage
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.memory'
# Check for expensive in-flight search tasks
curl -s 'http://localhost:9200/_tasks?detailed=true&actions=*search*'
# Check write thread pool rejections under ingest load
curl -s 'http://localhost:9200/_cat/thread_pool/write?v&h=node_name,active,queue,rejected'
How to diagnose it
Confirm the breaker. Query
/_nodes/stats/breakerand verify theparentbreakertrippedcount has incremented. Compareestimated_size_in_bytestolimit_size_in_bytes. Note the current limit; if it was already raised above the default 70%, the cluster is running without adequate headroom. If the ratio is consistently above 70%, the node is operating with thin margins even when not actively tripping.Read the error breakdown. If the exception
usages[]shows a largefielddatacomponent, the cause is text-field aggregations. Ifrequestis large, a single query is at fault. If both are small butreal usageis high, the heap is consumed by segments or other long-lived structures.Find the abusive task. Check
/_tasks?detailed=true&actions=*search*for long-running searches. If one task dominates memory, note itstask_idand cancel it. If no single task stands out, the pressure is systemic.Inspect the heap floor. Look at
heap.percentand old GC counts. If heap usage is sustained above 85% and old GC frequency is climbing, the node is entering a heap pressure death spiral. Checksegments.memoryand total shard count per node.Correlate with workload. If the trip coincides with bulk ingest, check
thread_pool.writequeue and rejection rates. If it coincides with a new query deployment, review the slow log for expensive aggregations.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
breakers.parent.tripped (delta) | Count of parent rejections; any sustained delta means the node is near OOM | Delta > 0 per minute |
jvm.mem.heap_used_percent | Current heap utilization; sustained elevation precedes breaker trips | Sustained >85% |
| Post-GC heap floor | Minimum heap after old GC; a rising floor means long-lived objects are accumulating | Trending upward over days |
fielddata.memory_size_in_bytes | Text field data loaded into heap; should be minimal with doc_values | >25% of heap |
segments.memory | Segment metadata lives in heap; grows with shard and segment count | Steady growth without index deletion |
thread_pool.write.rejected | Write pool pushback; often precedes parent trips under ingest load | Sustained nonzero rate |
Fixes
Immediate relief for runaway queries
Find and cancel the abusive task:
# List active search tasks
curl -s 'http://localhost:9200/_tasks?detailed=true&actions=*search*&filter_path=nodes.*.tasks'
# Cancel by task ID
curl -X POST 'http://localhost:9200/_tasks/<task_id>/_cancel'
Cancel only the specific task. Mass cancellation requests can generate enough internal memory traffic to trip breakers on remote nodes.
Fielddata pressure
Clear the cache to recover headroom:
curl -X POST 'http://localhost:9200/_cache/clear?fielddata=true'
Expect temporary query latency degradation until hot data reloads.
Then fix the mapping. Use _nodes/stats/indices/fielddata?fields=* to identify offending text fields. Replace text-field aggregations and sorts with keyword sub-fields. The fielddata breaker default is 40% of heap, but if fielddata alone consumes 22 GB on a 30 GB heap, the parent breaker has no margin left. Any significant fielddata usage in modern Elasticsearch indicates a mapping problem.
Bulk indexing pressure
Reduce bulk batch size. Standard practice is 1,000 to 5,000 documents per batch. Large in-flight requests add temporary heap pressure. Combined with existing segment or fielddata load, the parent trips even for moderate batches. Also check for very large individual documents that inflate request overhead.
Structural heap pressure
If segments.memory is large and shard count per node is high, reduce the shard burden:
- Close or delete old indices.
- Force-merge read-only indices to 1 segment only if they are no longer written and you accept the I/O cost.
- Shrink indices with the shrink API.
- Add data nodes to redistribute shards.
Check _cat/allocation to confirm per-node shard distribution before expanding the cluster.
Do not disable real-memory tracking. Setting indices.breaker.total.use_real_memory: false masks heap pressure and leads to OOM kills. This is a static setting that requires a restart, so it is not a viable incident response.
Temporary dynamic tuning
You can raise child breaker limits dynamically to avoid false trips while fixing the root cause:
curl -X PUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{
"transient": {
"indices.breaker.fielddata.limit": "45%",
"indices.breaker.request.limit": "65%"
}
}'
Tradeoff: higher limits delay rejection but increase OOM risk. Revert these changes once the root cause is fixed. Do not raise the parent limit.
Prevention
- Monitor the post-GC heap floor, not just the peak. A rising floor is the best leading indicator of the heap pressure death spiral.
- Keep fielddata near zero by using
doc_valuesandkeywordfields for aggregations and sorting. - Size bulk batches conservatively and monitor indexing pressure if you run Elasticsearch 7.9 or later.
- Use ILM to delete or rollup old indices before shard count and segment overhead exhaust heap.
- Keep JVM heap at or below 31 GB to stay within compressed OOPs, and leave at least 50% of system RAM for the OS page cache.
- Alert on
breakers.parent.estimated_size_in_bytesconsistently above 70% oflimit_size_in_bytes.
How Netdata helps
Netdata correlates elasticsearch.breakers.parent.tripped with per-node JVM heap usage and old-generation GC pauses to distinguish a bad query from systemic pressure. Per-node charts for elasticsearch.fielddata.memory_size_in_bytes and elasticsearch.segments.memory_in_bytes identify the heap consumer without manual API sampling. Alert on jvm.mem.heap_used_percent and GC collection time for a leading indicator before the breaker fires. Thread pool rejection rates and shard counts per node surface gradual accumulation that eventually triggers trips.







