Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix

Queries return CircuitBreakingException: [fielddata] Data too large... and HTTP 429s while JVM heap on one or more data nodes climbs toward the breaker limit. The node rejects queries to protect itself before OOM. This almost always means a query is aggregating, sorting, or scripting against an analyzed text field that lacks a keyword sub-field, forcing Elasticsearch to load an expensive fielddata cache into heap.

Analyzed text fields are tokenized and optimized for full-text search, not columnar operations such as terms aggregations or sorting. Running these against a raw text field forces Elasticsearch to uninvert the inverted index per segment into an in-memory structure called fielddata. This cache consumes JVM heap proportional to unique term cardinality. On large indices it approaches the fielddata circuit breaker limit, which defaults to 40% of the JVM heap.

The correct fix is not to raise the breaker or enable fielddata on the text field. It is to use a keyword sub-field, which stores values in doc_values off-heap by default, and update the query to target that sub-field. In a healthy cluster, fielddata.memory_size_in_bytes should stay near zero and fielddata.evictions should be zero.

What this means

Text fields are analyzed at index time. The inverted index maps terms to documents, which is efficient for search but useless for aggregations that need a columnar view. keyword, numeric, date, and other structured types use doc_values by default. doc_values live on disk and are accessed outside the JVM heap, so aggregations and sorts are memory-efficient.

Elasticsearch does not use doc_values on text fields because the tokenized representation is not useful for most aggregations. When a query aggregates or sorts on a text field, Elasticsearch builds fielddata on the fly by uninverting the inverted index per segment. The cache lives in the JVM heap until eviction or restart. It is unbounded by default and grows with unique term count, so the fielddata circuit breaker rejects operations before they consume 40% of the heap and trigger an OOM.

Any significant fielddata usage is a configuration or query bug. The breaker is working as intended. Give the query a structured field.

Common causes

CauseWhat it looks likeFirst thing to check
Terms aggregation or sort on a text fieldThe fielddata breaker trips when a specific dashboard or query runs; per-field stats show one text field consuming the majority of fielddata memoryQuery source for aggs or sort on a text field name
Legacy mapping with fielddata: trueFielddata loads successfully for a while on small indices, then trips the breaker as the corpus grows; evictions may appear before the tripIndex mappings and legacy templates for fielddata: true
Third-party tool or ad-hoc queryBreaker trips correlate with usage of a BI tool, security platform, or Kibana Discover sort on a message fieldSlow log or application logs for unexpected aggregations on text fields

Quick checks

Run these read-only commands to confirm the breaker type, locate the node, and identify the offending field.

# Check fielddata circuit breaker estimated size and trip count
curl -s 'http://localhost:9200/_nodes/stats/breaker?filter_path=nodes.*.breakers'

# Check total fielddata memory and evictions per node
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,fielddata.memory_size,fielddata.evictions'

# Check per-field fielddata breakdown to identify the offender
curl -s 'http://localhost:9200/_nodes/stats/indices/fielddata?fields=*&filter_path=nodes.*.indices.fielddata.fields'

# Check JVM heap context to confirm memory pressure
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,heap.percent,heap.max'

# Check for active searches that may be loading fielddata
curl -s 'http://localhost:9200/_tasks?detailed=true&actions=*search*'

How to diagnose it

  1. Confirm the breaker type. In the _nodes/stats/breaker output, look for the fielddata section. If tripped is incrementing and estimated_size_in_bytes is near limit_size_in_bytes, the fielddata breaker is the active constraint. Distinguish this from parent breaker trips, which indicate total heap pressure from multiple sources, or request breaker trips, which indicate a single query’s aggregation structures are too large.
  2. Find the node. Use _cat/nodes with fielddata.memory_size to see which data nodes are holding the cache. Fielddata is loaded per node, so the breaker may trip on one heavily queried node before others.
  3. Find the field. Use _nodes/stats/indices/fielddata?fields=* to retrieve the per-field breakdown. The field with a large byte count is the culprit. This is often a high-cardinality text field like message, description, or host.name if it was mapped as text without a sub-field.
  4. Find the query. Check application logs, the Elasticsearch slow log (index.search.slowlog.threshold.query.warn), or the _tasks output for long-running searches. Look for terms, cardinality, date_histogram, or sort clauses targeting the raw text field name.
  5. Inspect the mapping. Verify the field is type: text and either lacks a keyword multi-field or, worse, has fielddata: true explicitly set. If the mapping was created by dynamic mapping on raw JSON, string fields often default to text with a keyword sub-field, but legacy templates or manual mappings may omit the sub-field.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
fielddata.memory_size_in_bytesDirect measure of heap wasted on text-field term loadingNonzero on a cluster using doc_values correctly
fielddata.evictionsEvictions mean the cache is churning under pressure and reloading dataAny nonzero value; should be zero
breakers.fielddata.trippedCumulative count of queries rejected to prevent OOMAny delta > 0 over a monitoring interval
breakers.fielddata.estimated_size_in_bytes / limit_size_in_bytesProximity to the 40% heap default limitRatio consistently above 70%
jvm.mem.heap_used_percentFielddata pressure contributes to overall heap saturationSustained above 75% correlated with fielddata growth

Fixes

Add a keyword sub-field and update queries (canonical fix)

Update the index mapping to add a keyword sub-field:

PUT /my-index/_mapping
{
  "properties": {
    "my_field": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    }
  }
}

Change all aggregations, sorts, and scripts to use my_field.keyword instead of my_field. The keyword field uses doc_values by default, so it consumes no heap for these operations.

Tradeoffs: This mapping change only affects new documents. If you need the sub-field on existing documents, you must reindex the data or roll over to a new index. You must also update every query, dashboard, and alert that targets the raw text field for aggregations. This is the only fix that removes heap pressure permanently.

Do not enable fielddata: true

Some legacy documentation suggests setting fielddata: true on the text field to suppress the error. Do not do this. It allows the fielddata to load into heap, consuming memory unbounded until the breaker trips anyway. It turns a fast failure into a slow heap-pressure incident that can destabilize the node.

Lower the breaker limit to fail faster (temporary relief)

If you need to protect the node while deploying the mapping fix, temporarily lower indices.breaker.fielddata.limit to force earlier rejection:

PUT /_cluster/settings
{
  "transient": {
    "indices.breaker.fielddata.limit": "30%"
  }
}

Warning: This does not fix the root cause. It only reduces the blast radius by rejecting bad queries sooner, giving you runway to fix the mapping and reindex. Revert the setting after the fix is deployed.

Prevention

Dynamic templates with keyword sub-fields. Configure index templates to map strings as text with a keyword sub-field by default. Every text field then has a .keyword alternative without manual mapping changes.

Explicit fielddata: false in templates. Set fielddata: false on text fields in index templates where aggregations are never intended. This prevents accidental enablement.

Query auditing in staging. Run dashboards, alerts, and ad-hoc queries through staging first. Aggregations or sorts on a raw text field should fail loudly in testing.

Fielddata monitoring in CI. Treat nonzero fielddata.memory_size in staging as a build-time error. A healthy cluster should show near-zero fielddata.

How Netdata helps

  • Netdata collects elasticsearch.fielddata.memory_size_in_bytes per node. Use it to spot the offending node without running _cat/nodes.
  • Alert on elasticsearch.fielddata.evictions. Any nonzero value indicates cache churn.
  • Track elasticsearch.breaker_fielddata_tripped against elasticsearch.jvm_heap_used_percent to confirm heap pressure is driven by fielddata rather than segment metadata or cluster state bloat.
  • Per-node fielddata metrics distinguish a data-node hotspot from a coordinating-node bottleneck.
  • Correlate fielddata growth with search thread pool rejections to surface the problem before widespread query failures.