Elasticsearch refresh_interval tuning: search visibility vs indexing throughput

Elasticsearch makes newly indexed documents searchable within one second by default. Every refresh creates a new Lucene segment, invalidates per-segment query caches, and adds merge debt. During bulk loads, reindexing, or high-throughput indexing, the default interval becomes a throughput bottleneck. Tuning refresh_interval requires understanding its interaction with the write path, the merge scheduler, and the query cache.

In self-managed clusters, the default index.refresh_interval is one second, though recent versions implement lazy refresh: an index refreshes every second only if it received a search request in the last 30 seconds. If idle, auto-refresh stops until the next search arrives. Explicitly setting refresh_interval overrides lazy refresh and forces refreshes on the interval regardless of search traffic. In Elastic Cloud Serverless, the default is five seconds with a hard floor of five seconds (or -1). Setting the interval to -1 disables automatic refresh.

What it is and why it matters

refresh_interval is an index-level dynamic setting that controls how often Elasticsearch executes a refresh. A refresh makes documents in the in-memory buffer searchable by writing them to a new Lucene segment. It does not make them durable on disk. Durability is handled later by flush, which commits the translog and writes a Lucene commit point. During incidents, operators often conflate refresh with durability. A refreshed document can still be lost if the node crashes before flush, so do not treat searchable as synonymous with persisted.

Because each refresh produces an immutable segment, the interval directly controls segment creation velocity. A one-second interval on a heavy indexing stream can generate thousands of small segments per hour. The merge scheduler consolidates them in the background, consuming disk I/O, CPU, and file descriptors. If merge throughput cannot keep up, segment count grows, search latency degrades, and heap pressure rises from segment metadata. Since Elasticsearch 7.7, significant segment metadata moved off-heap , reducing per-segment heap overhead, but high segment counts still increase file descriptor usage and search-path overhead.

Refresh invalidates the node query cache on a per-segment basis. The query cache stores filter results at the segment level; when a new segment appears, cached results for affected segments drop. On actively written indices, frequent refresh keeps the query cache cold, which manifests as higher search latency even when indexing throughput looks acceptable.

How it works

When a document arrives at a primary shard, the shard validates it, writes it to the translog, and places it in an in-memory buffer managed by Lucene’s IndexWriter. The document is not yet searchable. On refresh, Elasticsearch opens a new Lucene segment from that buffer and makes it visible to the search path.

The following diagram shows the refresh decision gate in the write path.

flowchart TD
    A[Document write] --> B[In-memory buffer]
    B --> C{Auto refresh interval}
    C -->|Elapsed| D[New Lucene segment]
    C -->|Disabled -1| B
    D --> E[Searchable]
    D --> F[Query cache invalidated]
    D --> G[Background merge]
    B --> H[Flush]
    H --> I[Durable commit point]

Segments are immutable. To control proliferation, a background merge scheduler continuously combines small segments into larger ones. If refresh_interval is too low relative to indexing volume, segments accumulate faster than merges consolidate them. High segment counts degrade search performance because the query path must check each segment, and they increase resource consumption across the node.

When you need immediate search visibility for a specific operation, use the refresh query parameter on indexing or bulk requests. refresh=false is the default and cheapest: the document follows the normal interval. refresh=true forces an immediate synchronous refresh, creating a segment and making the document searchable before the call returns. refresh=wait_for blocks until the next periodic refresh occurs. If refresh_interval is set to -1, a wait_for request blocks indefinitely unless a manual refresh is triggered elsewhere. This is a common trap during bulk loads: operators disable auto-refresh to speed up ingestion, then issue reads with wait_for and wonder why the call hangs.

Where it shows up in production

High-throughput log ingestion is the most common place where the default interval hurts. A hot logging index receiving tens of thousands of documents per second with a one-second refresh creates thousands of small segments per hour. Operators often misdiagnose the resulting slowdown as a disk throughput problem and provision faster storage. In reality, the bottleneck is segment metadata overhead and merge scheduling, not raw write bandwidth. Merge backlog grows, indexing latency rises, and the cluster may appear I/O-bound when the real problem is excessive refresh. The merge scheduler falls behind, segment memory creeps up, and eventually search latency degrades even though indexing still succeeds.

Bulk reindexing and migration jobs are another hotspot. The indexing rate can plateau because the cluster spends cycles creating and merging segments every second, not because of CPU or disk limits. Set refresh_interval to -1 before the job and restore it after to maximize throughput. Forgetting to restore the interval is a frequent post-incident finding.

Search-heavy, low-write indices present the opposite case. Lazy refresh already skips auto-refresh when no recent searches arrived. Explicitly setting a low refresh_interval on such an index opts out of that optimization and wastes resources forcing unneeded refreshes. If your index is rarely searched, leave the default unset and let lazy refresh work.

Mixed workloads on shared clusters require per-index tuning. A product catalog serving interactive search should not share the same refresh interval as a metrics index queried in aggregate every minute. Global tuning forces one workload to suffer for the other. Evaluate each index pattern independently based on read latency requirements and write volume.

Query cache thrashing is a subtler symptom. On an index with a stable query pattern and frequent refreshes, the query cache hit rate drops continuously. The cache invalidates on every refresh, so the working set never stays warm. This shows up as elevated search latency on an index that should be fast.

Tradeoffs and when to use it

Default one second (or Serverless five seconds): Use for general-purpose indices serving interactive search and moderate indexing. If search traffic is continuous, lazy refresh provides the expected behavior. Explicitly setting one second on an index with few searches opts out of lazy refresh and pays the segment cost for no benefit. In Serverless, attempting to set an interval below five seconds is rejected.

Extended interval (ten to thirty seconds): Use for write-heavy indices where search freshness tolerates a short delay. This dramatically reduces segment creation and merge pressure. It is a good middle ground for time-series data queried in aggregate rather than by exact last event. Many logging clusters benefit from moving hot indices to ten or thirty seconds during peak ingestion.

Disabled (-1): Use only for bulk loads, reindexing, or initial backfills. Set refresh_interval to -1 before the load starts, run the bulk job, then restore the interval afterward. Do not leave it at -1 permanently: searches return no new results and wait_for requests stall. After restoring the interval, issue a manual POST /index/_refresh if you need immediate visibility for the newly loaded documents. Verify the restoration with GET /index/_settings to avoid leaving the index invisible to search.

Explicit refresh on demand: For workflows that index a document and immediately read it back, prefer refresh=wait_for over lowering the global interval. If you use this pattern heavily, batch documents into a single bulk request rather than chaining sequential wait_for calls. Sequential calls serialize throughput and create head-of-line blocking.

Anti-patterns: Setting refresh_interval to a fractional value like 500ms is rarely beneficial and accelerates segment explosion. Forgetting to restore -1 after a bulk job leaves the index invisible to search. Using refresh=true on every bulk request in a high-throughput pipeline destroys indexing performance by forcing foreground refreshes. Do not drop the interval to 1ms to “fix” search lag; fix the query pattern or use explicit refresh instead.

Signals to watch in production

Signal	Why it matters	Warning sign
refresh.total_time_in_millis	Rising refresh time indicates disk I/O contention or too many shards refreshing concurrently.	Sustained average >1s or 3x above hourly baseline.
Segment count per shard	Frequent refresh creates small segments. High counts degrade search and increase heap.	>100 segments per shard on active indices.
Merge activity (merges.current, merges.total_time_in_millis)	Merges combine segments. If merge time grows while segment count still rises, the scheduler is falling behind.	merges.current at max thread count for sustained periods.
Indexing latency	Segment creation and cache invalidation add overhead per document.	Sustained latency >2x baseline under constant load.
Query cache hit rate	Refresh invalidates per-segment caches.	Hit rate dropping on indices with stable query patterns.
segments.memory per node	Each segment carries metadata overhead.	Growing linearly with shard and segment count.

Correlate refresh time with write thread pool queue depth and indexing rate. If refresh cost rises while the write queue grows and indexing rate stays flat, the node spends cycles on segment management instead of indexing.

How Netdata helps

Correlate refresh.total_time_in_millis with per-node disk I/O wait and merge activity to distinguish refresh-induced saturation from generic disk pressure.
Track segment count and JVM heap trends on the same time axis to spot gradual buildup before a heap pressure cascade.
Alert on indexing latency deviations and write thread pool queue depth to catch throttling before rejections begin.
Monitor query cache hit rate drops alongside refresh rate spikes to identify indices where frequent refresh destroys cache efficiency.
Compare per-node segment memory to detect hot-spotting from uneven shard distribution.
Watch for divergence between indexing rate and merge rate; when merge rate flatlines while indexing climbs, the merge scheduler is falling behind.

The Netdata solution

Elasticsearch monitoring with Netdata

Netdata monitors Elasticsearch with per-second metrics and ML anomaly detection. Correlate JVM heap pressure, shard counts, disk watermarks, mapping growth, and merge activity with cluster and node health in one view.

See Elasticsearch monitoring → Start monitoring free

Elasticsearch refresh_interval tuning: search visibility vs indexing throughput

Elasticsearch refresh_interval tuning: search visibility vs indexing throughput

What it is and why it matters

How it works

Where it shows up in production

Tradeoffs and when to use it

Signals to watch in production

How Netdata helps

Related guides

Elasticsearch monitoring with Netdata