Elasticsearch slow search after restart: cold OS page cache and warmup
Restart an Elasticsearch node and searches that normally return in tens of milliseconds now take seconds. CPU stays low, disk read throughput spikes, and iowait climbs. This is not a failing disk, a runaway query, or a JVM heap problem. It is a cold OS page cache.
Elasticsearch relies on the OS filesystem cache to serve search requests from Lucene segment files. After any restart, that cache is empty. The kernel must read segments from disk into memory on demand. Until the working set is resident, queries incur disk I/O that should have been cache hits. Depending on the ratio of dataset size to available RAM, warmup can last minutes to hours.
Elasticsearch 7.0 introduced a second mechanism that produces nearly identical symptoms with a different fix. Shards that receive no search or GET request for index.search.idle.after (default 30s) enter an idle state. The next query triggers a synchronous refresh before executing, adding latency that can be mistaken for cache coldness. Telling the two apart determines whether you wait, tune a setting, or query differently.
What this means
Elasticsearch keeps index data out of the JVM heap. The heap holds segment metadata, query structures, and caches; the OS page cache holds the actual inverted indices and stored fields. This design makes the page cache the single most important resource for search performance.
After a restart, the page cache is empty. The first queries to each shard trigger sequential disk reads. Because the query itself is not CPU-intensive, the node sits with low CPU and high disk wait until the kernel caches the relevant segment files. Latency can be 10-100x higher than normal during this window. Search is scatter-gather across target shards, so the slowest cold shard sets overall latency.
The idle-shard refresh behavior is separate and per-shard. When a shard goes idle, Elasticsearch stops background refreshes to save indexing overhead. The next search must wait for a refresh to complete. This does not require a restart, but it is often first noticed after a restart when traffic resumes unevenly across shards.
flowchart TD
A[High search latency after restart] --> B{Low CPU and high disk reads?}
B -->|Yes| C[Cold OS page cache]
B -->|No| D{Slow log shows refresh wait?}
D -->|Yes| E[Idle shard refresh block]
D -->|No| F[Investigate queries segments heap]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Cold OS page cache after restart | Query and fetch latency elevated, low CPU, high disk I/O, no slow log entries | free -w or /proc/meminfo for cached memory, plus _cat/allocation for segment store size |
| Idle-shard refresh blocking (Elasticsearch 7.0+) | First search after idle period is slow; subsequent queries on the same shard are fast | GET /<index>/_settings/index.search.idle.after and slow logs |
| Excessive Linux readahead thrashing the cache | Sustained high disk throughput but poor cache residency, common on LVM or RAID | lsblk or blockdev readahead values |
| Insufficient RAM for the page cache | Chronic high latency even after warmup; node memory is too small for the working set | Compare total index size on node to available system memory after heap |
Quick checks
# OS page cache and memory layout
free -w
# Elasticsearch node-level OS memory stats
curl -s 'http://localhost:9200/_nodes/stats/os?filter_path=nodes.*.os.mem'
# Index settings that affect warmup and idle refresh
curl -s 'http://localhost:9200/<index>/_settings?filter_path=**.index.store.preload,**.index.search.idle.after'
# Slow queries with refresh waits (path and format depend on log4j2 configuration)
grep "took" /var/log/elasticsearch/*_slowlog.log | tail -20
# Linux readahead settings for block devices
lsblk -o NAME,RA,MOUNTPOINT,TYPE,SIZE
# Segment store size per node to estimate working set
curl -s 'http://localhost:9200/_cat/allocation?v&h=node,disk.indices,disk.used,disk.total'
How to diagnose it
Confirm the signature. High search latency with low CPU and elevated disk reads points to I/O-bound cold data. If CPU is high, look elsewhere: expensive queries, merge storms, or segment explosion.
Check the slow log. Cold page cache misses do not appear in the slow log because the query plan and execution are fast; only the disk read is slow. If the slow log shows entries with refresh wait times, the cause is idle-shard synchronous refresh rather than cold cache.
Verify page cache headroom. On the node OS, compare cached memory (
free -wor/proc/meminfo) to the total size of segment files the node holds. If the dataset is larger than RAM, the cache will churn. If the dataset fits but cached memory is low, another process may be consuming memory, or the heap may be oversized.Inspect
index.store.preload. If the index is configured to preload specific file extensions at open time, verify the list is narrow. Preloading too many files can evict hot data and degrade search performance when the cache cannot hold the preload plus the working set.Inspect
index.search.idle.after. The default is30s. If your workload has natural idle periods and latency spikes consistently on the first query after each idle window, this setting is the culprit.On Linux, verify block device readahead. Values much larger than 128 KiB can cause the kernel to read more data than necessary, polluting the page cache and delaying residency of the actual Lucene segments.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Search latency (query and fetch phases) | Primary user-facing symptom | Sustained >5x baseline after restart |
| OS CPU percent | Distinguishes CPU-bound from I/O-bound search | Low CPU despite high latency |
| Disk I/O wait and read throughput | Confirms disk-bound reads from cold cache | iowait >30% or sustained high read throughput |
| OS page cache / available memory | Measures cache adequacy for segment files; Elasticsearch does not expose page cache directly | Available memory + cache < working set size, or chronic swapping |
index.search.idle.after | Idle-shard penalty affects first search after idle | Latency spikes correlated with idle periods >30s |
| Thread pool search queue | Queuing indicates pressure from slow shards | Queue depth growing while latency is high |
Fixes
Cold page cache
Wait. If the cluster is green and the node is healthy, the cache will warm naturally as queries run. This is not a bug and does not require a restart or configuration change.
Run warm-up queries. Before returning a restarted node to the load balancer, issue representative searches against critical indices. Use the same query patterns your application uses so the relevant segment files are loaded into cache. This shifts the latency impact from users to the maintenance window. For large datasets, warmup can still take hours, so plan maintenance windows accordingly.
Use index.store.preload with caution. You can configure an index to eagerly load specific Lucene file types into the filesystem cache when the index is opened. For example, ["cfs", "dvm", "tim"]. However, preloading too many files on too many indices will evict hot data and make search slower if the cache cannot hold everything. Reserve this for small, critical indices.
Ensure adequate RAM. Elasticsearch recommends leaving at least half of available system memory for the filesystem cache. If the JVM heap consumes too much of the node RAM, the remaining space for the page cache shrinks and warmup becomes ineffective. Keep heap sized appropriately (typically no more than 26-30 GB).
Idle-shard refresh blocking
Tune index.search.idle.after. The default 30s is appropriate for continuously searched indices. For batch or intermittently searched workloads, raising this value (for example, to 3600s) keeps background refreshes active longer, preventing the synchronous refresh penalty on the next query. The tradeoff is slightly higher indexing overhead. Do not raise this on always-on search paths unless you understand the indexing cost.
Keep traffic continuous. Sending a lightweight periodic search or GET request to critical shards prevents them from entering the idle state. This avoids the synchronous refresh penalty entirely.
Linux readahead thrashing
Set readahead to 128 KiB. High readahead values, common on LVM, software RAID, or dm-crypt devices, cause the kernel to read more data than necessary. This pollutes the page cache and delays the loading of the actual Lucene segments.
# Check current readahead
blockdev --getra /dev/<device>
# Set to 128 KiB (256 sectors)
blockdev --setra 256 /dev/<device>
This change takes effect immediately but is lost on reboot. Persist it via udev rules or your init system.
Prevention
Warm-up procedure. After any restart, run a scripted warm-up pass against critical indices before returning the node to the load balancer. This moves the latency cost from production traffic into the maintenance window.
Right-size the heap. Keep the Elasticsearch JVM heap sized appropriately (typically no more than 26-30 GB) so the OS has sufficient remaining RAM for the page cache. An oversized heap starves the cache and makes cold-start recovery longer and less stable.
Monitor page cache headroom. Track OS-level cached memory and available memory alongside the total size of index data on the node. Elasticsearch does not report page cache usage directly. If the gap shrinks over time, you are approaching chronic cache pressure that will amplify any restart.
Avoid aggressive readahead. Configure Linux block devices with a 128 KiB readahead, especially under LVM or RAID. This prevents cache pollution during warmup and steady-state operation.
Tune idle-shard behavior for your workload. If your traffic pattern is naturally bursty and you observe refresh-related latency spikes, adjust index.search.idle.after proactively rather than reacting to user complaints.
How Netdata helps
- Correlate Elasticsearch search latency with per-disk
iowaitand read throughput to confirm a cold-cache bottleneck rather than CPU or heap pressure. - Track system RAM and cached memory to verify page cache headroom after restarts.
- Alert on search latency spikes paired with low CPU, which points to cold page cache or idle-shard I/O waits.
- Visualize per-node disk I/O to distinguish warmup reads from merge or recovery traffic.
Related guides
- Elasticsearch all shards failed: diagnosing search_phase_execution_exception
- Elasticsearch authentication failures: audit logs, brute force, and credential drift
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster_block_exception: blocked by, the read-only blocks explained
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch cluster state too large: field count, index count, and per-node heap
- Elasticsearch disk full: emergency recovery and freeing space safely
- Elasticsearch disk I/O saturation: merges, fsync, and page-cache starvation
- Elasticsearch disk watermark cascade: from low watermark to cluster-wide read-only
- Elasticsearch document indexing failures: index_failed, bulk item errors, and version conflicts
- Elasticsearch EsRejectedExecutionException: write thread pool rejections and HTTP 429







