Elasticsearch too many segments per shard: search slowdown and force-merge
Searches that used to complete in tens of milliseconds are now taking hundreds. CPU on data nodes climbs without a traffic spike. Cluster health stays green, but query latency degrades and timeouts increase. The problem is invisible at the cluster level: shards have accumulated too many Lucene segments, and every query scans every segment in every relevant shard. When a shard holds more than roughly 100 segments, the background merge policy has fallen behind. Search slows, file descriptor usage rises, and merge I/O competes with indexing and queries. Since Elasticsearch 7.7, most segment metadata moved off-heap, so the node may not run out of JVM heap, but search performance degrades all the same.
What this means
A Lucene segment is an immutable chunk of an inverted index. During the write path, documents collect in an in-memory buffer; on refresh, Elasticsearch writes a new segment that is immediately searchable. On flush, segments become durable and the translog is truncated. Background merges combine small segments into larger ones, reclaim space from deleted documents, and improve search performance. Healthy shards typically hold 10 to 50 segments. When the count exceeds 100 per shard, merges are not keeping up with the creation rate.
Elasticsearch uses a scatter-gather model: the coordinating node broadcasts the query to one copy of every relevant shard. Each shard executes the query against its local segments and returns results. The slowest shard determines overall latency. When segment counts are high, every shard scan opens more files, iterates more data structures, and consumes more CPU. The node also holds more open file descriptors, which can approach system limits. Merge operations themselves consume disk I/O and CPU, so a merge backlog can saturate storage and slow indexing even further.
flowchart TD
A[High indexing rate or frequent refresh] --> B[Many small segments created]
C[Update-heavy workload] --> B
B --> D[Merge backlog]
D --> E[Segment count exceeds 100 per shard]
E --> F[Search latency increases]
E --> G[File descriptor pressure]
D --> H[Disk I/O saturation]
H --> I[Indexing slowdown]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Refresh interval too aggressive | Segment count rising on actively indexed indices; high refresh throughput | /_nodes/stats/indices/refresh for rising total time |
| Merge I/O saturation | Segment count growing despite steady load; merge time increasing | /_nodes/stats/indices/merges for current, plus OS iostat |
| Indexing burst outpacing merge threads | Many small segments; merges persistently active | /_nodes/stats/indices/merges for current and total |
| Update-heavy workload with deleted docs | Merges are large and slow; space not reclaimed after deletes | /_nodes/stats/indices/merges for total_size_in_bytes |
| Force merge blocking background merges | Segment count flat or rising while a force merge is running | /_nodes/stats/indices/merges current metric |
Quick checks
Read-only checks for segment density, merge health, and search impact.
# Segment counts per index, sorted by primary segment count descending
curl -s 'http://localhost:9200/_cat/indices?v&h=index,pri,rep,docs.count,store.size,pri.segments.count&s=pri.segments.count:desc' | head -20
# Node-level segment count and segment memory
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,segments.count,segments.memory'
# Node-level merge concurrency and cumulative merges
curl -s 'http://localhost:9200/_nodes/stats/indices/merges?filter_path=nodes.*.indices.merges.current,nodes.*.indices.merges.total'
# Detailed merge stats: current merges, total time, and size
curl -s 'http://localhost:9200/_nodes/stats/indices/merges?filter_path=nodes.*.indices.merges'
# Search latency: query and fetch phase cumulative counters
curl -s 'http://localhost:9200/_nodes/stats/indices/search?filter_path=nodes.*.indices.search.query_total,nodes.*.indices.search.query_time_in_millis,nodes.*.indices.search.fetch_total,nodes.*.indices.search.fetch_time_in_millis'
# Refresh and flush performance
curl -s 'http://localhost:9200/_nodes/stats/indices/refresh,flush?filter_path=nodes.*.indices.refresh,nodes.*.indices.flush'
# File descriptor utilization per node
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,file_desc.current,file_desc.max,file_desc.percent'
How to diagnose it
- Confirm segment counts per shard. Use
_cat/indiceswithpri.segments.count. Sort descending. Healthy shards show 10 to 50 segments. Anything above 100 suggests a merge backlog. - Check if merges are keeping up. Use
_nodes/stats/indices/merges. Ifcurrentis persistently non-zero and segment counts continue to rise, the node cannot consolidate segments faster than they are created. - Inspect merge throughput and cost. Use
_nodes/stats/indices/mergesto seetotal_time_in_millisandtotal_size_in_bytes. Rapid growth in merge time without a corresponding drop in segment count indicates I/O-bound merges. - Correlate with search latency. Sample
_nodes/stats/indices/searchtwice over a 30-second interval. Computedelta(query_time) / delta(query_total). Elevated query latency with stable query complexity points to segment scan overhead. - Evaluate refresh behavior. Check
_nodes/stats/indices/refreshfortotal_time_in_millis. If refresh time is high orrefresh_intervalis set very low, the node is creating segments faster than the merge policy can absorb them. - Check file descriptor pressure. Use
_cat/nodeswithfile_desc.percent. Each segment consists of multiple files. Segment explosion drives file descriptor growth, which can approach the per-process limit.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pri.segments.count per index | Direct measure of per-shard segment density | >100 per shard |
segments.count per node | Node-wide load and file descriptor pressure | Growing monotonically |
segments.memory per node | Memory consumed by segment-related structures | Sustained upward trend |
| Search query latency | User-facing impact of segment scan overhead | >2x baseline sustained |
merges.current | Merge concurrency saturation | Persistently non-zero while segment counts rise |
| Merge total time | Background I/O and CPU cost | Growing faster than segment count drops |
file_desc.percent | Segments consume multiple files per shard | >80% of system limit |
Fixes
Reduce refresh frequency on write-heavy indices
For indices still receiving writes, increase index.refresh_interval to 30s or higher during bulk loads. This reduces the creation rate of small segments and gives the TieredMergePolicy room to consolidate them. Restore 1s only after ingestion completes and segment counts normalize. Do not set refresh_interval: -1 on live production search indices unless you can tolerate delayed searchability.
Tune merge concurrency for storage type
If merges are falling behind on spinning disks, reduce index.merge.scheduler.max_thread_count to 1 to lower random I/O contention. On SSDs, the default scheduler concurrency is tuned for available cores. Raising the thread count above the default rarely helps and usually increases contention with indexing and search.
Force-merge read-only indices only
For time-series indices that are no longer written to, force merge to one segment per shard:
# WARNING: blocking and resource-intensive. Only for read-only indices.
POST /<index>/_forcemerge?max_num_segments=1
This can take hours on large shards and will spike CPU and disk I/O while it runs. Never run force merge on an index that is still receiving writes. On a live write index, force merge creates very large segments that the background merge policy will not efficiently recombine with future small segments. This wastes disk space and CPU and can trigger subsequent merge storms. After a force merge, new writes create new segments alongside the force-merged giant, leaving you with worse fragmentation than before.
Address over-sharding
If segment counts are high because shards are undersized, use the shrink API to reduce the primary shard count on read-only indices, or reindex into a new index with fewer, larger shards. Target shard sizes in the 10 to 50 GB range. Fewer shards means fewer total segments and lower fixed overhead.
Prevention
- Set
refresh_intervalto-1during large bulk imports and restore it afterward. - Use ILM to transition time-series indices to read-only and force-merge them automatically once writes stop.
- Monitor
pri.segments.countandsegments.countper node as leading indicators, not just lagging symptoms. - Size disks with merge headroom in mind. A large merge can temporarily require disk space for both old and new segments.
- Avoid dynamic index creation rates that outpace merge capacity. Consolidate time-series granularity where possible.
How Netdata helps
Netdata correlates Elasticsearch segment counts with per-node search latency and CPU to isolate segment pressure from query load. It tracks file descriptor utilization per node alongside segment growth, surfaces OS disk I/O wait alongside merge activity to distinguish merge saturation from generic disk pressure, and charts JVM heap and old GC behavior against segment memory trends. Alert on monotonically rising segment counts per node before query performance degrades.
Related guides
- Elasticsearch all shards failed: diagnosing search_phase_execution_exception
- Elasticsearch authentication failures: audit logs, brute force, and credential drift
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster_block_exception: blocked by, the read-only blocks explained
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch cluster state too large: field count, index count, and per-node heap
- Elasticsearch disk full: emergency recovery and freeing space safely
- Elasticsearch disk watermark cascade: from low watermark to cluster-wide read-only
- Elasticsearch document indexing failures: index_failed, bulk item errors, and version conflicts
- Elasticsearch EsRejectedExecutionException: write thread pool rejections and HTTP 429
- Elasticsearch exposed without authentication: open clusters and snapshot exfiltration







