ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool
When insert latency climbs and system.merges is empty while active parts grow, the background merge pool is likely stalled or starved. ClickHouse relies on background merges to consolidate immutable parts after each INSERT. Without merges, parts accumulate: query scans open more files, memory pressure shifts to the mark cache and file descriptor tables, and the system approaches the TOO_MANY_PARTS threshold.
A stalled pool is not always obvious. The server accepts queries, replication appears healthy, and the HTTP ping endpoint returns Ok.. The real signals are rising part counts per partition, insert delays, and a merge pool that is either fully occupied or idle. Because merges and mutations share the background pool, a heavy ALTER UPDATE or ALTER DELETE can silently monopolize the threads that should consolidate parts.
Distinguish a genuinely stuck merge from an oversubscribed pool, identify the root cause, and recover before inserts are rejected.
What this means
INSERTs create immutable data parts. ClickHouse schedules background merges to combine smaller parts into larger ones. Fewer parts means fewer file descriptors, faster index lookups, and lower query latency. This is the only automatic way to reduce part count.
When merges are not keeping up, either they run too slowly to offset new part creation, or they are blocked entirely. Both lead to rising active part counts, which eventually trigger DelayedInserts and then RejectedInserts.
The background pool is governed by background_merges_mutations_concurrency_ratio (default 2). This controls how many merge and mutation tasks run concurrently relative to the physical thread pool. When slots are saturated by slow operations, or when a merge cannot start because disk space is insufficient, the pool bottlenecks. Because merges are the only way to reduce part count, a stalled pool creates a one-way ratchet toward insert rejection.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Mutations monopolizing the pool | system.merges shows only is_mutation = 1; part count rises while mutations run | system.mutations for is_done = 0 entries and flat parts_to_do |
| Disk I/O saturation | Merge threads exist but elapsed grows while progress crawls; query latency also elevated | iostat -xz 1 5 and merge bytes/sec from system.part_log |
| Merge pool fully saturated | BackgroundMergesAndMutationsPoolTask at or near BackgroundMergesAndMutationsPoolSize | system.metrics for pool utilization ratio |
| Insufficient disk space for merge output | system.merges is empty or merges refuse to start; disk usage high | system.disks for free_space and unreserved_space |
| Hung individual merge | One merge in system.merges with identical progress across samples | Resample system.merges over 60 seconds |
Quick checks
Run these read-only checks in order.
Check what is actually running in the merge pool:
-- Active merges and mutations with resource usage
SELECT
database,
table,
elapsed,
progress,
num_parts,
is_mutation,
formatReadableSize(total_size_bytes_compressed) AS total_size,
formatReadableSize(memory_usage) AS mem_used
FROM system.merges
ORDER BY elapsed DESC;
If this returns zero rows while parts are actively growing, the pool is blocked.
Count merges versus mutations:
-- Distinguish regular merges from mutation tasks
SELECT
count(*) AS active_merges,
countIf(is_mutation = 1) AS mutations,
countIf(is_mutation = 0) AS regular_merges
FROM system.merges;
Check pool utilization directly:
-- Background pool active tasks vs configured size
SELECT metric, value
FROM system.metrics
WHERE metric LIKE '%Background%Pool%';
Check part count at the partition level, because limits are per-partition:
-- Part count per partition (the critical granularity)
SELECT
database,
table,
partition_id,
count(*) AS parts_in_partition
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts_in_partition DESC
LIMIT 20;
Check for insert backpressure:
-- Insert backpressure counters
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
Check for mutation backlog:
-- Pending mutations that may be consuming pool slots
SELECT
database,
table,
mutation_id,
command,
create_time,
is_done,
parts_to_do,
latest_fail_time,
latest_fail_reason
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
Check disk space across all configured volumes:
-- Free space including reservation accounting
SELECT
name,
path,
formatReadableSize(free_space) AS free,
formatReadableSize(total_space) AS total,
round(100 * (1 - free_space / total_space), 1) AS used_pct,
formatReadableSize(unreserved_space) AS unreserved
FROM system.disks;
Check OS-level I/O latency:
# Device-level I/O latency and utilization
iostat -xz 1 5
Check recent merge throughput to see if completion rate has collapsed:
-- Merge completion trend over the last hour
SELECT
toStartOfMinute(event_time) AS minute,
count() AS merges_completed,
sum(rows) AS rows_merged,
avg(duration_ms) AS avg_merge_duration_ms
FROM system.part_log
WHERE event_type = 'MergeParts'
AND event_time > now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute;
How to diagnose it
- Confirm merges are absent or stuck. Query
system.merges. Zero rows while parts are growing means the pool is blocked. If rows exist, sampleprogressandelapsedtwice with a 60-second interval. Identicalprogressmeans a hung merge. - Determine whether mutations own the pool. If
system.mergesshowsis_mutation = 1for every task, querysystem.mutationsforis_done = 0. A mutation with flat or slowly decreasingparts_to_dois consuming slots without freeing them quickly. - Inspect pool utilization. In
system.metrics, compareBackgroundMergesAndMutationsPoolTaskagainstBackgroundMergesAndMutationsPoolSize. If the active task count is at the size limit, the pool is fully saturated. - Rule out disk space. Merges need temporary space to write the merged part before deleting sources. Check
system.disks. Iffree_spaceorunreserved_spaceis near zero, ClickHouse will not schedule new merges. - Rule out I/O saturation. Run
iostat -xz 1 5. Ifawaitis elevated on the data volume while merge throughput insystem.part_logis near zero, disk I/O is the bottleneck even though threads are allocated. - Measure insert pressure. Check
system.eventsforDelayedInserts. If this counter is climbing, the merge backlog is already throttling the write pipeline. - Check for orphaned merge tasks. If a single merge has occupied a slot for hours with no progress, and disk and I/O are healthy, the task may be stuck in the scheduler. Note the
result_part_namefromsystem.merges. Clearing it may require detaching and reattaching the table, which interrupts queries on that table.
flowchart TD
A[Parts count rising] --> B{system.merges empty?}
B -->|Yes| C[Check disk space and pool saturation]
B -->|No| D{Progress advancing?}
D -->|No| E[Hung merge]
D -->|Yes| F{Only mutations?}
F -->|Yes| G[Mutation monopoly]
F -->|No| H[Check I/O saturation and pool size]
C --> I[Disk full or pool blocked]
H --> J[Scale pool or reduce insert rate]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Active parts per partition | parts_to_delay_insert and parts_to_throw_insert apply per partition, not per table | Count exceeds 50% of your configured delay threshold |
BackgroundMergesAndMutationsPoolTask vs BackgroundMergesAndMutationsPoolSize | Shows whether merge tasks are queuing because all slots are full | Ratio sustained above 0.9 for more than 10 minutes |
Merge progress in system.merges | A merge with static progress is hung and may block scheduling | No change across 60-second samples |
Mutation parts_to_do in system.mutations | Mutations rewrite entire parts and compete for pool slots | is_done = 0 with flat or barely decreasing parts_to_do |
Disk unreserved_space | Merges temporarily duplicate data during execution | Approaching zero on any data volume |
DelayedInserts / RejectedInserts | Direct evidence that merge backlog is constricting the write pipeline | Any non-zero RejectedInserts is critical |
Fixes
Kill blocking mutations
If system.merges is dominated by mutations and system.mutations shows a long-running ALTER UPDATE or ALTER DELETE, kill it with KILL MUTATION. This frees pool slots immediately and allows regular merges to resume. Warning: the data change is abandoned. Reissue it later when the system is healthy.
Throttle or pause inserts
If the pool is saturated because part creation exceeds merge throughput, reduce the insert rate from upstream or batch inserts into larger blocks. Tradeoff: data pipeline lag increases, but it stops the part count from compounding and gives merges runway to catch up.
Reclaim disk space
If system.disks shows low unreserved_space, merges cannot write their output. Identify large tables with system.parts, then detach old partitions with ALTER TABLE ... DETACH PARTITION to reclaim space immediately. Warning: detached data is not queryable until reattached. Ensure it is not needed for compliance or backfill before detaching.
Increase pool resources
If CPU and I/O headroom exist, increase background_merges_mutations_concurrency_ratio to allow more concurrent merge and mutation tasks. Tradeoff: more concurrent merges increase I/O and CPU load, which can degrade query latency and may saturate disk bandwidth further if I/O is already near limits.
Address hung merges
If a single merge has stuck progress and you have ruled out disk and I/O, the merge task may be hung. There is no direct KILL command for a stuck merge. Reducing load and, if necessary, detaching and reattaching the affected table may clear the scheduler. Warning: this interrupts queries on that table.
Prevention
- Monitor part count at the partition level, not just the table level. The per-partition limit is what triggers insert delays and rejections. Table-level aggregates hide hotspots.
- Track the ratio of part creation rate to merge completion rate. If part creation exceeds merges for more than a brief burst, investigate before parts accumulate.
- Alert on
system.mutationsqueue depth and duration. A mutation backlog is the most common silent cause of merge starvation. - Keep disk usage below 80-85% and ensure
unreserved_spaceremains well above the size of your largest active partition. Merges need headroom to write temporary output. - Review partition granularity. Over-partitioning multiplies part count across the same data volume and exhausts merge capacity faster than the same data in coarser partitions.
How Netdata helps
- Correlate
MaxPartCountForPartitionfromsystem.asynchronous_metricswithBackgroundMergesAndMutationsPoolTaskto see pool saturation and part growth without manual sampling. - Alert on
RejectedInsertsandDelayedInsertsfromsystem.eventsbefore the crisis becomes user-visible. - Visualize disk space alongside merge activity to catch the early signature of a merge death spiral: parts rising while disk space flattens or falls.
- Track per-partition part counts to surface hotspots that table-level aggregates hide.
- Monitor OS-level disk I/O latency and iowait to distinguish pool saturation from I/O starvation.
Related guides
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate: /guides/clickhouse/clickhouse-merge-death-spiral/
- ClickHouse monitoring checklist: the signals every production cluster needs: /guides/clickhouse/clickhouse-monitoring-checklist/
- ClickHouse monitoring maturity model: from survival to expert: /guides/clickhouse/clickhouse-monitoring-maturity-model/
- ClickHouse DB::Exception: Too many parts - causes and fixes: /guides/clickhouse/clickhouse-too-many-parts/
- How ClickHouse actually works in production: a mental model for operators: /guides/clickhouse/how-clickhouse-works-in-production/







