ClickHouse mutations silently blocking merges: the hidden cause of part growth
You are watching part counts climb on a ClickHouse node. system.merges shows active background tasks, health checks return Ok, and the log shows no mutation errors. Yet inserts are slowing and MaxPartCountForPartition is trending toward the delay threshold. The pool looks busy, so merges should be keeping up. They are not: some of those busy slots are mutations, and mutations starve merges silently.
ClickHouse implements ALTER UPDATE and ALTER DELETE as mutations: asynchronous background jobs that rewrite matching data parts. Mutations and regular merges share the same BackgroundMergesAndMutationsPool. A mutation holds one pool slot for minutes to hours. If the backlog is large enough, merges fall behind insert-driven part creation and the system drifts toward the Too many parts threshold without logging a mutation-specific error.
By default, a mutation starts only when at least number_of_free_entries_in_pool_to_execute_mutation slots in the pool are free. The default is commonly cited as 20 on a 32-slot pool . On a busy server with steady inserts, merges rarely free that many slots, so mutations sit while parts accumulate.
What this means
Every INSERT creates immutable data parts. Background merges consolidate smaller parts to keep query performance stable and avoid file descriptor exhaustion. ALTER UPDATE and ALTER DELETE are not in-place edits; they create mutation tasks that rewrite every matching part. Each mutation consumes one slot in the shared pool, runs to completion or until killed with KILL MUTATION, and survives server restarts.
The default pool size is background_pool_size (16) multiplied by background_merges_mutations_concurrency_ratio (2), giving 32 slots. On an active node, merges keep the pool occupied, mutations cannot start, and the queue grows invisible to operators who only watch part counts.
When a mutation finally runs, it monopolizes its slot. Regular merges are starved, inserts create new parts that are not consolidated, and part count climbs. system.merges shows active tasks, but the work is mutation, not merge.
flowchart TD
A[ALTER UPDATE/DELETE creates mutation] --> B[Mutation waits for pool slot]
B --> C[Mutation occupies pool slot]
C --> D[Regular merges starved]
D --> E[Parts accumulate from inserts]
E --> F[DelayedInserts then RejectedInserts]
C --> G[system.merges shows is_mutation=1]
G --> H[Operator misses the blockage]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
Heavy ALTER UPDATE/DELETE on a large table | system.mutations shows is_done = 0 with parts_to_do not decreasing | SELECT * FROM system.mutations WHERE is_done = 0 |
| Mutation queue backlog | Multiple mutations queued per table; each runs sequentially | SELECT database, table, count() FROM system.mutations WHERE is_done = 0 GROUP BY database, table |
| Pool slot monopolized by a long mutation | system.merges shows is_mutation = 1 with high elapsed | SELECT is_mutation, elapsed, progress FROM system.merges ORDER BY elapsed DESC |
| Replicated replica lagging on mutations | One replica has mutations stuck while others are caught up | Query system.mutations on each replica or use clusterAllReplicas |
| Insufficient pool headroom for burst workload | BackgroundMergesAndMutationsPoolTask near pool size even without mutations | SELECT metric, value FROM system.metrics WHERE metric LIKE 'Background%Pool%' |
Quick checks
-- Count incomplete mutations
SELECT count(*) AS running_mutations FROM system.mutations WHERE is_done = 0;
-- List active mutations with remaining work
SELECT database, table, mutation_id, command, create_time, parts_to_do, latest_fail_reason
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
-- Distinguish merges from mutations in the pool
SELECT
count(*) AS active_tasks,
countIf(is_mutation = 1) AS mutations,
countIf(is_mutation = 0) AS regular_merges
FROM system.merges;
-- Check pool utilization
SELECT metric, value FROM system.metrics WHERE metric LIKE 'Background%Pool%';
-- Find worst partition part counts
SELECT database, table, partition_id, count(*) AS parts_in_partition
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts_in_partition DESC
LIMIT 10;
-- Check if inserts are already suffering
SELECT event, value FROM system.events WHERE event IN ('DelayedInserts', 'RejectedInserts');
-- Check for failed mutations that will never complete
SELECT database, table, mutation_id, latest_fail_reason
FROM system.mutations
WHERE latest_fail_reason != '';
-- Check mutation status across a cluster
SELECT hostName() AS replica, database, table, mutation_id, parts_to_do
FROM clusterAllReplicas('your_cluster_name', system.mutations)
WHERE is_done = 0
ORDER BY replica, create_time;
How to diagnose it
Confirm part count is growing. Query
system.partsfor active parts per partition twice, 60 seconds apart. If the worst partition is growing by more than a few parts per minute and regular merges are not completing, the merge subsystem is falling behind.Inspect what is running in the pool. Query
system.merges. Look foris_mutation = 1. If mutations account for most or all active tasks, they are the blockage. Noteelapsedandprogress. A mutation withprogressstuck for minutes is likely the culprit.Inspect the mutation queue. Query
system.mutationswhereis_done = 0. Checkparts_to_do. If it is flat or increasing over a 5-minute window, the mutation is not making progress. Checklatest_fail_reason; a non-empty value means the mutation is retrying and will not self-resolve.Measure pool saturation. Query
system.metricsforBackgroundMergesAndMutationsPoolTask. If this value is at or near the pool size, every slot is occupied. When mutations hold those slots, regular merges cannot enter the pool.Check insert health. Query
system.eventsforDelayedInsertsandRejectedInserts. Even small increments ofDelayedInsertsconfirm that part accumulation is already affecting the write pipeline.Verify cross-replica state for replicated tables. Run the
clusterAllReplicasquery againstsystem.mutations. If one replica is mutation-blocked while others are healthy, the issue is local to that replica’s background pool or disk.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
system.mutations.parts_to_do | Tracks remaining work per mutation | Flat or increasing over 30 minutes |
system.merges.is_mutation | Distinguishes mutation tasks from merges in the pool | Mutations dominate while regular merges drop |
BackgroundMergesAndMutationsPoolTask | Reveals total pool saturation | Sustained near pool size for more than 10 minutes |
MaxPartCountForPartition | Measures distance to insert throttling | More than 50% of parts_to_delay_insert |
DelayedInserts / RejectedInserts | Confirms write-path impact | Any sustained increase |
system.mutations.latest_fail_reason | Identifies permanently stuck mutations | Non-empty string |
| Per-replica mutation lag | Catches asymmetric stalls in clusters | Same mutation is_done on some replicas but not others |
Fixes
Kill a non-critical or stuck mutation
Stop a stuck or mistaken mutation with KILL MUTATION:
KILL MUTATION WHERE database = 'db_name' AND table = 'table_name' AND mutation_id = 'mutation_id';
Warning: Killing a mutation is disruptive. Parts already rewritten remain changed; the mutation does not roll back completed work. Subsequent merges and mutations process those parts normally. Use this only when the mutation is safe to discard.
Lower the mutation idle-slot gate
number_of_free_entries_in_pool_to_execute_mutation controls how many slots must be free before a mutation starts. Lowering it lets mutations start with fewer idle slots, which can help a critical mutation finish so merges can resume.
Tradeoff: A lower gate lets mutations start sooner, but mutations still consume pool slots and starve merges while they run. This is a temporary relief valve, not a fix for chronic mutation overload.
Increase background pool capacity
If CPU and I/O headroom exist, raise background_merges_mutations_concurrency_ratio (or background_pool_size on older configurations) to add slots. These settings typically require a server restart or config reload.
Tradeoff: More threads increase disk and CPU contention. Verify that storage latency and CPU utilization are below 70% before raising the pool size, or you may turn merge starvation into I/O saturation.
Reduce mutation frequency
Avoid treating ClickHouse like an OLTP store. Batch changes instead of issuing many small ALTER UPDATE statements. If your version supports lightweight deletes and the use case fits, prefer DELETE FROM over ALTER DELETE; lightweight deletes mark rows with a hidden column rather than rewriting parts immediately, though they still require eventual cleanup merges.
For replicated tables: clear per-replica blockages
Each replica processes mutations independently. If one replica is mutation-blocked while others are healthy, run the diagnostic queries locally on that replica and kill the stuck mutation there.
Do not restart the server to clear a mutation. Mutations are persisted and resume after restart, so a restart only prolongs the stall.
Prevention
- Monitor mutations as a first-class signal. A dashboard that shows
system.partsbut notsystem.mutationswill miss the root cause of merge starvation. - Alert on mutation stagnation. Alert when
parts_to_dois not decreasing over a 15-minute window, and when any mutation has a non-emptylatest_fail_reason. - Reserve pool headroom. Target background pool utilization below 70% during normal operation. This leaves room for mutations without choking merges.
- Review schema-change patterns. Flag application code or ETL pipelines that issue frequent
ALTER UPDATE/DELETE. ClickHouse mutations are designed for occasional bulk corrections, not continuous row-level updates. - Test mutation duration. Before running a mutation on a billion-row table, test it on a representative partition and measure elapsed time and disk I/O.
How Netdata helps
- Correlate active mutation count from
system.mutationswith active part count and merge pool utilization. A flat mutation line alongside a climbing part-count line is the classic starvation signature. - Alert on the ratio of mutation tasks to total background tasks. When mutations dominate the pool, part growth follows within minutes.
- Combine
DelayedInsertsrate with mutation backlog to distinguish mutation blocking from insert overload or poor batching. - Surface per-replica mutation states in clustered deployments to catch asymmetric stalls that cluster-wide averages hide.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse async inserts: when async_insert fixes too-many-parts and when it hides it
- ClickHouse DelayedInserts climbing: the warning before too-many-parts
- ClickHouse insert latency rising: the leading indicator of write-pipeline trouble
- ClickHouse Memory limit (for query) exceeded: per-query limits and GROUP BY/JOIN blowups
- ClickHouse Memory limit (total) exceeded - server-wide memory pressure and fixes
- ClickHouse memory pressure death spiral: runaway queries, retries, and OOM
- ClickHouse MemoryTracking vs MemoryResident: reading the memory gap correctly
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate
- ClickHouse merge duration climbing: the leading indicator of part explosion
- ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool
- ClickHouse monitoring checklist: the signals every production cluster needs







