$ guides / clickhouse / clickhouse-alter-update-delete-overuse ▌

Operations Guides

ClickHouse ALTER UPDATE/DELETE overuse: why mutations are not row updates

Your inserts are slowing down. system.merges shows long-running background tasks with is_mutation = 1, while the active part count climbs toward the parts_to_delay_insert threshold. Write latency rises and DelayedInserts increases even though memory, disk, and CPU are not exhausted. The culprit is usually an application treating ClickHouse as an OLTP store, issuing ALTER TABLE UPDATE or ALTER TABLE DELETE as routine operations.

In ClickHouse, these statements do not perform row-level updates. They create mutations: asynchronous background jobs that rewrite entire data parts. Mutations queue sequentially per table, consume the same background pool that performs merges, and can silently starve the merge process until parts accumulate and inserts begin to reject.

What this means

When you issue ALTER TABLE UPDATE or ALTER TABLE DELETE, ClickHouse inserts a job into system.mutations. That job rewrites every active part in the target table. Multiple mutations stack up and execute one at a time per table, creating an O(mutations x parts) workload. Each mutation occupies a slot in the background merge and mutation pool, which defaults to a concurrency ratio of 2. While mutations run, regular merges are blocked or delayed.

The result is silent merge starvation. Parts that would normally be consolidated continue to accumulate. Query performance degrades as file counts rise. Eventually insert throttling begins, followed by hard rejections when a partition crosses parts_to_throw_insert.

Lightweight DELETE FROM uses a different mechanism based on mask files. It is less disruptive than classic ALTER TABLE DELETE mutations, though deleted rows are physically removed only during subsequent merges. For update-heavy workloads, avoid mutations entirely and use ReplacingMergeTree or CollapsingMergeTree, which handle changes through insert semantics and background deduplication rather than part rewrites.

flowchart TD
    A[Frequent ALTER UPDATE or DELETE] --> B[Mutation queued per table]
    B --> C[Rewrites every active part]
    C --> D[Background merge pool consumed]
    D --> E[Regular merges blocked]
    E --> F[Active part count rises]
    F --> G[Inserts delayed then rejected]

Common causes

Cause	What it looks like	First thing to check
Application using `ALTER TABLE UPDATE` for incremental changes	`system.mutations` shows multiple pending mutations on the same table with `is_done = 0`	`SELECT ... FROM system.mutations WHERE is_done = 0`
Batch purging with `ALTER TABLE DELETE`	`parts_to_do` decreasing very slowly or stalled	`system.merges` for `is_mutation = 1` count
ORM or BI tool emitting `ALTER` statements	Frequent `Alter` entries in `system.query_log` from application users	Query log filtered by `query_kind = 'Alter'`
One-off corrections queuing behind a large table mutation	Small mutation blocked for hours behind a rewrite of a billion-row table	`create_time` and `parts_to_do` per mutation
Lack of `ReplacingMergeTree` or `CollapsingMergeTree` for mutable data	Plain `MergeTree` engine with frequent “fix” operations	`SHOW CREATE TABLE` for engine type

Quick checks

-- Active mutation queue and remaining work
SELECT database, table, mutation_id, command, create_time, is_done, parts_to_do
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;

-- Distinguish mutations from regular merges in the background pool
SELECT count(*) AS active_merges,
       countIf(is_mutation = 1) AS mutations,
       countIf(is_mutation = 0) AS regular_merges
FROM system.merges;

-- Check if inserts are already being throttled or rejected
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');

-- Worst-case part count across all partitions
SELECT value
FROM system.asynchronous_metrics
WHERE metric = 'MaxPartCountForPartition';

-- Per-table active part counts to spot accumulation
SELECT database, table, count(*) AS active_parts
FROM system.parts
WHERE active = 1
GROUP BY database, table
ORDER BY active_parts DESC
LIMIT 10;

# Recent mutation-related log lines (path varies by installation)
grep -i mutation /var/log/clickhouse-server/*.log | tail -20

How to diagnose it

Confirm mutation backlog. Query system.mutations WHERE is_done = 0. If multiple rows exist for the same table, they are queuing sequentially and each must process every part.
Check pool allocation. Query system.merges. If is_mutation = 1 dominates the active task list while part counts are high, mutations are monopolizing the background pool.
Correlate with part growth. Compare MaxPartCountForPartition over time. A sustained rise while mutations run indicates merges are starved.
Check insert pipeline pressure. Verify DelayedInserts and RejectedInserts counters. Any sustained increase confirms the write path is backing up.
Identify the source. Search system.query_log for query_kind = 'Alter' to find which user or service account is issuing mutations.
Assess replication impact. For ReplicatedMergeTree, compare system.mutations across replicas. If parts_to_do lags on one replica, that node is the bottleneck for cluster-wide completion.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`system.mutations` queue depth (`is_done = 0`)	Direct measure of pending mutation work	More than one pending mutation per table sustained
`parts_to_do` flat or decreasing slowly	Indicates stalled or extremely slow mutation	No progress for longer than 30 minutes
`system.merges` with `is_mutation = 1`	Shows pool capacity consumed by mutations	All active background tasks are mutations
`MaxPartCountForPartition`	Predicts “too many parts” before inserts fail	Greater than 500 parts per partition
`DelayedInserts` / `RejectedInserts`	Confirms merge or mutation backlog affecting writes	Any sustained increase
Active background pool tasks	Merge and mutation tasks share this pool	`system.merges` task count near the concurrency ratio limit with growing parts

Fixes

Kill unnecessary mutations

If a mutation was issued by mistake or is no longer needed, remove it from the queue.

-- Identify the mutation ID
SELECT database, table, mutation_id
FROM system.mutations
WHERE is_done = 0;

-- Kill the mutation
KILL MUTATION WHERE database = 'db' AND table = 'table' AND mutation_id = 'mutation_id';

Warning: Killing a mutation is disruptive and may leave parts in an intermediate state. Test cancellation behavior on your ClickHouse version before using this during an incident.

Switch to ReplacingMergeTree or CollapsingMergeTree

For workloads that require updates or deletes, redesign the table to avoid mutations:

ReplacingMergeTree: Insert new rows with updated values and a version column. Background merges retain the highest version per ORDER BY key. Use FINAL or deduplicate at query time.
CollapsingMergeTree: Insert cancel rows (sign = -1) and replacement rows (sign = +1). Background merges collapse matching pairs.

Tradeoff: These engines shift effort from write-time mutation to read-time filtering. Queries must handle duplicate rows that have not yet been merged.

Use lightweight DELETE

If the use case is row deletion, prefer DELETE FROM table WHERE ... over ALTER TABLE DELETE. This applies a mask file instead of rewriting entire parts. It is significantly less I/O-intensive, though cleanup still requires subsequent merges.

Collapse mutations into fewer statements

If multiple column updates are needed, batch them into a single ALTER TABLE UPDATE with a compound condition. One mutation rewriting parts once is far better than five sequential mutations each rewriting the same parts.

Increase background pool headroom

If mutations are legitimate and the server has CPU and disk headroom, you can increase merge and mutation concurrency:

SELECT name, value
FROM system.settings
WHERE name = 'background_merges_mutations_concurrency_ratio';

Only increase this if disk I/O and CPU are not saturated. More concurrency amplifies resource usage without solving the root cause of excessive mutations.

Address stuck or failed mutations

If latest_fail_reason is non-empty in system.mutations, the mutation is failing repeatedly. A failed mutation can block all subsequent mutations on that table. Check disk space, part integrity, and permissions. Resolve the underlying failure or kill the mutation to unblock the queue.

Prevention

Never use ALTER TABLE UPDATE/DELETE as a routine application pattern. Treat it as a rare administrative tool.
Design tables for immutable inserts. Use ReplacingMergeTree, CollapsingMergeTree, or versioned insert logic instead of in-place updates.
Batch corrections into single statements. Avoid issuing many small mutations.
Monitor system.mutations proactively. Alert on any is_done = 0 mutation that lasts longer than expected for your table size.
Audit query logs for ALTER patterns. Application service accounts should not hold privileges to issue ALTER TABLE UPDATE/DELETE.
Watch background pool saturation. If background task utilization is consistently near the concurrency limit, investigate whether mutations are the cause before part counts explode.

How Netdata helps

Correlates mutation backlog with merge starvation: Netdata collects system.mutations queue depth alongside system.merges activity, surfacing when is_mutation tasks dominate the background pool.
Tracks part count growth: Charts MaxPartCountForPartition and per-table active parts to expose accumulation caused by blocked merges.
Surfaces insert pipeline pressure: Alerts on DelayedInserts and RejectedInserts to catch downstream write-path impact.
Links resource usage: Correlates mutation-driven I/O with disk latency and memory pressure, distinguishing mutation overload from query load.

The Netdata solution

ClickHouse monitoring with Netdata

Netdata monitors ClickHouse with per-second metrics and ML anomaly detection. Track merge debt, memory usage, replication lag, Keeper/ZooKeeper saturation, and disk headroom against the host signals that drive them.

See ClickHouse monitoring → Start monitoring free

ClickHouse ALTER UPDATE/DELETE overuse: why mutations are not row updates

ClickHouse ALTER UPDATE/DELETE overuse: why mutations are not row updates

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Kill unnecessary mutations

Switch to ReplacingMergeTree or CollapsingMergeTree

Use lightweight DELETE

Collapse mutations into fewer statements

Increase background pool headroom

Address stuck or failed mutations

Prevention

How Netdata helps

Related guides

ClickHouse monitoring with Netdata