ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool

When insert latency climbs and system.merges is empty while active parts grow, the background merge pool is likely stalled or starved. ClickHouse relies on background merges to consolidate immutable parts after each INSERT. Without merges, parts accumulate: query scans open more files, memory pressure shifts to the mark cache and file descriptor tables, and the system approaches the TOO_MANY_PARTS threshold.

A stalled pool is not always obvious. The server accepts queries, replication appears healthy, and the HTTP ping endpoint returns Ok.. The real signals are rising part counts per partition, insert delays, and a merge pool that is either fully occupied or idle. Because merges and mutations share the background pool, a heavy ALTER UPDATE or ALTER DELETE can silently monopolize the threads that should consolidate parts.

Distinguish a genuinely stuck merge from an oversubscribed pool, identify the root cause, and recover before inserts are rejected.

What this means

INSERTs create immutable data parts. ClickHouse schedules background merges to combine smaller parts into larger ones. Fewer parts means fewer file descriptors, faster index lookups, and lower query latency. This is the only automatic way to reduce part count.

When merges are not keeping up, either they run too slowly to offset new part creation, or they are blocked entirely. Both lead to rising active part counts, which eventually trigger DelayedInserts and then RejectedInserts.

The background pool is governed by background_merges_mutations_concurrency_ratio (default 2). This controls how many merge and mutation tasks run concurrently relative to the physical thread pool. When slots are saturated by slow operations, or when a merge cannot start because disk space is insufficient, the pool bottlenecks. Because merges are the only way to reduce part count, a stalled pool creates a one-way ratchet toward insert rejection.

Common causes

CauseWhat it looks likeFirst thing to check
Mutations monopolizing the poolsystem.merges shows only is_mutation = 1; part count rises while mutations runsystem.mutations for is_done = 0 entries and flat parts_to_do
Disk I/O saturationMerge threads exist but elapsed grows while progress crawls; query latency also elevatediostat -xz 1 5 and merge bytes/sec from system.part_log
Merge pool fully saturatedBackgroundMergesAndMutationsPoolTask at or near BackgroundMergesAndMutationsPoolSizesystem.metrics for pool utilization ratio
Insufficient disk space for merge outputsystem.merges is empty or merges refuse to start; disk usage highsystem.disks for free_space and unreserved_space
Hung individual mergeOne merge in system.merges with identical progress across samplesResample system.merges over 60 seconds

Quick checks

Run these read-only checks in order.

Check what is actually running in the merge pool:

-- Active merges and mutations with resource usage
SELECT
    database,
    table,
    elapsed,
    progress,
    num_parts,
    is_mutation,
    formatReadableSize(total_size_bytes_compressed) AS total_size,
    formatReadableSize(memory_usage) AS mem_used
FROM system.merges
ORDER BY elapsed DESC;

If this returns zero rows while parts are actively growing, the pool is blocked.

Count merges versus mutations:

-- Distinguish regular merges from mutation tasks
SELECT
    count(*) AS active_merges,
    countIf(is_mutation = 1) AS mutations,
    countIf(is_mutation = 0) AS regular_merges
FROM system.merges;

Check pool utilization directly:

-- Background pool active tasks vs configured size
SELECT metric, value
FROM system.metrics
WHERE metric LIKE '%Background%Pool%';

Check part count at the partition level, because limits are per-partition:

-- Part count per partition (the critical granularity)
SELECT
    database,
    table,
    partition_id,
    count(*) AS parts_in_partition
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts_in_partition DESC
LIMIT 20;

Check for insert backpressure:

-- Insert backpressure counters
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');

Check for mutation backlog:

-- Pending mutations that may be consuming pool slots
SELECT
    database,
    table,
    mutation_id,
    command,
    create_time,
    is_done,
    parts_to_do,
    latest_fail_time,
    latest_fail_reason
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;

Check disk space across all configured volumes:

-- Free space including reservation accounting
SELECT
    name,
    path,
    formatReadableSize(free_space) AS free,
    formatReadableSize(total_space) AS total,
    round(100 * (1 - free_space / total_space), 1) AS used_pct,
    formatReadableSize(unreserved_space) AS unreserved
FROM system.disks;

Check OS-level I/O latency:

# Device-level I/O latency and utilization
iostat -xz 1 5

Check recent merge throughput to see if completion rate has collapsed:

-- Merge completion trend over the last hour
SELECT
    toStartOfMinute(event_time) AS minute,
    count() AS merges_completed,
    sum(rows) AS rows_merged,
    avg(duration_ms) AS avg_merge_duration_ms
FROM system.part_log
WHERE event_type = 'MergeParts'
  AND event_time > now() - INTERVAL 1 HOUR
GROUP BY minute
ORDER BY minute;

How to diagnose it

  1. Confirm merges are absent or stuck. Query system.merges. Zero rows while parts are growing means the pool is blocked. If rows exist, sample progress and elapsed twice with a 60-second interval. Identical progress means a hung merge.
  2. Determine whether mutations own the pool. If system.merges shows is_mutation = 1 for every task, query system.mutations for is_done = 0. A mutation with flat or slowly decreasing parts_to_do is consuming slots without freeing them quickly.
  3. Inspect pool utilization. In system.metrics, compare BackgroundMergesAndMutationsPoolTask against BackgroundMergesAndMutationsPoolSize. If the active task count is at the size limit, the pool is fully saturated.
  4. Rule out disk space. Merges need temporary space to write the merged part before deleting sources. Check system.disks. If free_space or unreserved_space is near zero, ClickHouse will not schedule new merges.
  5. Rule out I/O saturation. Run iostat -xz 1 5. If await is elevated on the data volume while merge throughput in system.part_log is near zero, disk I/O is the bottleneck even though threads are allocated.
  6. Measure insert pressure. Check system.events for DelayedInserts. If this counter is climbing, the merge backlog is already throttling the write pipeline.
  7. Check for orphaned merge tasks. If a single merge has occupied a slot for hours with no progress, and disk and I/O are healthy, the task may be stuck in the scheduler. Note the result_part_name from system.merges. Clearing it may require detaching and reattaching the table, which interrupts queries on that table.
flowchart TD
    A[Parts count rising] --> B{system.merges empty?}
    B -->|Yes| C[Check disk space and pool saturation]
    B -->|No| D{Progress advancing?}
    D -->|No| E[Hung merge]
    D -->|Yes| F{Only mutations?}
    F -->|Yes| G[Mutation monopoly]
    F -->|No| H[Check I/O saturation and pool size]
    C --> I[Disk full or pool blocked]
    H --> J[Scale pool or reduce insert rate]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Active parts per partitionparts_to_delay_insert and parts_to_throw_insert apply per partition, not per tableCount exceeds 50% of your configured delay threshold
BackgroundMergesAndMutationsPoolTask vs BackgroundMergesAndMutationsPoolSizeShows whether merge tasks are queuing because all slots are fullRatio sustained above 0.9 for more than 10 minutes
Merge progress in system.mergesA merge with static progress is hung and may block schedulingNo change across 60-second samples
Mutation parts_to_do in system.mutationsMutations rewrite entire parts and compete for pool slotsis_done = 0 with flat or barely decreasing parts_to_do
Disk unreserved_spaceMerges temporarily duplicate data during executionApproaching zero on any data volume
DelayedInserts / RejectedInsertsDirect evidence that merge backlog is constricting the write pipelineAny non-zero RejectedInserts is critical

Fixes

Kill blocking mutations

If system.merges is dominated by mutations and system.mutations shows a long-running ALTER UPDATE or ALTER DELETE, kill it with KILL MUTATION. This frees pool slots immediately and allows regular merges to resume. Warning: the data change is abandoned. Reissue it later when the system is healthy.

Throttle or pause inserts

If the pool is saturated because part creation exceeds merge throughput, reduce the insert rate from upstream or batch inserts into larger blocks. Tradeoff: data pipeline lag increases, but it stops the part count from compounding and gives merges runway to catch up.

Reclaim disk space

If system.disks shows low unreserved_space, merges cannot write their output. Identify large tables with system.parts, then detach old partitions with ALTER TABLE ... DETACH PARTITION to reclaim space immediately. Warning: detached data is not queryable until reattached. Ensure it is not needed for compliance or backfill before detaching.

Increase pool resources

If CPU and I/O headroom exist, increase background_merges_mutations_concurrency_ratio to allow more concurrent merge and mutation tasks. Tradeoff: more concurrent merges increase I/O and CPU load, which can degrade query latency and may saturate disk bandwidth further if I/O is already near limits.

Address hung merges

If a single merge has stuck progress and you have ruled out disk and I/O, the merge task may be hung. There is no direct KILL command for a stuck merge. Reducing load and, if necessary, detaching and reattaching the affected table may clear the scheduler. Warning: this interrupts queries on that table.

Prevention

  • Monitor part count at the partition level, not just the table level. The per-partition limit is what triggers insert delays and rejections. Table-level aggregates hide hotspots.
  • Track the ratio of part creation rate to merge completion rate. If part creation exceeds merges for more than a brief burst, investigate before parts accumulate.
  • Alert on system.mutations queue depth and duration. A mutation backlog is the most common silent cause of merge starvation.
  • Keep disk usage below 80-85% and ensure unreserved_space remains well above the size of your largest active partition. Merges need headroom to write temporary output.
  • Review partition granularity. Over-partitioning multiplies part count across the same data volume and exhausts merge capacity faster than the same data in coarser partitions.

How Netdata helps

  • Correlate MaxPartCountForPartition from system.asynchronous_metrics with BackgroundMergesAndMutationsPoolTask to see pool saturation and part growth without manual sampling.
  • Alert on RejectedInserts and DelayedInserts from system.events before the crisis becomes user-visible.
  • Visualize disk space alongside merge activity to catch the early signature of a merge death spiral: parts rising while disk space flattens or falls.
  • Track per-partition part counts to surface hotspots that table-level aggregates hide.
  • Monitor OS-level disk I/O latency and iowait to distinguish pool saturation from I/O starvation.
  • ClickHouse merge death spiral: when parts accumulate faster than merges consolidate: /guides/clickhouse/clickhouse-merge-death-spiral/
  • ClickHouse monitoring checklist: the signals every production cluster needs: /guides/clickhouse/clickhouse-monitoring-checklist/
  • ClickHouse monitoring maturity model: from survival to expert: /guides/clickhouse/clickhouse-monitoring-maturity-model/
  • ClickHouse DB::Exception: Too many parts - causes and fixes: /guides/clickhouse/clickhouse-too-many-parts/
  • How ClickHouse actually works in production: a mental model for operators: /guides/clickhouse/how-clickhouse-works-in-production/