ClickHouse active part count growing: reading MaxPartCountForPartition before it pages

Rising MaxPartCountForPartition is the leading indicator for the most common ClickHouse production failure: parts accumulating faster than background merges can consolidate them. A single partition crossing 500 active parts means you have hours, not days, before inserts delay and eventually fail with TOO_MANY_PARTS.

The thresholds are per-partition. A table with ten partitions at fifty parts each is healthy; one partition at 950 parts is approaching throttling. Projections create hidden parts inside the same table that count toward the same limits. Materialized views route inserts to separate target tables that can hit their own limits independently.

What this means

Every INSERT creates immutable data parts. Background merges combine smaller parts into larger ones. When inserts outpace merges, merge debt grows exponentially because merge cost increases with part count.

By default, parts_to_delay_insert is 1000 per partition; above this ClickHouse slows inserts. At parts_to_throw_insert (default 3000 in recent releases, 300 in older releases) , inserts are rejected. These defaults are configurable per table, but the mechanics are the same: once merge debt accumulates, recovery becomes harder.

flowchart TD
    A[High insert rate or micro-batches] --> B[Active parts per partition rise]
    B --> C[Merges fall behind]
    C --> D[Part count crosses delay threshold]
    D --> E[Insert throttling begins]
    E --> F[Part count crosses throw threshold]
    F --> G[Inserts rejected with TOO_MANY_PARTS]
    C --> H[Query latency degrades
more files to scan]

Common causes

CauseWhat it looks likeFirst thing to check
Micro-batch insertsPart count rises steadily; many insert queries but few rows per insertsystem.query_log for rows written per insert
Mutations blocking mergesPart count grows despite active inserts; few regular merges runningsystem.mutations for is_done = 0 entries
Disk space too low for merge temp spaceParts high, merges absent or stuck; disk near fullsystem.disks unreserved_space
Over-partitioned tableOne or a few partitions explode while others are flat; hourly or daily partition keys on high-volume tablessystem.parts grouped by partition_id
Materialized view amplificationBase table part count normal, but MV target tables grow rapidlyActive parts in target tables of attached materialized views
Projection bloatBase table within limits but hidden projection parts drive total count and latencyWhether projections are defined on the table

Quick checks

Run these read-only queries in order. None block writes or mutate state.

-- Server-wide worst partition
SELECT value FROM system.asynchronous_metrics WHERE metric = 'MaxPartCountForPartition';
-- Per-partition breakdown
SELECT database, table, partition_id, count(*) AS parts_in_partition
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts_in_partition DESC
LIMIT 20;
-- Insert batching pattern from the last hour
SELECT
    query,
    count() AS inserts,
    quantiles(0.5, 0.99)(written_rows)[1] AS p50_rows,
    quantiles(0.5, 0.99)(written_rows)[2] AS p99_rows
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_kind = 'Insert'
  AND event_time > now() - INTERVAL 1 HOUR
GROUP BY query
ORDER BY inserts DESC
LIMIT 10;
-- Currently running merges and mutations
SELECT database, table, elapsed, progress, num_parts, is_mutation
FROM system.merges
ORDER BY elapsed DESC;
-- Pending mutations that may be consuming merge threads
SELECT database, table, mutation_id, command, parts_to_do
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
-- Insert throttling and rejection counters
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
-- Disk space available for merge temporary output
SELECT name, path,
       formatReadableSize(free_space) AS free,
       formatReadableSize(unreserved_space) AS unreserved
FROM system.disks;
-- Background pool saturation
SELECT metric, value
FROM system.metrics
WHERE metric LIKE 'Background%Pool%';

How to diagnose it

  1. Confirm the scope is per-partition. Run the per-partition query from Quick checks. A table-level aggregate hides hotspots. One partition at 900 parts is an emergency even if the table average is 100.
  2. Determine if merges are keeping up. Check system.merges. If merges are running and progress is advancing, the system is working but may be under-provisioned for the insert rate. If no merges are running despite high part counts, the pool is blocked or starved.
  3. Check for mutation blockage. Run the system.mutations query. Mutations rewrite entire parts and share the background pool with merges. A single long-running mutation on a large table can monopolize threads and silently allow parts to accumulate.
  4. Verify disk headroom for temporary merge output. Merges write the full merged result before deleting source parts. If unreserved_space in system.disks is smaller than the partition being merged, merges will stall. There is no fixed multiplier; ensure enough free space for the largest expected merge output.
  5. Inspect insert batching behavior. Query system.query_log for insert patterns. Consistently under 1000 rows per insert means micro-batching. This is the most frequent root cause of part accumulation.
  6. Check for hidden amplification. Materialized views write to separate target tables; check their active part counts. Projections create hidden parts within the base table and inflate the same per-partition limits.
  7. Correlate with event counters. If DelayedInserts is increasing, the system is already throttling. Any non-zero RejectedInserts means inserts are being dropped and the situation is critical.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
MaxPartCountForPartitionOne metric reveals the worst partition across all tablesSustained value above 500, or positive derivative over 30 minutes
Active parts per partitionLimit enforcement and performance degradation are per-partitionAny partition above 50% of parts_to_delay_insert
Merge activity (system.merges)Proof that merges are executing and completingZero merges running while parts are high and inserts are active
Background pool utilizationWhether threads are available to process mergesSustained above 90% with growing part counts
DelayedInserts / RejectedInsertsClickHouse signaling write-path stressAny increase in RejectedInserts; steady growth in DelayedInserts
Disk unreserved_spaceMerges require temporary space to write outputApproaching less than the largest expected merge output
Insert latency P99Leading indicator before delay counters incrementSustained elevation above 2x baseline

Fixes

Reduce insert rate or increase batch size

If micro-batching is the cause, throttle ingestors or increase client batch sizes to 1000+ rows per insert. If you run ClickHouse 23.x or later, enable async_insert and tune its timeout settings to let the server buffer small inserts into fewer parts. Tradeoff: higher latency before data becomes visible.

Kill mutations that block merges

If system.mutations shows long-running mutations and system.merges is dominated by is_mutation = 1 entries, kill non-critical mutations to free pool capacity:

KILL MUTATION WHERE database = '...' AND table = '...' AND mutation_id = '...';

Tradeoff: the mutation must be reissued later. Do not kill mutations if you rely on their result for data correctness.

Force a merge manually (use with caution)

For a specific hot partition:

OPTIMIZE TABLE db.table PARTITION ID '...' FINAL;

This is CPU and I/O intensive and can contend with inserts on that partition. Run only during low-traffic windows. It is not a substitute for fixing the root cause.

Increase merge concurrency if headroom exists

If CPU and disk I/O are not saturated, you can increase background_merges_mutations_concurrency_ratio. The default is 2. Raising it allows more concurrent merges but increases resource competition with queries. Monitor query latency after the change.

Free disk space immediately

If disk space is the blocker, identify the largest tables:

SELECT database, table,
       formatReadableSize(sum(bytes_on_disk)) AS disk_size
FROM system.parts
WHERE active = 1
GROUP BY database, table
ORDER BY disk_size DESC
LIMIT 10;

Detach old partitions to reclaim space quickly. Warning: detached partitions are unavailable for queries until reattached.

ALTER TABLE db.table DETACH PARTITION ID '...';

Do not restart ClickHouse to resolve disk pressure; address the space issue directly.

Prevention

  • Alert on the derivative of part count, not just the absolute value. A partition crossing 300 parts is concerning, but a partition growing at 50 parts per hour is an emergency even from a lower base.
  • Enforce batch sizes at the client. Target 1000 to 10000 rows per insert. Use async_insert if you cannot control client behavior.
  • Keep partition cardinality low. Prefer monthly partitioning (toYYYYMM) over daily (toYYYYMMDD) for high-volume tables. High-cardinality partition keys multiply parts independently.
  • Monitor mutations as first-class signals. A forgotten ALTER UPDATE can block merges for hours. Watch system.mutations parts_to_do for stall.
  • Maintain disk headroom below 80-85% usage. Keep enough unreserved space to write the largest expected merge output; do not rely on a fixed multiplier.
  • Account for projections and materialized views. Projections store additional parts inside the same table. Materialized views insert into separate tables that accumulate their own parts. Factor both into part-count budgets.

How Netdata helps

Netdata exposes MaxPartCountForPartition from system.asynchronous_metrics as a gauge, so you see the worst partition without manual queries. Correlate it with background pool task counts, disk I/O latency, and running merge activity on the same timeline to distinguish merge debt from replication lag or query storms. Alert on the derivative of active parts per partition to catch backlog before the delay threshold. For ReplicatedMergeTree, cross-reference with ZooKeeper session health and replication queue depth to separate local merge capacity issues from coordination-layer degradation.