ClickHouse DB::Exception: Too many parts - causes and fixes

When logs show DB::Exception: Too many parts (N). Merges are processing significantly slower than inserts (exception code 252), INSERTs that succeeded yesterday are now failing. Data backs up in ingestors, message queues, or client buffers, and every retry adds load to a system already drowning in small files.

At least one partition in a MergeTree table has exceeded the hard limit for active data parts, and background merges cannot consolidate them fast enough. Do not restart the server or raise the limit indefinitely. Reduce the part creation rate, remove blockers from the merge pipeline, and give background threads runway to catch up.

What this means

Each INSERT into a MergeTree table creates one or more immutable data parts on disk: self-contained directories of column files, indexes, and metadata. Background merges combine smaller parts into larger ones. Fewer parts means faster queries and lower file descriptor pressure.

The safety limits are per-partition, not per-table. The two thresholds that matter are:

  • parts_to_delay_insert (default 1000): When a partition exceeds this, ClickHouse slows INSERTs artificially to give merges time to catch up.
  • parts_to_throw_insert (default 3000): When a partition exceeds this, ClickHouse throws exception 252 and rejects the INSERT entirely.

The delay logic is active only if the average part size in that partition is less than or equal to max_average_part_size_for_partition (default 1 GiB). If your parts are smaller than that, the throttle applies.

Root cause: parts are created faster than merges can retire them.

flowchart TD
    A[High-frequency small inserts] --> B[Active parts per partition grow]
    B --> C[Merges fall behind]
    H[Blocking mutations] --> C
    I[Disk full or I/O saturated] --> C
    C --> D[Part count hits delay threshold]
    D --> E[DelayedInserts increase]
    E --> F[Part count hits throw threshold]
    F --> G[RejectedInserts: TOO_MANY_PARTS]

Common causes

CauseWhat it looks likeFirst thing to check
Many small insertsPart count grows steadily; writers send single-row or low-batch insertssystem.query_log for insert frequency versus rows per query
Unbatched or mis-tuned async insertsAsync inserts flush too aggressively, creating many small partssystem.query_log for frequent small flushes from async insert settings
Blocking mutationsMerge pool threads consumed by ALTER UPDATE/DELETE; parts accumulate silently while the mutation runssystem.mutations for entries where is_done = 0
Insufficient merge throughputMerges are running but completing too slowly due to disk saturation or CPU limitssystem.merges for progress over time, and OS disk latency
Over-partitioningPartition key has too much cardinality (e.g., by hour or by high-cardinality ID), multiplying parts across many partitionssystem.parts grouped by partition_id to see skew
Disk space pressureFree space is too low for merges to write temporary output; merges halt before the limit is reachedsystem.disks for free_space and unreserved_space
Replication fetch backlogReplicas fetch many small parts instead of merging locally; local part counts risesystem.replication_queue for repeated GET_PART entries

Quick checks

Run these read-only probes to assess scope and severity.

# Check if inserts are already being rejected
clickhouse-client -q "SELECT value FROM system.events WHERE event = 'RejectedInserts'"
# Check the worst partition server-wide
clickhouse-client -q "SELECT value FROM system.asynchronous_metrics WHERE metric = 'MaxPartCountForPartition'"
# Top partitions by active part count
clickhouse-client -q "SELECT database, table, partition_id, count() AS parts, sum(rows) AS rows, formatReadableSize(sum(bytes_on_disk)) AS size FROM system.parts WHERE active = 1 GROUP BY database, table, partition_id ORDER BY parts DESC LIMIT 20"
# Currently running merges and mutations
clickhouse-client -q "SELECT database, table, elapsed, progress, num_parts, is_mutation, formatReadableSize(memory_usage) AS mem FROM system.merges ORDER BY elapsed DESC"
# Active mutations that may be starving merges
clickhouse-client -q "SELECT database, table, mutation_id, parts_to_do, latest_fail_reason FROM system.mutations WHERE is_done = 0 ORDER BY create_time"
# Disk free space on all configured volumes
clickhouse-client -q "SELECT name, formatReadableSize(free_space) AS free, formatReadableSize(total_space) AS total, round(100 * (1 - free_space / total_space), 1) AS used_pct FROM system.disks"
# Insert batch size pattern over the last 10 minutes
clickhouse-client -q "SELECT quantile(0.50)(written_rows) AS p50_rows, quantile(0.99)(written_rows) AS p99_rows, count() AS insert_queries FROM system.query_log WHERE query_kind = 'Insert' AND type = 'QueryFinish' AND event_time > now() - INTERVAL 10 MINUTE"

How to diagnose it

  1. Identify affected partitions. Use the system.parts query above. The limit is per-partition, so a table with 50 partitions can have one at 2900 parts while the rest are healthy. That single partition rejects inserts.
  2. Check if merges are running. Query system.merges. If merges are active and progress is advancing, the system is working but overloaded. If no merges run despite high part counts, merges are blocked.
  3. Look for blocking mutations. Query system.mutations. Any is_done = 0 entry with high parts_to_do or a flat parts_to_do over time means a mutation is monopolizing merge threads.
  4. Check disk headroom. Query system.disks. If free_space or unreserved_space is near zero, merges cannot write temporary output and will stop. This is a common death spiral.
  5. Analyze insert patterns. Use system.query_log or compare InsertQuery and InsertedRows from system.events. Thousands of insert queries with only dozens of rows each means writers are under-batching.
  6. Check replication queue (if replicated). Query system.replication_queue for stuck GET_PART entries with high num_tries. Replicas fetching many small parts inflate local part counts.
  7. Correlate with delay and rejection counters. Query DelayedInserts and RejectedInserts in system.events. Rising DelayedInserts means you are in the grace period before hard failures.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
MaxPartCountForPartitionServer-wide worst partition; direct predictor of imminent rejection> 50% of parts_to_throw_insert (default 1500)
RejectedInsertsHard insert failures; data loss if the upstream pipeline lacks bufferingAny sustained increase over zero
DelayedInsertsInserts being throttled; merge stress is developingSteady increase over a 10-minute window
Active parts per partitionDistance to the hard limit; reveals per-partition hotspotsAny partition > 1000 parts
Merge activity (system.merges)Whether merges are actually executing and completingZero merges running while parts are elevated
Mutation queue (system.mutations)Mutations consume the same thread pool and can silently block mergesAny is_done = 0 with parts_to_do flat for > 30 minutes
Disk free space (system.disks)Merges need temporary space to write combined parts before deleting sources< 20% free, or unreserved_space approaching zero
Background pool utilizationMerge thread saturationSustained active tasks near pool limit with growing parts
Insert rows per queryBatch size efficiency; small batches multiply partsP50 or average < 1000 rows per insert

Fixes

Throttle or pause ingestors

The fastest way to stop the spiral is to reduce the rate of new part creation. Pause non-critical ingestion pipelines or reduce producer throughput. This gives background merges runway to consolidate the backlog. The tradeoff is upstream backpressure.

Kill blocking mutations

If system.mutations shows long-running ALTER UPDATE/DELETE operations consuming merge threads, kill them:

# List active mutations
clickhouse-client -q "SELECT database, table, mutation_id FROM system.mutations WHERE is_done = 0"

Use KILL MUTATION for the specific mutation. This immediately frees merge pool capacity. The tradeoff is that partial work remains applied, and you must reissue the operation later when the system is healthy.

Increase batch size on writers

Target at least 1,000 rows per INSERT, ideally 10,000-100,000. Aim for approximately one INSERT per second in synchronous mode. The tradeoff is higher latency on the ingest path, which usually requires client-side buffering or micro-batching in the pipeline.

Enable or tune async inserts

For high-frequency small writes, use server-side batching. Set async_insert = 1, wait_for_async_insert = 1, and tune flush thresholds to your tolerance:

  • async_insert_max_data_size: default 100 MiB; consider 5-10 MiB if you need faster flush under pressure.
  • async_insert_busy_timeout_ms: default 200 ms on self-hosted, 1000 ms on Cloud; consider raising to 2000-5000 ms to allow larger buffers.
  • async_insert_max_query_number: default 450.

Async inserts do not apply to INSERT INTO ... SELECT. Schema validation is deferred to flush time; if any row fails validation, the entire payload is rejected.

Force an emergency merge

As a temporary stopgap, run:

OPTIMIZE TABLE <table> FINAL;

Warning: This is resource-intensive and will spike CPU and disk I/O. Use it only to buy time while you fix the insert pattern, not as a routine operation.

If the crisis is isolated to one partition, use OPTIMIZE TABLE <table> PARTITION '<partition_id>' FINAL to reduce blast radius.

Adjust thresholds (temporary relief)

You can raise parts_to_delay_insert and parts_to_throw_insert at the table or server level. This raises the ceiling but does not fix the underlying merge deficit. If you use this, treat it as a bandage while you restructure ingestion.

Free disk space

If merges are halted by insufficient space, create headroom immediately. Identify large tables with system.parts, detach old partitions if they are no longer queried, or expand storage.

ALTER TABLE <table> DETACH PARTITION '<partition_id>';

Warning: Detached data is unreadable until reattached or dropped. Detached partitions stop consuming merge budget, but ensure you have a retention plan before running this.

Prevention

  • Monitor MaxPartCountForPartition and the rate of part count growth, not just the absolute value. A partition at 500 parts and climbing fast is more dangerous than one at 1500 and stable.
  • Enforce minimum batch sizes in application code or ingestion middleware.
  • If you need high-frequency inserts, use async inserts with conservative, workload-tested flush thresholds.
  • Avoid over-partitioning. Partition by day unless your data volume justifies hour-level partitioning.
  • Keep disk usage below 80-85% to ensure merges always have room to write temporary output.
  • Monitor system.mutations and alert on any non-zero is_done = 0 queue lasting more than 30 minutes.
  • Treat ZooKeeper or ClickHouse Keeper as a first-class dependency. Session expiry and replication stalls on replicated tables can indirectly suppress merge progress.

How Netdata helps

  • Correlate MaxPartCountForPartition with disk I/O latency and background pool utilization to distinguish a merge backlog from disk saturation.
  • Alert on derivatives of RejectedInserts and DelayedInserts to catch the crisis before inserts hard-fail.
  • Track per-table insert rates alongside query latency to spot batch-size regressions immediately.
  • Surface ClickHouse metrics next to OS memory and file descriptor usage to expose resource competition between merges and queries.