ClickHouse DB::Exception: Too many parts - causes and fixes
When logs show DB::Exception: Too many parts (N). Merges are processing significantly slower than inserts (exception code 252), INSERTs that succeeded yesterday are now failing. Data backs up in ingestors, message queues, or client buffers, and every retry adds load to a system already drowning in small files.
At least one partition in a MergeTree table has exceeded the hard limit for active data parts, and background merges cannot consolidate them fast enough. Do not restart the server or raise the limit indefinitely. Reduce the part creation rate, remove blockers from the merge pipeline, and give background threads runway to catch up.
What this means
Each INSERT into a MergeTree table creates one or more immutable data parts on disk: self-contained directories of column files, indexes, and metadata. Background merges combine smaller parts into larger ones. Fewer parts means faster queries and lower file descriptor pressure.
The safety limits are per-partition, not per-table. The two thresholds that matter are:
parts_to_delay_insert(default 1000): When a partition exceeds this, ClickHouse slows INSERTs artificially to give merges time to catch up.parts_to_throw_insert(default 3000): When a partition exceeds this, ClickHouse throws exception 252 and rejects the INSERT entirely.
The delay logic is active only if the average part size in that partition is less than or equal to max_average_part_size_for_partition (default 1 GiB). If your parts are smaller than that, the throttle applies.
Root cause: parts are created faster than merges can retire them.
flowchart TD
A[High-frequency small inserts] --> B[Active parts per partition grow]
B --> C[Merges fall behind]
H[Blocking mutations] --> C
I[Disk full or I/O saturated] --> C
C --> D[Part count hits delay threshold]
D --> E[DelayedInserts increase]
E --> F[Part count hits throw threshold]
F --> G[RejectedInserts: TOO_MANY_PARTS]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Many small inserts | Part count grows steadily; writers send single-row or low-batch inserts | system.query_log for insert frequency versus rows per query |
| Unbatched or mis-tuned async inserts | Async inserts flush too aggressively, creating many small parts | system.query_log for frequent small flushes from async insert settings |
| Blocking mutations | Merge pool threads consumed by ALTER UPDATE/DELETE; parts accumulate silently while the mutation runs | system.mutations for entries where is_done = 0 |
| Insufficient merge throughput | Merges are running but completing too slowly due to disk saturation or CPU limits | system.merges for progress over time, and OS disk latency |
| Over-partitioning | Partition key has too much cardinality (e.g., by hour or by high-cardinality ID), multiplying parts across many partitions | system.parts grouped by partition_id to see skew |
| Disk space pressure | Free space is too low for merges to write temporary output; merges halt before the limit is reached | system.disks for free_space and unreserved_space |
| Replication fetch backlog | Replicas fetch many small parts instead of merging locally; local part counts rise | system.replication_queue for repeated GET_PART entries |
Quick checks
Run these read-only probes to assess scope and severity.
# Check if inserts are already being rejected
clickhouse-client -q "SELECT value FROM system.events WHERE event = 'RejectedInserts'"
# Check the worst partition server-wide
clickhouse-client -q "SELECT value FROM system.asynchronous_metrics WHERE metric = 'MaxPartCountForPartition'"
# Top partitions by active part count
clickhouse-client -q "SELECT database, table, partition_id, count() AS parts, sum(rows) AS rows, formatReadableSize(sum(bytes_on_disk)) AS size FROM system.parts WHERE active = 1 GROUP BY database, table, partition_id ORDER BY parts DESC LIMIT 20"
# Currently running merges and mutations
clickhouse-client -q "SELECT database, table, elapsed, progress, num_parts, is_mutation, formatReadableSize(memory_usage) AS mem FROM system.merges ORDER BY elapsed DESC"
# Active mutations that may be starving merges
clickhouse-client -q "SELECT database, table, mutation_id, parts_to_do, latest_fail_reason FROM system.mutations WHERE is_done = 0 ORDER BY create_time"
# Disk free space on all configured volumes
clickhouse-client -q "SELECT name, formatReadableSize(free_space) AS free, formatReadableSize(total_space) AS total, round(100 * (1 - free_space / total_space), 1) AS used_pct FROM system.disks"
# Insert batch size pattern over the last 10 minutes
clickhouse-client -q "SELECT quantile(0.50)(written_rows) AS p50_rows, quantile(0.99)(written_rows) AS p99_rows, count() AS insert_queries FROM system.query_log WHERE query_kind = 'Insert' AND type = 'QueryFinish' AND event_time > now() - INTERVAL 10 MINUTE"
How to diagnose it
- Identify affected partitions. Use the
system.partsquery above. The limit is per-partition, so a table with 50 partitions can have one at 2900 parts while the rest are healthy. That single partition rejects inserts. - Check if merges are running. Query
system.merges. If merges are active andprogressis advancing, the system is working but overloaded. If no merges run despite high part counts, merges are blocked. - Look for blocking mutations. Query
system.mutations. Anyis_done = 0entry with highparts_to_door a flatparts_to_doover time means a mutation is monopolizing merge threads. - Check disk headroom. Query
system.disks. Iffree_spaceorunreserved_spaceis near zero, merges cannot write temporary output and will stop. This is a common death spiral. - Analyze insert patterns. Use
system.query_logor compareInsertQueryandInsertedRowsfromsystem.events. Thousands of insert queries with only dozens of rows each means writers are under-batching. - Check replication queue (if replicated). Query
system.replication_queuefor stuckGET_PARTentries with highnum_tries. Replicas fetching many small parts inflate local part counts. - Correlate with delay and rejection counters. Query
DelayedInsertsandRejectedInsertsinsystem.events. RisingDelayedInsertsmeans you are in the grace period before hard failures.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
MaxPartCountForPartition | Server-wide worst partition; direct predictor of imminent rejection | > 50% of parts_to_throw_insert (default 1500) |
RejectedInserts | Hard insert failures; data loss if the upstream pipeline lacks buffering | Any sustained increase over zero |
DelayedInserts | Inserts being throttled; merge stress is developing | Steady increase over a 10-minute window |
| Active parts per partition | Distance to the hard limit; reveals per-partition hotspots | Any partition > 1000 parts |
Merge activity (system.merges) | Whether merges are actually executing and completing | Zero merges running while parts are elevated |
Mutation queue (system.mutations) | Mutations consume the same thread pool and can silently block merges | Any is_done = 0 with parts_to_do flat for > 30 minutes |
Disk free space (system.disks) | Merges need temporary space to write combined parts before deleting sources | < 20% free, or unreserved_space approaching zero |
| Background pool utilization | Merge thread saturation | Sustained active tasks near pool limit with growing parts |
| Insert rows per query | Batch size efficiency; small batches multiply parts | P50 or average < 1000 rows per insert |
Fixes
Throttle or pause ingestors
The fastest way to stop the spiral is to reduce the rate of new part creation. Pause non-critical ingestion pipelines or reduce producer throughput. This gives background merges runway to consolidate the backlog. The tradeoff is upstream backpressure.
Kill blocking mutations
If system.mutations shows long-running ALTER UPDATE/DELETE operations consuming merge threads, kill them:
# List active mutations
clickhouse-client -q "SELECT database, table, mutation_id FROM system.mutations WHERE is_done = 0"
Use KILL MUTATION for the specific mutation. This immediately frees merge pool capacity. The tradeoff is that partial work remains applied, and you must reissue the operation later when the system is healthy.
Increase batch size on writers
Target at least 1,000 rows per INSERT, ideally 10,000-100,000. Aim for approximately one INSERT per second in synchronous mode. The tradeoff is higher latency on the ingest path, which usually requires client-side buffering or micro-batching in the pipeline.
Enable or tune async inserts
For high-frequency small writes, use server-side batching. Set async_insert = 1, wait_for_async_insert = 1, and tune flush thresholds to your tolerance:
async_insert_max_data_size: default 100 MiB; consider 5-10 MiB if you need faster flush under pressure.async_insert_busy_timeout_ms: default 200 ms on self-hosted, 1000 ms on Cloud; consider raising to 2000-5000 ms to allow larger buffers.async_insert_max_query_number: default 450.
Async inserts do not apply to INSERT INTO ... SELECT. Schema validation is deferred to flush time; if any row fails validation, the entire payload is rejected.
Force an emergency merge
As a temporary stopgap, run:
OPTIMIZE TABLE <table> FINAL;
Warning: This is resource-intensive and will spike CPU and disk I/O. Use it only to buy time while you fix the insert pattern, not as a routine operation.
If the crisis is isolated to one partition, use OPTIMIZE TABLE <table> PARTITION '<partition_id>' FINAL to reduce blast radius.
Adjust thresholds (temporary relief)
You can raise parts_to_delay_insert and parts_to_throw_insert at the table or server level. This raises the ceiling but does not fix the underlying merge deficit. If you use this, treat it as a bandage while you restructure ingestion.
Free disk space
If merges are halted by insufficient space, create headroom immediately. Identify large tables with system.parts, detach old partitions if they are no longer queried, or expand storage.
ALTER TABLE <table> DETACH PARTITION '<partition_id>';
Warning: Detached data is unreadable until reattached or dropped. Detached partitions stop consuming merge budget, but ensure you have a retention plan before running this.
Prevention
- Monitor
MaxPartCountForPartitionand the rate of part count growth, not just the absolute value. A partition at 500 parts and climbing fast is more dangerous than one at 1500 and stable. - Enforce minimum batch sizes in application code or ingestion middleware.
- If you need high-frequency inserts, use async inserts with conservative, workload-tested flush thresholds.
- Avoid over-partitioning. Partition by day unless your data volume justifies hour-level partitioning.
- Keep disk usage below 80-85% to ensure merges always have room to write temporary output.
- Monitor
system.mutationsand alert on any non-zerois_done = 0queue lasting more than 30 minutes. - Treat ZooKeeper or ClickHouse Keeper as a first-class dependency. Session expiry and replication stalls on replicated tables can indirectly suppress merge progress.
How Netdata helps
- Correlate
MaxPartCountForPartitionwith disk I/O latency and background pool utilization to distinguish a merge backlog from disk saturation. - Alert on derivatives of
RejectedInsertsandDelayedInsertsto catch the crisis before inserts hard-fail. - Track per-table insert rates alongside query latency to spot batch-size regressions immediately.
- Surface ClickHouse metrics next to OS memory and file descriptor usage to expose resource competition between merges and queries.







