ClickHouse small inserts anti-pattern: why single-row inserts melt the merge pool

DB::Exception: Too many parts spikes, query latency climbs, and the background merge pool runs flat out while making no progress. Disk, CPU, and memory look healthy. The cause is usually single-row INSERTs from the application.

Every INSERT into a MergeTree table creates at least one immutable data part on disk. ClickHouse is optimized for batch inserts. When clients send single-row or micro-batch inserts, parts are created faster than background merges can consolidate them. This is the number one driver of part accumulation in production clusters.

flowchart LR
    A[Single-row INSERT] --> B[New part created]
    B --> C[Part count rises]
    C --> D[Merge pool falls behind]
    D --> E[DelayedInserts]
    E --> F[RejectedInserts]
    F --> G[Write halt]

What this means

Each part is a self-contained directory with compressed column files, indexes, checksums, and metadata. ClickHouse continuously runs background merges that combine smaller parts into larger ones. This is the core storage maintenance mechanism. Fewer parts means fewer file descriptors during queries, fewer index lookups, and lower metadata overhead.

With proper batching, new parts are large and merges keep pace. With tiny inserts, part creation exceeds merge throughput. Each micro-part adds fixed overhead: file descriptors, metadata entries, and merge candidate evaluation. On ReplicatedMergeTree tables, excessive parts also increase replication queue entries and ZooKeeper metadata pressure, which can slow replica synchronization. Merges are I/O-intensive and CPU-intensive. They read all source parts, decompress, sort, recompress, and write the result before deleting sources. The background merge pool has finite threads controlled by background_merges_mutations_concurrency_ratio. As the part count climbs, merges take longer because there are more candidate combinations to evaluate and more files to process.

When the per-partition part count crosses parts_to_delay_insert, ClickHouse artificially delays INSERT operations to slow creation. If the pattern continues, the count crosses parts_to_throw_insert and inserts are rejected entirely. Recovery requires stopping the bad insert pattern and waiting for merges to drain the backlog, which can take minutes to hours depending on how deep the debt is.

Common causes

CauseWhat it looks likeFirst thing to check
ORM or application emitting per-row INSERTsInsertQuery rate is high but InsertedRows is nearly identical to query countsystem.query_log for average rows per insert
ETL pipeline flushing micro-batches too frequentlyRows per insert in the 10-100 range on a steady clockQuery log grouped by time window
Async insert fire-and-forgetClient sees low latency but server-side parts still explodeWhether async inserts are configured to wait for flush
Materialized views on a high-frequency insert targetOne logical insert creates parts across multiple tableswritten_rows in system.query_log versus the target table

Quick checks

Run these read-only checks to confirm the anti-pattern and assess merge pool health. Run them during low traffic if possible; querying system.query_log on an already stressed system adds marginal load.

-- Check average insert batch size over the last hour
SELECT
    count() AS insert_count,
    sum(written_rows) AS total_rows,
    round(avg(written_rows)) AS avg_rows_per_insert
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_kind = 'Insert'
  AND event_time > now() - INTERVAL 1 HOUR;

Healthy batching averages 1,000 or more rows per insert. Sustained averages below 100 indicate the small-insert anti-pattern.

-- Find partition hotspots
SELECT
    database,
    table,
    partition_id,
    count(*) AS parts_in_partition
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts_in_partition DESC
LIMIT 20;
-- Check if inserts are already being delayed or rejected
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
-- See if merges are actually running
SELECT
    database,
    table,
    elapsed,
    progress,
    num_parts
FROM system.merges
ORDER BY elapsed DESC;
-- Check background pool saturation
SELECT metric, value
FROM system.metrics
WHERE metric LIKE 'Background%Pool%';

How to diagnose it

  1. Quantify insert granularity. Query system.query_log to compute rows per insert over a representative window. If the average is below 1,000 and the insert rate is high, you have a batching failure. Consistently falling below that threshold elevates part accumulation risk. Group by client_hostname or user if multiple services write to the same cluster.

  2. Map part hotspots per partition. Query system.parts grouped by partition_id. High counts concentrated in specific partitions confirm exactly where merges are losing ground. Remember that parts_to_delay_insert and parts_to_throw_insert apply per partition, not per table, so a single hot partition can throttle an entire table.

  3. Correlate merge throughput with part creation. If system.merges shows active merges but system.parts counts are still climbing, part creation is exceeding consolidation. Check whether the background pool is fully utilized via system.metrics. If utilization is pinned and part counts rise, the pool is starved.

  4. Check for mutations compounding the problem. Run SELECT * FROM system.mutations WHERE is_done = 0. Mutations rewrite entire parts and compete with merges for the same pool threads. A mutation backlog starves merges even when insert volume is reasonable.

  5. Verify async insert behavior if enabled. If the application uses async inserts, confirm that wait_for_async_insert is set to 1 so the client blocks until the server flushes. Immediate acknowledgement returns success as soon as data is buffered, but parts are still created on flush. If the buffer empties frequently with tiny payloads, the anti-pattern persists server-side despite low client latency.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
InsertQuery rate vs InsertedRowsReveals batch size collapseAvg rows per insert < 1,000 sustained
Active parts per partitionDistance to the “too many parts” cliffCount > 50% of parts_to_delay_insert
DelayedInsertsServer is throttling writes to slow accumulationCounter increasing over time
RejectedInsertsHard insert failures are occurringAny non-zero value
Background merge pool utilizationMerge thread starvation> 90% sustained with growing parts
Merge throughputWhether consolidation keeps up with creationMerge completion rate below part creation rate

Fixes

Batch client-side

The only root-cause fix is to stop creating so many parts. Buffer rows in the application and issue one INSERT per 1-2 seconds containing at least 1,000 rows. A simple in-memory accumulator or message consumer batch window directly reduces part creation pressure. For distributed tables, batch before the Distributed table proxy rather than inserting into underlying shards directly; this keeps the batch intact as one insert per shard. Tradeoff: slightly higher end-to-end write latency because data waits in the application buffer before becoming visible.

Enable async inserts with safe settings

If immediate client-side batching is not feasible, set async_insert=1 and wait_for_async_insert=1 so the client receives acknowledgement only after the server flushes. This buffers small inserts server-side and flushes them as larger blocks. Tradeoff: data is not durable when the client receives the ACK if the server crashes before flush. Schema validation is deferred to flush time, so one malformed row can reject an entire buffered batch.

Audit materialized views

If the insert target has materialized views attached, a single INSERT creates parts in every downstream table. The aggregate part creation rate across the table and its views can overwhelm the shared merge pool. Check system.query_log for high written_rows spread across multiple tables from a single query ID. Tradeoff: removing or simplifying views reduces write amplification but may shift computation to query time.

Increase merge pool capacity (temporary relief)

If CPU and disk I/O have headroom, raise background_merges_mutations_concurrency_ratio. This gives merges more threads to drain the backlog faster. Tradeoff: merges consume CPU and I/O that would otherwise serve queries. This is a temporary bridge, not a cure. If the insert pattern does not change, the pool will saturate again at the new limit.

Kill blocking mutations

If system.mutations shows long-running mutations monopolizing the merge pool, kill them with KILL MUTATION. This frees threads to resume consolidation.

Warning: KILL MUTATION discards the mutation. You must reissue it after the merge backlog clears.

Prevention

  • Enforce application-level batching discipline. Target 1,000+ rows per INSERT and avoid row-at-a-time patterns from ORMs.
  • Monitor rows per insert as a first-class pipeline metric, not just insert latency. Alert when the ratio of InsertQuery to InsertedRows indicates micro-batching.
  • If using async inserts, set wait_for_async_insert=1 in production and understand server-side flush behavior.
  • Review partition keys. Even well-batched inserts multiply parts if a single batch scatters across many partition values. A time-based partition key with too fine a granularity, such as hourly instead of daily, can amplify part counts under high-frequency batching.

How Netdata helps

Netdata charts the following without manual query_log analysis:

  • InsertQuery rate against InsertedRows to reveal batch size collapse in real time.
  • Active part count per table and alerts when it crosses safe ratios of parts_to_delay_insert.
  • DelayedInserts and RejectedInserts so you catch throttling before hard failures begin.
  • Background pool utilization and merge activity to distinguish small-insert pressure from genuine I/O starvation.