ClickHouse DelayedInserts climbing: the warning before too-many-parts

Insert latency is climbing and system.events.DelayedInserts is no longer flat. ClickHouse is sleeping during INSERT because at least one partition has crossed parts_to_delay_insert. The database still accepts writes, but injects a sleep before each insert commits. This is the warning window before hard failure. If the merge backlog is not resolved, DelayedInserts climbs until RejectedInserts starts ticking and clients receive DB::Exception: Too many parts.

DelayedInserts and RejectedInserts are cumulative counters in system.events. Alert on their rate of change, not absolute value. A sustained positive rate means the merge pipeline is losing to the insert pipeline.

What this means

Every INSERT into a MergeTree table creates one or more immutable data parts. ClickHouse runs background merges to consolidate smaller parts into larger ones. This keeps query performance stable and limits file descriptor usage. If parts accumulate faster than merges consolidate them, the active part count per partition rises.

When the active part count in a single partition exceeds parts_to_delay_insert (default 1000), ClickHouse throttles inserts into that partition by injecting a sleep. The delay grows as the part count rises further above the threshold. When the count exceeds parts_to_throw_insert (default 3000), inserts are rejected entirely. Both thresholds are configurable per table via SETTINGS in the CREATE TABLE statement or server defaults. There is also a global cap, max_parts_in_total (default 100000), which throws regardless of per-partition state.

DelayedInserts is the earliest operational signal that a partition is heading toward the hard limit. Insert latency is the client-visible symptom. Because these counters are cumulative since server start, monitor their rate. A flat value is healthy; a sustained positive rate means the throttle is active and the situation is worsening.

flowchart TD
    A[INSERT arrives] --> B{Active parts in partition < parts_to_delay_insert?}
    B -->|Yes| C[Insert commits normally]
    B -->|No| D{Active parts < parts_to_throw_insert?}
    D -->|Yes| E[Apply delay and increment DelayedInserts]
    E --> F[Insert commits after sleep]
    D -->|No| G[Reject insert and increment RejectedInserts]

Common causes

CauseWhat it looks likeFirst thing to check
Merge throughput below insert ratesystem.merges shows few or very slow merges; part count rising steadilySELECT database, table, partition_id, count() AS parts FROM system.parts WHERE active = 1 GROUP BY database, table, partition_id ORDER BY parts DESC LIMIT 10
Many small insertsHigh InsertQuery count but low InsertedRows per query; each insert creates a new partSELECT event, value FROM system.events WHERE event IN ('InsertQuery', 'InsertedRows') and compare ratios
Mutations blocking the merge poolsystem.mutations shows is_done = 0 with parts_to_do stalledSELECT database, table, mutation_id, parts_to_do FROM system.mutations WHERE is_done = 0 ORDER BY create_time
Over-partitioningA single table has dozens or hundreds of partition_ids, each accumulating partsPer-partition part count query above; look for many partition_ids under one table
Projections or materialized views multiplying partsBase table looks healthy but hidden parts or downstream tables are accumulatingCheck part counts in materialized view target tables; for projections, compare total active parts against expected base table parts

Quick checks

Run these read-only checks to confirm the scope and locate the bottleneck.

# Check current DelayedInserts value (sample twice to compute delta)
clickhouse-client -q "SELECT event, value FROM system.events WHERE event = 'DelayedInserts'"
-- Find the worst partition and how close it is to limits
SELECT
    database,
    table,
    partition_id,
    count() AS active_parts,
    1000 AS delay_threshold,
    3000 AS throw_threshold
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY active_parts DESC
LIMIT 10;

-- Check currently running merges and their progress
SELECT
    database,
    table,
    elapsed,
    progress,
    num_parts,
    is_mutation,
    formatReadableSize(total_size_bytes_compressed) AS merge_size
FROM system.merges
ORDER BY elapsed DESC;

-- Check for active mutations that may be starving merges
SELECT
    database,
    table,
    mutation_id,
    parts_to_do,
    latest_fail_reason
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;

-- Evaluate insert batching efficiency
SELECT
    event,
    value
FROM system.events
WHERE event IN ('InsertQuery', 'InsertedRows');

-- Check disk space available for merges to complete
SELECT
    name,
    path,
    formatReadableSize(free_space) AS free,
    formatReadableSize(total_space) AS total,
    round(100 * free_space / total_space, 1) AS free_pct
FROM system.disks;

How to diagnose it

  1. Confirm the symptom is current. DelayedInserts is cumulative since server start. Take two samples 60 seconds apart and compute the delta. A sustained rate above zero is the signal.
  2. Identify the affected partition. The limit is per-partition, not per-table. Use the per-partition part count query to find the exact table and partition_id that crossed the threshold.
  3. Determine if merges are running. Query system.merges. If merges are active and progress is advancing, the system may be catching up from a burst. If no merges are running despite high part counts, the merge pool may be blocked or the disk may be too full to allow merge output.
  4. Check for merge blockers. Look at system.mutations for long-running or stuck mutations. Look at system.disks for free space below 20%. Look at background pool utilization in system.metrics .
  5. Correlate with insert patterns. Compare InsertQuery to InsertedRows. If the ratio shows many small inserts, the root cause is client-side batching. If insert volume jumped recently, the root cause is insufficient merge capacity for the new insert rate.
  6. Check for hidden parts. If the base table part count is moderate but the throttle is firing, verify whether projections or materialized views are creating hidden parts that also count toward limits.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
DelayedInserts rateDirect measure of insert throttlingSustained delta > 0 over 5 minutes
RejectedInserts rateHard insert failures have begunAny non-zero sustained rate
Active parts per partitionDistance to the delay and throw thresholds> 500 parts (50% of default delay threshold)
Merge activity (system.merges)Whether consolidation is actually happeningZero active merges while parts > 500
Insert latency P99Client-visible impact of throttling> 2x baseline sustained for 15 minutes
Mutation queue depthHidden consumer of merge threadsAny is_done = 0 with parts_to_do flat for > 30 minutes
Disk free spaceMerges require temporary headroom< 20% free or unreserved_space trending toward zero
Background pool utilizationThread availability for merge workSustained > 90% with part count growing

Fixes

Merge pipeline is behind after an insert burst. If the burst was temporary, merges may self-recover. Reduce insert rate from the largest writers temporarily. Do not restart the server; restarting forces the server to reload all parts from disk and delays recovery.

Mutations are monopolizing the merge pool. Identify the mutation with the highest parts_to_do and longest runtime. If it is non-critical, kill it:

KILL MUTATION WHERE database = 'db' AND table = 'table' AND mutation_id = '0000000000';

Merges will resume after the mutation thread is released.

Client is sending many small inserts. Fix batching upstream. ClickHouse prefers batches of at least 1,000 rows. Asynchronous inserts buffer small inserts server-side, but they return success before data is written to a part. Monitor server-side part creation, not just client-side acknowledgments.

Disk space is low. Merges stop when ClickHouse estimates there is insufficient free space to write the merged result. Free space must be at least large enough to accommodate the biggest pending merge. Detach old partitions or add storage:

-- Identify the largest partition
SELECT database, table, partition_id, formatReadableSize(sum(bytes_on_disk))
FROM system.parts WHERE active = 1 GROUP BY database, table, partition_id ORDER BY sum(bytes_on_disk) DESC LIMIT 5;

Then use ALTER TABLE ... DETACH PARTITION ... to reclaim space if data retention policy allows it. Detached partitions are no longer queryable unless reattached.

Over-partitioning is creating too many partition_ids. If a high-cardinality partition key (for example, partitioning by hour or by a high-cardinality column) is causing part multiplication, the fix requires changing the partition key. This means creating a new table with the correct key, backfilling, and cutting over. It is not an in-place fix.

Prevention

  • Monitor active part count at the partition level, not the table level. The limit applies per-partition, so a table-level average hides hotspots.
  • Alert on the rate of DelayedInserts, not the cumulative value. A cumulative counter that is flat is healthy; a rising rate is not.
  • Keep the ratio of part creation rate to merge completion rate below 1.0 sustained. If you are creating parts faster than you merge them, you are borrowing time.
  • Maintain disk usage below 80-85% of capacity. Merges need temporary space roughly equal to the size of the parts being merged.
  • Batch inserts client-side to at least 1,000 rows per INSERT. Fewer rows per insert directly increases part count pressure.
  • Review partition granularity during schema design. Daily or monthly partitioning is usually sufficient; hourly partitioning requires extremely high merge capacity.
  • Track mutation queue depth as part of routine monitoring. Mutations are a common hidden cause of merge starvation.

How Netdata helps

  • Correlates DelayedInserts with insert latency, active part count, and merge activity on the same charts to show whether the throttle is growing or recovering.
  • Surfaces per-partition part count trends without manual system.parts queries during an incident.
  • Tracks the rate of change for cumulative counters like DelayedInserts and RejectedInserts automatically.
  • Provides background pool utilization metrics alongside disk I/O and memory pressure to distinguish merge backlog from I/O starvation or mutation blocking.