ClickHouse insert latency rising: the leading indicator of write-pipeline trouble
Your ClickHouse inserts are taking longer. A query that committed in 200 ms last week is now taking 5 seconds, then 15, then 30. In most databases this signals slow disks or lock contention. In ClickHouse, sustained insert latency is the earliest operational signal that the write pipeline is congesting. It precedes DelayedInserts, part-count alerts, and the hard stop of RejectedInserts by minutes to hours. Wait for the error and the merge debt is already severe.
ClickHouse measures insert latency in system.query_log as query_duration_ms for query_kind = 'Insert' and type = 'QueryFinish'. For synchronous inserts this captures the full path from statement arrival to part commit. For asynchronous inserts the client-visible number reflects buffering, not the final disk flush, so the trend matters more than the absolute value. In either mode, a sustained rise means backpressure is building somewhere between the TCP handler and the frozen part on disk.
Congestion usually starts in the merge subsystem. Each INSERT creates immutable parts. Background merges combine smaller parts into larger ones to keep query performance stable and file descriptor usage bounded. When parts accumulate faster than merges consolidate them, ClickHouse throttles inserts with exponential backoff. Rising latency is the first observable symptom. After latency climbs, DelayedInserts appears. Then RejectedInserts. Then writes halt entirely.
flowchart TD
A[High insert rate] -->|creates| B[Growing active parts]
B -->|exceeds| C[Merge throughput]
C -->|produces| D[Rising insert latency]
D -->|precedes| E[DelayedInserts]
E -->|escalates to| F[RejectedInserts]
F -->|blocks| G[Write halt]
B -->|consumes| H[Disk space]
H -->|stalls| CWhat this means
Rising insert latency means the pipeline cannot absorb the current write load at its usual speed. The bottleneck is rarely the insert itself. More often, the background merge pool, disk I/O, or coordination service cannot keep up with the part creation rate, and the system starts queuing or throttling writes.
Because ClickHouse delays inserts using an exponential formula once the active part count in a partition exceeds parts_to_delay_insert , latency growth is non-linear. As the part count approaches parts_to_throw_insert , delays grow super-exponentially. By the time you see multi-second insert times, the merge subsystem is already under water. Sustained latency above 30 seconds means the write pipeline is effectively blocking.
This signal is workload-facing. Unlike internal merge counters, insert latency is what your application feels. It is the canary for the entire part lifecycle.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Merge backlog | Active part count growing steadily; merges running but not fast enough to reduce parts | system.parts active count per partition versus system.merges progress |
| Disk I/O saturation | Disk latency elevated; merge throughput low despite active threads; query latency also climbing | Disk await and system.merges bytes/sec |
| Mutation backlog blocking merges | system.mutations shows is_done = 0 and merge pool threads are consumed by mutations | system.mutations and background pool utilization in system.metrics |
| Replication or ZooKeeper overhead | Latency spikes on replicated tables; ZooKeeper session metrics show stress | Replica queue_size and ZooKeeper session state |
| Small insert anti-pattern | Many insert queries with few rows each; part count rises faster than row count | Rows per insert from system.query_log |
| Async insert buffering | Client latency reflects buffer time, not flush; actual land time diverges from ack time | system.query_log for insert latency trend versus part creation timing |
Quick checks
Run these safe, read-only checks to confirm the trend and locate the bottleneck.
# Check insert latency trend
SELECT
toStartOfFiveMinutes(event_time) AS window,
quantile(0.99)(query_duration_ms / 1000) AS p99_sec,
avg(query_duration_ms / 1000) AS avg_sec,
count() AS inserts
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_kind = 'Insert'
AND event_time > now() - INTERVAL 1 HOUR
GROUP BY window
ORDER BY window DESC;
# Check active parts per partition
SELECT
database,
table,
partition_id,
count() AS parts
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY parts DESC
LIMIT 20;
# Check running merges and mutations
SELECT
database,
table,
elapsed,
progress,
num_parts,
is_mutation,
formatReadableSize(total_size_bytes_compressed) AS size
FROM system.merges
ORDER BY elapsed DESC;
# Check insert throttling events
SELECT event, value
FROM system.events
WHERE event IN ('DelayedInserts', 'RejectedInserts');
# Check mutation backlog
SELECT
database,
table,
mutation_id,
parts_to_do,
latest_fail_reason
FROM system.mutations
WHERE is_done = 0
ORDER BY create_time;
# Check background pool saturation
SELECT metric, value
FROM system.metrics
WHERE metric LIKE 'Background%Pool%';
# Check disk space
SELECT
name,
path,
formatReadableSize(free_space) AS free,
round(100 * (1 - free_space / total_space), 1) AS used_pct
FROM system.disks;
# Check replication queue depth if using replicated tables
SELECT
database,
table,
queue_size,
log_max_index - log_pointer AS entries_behind
FROM system.replicas
WHERE queue_size > 0
ORDER BY queue_size DESC;
How to diagnose it
Confirm the deviation. Use
system.query_logto verify that P99 insert latency is sustained above 5x your baseline or above 30 seconds. One slow query is noise; a 20-minute upward trend is signal.Find part hotspots. Query
system.partsgrouped bypartition_id. If any partition has more than 500 active parts, or is approachingparts_to_delay_insert, merges are falling behind.Inspect merge progress. Check
system.merges. Ifprogressis stalled across multiple merges, orelapsedis growing without forward movement, merges are stuck. Note whether entries areis_mutation = 1, which means mutations are consuming merge slots.Look for active throttling. Query
system.eventsforDelayedInsertsandRejectedInserts. Sustained growth inDelayedInsertsconfirms ClickHouse is throttling writes. Non-zeroRejectedInsertsmeans the hard limit is reached.Rule out disk space. Check
system.disks. If free space is below 20%, merges may halt because they cannot write temporary output before deleting sources. This is a common cause of merge starvation that looks like a merge performance problem.Check for blocking mutations. If
system.mutationsshows pending work and the background pool is fully occupied by mutations, merges are starved. Mutations rewrite entire parts and can monopolize threads for hours.Validate insert batching. Compute rows per insert from
system.query_log. If your application is sending many inserts with fewer than 1,000 rows each, it is creating excessive parts. This is the most common root cause of merge backlog.Correlate with resource metrics. Check disk write throughput and background pool utilization. If the pool is fully utilized and parts are still growing, the merge subsystem lacks CPU or I/O headroom to keep up.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Insert latency P99 | Earliest workload-facing indicator of write pipeline congestion | Sustained > 5x baseline or > 30 seconds |
| Active parts per partition | Distance to insert throttling and rejection | > 500 or steady growth over 30 minutes |
| DelayedInserts | ClickHouse is actively throttling writes | Counter increasing over time |
| RejectedInserts | Hard insert failures due to part limits | Any non-zero sustained value |
| Merge activity | Whether background merges are keeping up | Zero merges running while parts are high and inserts are active |
| Mutation queue depth | Mutations block merges and consume pool threads | is_done = 0 with flat or non-decreasing parts_to_do |
| Disk free space | Merges need temporary space to write output | < 20% or unreserved_space near zero |
| Background pool utilization | Capacity to run merges, fetches, and mutations | Fully utilized with growing part count |
| Insert batch size | Small batches create parts faster than merges can consolidate | < 1,000 rows per insert sustained |
Fixes
Merge backlog and too many parts
Throttle or pause upstream inserts to reduce the part creation rate. Increase client batch sizes so each INSERT creates fewer parts. If mutations are monopolizing the pool, kill non-critical mutations with KILL MUTATION. You can increase merge concurrency only if CPU and disk I/O have clear headroom; the default background_merges_mutations_concurrency_ratio is 2, and raising it on an already saturated disk will make latency worse.
Disk I/O saturation
Temporarily reduce heavy analytical query load to free I/O bandwidth for merges. Check whether large queries are spilling to disk and competing with merge writes. Ensure the ClickHouse data directory is not sharing physical disks with ZooKeeper transaction logs or system logs.
Replication and ZooKeeper overhead
Check ZooKeeper session metrics for expired sessions or elevated latency. If ZooKeeper is slow, do not restart ClickHouse nodes; reconnection storms amplify coordination load. Pause non-critical DDL operations to reduce metadata pressure on the coordination service.
Small inserts
Fix client logic to batch rows. Aim for at least 1,000 rows per INSERT. If you cannot control the client, server-side async insert buffering changes the latency profile, but you must then monitor actual part creation and flush timing rather than client acknowledgment time.
Mutations blocking merges
Identify long-running mutations in system.mutations. Kill mutations that are not business-critical. Plan heavy ALTER UPDATE or ALTER DELETE operations during low-traffic windows, and avoid treating ClickHouse like an OLTP database for frequent small mutations.
Low disk space
Drop old partitions to reclaim space. Add disk capacity; merges require temporary headroom to complete. Review TTL policies, but TTL enforcement depends on merges running. If merges are stalled, expired data persists.
Prevention
Monitor active part count at the partition level, not just the table level. The parts_to_delay_insert and parts_to_throw_insert limits apply per partition, and a single hot partition can throttle an entire table. Track the ratio of part creation rate to merge completion rate; if it stays above 1.0 for more than a brief burst, you are accumulating debt. Keep disk usage below 80% as an operational target, because merges need free space to write temporary output. Monitor system.mutations proactively; a forgotten mutation is a common silent cause of merge starvation. Enforce minimum insert batch sizes in your ingestion pipeline. Alert on insert latency percentiles from system.query_log as a primary signal. It fires before part-count alerts, giving you minutes to hours of runway.
How Netdata helps
- Netdata collects insert latency percentiles from
system.query_logand correlates them with merge activity, active part count, and disk I/O in real time. - Built-in alerts fire when
DelayedInsertsorRejectedInsertsincrease, and when active part count crosses safe thresholds per partition. - Disk I/O latency and background pool utilization charts help distinguish merge backlog from storage saturation without ad hoc queries.
- For replicated clusters, ZooKeeper connection health and replication queue depth are automatically correlated with insert latency trends.
- Per-partition part count visualization surfaces hotspots that table-level aggregates hide.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse DelayedInserts climbing: the warning before too-many-parts
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate
- ClickHouse merge duration climbing: the leading indicator of part explosion
- ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool
- ClickHouse monitoring checklist: the signals every production cluster needs
- ClickHouse monitoring maturity model: from survival to expert
- ClickHouse projections and hidden parts: the part count you can’t see
- ClickHouse small inserts anti-pattern: why single-row inserts melt the merge pool
- ClickHouse too many partitions: why over-partitioning multiplies your part count
- ClickHouse DB::Exception: Too many parts - causes and fixes
- How ClickHouse actually works in production: a mental model for operators







