ClickHouse disk space monitoring: free_space, unreserved_space, and the 80% target

ClickHouse disk space is not a simple capacity gauge. Merges write new parts before deleting old ones, so free space is an operational dependency of the storage engine. Insufficient disk does not just mean “running low”: background merges stall, parts accumulate, and the system enters a self-reinforcing death spiral.

This guide covers the system.disks metrics, why unreserved_space matters more than free_space, and operational targets that keep merges alive.

What it is and why it matters

ClickHouse stores data in immutable parts. Every insert creates at least one part, and background merges continuously combine smaller parts into larger ones to keep query performance healthy and storage efficient. A merge reads all source parts in a partition, writes a new merged part to the same disk, and only then removes the sources. This means a single merge temporarily requires space for both the sources and the result.

If the disk is too full to accommodate that temporary duplication, ClickHouse will not start the merge. When merges stop, small parts accumulate. Each unmerged part carries metadata overhead and consumes file descriptors. The disk fills faster because space is no longer being reclaimed through consolidation. TTL expirations also rely on merges to remove old data, so expired data stays on disk. The result is a cliff-edge failure: gradual fill becomes sudden collapse.

system.disks exposes the metrics you need to watch this before it becomes an incident:

  • free_space: bytes available on the volume from the OS perspective.
  • total_space: total bytes on the volume.
  • keep_free_space_bytes: a configurable reservation subtracted from free_space.
  • unreserved_space: the space ClickHouse actually considers available for writes and merges (free_space - keep_free_space_bytes).

Do not rely on df or free_space alone. df does not know about ClickHouse’s internal reservation, and free_space ignores the merge headroom requirement. The operational target is to keep disk usage below 80-85% and to maintain absolute headroom of at least 2x the size of the largest active part.

How it works

The merge headroom requirement

When the merge scheduler selects parts to combine, it estimates the output size and checks whether unreserved_space can accommodate it. If not, the merge does not start. There is no queue-with-backpressure: the merge simply stalls.

The largest possible temporary expansion is roughly the size of the largest active part in the table being merged. A conservative safety margin is therefore 2x the size of the largest active part on the volume: one copy for the sources, one for the result. In practice, keeping disk usage below 80-85% usually satisfies this for most workloads, but the percentage alone is not sufficient if your parts are unusually large.

You can check the largest active part directly:

SELECT
    database,
    table,
    name,
    formatReadableSize(bytes_on_disk) AS size
FROM system.parts
WHERE active = 1
ORDER BY bytes_on_disk DESC
LIMIT 1;

Reserved space and unreserved_space

keep_free_space_bytes is configured per disk in the ClickHouse storage configuration. ClickHouse subtracts this value from free_space before making scheduling decisions. unreserved_space is what remains.

If keep_free_space_bytes is set too low (or left at zero), ClickHouse will schedule merges that bring OS-level free space dangerously close to zero. If it is set too high, you strand capacity that could absorb normal ingest. Most production deployments set this to a fixed value representing the expected largest merge temporary space, or leave it unset and enforce the 80-85% target externally.

Query the current state:

SELECT
    name,
    path,
    formatReadableSize(free_space) AS free,
    formatReadableSize(total_space) AS total,
    round(100 * (1 - toFloat64(free_space) / total_space), 1) AS used_pct,
    formatReadableSize(unreserved_space) AS unreserved,
    formatReadableSize(keep_free_space_bytes) AS keep_free
FROM system.disks;
flowchart TD
    A[system.disks total_space] --> B[Used by data parts and system tables]
    A --> C[free_space]
    C --> D[keep_free_space_bytes]
    C --> E[unreserved_space]
    E --> F{Headroom > 2x largest active part?}
    F -->|Yes| G[Merges complete
Old parts deleted] G --> H[Space reclaimed] F -->|No| I[Merges stall] I --> J[Parts accumulate] J --> K[TTL cleanup stops] K --> L[Disk fills faster] L --> I

Tiered storage and local cache

In tiered storage deployments, hot data may reside on local NVMe while cold data is on S3 or another object store. The local disk can still fill from cached blocks from cold data reads, write buffers before offloading to remote storage, and temporary merge artifacts for parts that have not yet been moved. Always monitor system.disks for every configured volume, not just the primary data path.

Where it shows up in production

Ingest rate exceeding cleanup rate. If daily inserts grow faster than TTL expiry plus merge consolidation, net disk usage trends upward. The runway is (unreserved_space - safety_margin) / daily_net_growth_rate. When this falls below your operational planning horizon (typically 7 days), act.

Mutation backlog. ALTER UPDATE and ALTER DELETE rewrite entire parts. A large mutation temporarily increases disk usage by the size of the affected parts until the mutated parts replace the originals. A full disk during a mutation can stall both the mutation and all merges on that volume.

System table growth. system.query_log, system.part_log, and system.text_log are MergeTree tables. On high-QPS systems they accumulate parts and bytes like any other table. If their TTL is not configured, they can consume tens or hundreds of gigabytes and themselves contribute to the disk pressure that blocks merges.

Silent TTL failure. TTL cleanup is executed during merges. If merges stall because of low disk space, TTL stops removing expired data. The disk pressure therefore increases from both sides: new inserts arrive and old data refuses to leave.

Tradeoffs and common misuses

  • Monitoring df instead of system.disks: df shows the OS view, but ClickHouse respects keep_free_space_bytes and storage policies. Use system.disks as the primary source of truth for ClickHouse-specific capacity decisions.
  • Treating 100% as the limit: ClickHouse needs free space to function, not just to store data. The operational limit is 80-85%, not 100%. Above 85%, merges stall and the death spiral begins.
  • Ignoring absolute headroom: A 1 TB volume at 80% usage still has 200 GB free. If your largest active part is 150 GB, a merge needs 300 GB temporary space. That merge cannot run. Percentage targets are necessary but not sufficient without checking the largest part size.
  • Forgetting tiered storage cache disks: Object storage backends still require local cache or buffer disks. These volumes can fill independently and block writes even though “most data is in S3.”

Signals to watch in production

SignalWhy it mattersWarning sign
system.disks.unreserved_spaceThe actual space ClickHouse can use for merges and insertsTrending downward or approaching 2x largest active part
system.disks used percentageProximity to merge stall thresholdSustained > 80% or > 85%
Largest active part sizeDetermines minimum temporary space for a mergeLarger than half of unreserved_space
Active part count per partitionStalled merges cause parts to accumulateGrowing while merge activity is flat or zero
Merge activity (system.merges)Confirms merges are running and completingNo merges running despite active inserts
Daily net growth rateRunway estimation for capacity planning(unreserved - margin) / net_growth < 7 days
System table disk usage (system.parts.bytes_on_disk)Unbounded system logs can fill the diskquery_log or part_log growing without TTL
Insert latency / DelayedInsertsEarly warning that parts are accumulatingInsert latency rising before rejections appear

How Netdata helps

Netdata collects system.disks metrics (free_space, unreserved_space, total_space, keep_free_space_bytes) and visualizes them alongside OS disk metrics. Disk space alerts can be correlated with active part count and merge activity to expose the death spiral before inserts are rejected. Netdata tracks insert latency and DelayedInserts as leading indicators. In tiered storage setups, local cache disk utilization is monitored separately from overall node storage, surfacing cache-fill issues that df on the data directory would miss. Long-term disk growth trends and ingest-rate baselines estimate runway days without manual SQL queries.