ClickHouse mark cache and uncompressed cache: reading low hit rates

ClickHouse query latency spikes after a restart, or monitoring shows a sustained drop in mark cache hit rate. Before increasing mark_cache_size, determine whether you are seeing normal warmup or a cache that is too small for the working set.

The mark cache stores primary key index granule positions. The uncompressed cache stores decompressed column blocks. Low hit rates in each produce different symptoms. This guide covers how to read the metrics, distinguish warmup from real problems, and decide when tuning is warranted.

What these caches are and why they matter

ClickHouse uses the primary key to locate column blocks. The mark cache keeps granule positions from mark files in memory. Without it, every query reads these small index files from disk, turning memory-bound seeks into random I/O. On a server with many parts, this random I/O dominates query latency.

The uncompressed cache sits downstream. When enabled, it holds decompressed column blocks in memory. Re-reading the same block returns the cached copy instead of reading compressed data from disk and decompressing it. This cache is optional and disabled by default.

Both are bounded by server settings. mark_cache_size limits the mark cache. uncompressed_cache_size limits the uncompressed cache. They compete for memory with query execution buffers, merges, and the OS page cache.

How the caches work

ClickHouse populates the mark cache lazily. The first query that touches a part reads its mark files from disk and inserts them into the cache. Subsequent queries reuse the cached positions. Mark files are small and numerous; a server with thousands of active parts generates heavy random read I/O during initial population. The aggregate footprint of all mark files across active parts represents the theoretical memory needed to cache the full working set, though in practice only hot parts need to be resident.

The uncompressed cache behaves similarly when enabled. A block is decompressed on first read and cached. It is most effective when a small set of column blocks is read repeatedly, such as dashboard queries scanning the same recent partition. Ad-hoc analytical queries that scan cold partitions rarely benefit and can pollute the cache.

Merges invalidate cached state. New parts have new mark files. Cached marks for old parts remain valid only until those parts are marked obsolete. Queries must then warm the cache for the new merged part. Merge storms produce temporary churn and higher miss rates. DROP TABLE or TRUNCATE also invalidate relevant entries immediately, which can appear as a sudden hit rate drop if a monitoring window overlaps with DDL.

After a restart, the mark cache is empty. ClickHouse reads mark files aggressively as queries arrive, causing a burst of disk read IOPS and elevated CPU. On servers with many parts, this can last from minutes to over an hour. Query latency during this window is often significantly higher than baseline. If the OS page cache still held mark files from before the restart, warmup may finish sooner; after memory reclamation or on a new node, the disk read burst is unavoidable. This is expected, not a configuration error.

Inspect cumulative counters in system.events:

SELECT event, value FROM system.events
WHERE event IN ('MarkCacheHits', 'MarkCacheMisses',
                'UncompressedCacheHits', 'UncompressedCacheMisses');

These are cumulative counters. Compute the hit rate across a time window by sampling twice and taking the ratio of deltas:

-- Run twice with a known interval, then calculate:
-- (MarkCacheHits_delta) / (MarkCacheHits_delta + MarkCacheMisses_delta)

Current memory footprint is visible in system.asynchronous_metrics:

SELECT metric, value FROM system.asynchronous_metrics
WHERE metric IN ('MarkCacheBytes', 'UncompressedCacheBytes');

Check whether the uncompressed cache is enabled:

SELECT name, value FROM system.settings WHERE name = 'use_uncompressed_cache';

Estimate the working set size in parts:

SELECT
    database,
    table,
    count() AS active_parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY active_parts DESC;
flowchart TD
    Query[Query planner] --> Marks{Mark cache lookup}
    Marks -->|Hit| Seek[Fast granule seek]
    Marks -->|Miss| DiskMarks[Read mark file from disk]
    DiskMarks --> PopMarks[Populate mark cache]
    Seek --> Data{Uncompressed cache lookup}
    PopMarks --> Data
    Data -->|Hit| Return[Return block]
    Data -->|Miss| ReadComp[Read compressed from disk]
    ReadComp --> Decomp[Decompress]
    Decomp --> PopUnc[Populate uncompressed cache if enabled]
    PopUnc --> Return

Where low hit rates show up in production

Low mark cache hit rate appears as elevated disk read IOPS during queries that should be cache-bound. If storage metrics show sustained random reads on small files while query latency is high and CPU is not saturated, the mark cache is likely undersized or cold. You may also see elevated FileOpen counts in system.events as queries open mark files.

Low uncompressed cache hit rate appears as elevated CPU usage during repeated scans of the same data. If the cache is enabled but hit rates stay low, the working set exceeds cache capacity, or the access pattern does not revisit blocks often enough.

After a restart, both metrics look bad by design. Latency spikes, disk IOPS surge, and CPU climbs. On large datasets, warming to steady-state can take hours. Monitoring that does not account for this warmup will false-fire.

Merge activity also produces transient drops. When a large merge completes, the new part’s marks are uncached until queries access it. During heavy merge backlogs, churn can keep hit rates depressed longer than a restart warmup.

Distinguishing normal warmup from real problems

A low hit rate is only a problem when it persists outside warmup and merge churn.

Restart warmup. Hit rate starts near zero and climbs. Disk IOPS and CPU spike together. Latency recovers as the cache fills. This is normal.

Merge churn. Hit rate dips when large merges complete and recovers as queries access new parts. The dip correlates with elevated entries in system.merges. Correlating system.merges start time with cache metrics confirms the pattern.

Chronic undersizing. Hit rate stays low during steady-state operation. Disk read IOPS remains elevated for index lookups, and query latency stays high. Before resizing, confirm that MarkCacheBytes is near mark_cache_size; if the cache is not full, the working set may simply be larger than the limit.

Memory pressure eviction. If MemoryTracking approaches the server limit or OSMemoryAvailable is low, ClickHouse may evict cache entries to make room for query buffers. Hit rates drop even though mark_cache_size is theoretically sufficient. In this case, add memory or reduce query concurrency before increasing cache limits.

Tradeoffs and tuning considerations

Increasing mark_cache_size reduces disk seeks but consumes memory that could otherwise hold query working sets or the uncompressed cache. Size it to the total mark file footprint across active parts. Many tables with wide primary keys and high part counts need more space than a single large table with few parts. Increase the setting incrementally; large jumps can starve queries and merges of memory, causing spill-to-disk or OOM kills elsewhere.

The uncompressed cache is not universally beneficial. It helps when queries repeatedly scan the same column blocks, but on workloads with large, non-repeating scans, it wastes memory. Enable it only after confirming that the workload revisits the same blocks and that memory headroom exists.

Do not react to hit rate alone. Production nodes must balance cache size against query concurrency and merge memory. If P99 latency is acceptable and disk IOPS is within capacity, a moderate hit rate may be fine.

Signals to watch in production

SignalWhy it mattersWarning sign
Mark cache hit rateIndex lookups from disk add latencySustained low rate outside restart or merge windows
Uncompressed cache hit rateRepeated decompression wastes CPUSustained low rate when the cache is enabled
Disk read IOPSSpikes indicate mark cache missesSustained elevation correlated with low mark cache hit rate
Query latency P99User-facing impact of cache missesSustained elevation above baseline during normal operation
Memory trackingCaches compete with query working memoryMemoryTracking approaching server limit, with hit rate dropping
Active parts countMore parts mean more marks to cachePart growth exceeding the cache’s ability to hold working set marks
Merge activityNew parts invalidate old cached marksHit rate dips that track with merge completion

How Netdata helps

Use Netdata to correlate mark cache hit rate with disk IOPS and query latency. This distinguishes cache misses from storage saturation. Track MemoryTracking alongside cache byte metrics to detect memory pressure evictions. Compare CPU with disk read IOPS to identify whether a spike is seek-bound or decompression-bound. Alert on sustained low mark cache hit rate only after excluding known warmup and merge windows.