ClickHouse Memory limit (total) exceeded - server-wide memory pressure and fixes
Code: 241. DB::Exception: Memory limit (total) exceeded: would use X bytes, current RSS Y, maximum Z. is the server-level cap, not a per-query limit. When ClickHouse’s MemoryTracking hits max_server_memory_usage (default 90% of physical RAM), the server kills the heaviest running queries to protect the process. New and existing queries fail until memory drops. Find the largest memory consumer and stop it before the OOM killer does.
Unlike a per-query max_memory_usage breach, the (total) variant means the aggregate across all queries, background merges, caches, and dictionaries has crossed the server ceiling. It often arrives suddenly: a runaway JOIN, a burst of concurrent heavy GROUP BY queries, or a background merge spike pushes the tracked total over the line. ClickHouse uses a hierarchical memory tracker (server to user to query), and the server limit is the final backstop.
This error does not mean you need more RAM. It means the current workload is unsafe for the configured limit. Diagnose whether the pressure comes from a single query, concurrency, or background work, and recover without restarting the server.
What this means
ClickHouse tracks allocations through a hierarchy: a server-level MemoryTracker, then per-user trackers, then per-query trackers. The server limit is controlled by max_server_memory_usage. Left at default, ClickHouse derives the limit from physical RAM and caps tracked memory at 90%. That value lives in system.metrics as MemoryTracking.
When MemoryTracking approaches that ceiling, the server kills the heaviest queries to free memory. The error message Memory limit (total) exceeded is exception code 241. It is distinct from a query-level breach, which references per-query max_memory_usage instead of the server total.
The tracked value is an approximation. Resident Set Size (RSS) reported by the OS can exceed MemoryTracking because of allocator overhead, memory-mapped files, and untracked allocations. Relying on MemoryTracking alone can miss fragmentation that still exposes the process to the OOM killer.
flowchart TD
A[Physical RAM] --> B[max_server_memory_usage]
B --> C[Server MemoryTracker]
C --> D[User tracker]
C --> E[Background merges and caches]
D --> F[Query A]
D --> G[Query B]
C -.-> H[Kill heaviest query when limit exceeded]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Runaway query with bad JOIN or huge GROUP BY | One query in system.processes has peak_memory_usage far above the rest, often with long elapsed time | system.processes ordered by memory_usage DESC |
| Concurrent heavy query storm | Many queries each using moderate memory; sum(memory_usage) from system.processes is near the server limit | Total query memory versus MemoryTracking |
| Background merge or mutation spike | Merge memory_usage in system.merges is elevated while active query count is low | SELECT sum(memory_usage) FROM system.merges |
| Cache or dictionary growth | MemoryTracking climbs steadily without heavy queries; dictionaries may be reloading | system.dictionaries and cache metrics in system.asynchronous_metrics |
| Spill-to-disk feedback loop | /var/lib/clickhouse/tmp/ is growing; query latency rises as I/O substitutes for RAM | Temp directory size and query_log for spilling patterns |
Quick checks
# Check OS-level resident memory for the ClickHouse process
for pid in $(pgrep clickhouse-server); do echo "PID $pid"; grep -E 'VmRSS|VmSize|VmPeak' /proc/$pid/status; done
-- Server-level tracked memory and OS-level resident memory
SELECT
(SELECT value FROM system.metrics WHERE metric = 'MemoryTracking') AS tracked_bytes,
(SELECT formatReadableSize(value) FROM system.asynchronous_metrics WHERE metric = 'MemoryResident') AS rss,
(SELECT formatReadableSize(value) FROM system.asynchronous_metrics WHERE metric = 'OSMemoryAvailable') AS os_available
-- Heaviest running queries by current memory
SELECT
query_id,
user,
elapsed,
formatReadableSize(memory_usage) AS current_mem,
formatReadableSize(peak_memory_usage) AS peak_mem,
substring(query, 1, 120) AS query_prefix
FROM system.processes
WHERE memory_usage > 0
ORDER BY memory_usage DESC
LIMIT 10;
-- Total memory held by all running queries
SELECT formatReadableSize(sum(memory_usage)) AS total_query_memory FROM system.processes;
-- Total memory held by background merges
SELECT formatReadableSize(sum(memory_usage)) AS total_merge_memory FROM system.merges;
-- Recent MEMORY_LIMIT_EXCEEDED errors
SELECT
event_time,
query_id,
user,
substring(query, 1, 120) AS query_prefix
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
AND exception_code = 241
AND event_time > now() - INTERVAL 10 MINUTE
ORDER BY event_time DESC;
# Check for spill-to-disk activity
ls -lh /var/lib/clickhouse/tmp/ 2>/dev/null | head -20
How to diagnose it
- Confirm the server limit is the problem. Query
system.metricsforMemoryTracking. If it is nearmax_server_memory_usageand the error says(total), you are at the server cap. - Classify the consumer. Compare total query memory (
sumfromsystem.processes) against total merge memory (sumfromsystem.merges). If queries dominate, the fix is query-side. If merges dominate, the pressure is background workload. - Find the runaway query. Sort
system.processesbymemory_usageDESC. Notequery_id,user,elapsed, andpeak_memory_usage. A single query using more than 50% of the server limit is the obvious target, and it is likely the one the server will kill first. - Check for a concurrency pile-up. If no single query is huge but the sum of query memory exceeds 70% of the server limit, too many concurrent heavy queries are the issue.
- Inspect recent exceptions. Query
system.query_logforexception_code = 241in the last 10 minutes. If the same query pattern repeats, it is a bad query. If many distinct queries fail, the system is universally saturated. - Look for spill-to-disk. If
max_bytes_before_external_group_byis configured, queries may spill to/var/lib/clickhouse/tmp/instead of failing. Large temp files mean memory pressure has already forced queries onto disk, which can starve merges of I/O and worsen the cascade. - Compare RSS to tracked memory. Check
system.asynchronous_metricsMemoryResidentagainstMemoryTracking. If RSS is significantly higher, untracked allocations or allocator fragmentation are contributing. This explains why the OOM killer may strike before ClickHouse throttles queries. - If queries and merges do not explain the pressure, check
system.dictionariesfor recent reloads and review mark cache size insystem.asynchronous_metrics. Large dictionaries and oversized caches can consume server-level memory outside of query execution.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
MemoryTracking / max_server_memory_usage | Tracked allocation approaching the server cap | Tracked memory > 80% of limit |
MemoryResident (RSS) | Actual OS memory; the OOM killer sees this, not the internal tracker | RSS > 90% of physical RAM |
Per-query memory_usage | Identifies runaway queries before they threaten the server | Single query > 50% of server limit |
Sum of query memory_usage | Reveals concurrency-driven pressure | Total query memory > 70% of server limit |
Merge memory_usage from system.merges | Background merges can consume significant RAM during large consolidations | Merge memory unexpectedly high while parts accumulate |
Exception code 241 in query_log | Early indicator of memory pressure trend | Sustained rate above baseline |
OSMemoryAvailable | System-level headroom before OOM | Available memory < 5% of total |
| Spill-to-disk temp directory growth | Signals queries are trading memory for I/O, which can cascade into merge starvation | /var/lib/clickhouse/tmp/ growing steadily |
Fixes
Kill the runaway query
Use KILL QUERY WHERE query_id = '...' on the heaviest consumer from system.processes. This is the fastest way to drop memory usage. The tradeoff is lost work. For ad-hoc analytics this is usually acceptable. For ETL jobs, coordinate with the owner before killing.
Reduce concurrent query load
If many moderate queries exhaust memory together, lower max_concurrent_queries temporarily to force queuing instead of simultaneous execution. This prevents aggregate memory from spiking. The tradeoff is increased query latency, but it keeps the server stable.
Enforce per-query memory limits
Set max_memory_usage in user profiles so individual queries hit their own ceiling before threatening the server limit. The tradeoff is that legitimate large queries fail earlier, but they fail predictably and do not destabilize the node.
Tune spill-to-disk behavior
If you configure max_bytes_before_external_group_by, queries spill rather than fail. When spill volume in /var/lib/clickhouse/tmp/ grows, disk I/O rises and merges slow down. Raise the threshold only if you have proven I/O headroom, or lower it to fail fast and force query rewriting. The tradeoff is query failure versus system-wide I/O saturation.
Reduce cache or dictionary memory
If mark cache or uncompressed cache is oversized relative to RAM, reduce mark_cache_size or uncompressed_cache_size in the server configuration. If large dictionaries reload during peak hours, reschedule reloads or reduce dictionary size. The tradeoff is more disk I/O on cold queries.
Optimize the underlying queries
Fix Cartesian products by adding proper JOIN conditions. Add LIMIT to high-cardinality GROUP BY queries. Reduce DISTINCT on large datasets. These changes reduce peak_memory_usage at the source. This is the only fix that does not trade capacity for stability, but it requires development time.
Prevention
- Configure
max_memory_usageper user profile so analytical users cannot allocate unbounded memory. - Set
max_concurrent_queriesbased on worst-case memory: typical heavy query memory multiplied by desired concurrency should stay below 70% ofmax_server_memory_usage. - Monitor the 95th percentile of
MemoryTrackingover 5-minute windows. If the peak trends upward over days, investigate before it intersects the limit. - Require
EXPLAINor dry-run review for new JOIN andGROUP BYpatterns against large tables. - Alert when
MemoryResidentexceedsMemoryTrackingby a large ratio. This catches allocator fragmentation and untracked growth that the internal tracker misses. - Disable or minimally configure swap. ClickHouse on swap is effectively unresponsive.
How Netdata helps
Netdata tracks the clickhouse-server process RSS alongside system-level memory metrics. Compare RSS against ClickHouse’s internal MemoryTracking to spot divergence before the OOM killer intervenes.
The ClickHouse collector exposes MemoryTracking, query counts, and exception rates. Correlate spikes in code 241 with query concurrency and memory metrics.
Disk latency and I/O utilization charts show when spill-to-disk from memory pressure begins starving background merges.
Process memory charts show whether clickhouse-server RSS climbs independently of tracked query memory, indicating untracked allocation or cache growth.
Query latency metrics help spot the performance degradation that precedes explicit memory limit errors.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse async inserts: when async_insert fixes too-many-parts and when it hides it
- ClickHouse DelayedInserts climbing: the warning before too-many-parts
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate
- ClickHouse merge duration climbing: the leading indicator of part explosion
- ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool
- ClickHouse monitoring checklist: the signals every production cluster needs
- ClickHouse monitoring maturity model: from survival to expert
- ClickHouse projections and hidden parts: the part count you can’t see
- ClickHouse small inserts anti-pattern: why single-row inserts melt the merge pool
- ClickHouse too many partitions: why over-partitioning multiplies your part count
- ClickHouse DB::Exception: Too many parts - causes and fixes







