$ guides / clickhouse / clickhouse-memory-limit-for-query-exceeded ▌

Operations Guides

ClickHouse Memory limit (for query) exceeded: per-query limits and GROUP BY/JOIN blowups

Code: 241. DB::Exception: Memory limit (for query) exceeded means a single query’s allocations breached the max_memory_usage ceiling. ClickHouse tracks memory in a hierarchy: server-wide, per-user, and per-query. The server kills the query to protect the rest of the workload.

This differs from Memory limit (total) exceeded (server-level pressure) and Memory limit (for user) exceeded (profile-level pressure). A query-level breach usually stems from one of three patterns: a high-cardinality GROUP BY, an unbounded DISTINCT, or a JOIN that materializes more rows than expected.

What this means

The per-query memory tracker enforces max_memory_usage. The effective value comes from the query SETTINGS clause, the user profile, or the global default. Verify the effective setting rather than assuming a specific value.

When ClickHouse kills a query, it frees the allocations, but that memory may not return to the operating system immediately because of allocator retention (jemalloc). MemoryResident (RSS) can stay high after the query dies. If your client retries immediately, the freed memory may be reallocated before it can be reused, creating a retry loop that amplifies pressure.

Common causes

Cause	What it looks like	First thing to check
High-cardinality `GROUP BY`	Query groups on a column with millions of distinct values (e.g., UUID, device ID) and builds a large hash table	`system.query_log` for queries with `GROUP BY` and high `read_rows`
Unbounded `DISTINCT`	`SELECT DISTINCT high_cardinality_column FROM huge_table` with no `LIMIT`	`system.query_log` for `DISTINCT` patterns and peak memory
Cartesian-product `JOIN`	`JOIN` with a missing or weak condition, producing billions of candidate rows	`system.query_log` for `JOIN` queries where `read_rows` far exceeds table sizes
Concurrent heavy queries	Several large queries run together; none is individually huge, but aggregate memory approaches the server limit	`system.processes` sorted by `memory_usage DESC`
Missing per-query limits	Queries run with `max_memory_usage` effectively unbounded for the user profile	Effective settings for the query or user profile
Spill-to-disk disabled	A query that could spill intermediate state to disk instead holds everything in RAM because thresholds are unset	`max_bytes_before_external_group_by` and `max_bytes_before_external_sort` values

Quick checks

-- Server tracked memory vs available OS memory
SELECT
    (SELECT value FROM system.metrics WHERE metric = 'MemoryTracking') AS tracked,
    (SELECT value FROM system.asynchronous_metrics WHERE metric = 'OSMemoryAvailable') AS available;

-- Top memory consumers right now
SELECT
    query_id,
    user,
    formatReadableSize(memory_usage) AS mem,
    formatReadableSize(peak_memory_usage) AS peak,
    substring(query, 1, 120) AS query_prefix
FROM system.processes
WHERE memory_usage > 0
ORDER BY memory_usage DESC
LIMIT 10;

-- Recent memory limit failures
SELECT
    event_time,
    query_id,
    user,
    exception,
    substring(query, 1, 120) AS query_prefix
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
  AND exception LIKE '%Memory limit%'
  AND event_time > now() - INTERVAL 30 MINUTE
ORDER BY event_time DESC;

# Approximate temp directory usage. Path may vary based on clickhouse-server config.
du -sh /var/lib/clickhouse/tmp/

# OS-level memory for the ClickHouse process
cat /proc/$(pgrep clickhouse-server)/status | grep -E 'VmRSS|VmPeak'

How to diagnose it

Confirm the scope. Memory limit (for query) exceeded is a per-query kill. Memory limit (total) exceeded means the server-level max_server_memory_usage is at risk.
Find the culprit. If the error is ongoing, query system.processes ORDER BY memory_usage DESC. If it is historical, use system.query_log filtered for exception code 241 in the relevant window.
Count the incidents. A single query_id failing with code 241 is usually a bad ad-hoc query. A sustained rate of different query IDs failing with 241 indicates systemic pressure. Compute the failure rate over a 10-minute window.
Inspect the query pattern. Look for:
- GROUP BY with high-cardinality keys.
- DISTINCT without a LIMIT.
- JOIN with missing or weak ON conditions.
- Missing WHERE clauses on large tables.
Check effective memory settings. Verify max_memory_usage for the query’s user profile. Also check max_bytes_before_external_group_by and max_bytes_before_external_sort. If these are zero, the query cannot spill and must hold all intermediate state in memory.
Check for spill-to-disk side effects. If spill thresholds are set but the query still fails, temporary data may have hit disk-space constraints, or the query may have exhausted the limit before spill could activate. Check /var/lib/clickhouse/tmp/ growth during execution.
Correlate with server memory. Compare MemoryTracking to max_server_memory_usage. If tracked memory is above 70% of the server limit, query-level failures are symptoms of overallocation. Check for concurrent heavy queries, large merges, or cache eviction.

flowchart TD
    A[Error 241: query killed] --> B{One query?}
    B -->|Yes| C[Inspect pattern]
    B -->|No| D[Server pressure]
    C --> E[GROUP BY / DISTINCT / JOIN]
    E --> F[Check spill settings]
    E --> G[Check join limits]
    D --> H[Kill heaviest query]
    D --> I[Reduce concurrency]
    F --> J[Tune thresholds]
    G --> J

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`MemoryTracking` / `max_server_memory_usage`	Server-level tracked memory approaching the limit leaves no room for bursts	Sustained > 70% of limit
Per-query `peak_memory_usage` in `query_log`	Reveals which query patterns consume the most memory	A single query using > 50% of server limit
Exception code 241 rate in `query_log`	Direct measure of query-level memory kills	Sustained rate > 0.5% of total queries
`system.processes` memory usage	Live view of current allocation during incidents	Aggregate query memory approaching 70% of server limit
Temp directory size (`/var/lib/clickhouse/tmp/`)	Spill-to-disk turns memory pressure into I/O pressure	Growth during query execution
Mark cache hit rate	Memory pressure can evict caches, increasing disk I/O and slowing queries	Hit rate dropping below 80%

Fixes

High-cardinality GROUP BY

If the query groups by a high-cardinality column, the aggregation hash table grows linearly with the number of distinct keys. LIMIT does not reduce memory use because ClickHouse builds the full hash table before applying the limit.

Filter before grouping. Narrow the scan with WHERE or subqueries so fewer distinct keys enter the aggregation.
Enable external aggregation. Set max_bytes_before_external_group_by to a value lower than max_memory_usage so ClickHouse spills intermediate state to disk. The tradeoff is slower execution and temporary disk I/O.
Pre-aggregate. Replace the raw scan with a materialized view or an aggregation table that already groups by the required keys.

Unbounded DISTINCT

SELECT DISTINCT builds a hash set of all unique values in memory. There is no disk-spill equivalent for DISTINCT, and LIMIT does not reduce memory use because deduplication happens before the limit is applied. Rewrite the query to use GROUP BY (which can spill) or narrow the scan with WHERE.

Cartesian-product JOINs

A JOIN with a missing or weak condition can materialize far more rows than expected. If the join key has many duplicates on both sides, the result set explodes in memory.

Fix the condition. Ensure the JOIN has an explicit, selective ON clause. Pre-filter both tables in subqueries or CTEs before joining.
Set join limits. Configure max_bytes_in_join to a non-zero value so ClickHouse kills the query before it exhausts memory. join_overflow_mode controls whether it throws (default) or breaks.
Avoid unbounded cross products. Do not run JOINs without ON clauses on large tables unless the product is intentionally small.

Systemic memory pressure

If many queries fail with 241 and MemoryTracking is near the server limit:

Kill the heaviest running query. Use KILL QUERY WHERE query_id = '...'. This is disruptive to that client.
Reduce concurrency. If retries are immediate and compounding, temporarily lower max_concurrent_queries to reduce aggregate allocation.
Review cache sizes. mark_cache_size and uncompressed_cache_size count toward MemoryTracking. A large cache leaves less headroom for queries.

Prevention

Set max_memory_usage per user profile based on expected workload. Ad-hoc analytics users should have lower limits than ETL service accounts.
Configure max_bytes_before_external_group_by and max_bytes_before_external_sort for profiles running large aggregations.
Configure max_bytes_in_join for profiles that run JOINs.
Monitor peak_memory_usage by query pattern in system.query_log to catch new expensive queries before they hit production.
Avoid materializing cross products in JOINs on high-cardinality keys unless the application genuinely needs every match.
Do not rely on DISTINCT for large unbounded scans; use GROUP BY with spill configured.

How Netdata helps

Correlate MemoryTracking with MemoryResident (RSS) on the same chart to spot untracked allocator overhead or fragmentation that ClickHouse’s internal tracker misses.
Alert on FailedQuery growth from system.events, or configure log scraping for exception code 241, to detect memory limit errors without manual polling.
Correlate query latency spikes with memory saturation to identify when spill-to-disk is slowing execution instead of preventing failure.
Monitor temp directory growth and disk I/O to catch the side effects of external aggregation and sorting.
Track per-query memory attribution via system.processes during active incidents to identify runaway queries in real time.

The Netdata solution

ClickHouse monitoring with Netdata

Netdata monitors ClickHouse with per-second metrics and ML anomaly detection. Track merge debt, memory usage, replication lag, Keeper/ZooKeeper saturation, and disk headroom against the host signals that drive them.

See ClickHouse monitoring → Start monitoring free

ClickHouse Memory limit (for query) exceeded: per-query limits and GROUP BY/JOIN blowups

ClickHouse Memory limit (for query) exceeded: per-query limits and GROUP BY/JOIN blowups

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

High-cardinality GROUP BY

Unbounded DISTINCT

Cartesian-product JOINs

Systemic memory pressure

Prevention

How Netdata helps

Related guides

ClickHouse monitoring with Netdata