ClickHouse query error rate high: reading exception codes 241, 252, and 999
What this means
ClickHouse logs failed queries to system.query_log with type = ‘ExceptionWhileProcessing’ and a numeric exception_code. The FailedQuery counter in system.events increments for each failure. Keep the FailedQuery / Query ratio below 1%. A sustained climb above that threshold signals a systemic issue, not a few bad queries.
The codes that dominate production incidents fall into five classes.
- 241 MEMORY_LIMIT_EXCEEDED
- 252 TOO_MANY_PARTS
- 159 read timeout on distributed queries (shard unavailable)
- 999 / 210 ZooKeeper or network session errors
- 60 / 81 schema drift or missing objects
flowchart TD
A[Query error rate above 1 percent] --> B[Query system.query_log by exception_code]
B --> C{Code 241}
B --> D{Code 252}
B --> E{Code 159}
B --> F{Code 999 or 210}
B --> G{Code 60 or 81}
C --> H[Memory pressure or runaway query]
D --> I[Merge crisis or too many parts]
E --> J[Shard unavailable or network timeout]
F --> K[ZooKeeper session loss or coordination failure]
G --> L[Schema drift or missing table]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Runaway query or cache eviction | Code 241 spike; MemoryTracking near server limit | system.processes ordered by memory_usage |
| Merge debt or insert micro-batching | Code 252 spike; insert latency climbs before rejections | system.parts per partition; system.events RejectedInserts |
| Shard unavailable or slow replica | Code 159 spike on distributed tables | system.replicas absolute_delay and queue_size on target shards |
| ZooKeeper / Keeper session loss | Codes 999/210 spike; replicated tables become readonly | system.zookeeper_connection is_expired; system.replicas is_session_expired |
| Schema drift after deploy | Codes 60/81 spike following a migration | system.distributed_ddl_queue status; query text in system.query_log |
Quick checks
Run these read-only queries to classify the failure.
Failed versus total query ratio:
SELECT event, value
FROM system.events
WHERE event IN ('FailedQuery', 'FailedSelectQuery', 'FailedInsertQuery',
'Query', 'SelectQuery', 'InsertQuery');
Compare FailedQuery to Query. If the ratio is above 0.01 for more than five minutes, investigate.
Exception codes during the incident window:
SELECT exception_code,
count() AS count,
any(exception) AS sample_exception
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
AND event_time > now() - INTERVAL 10 MINUTE
GROUP BY exception_code
ORDER BY count DESC;
Top memory consumers right now:
SELECT query_id, user, memory_usage, peak_memory_usage, elapsed,
substring(query, 1, 100) AS query_prefix
FROM system.processes
WHERE memory_usage > 0
ORDER BY memory_usage DESC
LIMIT 10;
Part count per partition:
SELECT database, table, partition_id, count() AS active_parts
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY active_parts DESC
LIMIT 20;
Replica and coordination health:
# Replicas that cannot accept writes
SELECT database, table, is_readonly, is_session_expired, queue_size
FROM system.replicas
WHERE is_readonly = 1 OR is_session_expired = 1;
# ZooKeeper / Keeper connection state
SELECT name, host, port, is_expired, session_uptime_elapsed_seconds
FROM system.zookeeper_connection;
Stuck distributed DDL:
SELECT entry, host_name, status, exception_text
FROM system.distributed_ddl_queue
WHERE status != 'Finished'
ORDER BY entry DESC
LIMIT 10;
How to diagnose it
- Confirm the error rate. Query system.events for FailedQuery and Query. Calculate the ratio over a five-minute window. Exclude monitoring probes by filtering out known health-check users or query patterns.
- Classify by exception code. Query system.query_log for ExceptionWhileProcessing grouped by exception_code over the incident window.
- For code 241 (MEMORY_LIMIT_EXCEEDED): Check system.processes for queries with high peak_memory_usage. Check MemoryTracking in system.metrics against max_server_memory_usage. Look for spill-to-disk activity in the server tmp directory.
- For code 252 (TOO_MANY_PARTS): Check active part count per partition in system.parts. Check RejectedInserts and DelayedInserts in system.events. Verify that system.merges shows merges running; if not, the pool may be blocked by mutations or I/O starvation.
- For code 159 (read timeout): Check system.replicas on each shard for absolute_delay and queue_size. A replica with high lag or a stuck replication_queue entry slows distributed queries. Verify inter-server network latency between the coordinator and target shards.
- For codes 999/210 (ZooKeeper / network): Check system.zookeeper_connection for is_expired. Check system.replicas for is_session_expired. Test coordination service responsiveness with
echo ruok | nc zk-host 2181andecho mntr | nc zk-host 2181. - For codes 60/81 (schema drift): Check system.distributed_ddl_queue for entries with status != ‘Finished’. Compare CREATE TABLE statements across shards. Review recent migrations or ALTER operations.
- Correlate with deployment times. A code change or config rollout preceding the spike suggests schema or query plan regression.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| FailedQuery / Query ratio | Overall query health | > 1% sustained for five minutes |
| exception_code distribution | Identifies the failing domain | Any code class spiking above baseline |
| MemoryTracking | Memory pressure precursor | > 80% of max_server_memory_usage |
| MaxPartCountForPartition | Distance from too-many-parts cliff | > 50% of your configured parts_to_throw_insert |
| RejectedInserts | Hard insert failure | Any non-zero sustained rate |
| absolute_delay | Replica data freshness | > 120 seconds sustained |
| ZooKeeper latency | Coordination health | > 10 ms p99 or session expiry |
| system.distributed_ddl_queue status | Schema propagation | Any non-Finished entry persisting > five minutes |
Fixes
Group your response by the exception code class dominating the failure.
Memory limit exceeded (241)
Identify the heaviest consumer in system.processes and kill it:
Warning: KILL QUERY aborts running work and returns an error to the client.
KILL QUERY WHERE query_id = '...';
Set or lower per-query memory limits via the user profile max_memory_usage. Enable max_bytes_before_external_group_by to spill heavy GROUP BY operations to disk. If the load is concentrated under specific users, restrict their concurrency via profile limits.
Tradeoff: Spilling to disk saves queries from being killed, but it increases disk I/O load and query latency.
Too many parts (252)
Pause or throttle insert sources immediately. Increase batch sizes to at least 1000 rows per INSERT. Check system.mutations for blocking background work and kill non-critical mutations if they are consuming merge threads.
Warning: KILL MUTATION is disruptive and can leave partial changes in place until the next merge.
Review whether async_insert is appropriate for your write pattern.
Tradeoff: Pausing inserts stops data loss but creates upstream backlog. Recovery can take minutes to hours as merges catch up.
Shard unavailable / timeout (159)
Identify the slow shard by checking system.replicas on each node for replication lag and queue depth. If a specific replica is lagging, route traffic away from it temporarily if your client supports it. For stuck replication entries, SYSTEM RESTART REPLICA can force re-evaluation.
Warning: SYSTEM RESTART REPLICA is disruptive and interrupts replication on that table.
Tradeoff: Rerouting reads reduces load on the struggling shard but concentrates it elsewhere.
ZooKeeper / network (999/210)
Pause all DDL operations immediately to reduce coordination load. Check ensemble health with ruok and mntr. Do NOT restart ClickHouse nodes during a coordination overload; reconnection storms worsen the problem. If using JVM-based ZooKeeper, check GC logs and disk latency for the transaction log. ClickHouse Keeper does not have GC pauses, but disk I/O saturation still hurts it.
Tradeoff: Halting DDL buys recovery time but blocks schema changes.
Schema drift (60/81)
Review system.distributed_ddl_queue for exceptions tied to specific hosts. Align CREATE TABLE statements across shards, or re-run the migration. If one replica missed a DDL entry, apply the change locally.
Tradeoff: Fixing drift may require brief table unavailability on affected shards.
Prevention
- Monitor active part count per partition, not just per table. Alert when any partition crosses 50% of your configured parts_to_delay_insert threshold.
- Track the merge-to-insert ratio. If part creation rate exceeds merge completion rate for more than 30 minutes, investigate before RejectedInserts appears.
- Cap per-query memory with max_memory_usage and set max_server_memory_usage leaving at least 20% headroom.
- Monitor ZooKeeper or Keeper latency as a first-class signal, not just process liveness.
- Exclude monitoring probes from error-rate calculations by filtering on user or query text pattern.
- Validate schema changes with distributed DDL and verify completion on every node before declaring a migration done.
How Netdata helps
- Correlates FailedQuery rate with MemoryTracking, active parts, and background pool utilization.
- Surfaces exception_code spikes from system.query_log without manual polling.
- Alerts on MaxPartCountForPartition and RejectedInserts before code 252 errors flood clients.
- Tracks ZooKeeper session state and replication lag alongside query errors to distinguish code 999 from application issues.
- Excludes internal health-check traffic from error-rate calculations via user or query pattern filtering.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse ALTER UPDATE/DELETE overuse: why mutations are not row updates
- ClickHouse async inserts: when async_insert fixes too-many-parts and when it hides it
- ClickHouse background pool saturation: when merges and mutations starve
- ClickHouse mark cache and uncompressed cache: reading low hit rates
- ClickHouse client connections climbing: TCP 9000, HTTP 8123, and connection leaks
- ClickHouse DelayedInserts climbing: the warning before too-many-parts
- ClickHouse detached parts piling up: reading system.detached_parts and reclaiming space
- ClickHouse disk space collapse: why merges need free space and how the spiral starts
- ClickHouse disk space monitoring: free_space, unreserved_space, and the 80% target
- ClickHouse distributed DDL stuck: ON CLUSTER queries that never finish
- ClickHouse distributed query amplification: one coordinator, many shard subqueries







