ClickHouse Table is in readonly mode: is_readonly on replicated tables and how to fix it
Inserts fail with TABLE_IS_READ_ONLY. system.replicas shows is_readonly = 1 for affected tables and the replica rejects writes. For replicated tables, this is a coordination failure, not a disk or memory problem: the replica has lost its session with ClickHouse Keeper or ZooKeeper, or the ensemble is unreachable.
Reads from local parts may still succeed, which hides the failure from load balancers and monitoring probes that rely on SELECT 1 or the HTTP ping endpoint. Brief readonly states lasting seconds are normal during Keeper leader elections. Sustained readonly lasting minutes means the replica is diverging and inserts are being lost or routed elsewhere.
Distinguish transient blips from sustained failures, diagnose whether the root cause is a network partition, ensemble degradation, or session deadlock, and recover without making the incident worse.
What this means
Every ReplicatedMergeTree table holds a persistent session with the coordination service to register the replica, track the replication log, and serialize distributed DDL. When the session is lost or cannot be reinitialized after a network blip or ensemble event, the replica sets is_readonly = 1 to prevent divergent writes. is_session_expired in system.replicas is usually 1 alongside it, confirming session termination rather than a local disk issue.
While readonly, the replica stops accepting inserts and stops scheduling merges that require coordination. Queries that only read local parts may still succeed, so load balancer health checks based on SELECT 1 or the HTTP ping endpoint often miss this condition. If the replica is the only writable node for its shard, or if the application does not retry against other replicas, the impact is a hard write outage. If other replicas remain healthy, the impact is limited to replication lag and reduced redundancy.
flowchart TD
A[is_readonly detected] --> B{session expired?}
B -->|Yes| C[Check Keeper health]
B -->|No| D[Check manual readonly config]
C --> E{Keeper responsive?}
E -->|No| F[Fix ensemble first]
E -->|Yes| G[Restart ClickHouse]
F --> H[Pause DDL check leader]
G --> I[Monitor replication_queue]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Lost Keeper or ZooKeeper session | is_readonly = 1 and is_session_expired = 1 on the replica | system.zookeeper_connection and system.zookeeper connectivity |
| Keeper ensemble degraded or partitioned | Multiple replicas readonly; DDL hangs; replication queues grow on many tables | Keeper node liveness and leader election status |
| Coordination overload | Session flapping or timeouts under heavy DDL or a very high replicated table count | Keeper latency and system.distributed_ddl_queue depth |
| Network partition between replica and Keeper | Single replica readonly while peers remain writable | Network path and latency from the affected host to Keeper nodes |
| Brief leader election | Readonly lasts seconds and self-heals | system.replicas shows recovery without intervention |
Quick checks
-- Check replica readonly and session state
SELECT
database,
table,
is_readonly,
is_session_expired,
is_leader,
total_replicas,
active_replicas
FROM system.replicas
WHERE engine LIKE '%Replicated%';
-- Check Keeper connection details and expiry
SELECT
name,
host,
port,
is_expired,
session_uptime_elapsed_seconds,
session_timeout_ms
FROM system.zookeeper_connection;
-- Test live coordination connectivity
SELECT * FROM system.zookeeper WHERE path = '/' LIMIT 1;
# Check external ZooKeeper responsiveness
echo ruok | nc <zookeeper-host> 2181
# Check ClickHouse Keeper responsiveness
echo ruok | nc localhost 9181
-- Find stuck replication queue entries
SELECT
database,
table,
type,
num_tries,
last_exception
FROM system.replication_queue
WHERE num_tries > 0
ORDER BY num_tries DESC
LIMIT 20;
How to diagnose it
Confirm the blast radius. Run the
system.replicasquery to see whether one table, one replica, or the entire cluster is affected. If all replicas for a shard are readonly, the coordination service itself is likely down. If only one replica is affected, suspect a network or session issue localized to that node.Verify session expiry. If
is_session_expired = 1, the replica has lost its coordination session. Ifis_readonly = 1butis_session_expired = 0, check whether the table was placed in readonly mode intentionally through configuration.Test coordination connectivity. Query
system.zookeeperor useecho ruok | ncagainst each Keeper endpoint. If the connection test fails or hangs, the problem is below ClickHouse. Address the ensemble before restarting anything.Inspect the DDL queue. Query
system.distributed_ddl_queuefor entries that are not finished. A DDL storm can saturate Keeper with metadata operations and delay session recovery for all replicas.Examine replication queues. Query
system.replication_queuefor entries with highnum_triesand non-emptylast_exception. A replica that cannot apply log entries may remain in a bad state even after its session reconnects, because the stuck queue entry prevents progression.Determine if the event is transient. If
is_readonlyflips to 1 and back within seconds during a known Keeper leader election, no action is needed. If it persists for more than five minutes, proceed to the fixes section.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
system.replicas.is_readonly | Direct indicator that a replica cannot accept writes | = 1 sustained for more than 5 minutes |
system.replicas.is_session_expired | Confirms the replica has lost its coordination session | = 1 on any production replica |
system.zookeeper_connection.is_expired | Shows session expiry from the connection table | = 1 or the connection query fails |
| Keeper round-trip latency | High latency precedes session timeouts | Sustained latency above 30 percent of session_timeout_ms |
system.replicas.active_replicas vs total_replicas | Reveals replica visibility loss | active_replicas is less than total_replicas |
system.replication_queue.num_tries | Stuck entries block catch-up and can hold state | Any entry with num_tries growing past 5 |
Fixes
If the Keeper ensemble is degraded
Do not restart ClickHouse nodes while the coordination service is struggling. Restarts create reconnection storms that amplify overload.
- Verify all Keeper nodes are running and a leader is elected.
- Check disk latency on Keeper nodes. Transaction log writes are synchronous, and slow disks are a common bottleneck.
- Pause non-critical DDL to reduce metadata load.
- If you use external ZooKeeper, check JVM heap and garbage collection. GC pauses are a known cause of session timeouts. ClickHouse Keeper is not JVM-based, but it can still suffer from Raft leader election latency under load.
If the replica has an expired session but Keeper is healthy
When the ensemble is responsive but a replica remains is_session_expired = 1, the session may not self-heal. Restart the ClickHouse server process on the affected replica to force reconnection. This aborts in-flight queries and requires the node to reload metadata, but it is the reliable path to reinitialize the session and clear readonly state.
Warning: Restarting ClickHouse is disruptive. It aborts in-flight queries and forces metadata reload.
- After restart, watch
system.replicasforis_readonlyreturning to 0 andqueue_sizebeginning to drain. - If multiple replicas are affected, restart them one at a time to avoid dropping all shard capacity simultaneously.
If replication queues are stuck
A replica can reconnect to Keeper but remain blocked by a replication queue entry that repeatedly fails.
- Inspect
system.replication_queueforlast_exceptionmessages indicating fetch or merge failures. - If the root cause for the queue failure has been resolved and the entry does not advance, a process restart may clear the stale replication task state.
If tables entered readonly at startup
If ClickHouse started while Keeper was unreachable, replicated tables initialize as readonly and stay there. Once Keeper becomes reachable, sessions should reestablish automatically. If they do not, restart the ClickHouse process to force reconnection.
Prevention
- Monitor Keeper latency as a leading indicator, not just process liveness. Latency trending up predicts session timeouts before they happen.
- Gate alerts on
is_readonlywith a sustained duration of at least 5 minutes to avoid noise from brief leader elections. - Avoid DDL operations during peak ingest. DDL competes for the same coordination resources as replication.
- Keep network paths between ClickHouse and Keeper stable and within low latency bounds.
- Do not place Keeper transaction logs on slow or shared storage. Synchronous log writes make disk I/O a frequent bottleneck.
How Netdata helps
Netdata collects system.replicas metrics including is_readonly and is_session_expired across the cluster, so you can spot affected replicas without running SQL on each node. Keeper connection health and coordination latency are tracked alongside replica state to correlate readonly events with ensemble degradation. Replication queue depth and active replica counts help distinguish a single-node network partition from a cluster-wide quorum failure. Alerts on sustained readonly state and session expiry use windows that exclude brief leader-election blips.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse async inserts: when async_insert fixes too-many-parts and when it hides it
- ClickHouse DelayedInserts climbing: the warning before too-many-parts
- ClickHouse insert latency rising: the leading indicator of write-pipeline trouble
- ClickHouse Memory limit (for query) exceeded: per-query limits and GROUP BY/JOIN blowups
- ClickHouse Memory limit (total) exceeded - server-wide memory pressure and fixes
- ClickHouse memory pressure death spiral: runaway queries, retries, and OOM
- ClickHouse MemoryTracking vs MemoryResident: reading the memory gap correctly
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate
- ClickHouse merge duration climbing: the leading indicator of part explosion
- ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool
- ClickHouse monitoring checklist: the signals every production cluster needs







