ClickHouse Table is in readonly mode: is_readonly on replicated tables and how to fix it

Inserts fail with TABLE_IS_READ_ONLY. system.replicas shows is_readonly = 1 for affected tables and the replica rejects writes. For replicated tables, this is a coordination failure, not a disk or memory problem: the replica has lost its session with ClickHouse Keeper or ZooKeeper, or the ensemble is unreachable.

Reads from local parts may still succeed, which hides the failure from load balancers and monitoring probes that rely on SELECT 1 or the HTTP ping endpoint. Brief readonly states lasting seconds are normal during Keeper leader elections. Sustained readonly lasting minutes means the replica is diverging and inserts are being lost or routed elsewhere.

Distinguish transient blips from sustained failures, diagnose whether the root cause is a network partition, ensemble degradation, or session deadlock, and recover without making the incident worse.

What this means

Every ReplicatedMergeTree table holds a persistent session with the coordination service to register the replica, track the replication log, and serialize distributed DDL. When the session is lost or cannot be reinitialized after a network blip or ensemble event, the replica sets is_readonly = 1 to prevent divergent writes. is_session_expired in system.replicas is usually 1 alongside it, confirming session termination rather than a local disk issue.

While readonly, the replica stops accepting inserts and stops scheduling merges that require coordination. Queries that only read local parts may still succeed, so load balancer health checks based on SELECT 1 or the HTTP ping endpoint often miss this condition. If the replica is the only writable node for its shard, or if the application does not retry against other replicas, the impact is a hard write outage. If other replicas remain healthy, the impact is limited to replication lag and reduced redundancy.

flowchart TD
    A[is_readonly detected] --> B{session expired?}
    B -->|Yes| C[Check Keeper health]
    B -->|No| D[Check manual readonly config]
    C --> E{Keeper responsive?}
    E -->|No| F[Fix ensemble first]
    E -->|Yes| G[Restart ClickHouse]
    F --> H[Pause DDL check leader]
    G --> I[Monitor replication_queue]

Common causes

CauseWhat it looks likeFirst thing to check
Lost Keeper or ZooKeeper sessionis_readonly = 1 and is_session_expired = 1 on the replicasystem.zookeeper_connection and system.zookeeper connectivity
Keeper ensemble degraded or partitionedMultiple replicas readonly; DDL hangs; replication queues grow on many tablesKeeper node liveness and leader election status
Coordination overloadSession flapping or timeouts under heavy DDL or a very high replicated table countKeeper latency and system.distributed_ddl_queue depth
Network partition between replica and KeeperSingle replica readonly while peers remain writableNetwork path and latency from the affected host to Keeper nodes
Brief leader electionReadonly lasts seconds and self-healssystem.replicas shows recovery without intervention

Quick checks

-- Check replica readonly and session state
SELECT
    database,
    table,
    is_readonly,
    is_session_expired,
    is_leader,
    total_replicas,
    active_replicas
FROM system.replicas
WHERE engine LIKE '%Replicated%';
-- Check Keeper connection details and expiry
SELECT
    name,
    host,
    port,
    is_expired,
    session_uptime_elapsed_seconds,
    session_timeout_ms
FROM system.zookeeper_connection;
-- Test live coordination connectivity
SELECT * FROM system.zookeeper WHERE path = '/' LIMIT 1;
# Check external ZooKeeper responsiveness
echo ruok | nc <zookeeper-host> 2181
# Check ClickHouse Keeper responsiveness
echo ruok | nc localhost 9181
-- Find stuck replication queue entries
SELECT
    database,
    table,
    type,
    num_tries,
    last_exception
FROM system.replication_queue
WHERE num_tries > 0
ORDER BY num_tries DESC
LIMIT 20;

How to diagnose it

  1. Confirm the blast radius. Run the system.replicas query to see whether one table, one replica, or the entire cluster is affected. If all replicas for a shard are readonly, the coordination service itself is likely down. If only one replica is affected, suspect a network or session issue localized to that node.

  2. Verify session expiry. If is_session_expired = 1, the replica has lost its coordination session. If is_readonly = 1 but is_session_expired = 0, check whether the table was placed in readonly mode intentionally through configuration.

  3. Test coordination connectivity. Query system.zookeeper or use echo ruok | nc against each Keeper endpoint. If the connection test fails or hangs, the problem is below ClickHouse. Address the ensemble before restarting anything.

  4. Inspect the DDL queue. Query system.distributed_ddl_queue for entries that are not finished. A DDL storm can saturate Keeper with metadata operations and delay session recovery for all replicas.

  5. Examine replication queues. Query system.replication_queue for entries with high num_tries and non-empty last_exception. A replica that cannot apply log entries may remain in a bad state even after its session reconnects, because the stuck queue entry prevents progression.

  6. Determine if the event is transient. If is_readonly flips to 1 and back within seconds during a known Keeper leader election, no action is needed. If it persists for more than five minutes, proceed to the fixes section.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
system.replicas.is_readonlyDirect indicator that a replica cannot accept writes= 1 sustained for more than 5 minutes
system.replicas.is_session_expiredConfirms the replica has lost its coordination session= 1 on any production replica
system.zookeeper_connection.is_expiredShows session expiry from the connection table= 1 or the connection query fails
Keeper round-trip latencyHigh latency precedes session timeoutsSustained latency above 30 percent of session_timeout_ms
system.replicas.active_replicas vs total_replicasReveals replica visibility lossactive_replicas is less than total_replicas
system.replication_queue.num_triesStuck entries block catch-up and can hold stateAny entry with num_tries growing past 5

Fixes

If the Keeper ensemble is degraded

Do not restart ClickHouse nodes while the coordination service is struggling. Restarts create reconnection storms that amplify overload.

  • Verify all Keeper nodes are running and a leader is elected.
  • Check disk latency on Keeper nodes. Transaction log writes are synchronous, and slow disks are a common bottleneck.
  • Pause non-critical DDL to reduce metadata load.
  • If you use external ZooKeeper, check JVM heap and garbage collection. GC pauses are a known cause of session timeouts. ClickHouse Keeper is not JVM-based, but it can still suffer from Raft leader election latency under load.

If the replica has an expired session but Keeper is healthy

When the ensemble is responsive but a replica remains is_session_expired = 1, the session may not self-heal. Restart the ClickHouse server process on the affected replica to force reconnection. This aborts in-flight queries and requires the node to reload metadata, but it is the reliable path to reinitialize the session and clear readonly state.

Warning: Restarting ClickHouse is disruptive. It aborts in-flight queries and forces metadata reload.

  • After restart, watch system.replicas for is_readonly returning to 0 and queue_size beginning to drain.
  • If multiple replicas are affected, restart them one at a time to avoid dropping all shard capacity simultaneously.

If replication queues are stuck

A replica can reconnect to Keeper but remain blocked by a replication queue entry that repeatedly fails.

  • Inspect system.replication_queue for last_exception messages indicating fetch or merge failures.
  • If the root cause for the queue failure has been resolved and the entry does not advance, a process restart may clear the stale replication task state.

If tables entered readonly at startup

If ClickHouse started while Keeper was unreachable, replicated tables initialize as readonly and stay there. Once Keeper becomes reachable, sessions should reestablish automatically. If they do not, restart the ClickHouse process to force reconnection.

Prevention

  • Monitor Keeper latency as a leading indicator, not just process liveness. Latency trending up predicts session timeouts before they happen.
  • Gate alerts on is_readonly with a sustained duration of at least 5 minutes to avoid noise from brief leader elections.
  • Avoid DDL operations during peak ingest. DDL competes for the same coordination resources as replication.
  • Keep network paths between ClickHouse and Keeper stable and within low latency bounds.
  • Do not place Keeper transaction logs on slow or shared storage. Synchronous log writes make disk I/O a frequent bottleneck.

How Netdata helps

Netdata collects system.replicas metrics including is_readonly and is_session_expired across the cluster, so you can spot affected replicas without running SQL on each node. Keeper connection health and coordination latency are tracked alongside replica state to correlate readonly events with ensemble degradation. Replication queue depth and active replica counts help distinguish a single-node network partition from a cluster-wide quorum failure. Alerts on sustained readonly state and session expiry use windows that exclude brief leader-election blips.