ClickHouse ReplicatedDataLoss > 0: detecting and responding to lost parts
ReplicatedDataLoss > 0 is a hard signal in ClickHouse. A nonzero value in system.events means the server has determined that a data part is missing and cannot be retrieved from any available replica. This is not replication lag, a transient fetch failure, or the normal ReplicatedPartFetchesOfMerged optimization.
Queries that touch the affected part can return incomplete results or errors. The immediate risk is silent divergence between replicas, where one replica serves stale or incomplete results without failing the query. Confirm the event, identify the scope, and determine whether a healthy peer still has the part.
What this means
ReplicatedDataLoss increments only after the server exhausts recovery options for a part. Distinguish it from:
ReplicatedPartFailedFetches: temporary fetch degradation that may self-resolve when the source recovers.ReplicatedPartChecksFailed: integrity problems that may escalate to declared loss if left unaddressed.ReplicatedPartFetchesOfMerged: normal optimization where replicas fetch already-merged parts from peers instead of merging locally. This counter grows steadily and is expected behavior.
When ReplicatedDataLoss, ReplicatedPartChecksFailed, and ReplicatedPartFailedFetches climb together, a replica is failing to fetch a part, failing to verify it, and ultimately giving up.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Source replica disk corruption or checksum failure | ReplicatedPartChecksFailed climbing alongside ReplicatedDataLoss; parts may appear in system.detached_parts | system.replication_queue.last_exception for checksum or corrupt part errors |
| Source replica unavailable or network partitioned | ReplicatedPartFailedFetches increasing; queue entries retrying against one source | system.replicas for active_replicas < total_replicas or is_session_expired = 1 |
| Part merged or dropped on source before replica fetches it | Fetch fails because the unmerged source part no longer exists; queue shows GET_PART with high num_tries | system.parts on the source for the merged part; check whether ReplicatedPartFetchesOfMerged is incrementing |
| Accidental local deletion or detached parts | Local parts missing but other replicas healthy; queue may be empty while data diverges | Cross-replica row and partition counts |
| Hardware or filesystem-level corruption | Detached parts with corruption-related reasons; errors in OS logs | system.detached_parts.reason and dmesg for hardware errors |
Quick checks
Run these safe, read-only checks to orient yourself during the first minutes of the incident.
-- Check event counters for loss and related errors
SELECT event, value FROM system.events
WHERE event IN (
'ReplicatedPartFailedFetches',
'ReplicatedPartChecksFailed',
'ReplicatedDataLoss',
'ReplicatedPartFetchesOfMerged'
);
-- Inspect the replication queue for stuck entries
SELECT database, table, type, source_replica,
num_tries, last_exception,
create_time, last_attempt_time
FROM system.replication_queue
WHERE num_tries > 3
ORDER BY num_tries DESC;
-- Check replica availability and session state
SELECT database, table, is_leader, is_readonly, is_session_expired,
active_replicas, total_replicas
FROM system.replicas
WHERE is_session_expired = 1 OR is_readonly = 1;
-- Look for parts detached due to corruption or fetch failures
SELECT database, table, name, reason, modification_time
FROM system.detached_parts
ORDER BY modification_time DESC;
-- Search ClickHouse logs for corruption indicators
grep -Ei 'checksum|corrupt|Broken part|Cannot read all data|Mismatch' /var/log/clickhouse-server/*.log | tail -100
-- Check OS-level hardware errors
dmesg | tail -100
-- Compare row counts across replicas for suspected tables
-- Run this on each replica and compare results
SELECT count() FROM your_db.your_table;
-- Compare partition-level counts across replicas
-- Run this on each replica
SELECT partition_id, sum(rows) FROM system.parts WHERE active GROUP BY partition_id;
How to diagnose it
flowchart TD
A[ReplicatedDataLoss alert] --> B{Distinguish from normal fetches}
B --> C[Check system.replication_queue]
C --> D{Stuck entries with exceptions?}
D -->|Yes| E[Check source replica health]
D -->|No| F[Compare row counts across replicas]
E --> G{Source has the part?}
G -->|Yes| H[Restart replica to re-fetch]
G -->|No| I[Assess blast radius]
F --> J[Divergence found] --> I
F --> K[No divergence] --> L[Investigate detached parts]
H --> M[Monitor queue for completion]
I --> N[Restore from peer or rebuild partition]Confirm the event. Query
system.eventsforReplicatedDataLoss. If it is nonzero, checkReplicatedPartFetchesOfMergedat the same time. If onlyReplicatedPartFetchesOfMergedis moving and the others are flat, this is normal optimization traffic, not data loss.Identify the affected table and replica. Use
system.replication_queueto find entries with highnum_triesand non-emptylast_exception. Thedatabase,table, andsource_replicacolumns tell you which peer the replica was trying to fetch from when it failed.Check source replica health. On the source replica, verify it is not readonly or session-expired using
system.replicas. Check itssystem.partsto see if the missing part still exists there. If the source has the part but is refusing fetches due to network or load, the loss may be recoverable once the source stabilizes.Assess whether the part was merged away. If the queue shows
GET_PARTfor an unmerged part that no longer exists on the source, check whether the merged result is available. ClickHouse often fetches the merged part instead viaReplicatedPartFetchesOfMerged. If the merged part is also missing everywhere, proceed to blast-radius assessment.Measure blast radius. Run
SELECT count() FROM tableon every replica. If counts differ, the loss has already caused divergence. UseSELECT partition_id, sum(rows) FROM system.parts WHERE active GROUP BY partition_idon each replica to identify exactly which partitions are affected. Checksystem.detached_partsto see if the part was locally detached rather than lost from the cluster.Check for systemic hardware issues. If
last_exceptionmentions checksum failures orsystem.detached_partsshows corruption-related reasons, checkdmesgfor disk or memory errors on both the affected replica and the source. Repeated corruption on the same host indicates a hardware problem.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
ReplicatedDataLoss | Direct indicator that a part is irretrievably lost | Any nonzero value |
ReplicatedPartChecksFailed | Integrity check failures can escalate into declared data loss | Sustained increase over multiple minutes |
ReplicatedPartFailedFetches | Degraded replication; source replica may be failing or unreachable | Nonzero rate sustained outside of restart recovery |
system.replication_queue.num_tries | Permanently stuck entries will not self-heal | Any entry with num_tries > 10 |
system.detached_parts | Parts removed due to corruption or failed operations | Unexpected growth with corruption-related reasons |
| Cross-replica row counts | Detects silent divergence when the replication queue appears healthy | Mismatch between replicas for the same table |
Fixes
If a healthy replica still has the part
When another replica holds the missing part, force the affected replica to re-evaluate its state against ZooKeeper or ClickHouse Keeper and re-fetch.
-- Force re-check against coordination service state
SYSTEM RESTART REPLICA db.table;
After running this, monitor system.replication_queue to confirm the part is being fetched and that num_tries resets. If the replica is widely diverged or SYSTEM RESTART REPLICA does not resolve the loss, use:
-- Reinitializes replica metadata and triggers re-fetches from peers
SYSTEM RESTORE REPLICA db.table;
Warning: SYSTEM RESTORE REPLICA reinitializes the replica and can force a full re-fetch from peers. It is disruptive and I/O-intensive.
If no replica has the part
When ReplicatedDataLoss has incremented and no peer can provide the part, the data is confirmed lost. Determine the blast radius: which table, which partition, and what time range. Recover from your organization’s backup procedures if they cover the affected partition. If no backup is available, you may need to drop or detach the affected partition to prevent queries from failing on missing data.
Handle detached parts
If system.detached_parts shows parts with corruption-related reasons, do not reattach them blindly. The reason column explains why ClickHouse removed them. Investigate the underlying cause, which is often hardware or filesystem corruption. If a healthy source replica exists, let the replica re-fetch the part instead of reattaching a potentially corrupt local copy.
Address underlying hardware
If dmesg or system.detached_parts points to disk or memory corruption, replace the affected hardware before restoring the replica. Re-fetching parts onto the same failing disk will reproduce the corruption and trigger another data loss event.
Prevention
- Monitor
ReplicatedPartChecksFailedandReplicatedPartFailedFetchesas leading indicators. Address them before they compound intoReplicatedDataLoss. - Monitor
system.replication_queuefor stuck entries with highnum_tries. A queue entry that is not making progress is a data-loss risk. - Run periodic cross-replica row count and partition-level comparisons. Silent divergence produces zero queue entries and no standard replication alerts.
- Do not treat ZooKeeper or ClickHouse Keeper as fire-and-forget infrastructure. Session expiry and coordination latency directly cascade into replication failures.
- Investigate unexpected
system.detached_partsimmediately. They are often the first visible sign of disk or filesystem corruption.
How Netdata helps
- Correlates
ReplicatedDataLosswithReplicatedPartFailedFetchesandReplicatedPartChecksFailedso you can see whether the loss followed a fetch degradation or an integrity failure. - Alerts on nonzero
ReplicatedDataLoss. - Surfaces replication queue depth, stuck entries, and replica session state alongside the event to accelerate root cause analysis.
- Correlates replication errors with disk I/O, network throughput, and ZooKeeper health signals to distinguish source replica pressure from coordination failures.
- Tracks per-replica lag and availability, helping you identify which peer should serve as the recovery source.
Related guides
- ClickHouse active part count growing: reading MaxPartCountForPartition before it pages
- ClickHouse ALTER UPDATE/DELETE overuse: why mutations are not row updates
- ClickHouse async inserts: when async_insert fixes too-many-parts and when it hides it
- ClickHouse DelayedInserts climbing: the warning before too-many-parts
- ClickHouse insert latency rising: the leading indicator of write-pipeline trouble
- ClickHouse Memory limit (for query) exceeded: per-query limits and GROUP BY/JOIN blowups
- ClickHouse Memory limit (total) exceeded - server-wide memory pressure and fixes
- ClickHouse memory pressure death spiral: runaway queries, retries, and OOM
- ClickHouse MemoryTracking vs MemoryResident: reading the memory gap correctly
- ClickHouse merge death spiral: when parts accumulate faster than merges consolidate
- ClickHouse merge duration climbing: the leading indicator of part explosion
- ClickHouse merges not keeping up: diagnosing a stalled or starved merge pool







