PostgreSQL logical replication failures: conflicts, schema drift, and recovery
Your subscriber is behind. pg_stat_subscription shows a stalled LSN, the apply worker is throwing errors, or worse: replication appears healthy while the subscriber silently diverges from the publisher. Logical replication does not replicate DDL, does not resolve row conflicts automatically, and will halt on the first integrity error it encounters. When it breaks, the failure is often on the subscriber, but the root cause may be schema drift on either side, a forgotten replication slot on the publisher, or a local write that violated the assumption that the subscriber is read-only.
This guide covers the operational failure modes you will see in production: subscriber conflicts (PG17+), schema synchronization gaps, replica identity misconfiguration, and the recovery procedures to get replication moving again without causing more divergence.
What this means
Logical replication streams row-level changes (INSERT, UPDATE, DELETE) from a publication to a subscription via WAL decoding. Unlike physical streaming replication, the subscriber is a writable PostgreSQL instance with its own schema. The apply worker on the subscriber executes the changes as the subscription owner. If a row already exists when an INSERT arrives, if an UPDATE targets a missing row, or if the subscriber table structure does not match the incoming tuple, the apply worker stops and the subscription stalls. Replication slots on the publisher retain WAL until the subscriber acknowledges receipt, so a stalled subscription can also cause WAL accumulation and disk pressure upstream.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Schema drift (DDL mismatch) | Subscriber logs show type mismatches or missing columns; apply worker exits | Compare \d output for published tables on both publisher and subscriber |
| Subscriber conflict (unique violation) | PG17+ logs show insert_exists or update_exists; apply worker stops | pg_stat_subscription_stats for conflict counts; subscriber table for duplicate keys |
| Missing replica identity | Updates and deletes are not replicated; or apply worker scans entire table | pg_class.relreplident for published tables |
| Local writes on subscriber | Application wrote to a replicated table; subsequent publisher INSERT conflicts | pg_stat_user_tables.n_tup_ins on the subscriber for unexpected inserts |
| Initial data sync collision | CREATE SUBSCRIPTION with copy_data = true fails with duplicate key errors | Check if subscriber tables already contain data before initial sync |
| Replication slot retention | Publisher disk fills; pg_wal grows despite normal traffic | pg_replication_slots for inactive logical slots with old restart_lsn |
Quick checks
Run these safe, read-only queries to orient yourself.
-- Check subscription apply state and message timing
SELECT subname, pid, received_lsn, latest_end_lsn,
last_msg_send_time, last_msg_receipt_time
FROM pg_stat_subscription;
-- Check logical slot health on the publisher
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_type = 'logical';
-- Check subscriber-side conflict counts (PG17+)
SELECT subname, conflict_type, conflict_count
FROM pg_stat_subscription_stats;
-- Verify replica identity settings on published tables
SELECT relname, relreplident
FROM pg_class
WHERE oid = ANY (
SELECT prrelid FROM pg_publication_rel
);
-- Detect local writes on subscriber tables that should be read-only
SELECT schemaname, relname, n_tup_ins, n_tup_upd, n_tup_del
FROM pg_stat_user_tables
ORDER BY n_tup_upd DESC;
-- Measure WAL retained for logical slots on the publisher
SELECT slot_name,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
) AS retained_wal
FROM pg_replication_slots
WHERE slot_type = 'logical';
How to diagnose it
Follow this sequence to isolate the failure layer.
flowchart TD
A[Subscription LSN not advancing] --> B{Subscriber log shows conflict?}
B -->|Yes| C[Classify conflict type from PG17+ log]
B -->|No| D[Check publisher slot active?]
C --> E{Schema mismatch?}
E -->|Yes| F[Compare column types and constraints]
E -->|No| G[Check for local writes or duplicate keys]
D -->|Inactive| H[Slot retaining WAL or consumer dead]
D -->|Active| I[Check replica identity and RLS]- Confirm the stall. Run the
pg_stat_subscriptionquery. Ifreceived_lsnis static andlast_msg_receipt_timeis stale while the publisher is active, the apply worker is blocked. - Read the subscriber logs. In PG17+, conflict errors include the conflict type, transaction LSN, and key values. Look for
insert_exists,update_exists,update_missing, ordelete_missing. - Check the publisher slot. If the slot is inactive and
restart_lsnlagspg_current_wal_lsn()by gigabytes, the publisher is retaining WAL for a dead consumer. This does not explain a subscriber-side error, but it will fill the disk if the subscription is broken. - Compare schemas. Logical replication does not replicate DDL. If a column was added to the publisher but not the subscriber, or types were narrowed, the apply worker will error. Verify column names, types, and order.
- Check for local writes. Query
pg_stat_user_tableson the subscriber. Any INSERT or UPDATE on a replicated table risks a unique-conflict halt. - Verify replica identity. If UPDATE or DELETE operations are failing or causing sequential scans, check
pg_class.relreplident. Tables without a primary key and withoutREPLICA IDENTITY FULLcannot replicate deletes efficiently. Tables withREPLICA IDENTITY FULLrequire a unique index on the subscriber for efficient row lookup. - Test for same-cluster hangs. Creating a subscription between two databases in the same PostgreSQL cluster hangs indefinitely. This is a known limitation.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pg_stat_subscription.received_lsn | Tracks how far the subscriber has applied | Stalls for >60 seconds during active publisher writes |
pg_stat_subscription_stats.conflict_count (PG17+) | Counts apply conflicts by type | Non-zero insert_exists or update_exists |
pg_replication_slots.restart_lsn lag | Measures publisher WAL retained for the slot | >1 GB retained for an inactive logical slot |
pg_stat_subscription.last_msg_receipt_time | Last message received from publisher | Timestamp older than publisher activity window |
Subscriber n_tup_ins on replicated tables | Detects local writes that will conflict | Inserts appearing on tables that should be read-only |
pg_class.relreplident | Confirms tables can replicate updates/deletes | Value n (nothing) on tables with updates/deletes |
Fixes
Resolve subscriber conflicts
When the apply worker halts on a conflict, PG17+ logs the finish LSN of the offending transaction. You have three resolution paths, listed from least to most destructive.
Delete the conflicting row and restart. If a local INSERT caused an insert_exists conflict, delete the conflicting row on the subscriber so the publisher INSERT can apply. This is the safest path when the subscriber row is a stale duplicate.
Skip the transaction (PG17+). If the conflict is part of a large transaction and you accept losing all changes in that transaction:
ALTER SUBSCRIPTION subname SKIP (lsn = '0/034F4090');
This drops every change in the transaction, not just the conflicting row. Use only when inconsistency is acceptable or when you will resync the table afterward.
Advance the origin past the conflict. Alternatively, use pg_replication_origin_advance() to move the origin LSN beyond the failed transaction. This also skips the transaction entirely and carries the same consistency risk.
Fix schema drift
Logical replication requires manual DDL synchronization.
- Additive changes (add column, add index): apply to the subscriber first, then the publisher. This ensures the subscriber never rejects an incoming tuple.
- Non-additive changes (drop column, alter type, rename column): apply to the publisher first, then the subscriber. Be aware that in-flight changes may fail during the window.
After applying DDL, monitor pg_stat_subscription.last_msg_receipt_time to confirm replication resumed.
Correct replica identity
If updates or deletes are not replicating, or are causing full-table scans on the subscriber:
-- Use the primary key (default if one exists)
ALTER TABLE table_name REPLICA IDENTITY DEFAULT;
-- Or use a specific unique index
ALTER TABLE table_name REPLICA IDENTITY USING INDEX index_name;
-- Last resort: log full row (increases WAL size significantly)
ALTER TABLE table_name REPLICA IDENTITY FULL;
REPLICA IDENTITY FULL increases WAL size because the entire row becomes the key. On the subscriber side, PostgreSQL 16+ can use any unique index for efficient row lookup even with REPLICA IDENTITY FULL; without one, the subscriber performs a sequential scan per change.
Handle initial sync duplicates
If you create a subscription with copy_data = true and the subscriber already contains data, duplicate key errors occur during the initial table copy. Resolution: disable the subscription, truncate the conflicting table, then re-enable or refresh the subscription. If the data already matches, create the subscription with copy_data = false.
Address row-level security
Replication runs as the subscription owner. If row-level security is enabled on target tables and the subscription owner is subject to RLS policies, replication will block even if the policy would ordinarily allow the operation. Disable RLS on replicated tables:
ALTER TABLE table_name DISABLE ROW LEVEL SECURITY;
Prevention
- Treat subscribers as read-only for replicated relations. Route all writes to the publisher.
- Apply DDL through a pipeline that targets both publisher and subscriber in the correct order.
- Verify replica identity before adding tables to a publication.
- Monitor
pg_stat_subscription_stats(PG17+) for early conflict detection. - Monitor
pg_replication_slotsfor inactive logical slots before they fill the publisher disk. - Set
max_slot_wal_keep_size(PG13+) to bound WAL retention for logical slots. - Avoid creating subscriptions between databases in the same cluster.
How Netdata helps
- Correlate stalled
received_lsnwith publisher disk I/O to detect WAL retention pressure before the volume fills. - Alert on inactive logical replication slots using slot activity metrics.
- Flag unexpected subscriber table activity that indicates local writes conflicting with the apply worker.
- Track PostgreSQL connection state distributions to spot apply worker restarts or connection churn during conflict recovery.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns
- PostgreSQL autovacuum blocked by long-running transaction: detection and fix
- PostgreSQL autovacuum not running: detection, causes, and fixes
- PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads
- PostgreSQL backup strategy: pg_dump, pg_basebackup, and pgBackRest compared
- PostgreSQL blocking queries: finding the root blocker in a lock cascade
- PostgreSQL checkpoint storms: detection, causes, and tuning
- PostgreSQL: checkpoints are occurring too frequently – what to tune
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis
- PostgreSQL: database is not accepting commands to avoid wraparound data loss






