PostgreSQL logical replication failures: conflicts, schema drift, and recovery

Your subscriber is behind. pg_stat_subscription shows a stalled LSN, the apply worker is throwing errors, or worse: replication appears healthy while the subscriber silently diverges from the publisher. Logical replication does not replicate DDL, does not resolve row conflicts automatically, and will halt on the first integrity error it encounters. When it breaks, the failure is often on the subscriber, but the root cause may be schema drift on either side, a forgotten replication slot on the publisher, or a local write that violated the assumption that the subscriber is read-only.

This guide covers the operational failure modes you will see in production: subscriber conflicts (PG17+), schema synchronization gaps, replica identity misconfiguration, and the recovery procedures to get replication moving again without causing more divergence.

What this means

Logical replication streams row-level changes (INSERT, UPDATE, DELETE) from a publication to a subscription via WAL decoding. Unlike physical streaming replication, the subscriber is a writable PostgreSQL instance with its own schema. The apply worker on the subscriber executes the changes as the subscription owner. If a row already exists when an INSERT arrives, if an UPDATE targets a missing row, or if the subscriber table structure does not match the incoming tuple, the apply worker stops and the subscription stalls. Replication slots on the publisher retain WAL until the subscriber acknowledges receipt, so a stalled subscription can also cause WAL accumulation and disk pressure upstream.

Common causes

CauseWhat it looks likeFirst thing to check
Schema drift (DDL mismatch)Subscriber logs show type mismatches or missing columns; apply worker exitsCompare \d output for published tables on both publisher and subscriber
Subscriber conflict (unique violation)PG17+ logs show insert_exists or update_exists; apply worker stopspg_stat_subscription_stats for conflict counts; subscriber table for duplicate keys
Missing replica identityUpdates and deletes are not replicated; or apply worker scans entire tablepg_class.relreplident for published tables
Local writes on subscriberApplication wrote to a replicated table; subsequent publisher INSERT conflictspg_stat_user_tables.n_tup_ins on the subscriber for unexpected inserts
Initial data sync collisionCREATE SUBSCRIPTION with copy_data = true fails with duplicate key errorsCheck if subscriber tables already contain data before initial sync
Replication slot retentionPublisher disk fills; pg_wal grows despite normal trafficpg_replication_slots for inactive logical slots with old restart_lsn

Quick checks

Run these safe, read-only queries to orient yourself.

-- Check subscription apply state and message timing
SELECT subname, pid, received_lsn, latest_end_lsn,
       last_msg_send_time, last_msg_receipt_time
FROM pg_stat_subscription;
-- Check logical slot health on the publisher
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_type = 'logical';
-- Check subscriber-side conflict counts (PG17+)
SELECT subname, conflict_type, conflict_count
FROM pg_stat_subscription_stats;
-- Verify replica identity settings on published tables
SELECT relname, relreplident
FROM pg_class
WHERE oid = ANY (
    SELECT prrelid FROM pg_publication_rel
);
-- Detect local writes on subscriber tables that should be read-only
SELECT schemaname, relname, n_tup_ins, n_tup_upd, n_tup_del
FROM pg_stat_user_tables
ORDER BY n_tup_upd DESC;
-- Measure WAL retained for logical slots on the publisher
SELECT slot_name,
       pg_size_pretty(
           pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
       ) AS retained_wal
FROM pg_replication_slots
WHERE slot_type = 'logical';

How to diagnose it

Follow this sequence to isolate the failure layer.

flowchart TD
    A[Subscription LSN not advancing] --> B{Subscriber log shows conflict?}
    B -->|Yes| C[Classify conflict type from PG17+ log]
    B -->|No| D[Check publisher slot active?]
    C --> E{Schema mismatch?}
    E -->|Yes| F[Compare column types and constraints]
    E -->|No| G[Check for local writes or duplicate keys]
    D -->|Inactive| H[Slot retaining WAL or consumer dead]
    D -->|Active| I[Check replica identity and RLS]
  1. Confirm the stall. Run the pg_stat_subscription query. If received_lsn is static and last_msg_receipt_time is stale while the publisher is active, the apply worker is blocked.
  2. Read the subscriber logs. In PG17+, conflict errors include the conflict type, transaction LSN, and key values. Look for insert_exists, update_exists, update_missing, or delete_missing.
  3. Check the publisher slot. If the slot is inactive and restart_lsn lags pg_current_wal_lsn() by gigabytes, the publisher is retaining WAL for a dead consumer. This does not explain a subscriber-side error, but it will fill the disk if the subscription is broken.
  4. Compare schemas. Logical replication does not replicate DDL. If a column was added to the publisher but not the subscriber, or types were narrowed, the apply worker will error. Verify column names, types, and order.
  5. Check for local writes. Query pg_stat_user_tables on the subscriber. Any INSERT or UPDATE on a replicated table risks a unique-conflict halt.
  6. Verify replica identity. If UPDATE or DELETE operations are failing or causing sequential scans, check pg_class.relreplident. Tables without a primary key and without REPLICA IDENTITY FULL cannot replicate deletes efficiently. Tables with REPLICA IDENTITY FULL require a unique index on the subscriber for efficient row lookup.
  7. Test for same-cluster hangs. Creating a subscription between two databases in the same PostgreSQL cluster hangs indefinitely. This is a known limitation.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pg_stat_subscription.received_lsnTracks how far the subscriber has appliedStalls for >60 seconds during active publisher writes
pg_stat_subscription_stats.conflict_count (PG17+)Counts apply conflicts by typeNon-zero insert_exists or update_exists
pg_replication_slots.restart_lsn lagMeasures publisher WAL retained for the slot>1 GB retained for an inactive logical slot
pg_stat_subscription.last_msg_receipt_timeLast message received from publisherTimestamp older than publisher activity window
Subscriber n_tup_ins on replicated tablesDetects local writes that will conflictInserts appearing on tables that should be read-only
pg_class.relreplidentConfirms tables can replicate updates/deletesValue n (nothing) on tables with updates/deletes

Fixes

Resolve subscriber conflicts

When the apply worker halts on a conflict, PG17+ logs the finish LSN of the offending transaction. You have three resolution paths, listed from least to most destructive.

Delete the conflicting row and restart. If a local INSERT caused an insert_exists conflict, delete the conflicting row on the subscriber so the publisher INSERT can apply. This is the safest path when the subscriber row is a stale duplicate.

Skip the transaction (PG17+). If the conflict is part of a large transaction and you accept losing all changes in that transaction:

ALTER SUBSCRIPTION subname SKIP (lsn = '0/034F4090');

This drops every change in the transaction, not just the conflicting row. Use only when inconsistency is acceptable or when you will resync the table afterward.

Advance the origin past the conflict. Alternatively, use pg_replication_origin_advance() to move the origin LSN beyond the failed transaction. This also skips the transaction entirely and carries the same consistency risk.

Fix schema drift

Logical replication requires manual DDL synchronization.

  • Additive changes (add column, add index): apply to the subscriber first, then the publisher. This ensures the subscriber never rejects an incoming tuple.
  • Non-additive changes (drop column, alter type, rename column): apply to the publisher first, then the subscriber. Be aware that in-flight changes may fail during the window.

After applying DDL, monitor pg_stat_subscription.last_msg_receipt_time to confirm replication resumed.

Correct replica identity

If updates or deletes are not replicating, or are causing full-table scans on the subscriber:

-- Use the primary key (default if one exists)
ALTER TABLE table_name REPLICA IDENTITY DEFAULT;

-- Or use a specific unique index
ALTER TABLE table_name REPLICA IDENTITY USING INDEX index_name;

-- Last resort: log full row (increases WAL size significantly)
ALTER TABLE table_name REPLICA IDENTITY FULL;

REPLICA IDENTITY FULL increases WAL size because the entire row becomes the key. On the subscriber side, PostgreSQL 16+ can use any unique index for efficient row lookup even with REPLICA IDENTITY FULL; without one, the subscriber performs a sequential scan per change.

Handle initial sync duplicates

If you create a subscription with copy_data = true and the subscriber already contains data, duplicate key errors occur during the initial table copy. Resolution: disable the subscription, truncate the conflicting table, then re-enable or refresh the subscription. If the data already matches, create the subscription with copy_data = false.

Address row-level security

Replication runs as the subscription owner. If row-level security is enabled on target tables and the subscription owner is subject to RLS policies, replication will block even if the policy would ordinarily allow the operation. Disable RLS on replicated tables:

ALTER TABLE table_name DISABLE ROW LEVEL SECURITY;

Prevention

  • Treat subscribers as read-only for replicated relations. Route all writes to the publisher.
  • Apply DDL through a pipeline that targets both publisher and subscriber in the correct order.
  • Verify replica identity before adding tables to a publication.
  • Monitor pg_stat_subscription_stats (PG17+) for early conflict detection.
  • Monitor pg_replication_slots for inactive logical slots before they fill the publisher disk.
  • Set max_slot_wal_keep_size (PG13+) to bound WAL retention for logical slots.
  • Avoid creating subscriptions between databases in the same cluster.

How Netdata helps

  • Correlate stalled received_lsn with publisher disk I/O to detect WAL retention pressure before the volume fills.
  • Alert on inactive logical replication slots using slot activity metrics.
  • Flag unexpected subscriber table activity that indicates local writes conflicting with the apply worker.
  • Track PostgreSQL connection state distributions to spot apply worker restarts or connection churn during conflict recovery.