PostgreSQL replication lag: detection, diagnosis, and fixes

Replication lag is the distance between the last WAL record generated on the primary and the last record applied on a replica. In asynchronous streaming replication, a few seconds of lag is normal. When lag grows without bound, your recovery point objective becomes fiction.

Lag often grows silently. Replication processes stay connected, WAL streams flow, and uptime checks stay green while the byte gap creeps from megabytes to gigabytes. Promoting a replica that is hours behind destroys the consistency your application assumes.

This guide covers physical streaming replication lag. It shows how to decompose lag into network, disk, and apply components using pg_stat_replication, how to distinguish a slow network from a blocked replica, and how to fix the root cause without restarting PostgreSQL.

What this means

PostgreSQL appends every change to a WAL stream. The primary ships WAL to replica walreceiver processes; the replica applies records in strict LSN order. The primary exposes per-replica progress in pg_stat_replication via four LSNs: sent_lsn, write_lsn, flush_lsn, and replay_lsn. Subtracting them with pg_wal_lsn_diff yields exact byte distances, which is essential when the lag interval columns are NULL or when you need byte-based thresholds.

  • write_lag: time from WAL generation to the replica’s WAL receiver writing it.
  • flush_lag: time to flush that WAL to replica disk.
  • replay_lag: time for the startup process to apply the WAL so it is visible to queries.

These three intervals isolate network, disk, and apply bottlenecks.

On the replica, pg_last_xact_replay_timestamp() returns the commit timestamp of the last replayed transaction. now() - pg_last_xact_replay_timestamp() gives a coarse time-based estimate, but it stalls during quiet periods when the primary commits nothing.

Lag is fundamentally a byte offset, but operators usually read it as time. A replica 30 seconds behind an async primary is often acceptable. If the primary fails, those 30 seconds of committed writes are gone.

Common causes

CauseWhat it looks likeFirst thing to check
Network bottleneckwrite_lag grows on all replicas equally; primary WAL generation is highNetwork utilization and latency between primary and replicas
Replica disk saturationreplay_lag grows while write_lag and flush_lag stay smallReplica disk I/O latency and throughput
Long-running query blocking replayreplay_lag spikes; pg_stat_database.conflicts incrementsReplica pg_stat_activity for active queries older than the streaming delay window
Hot standby feedback bloatGradual lag growth; high dead tuple count on primaryhot_standby_feedback setting on replica; primary pg_stat_user_tables.n_dead_tup
Forgotten replication slotPrimary pg_wal directory grows; disk pressure on primarypg_replication_slots for inactive slots with a stale restart_lsn
Synchronous standby mismatchCommits hang on primary; throughput collapsesynchronous_standby_names versus actual connected standbys in pg_stat_replication

Quick checks

Run these safe, read-only checks on the primary and replica to characterize lag without changing state.

-- Decompose lag per replica on the primary
SELECT client_addr, write_lag, flush_lag, replay_lag
FROM pg_stat_replication;
-- Compute raw byte lag when interval columns are not available
SELECT client_addr,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
       pg_wal_lsn_diff(sent_lsn, flush_lsn) AS flush_lag_bytes,
       pg_wal_lsn_diff(sent_lsn, write_lsn) AS write_lag_bytes
FROM pg_stat_replication;
-- Detect recovery conflicts on the replica
SELECT datname, conflicts, confl_lock, confl_snapshot
FROM pg_stat_database
WHERE conflicts > 0;
-- Find long-running queries on the replica that may block replay
SELECT pid, usename, query_start, query
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < NOW() - INTERVAL '30 seconds';
-- Check replication slot health on the primary
SELECT slot_name, active, restart_lsn,
       pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_wal_bytes
FROM pg_replication_slots;
-- Verify hot standby feedback on the replica
SHOW hot_standby_feedback;
-- Check replay delay threshold on the replica
SHOW max_standby_streaming_delay;
-- Check for tables with high bloat on the primary when hot_standby_feedback is enabled
SELECT schemaname, relname, n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / nullif(n_live_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 10;
# Check replica disk latency during apply
iostat -xz 1 5
# Measure network round-trip between primary and replica
ping -c 10 <replica_ip>

How to diagnose it

  1. On the primary, run the pg_stat_replication lag query. Determine which component dominates:

    • write_lag points to network or WAL sender delay.
    • flush_lag points to replica disk write speed.
    • replay_lag points to apply slowness.
  2. If replay_lag is dominant, check the replica’s pg_stat_database.conflicts. Non-zero values mean queries were canceled because they conflicted with WAL replay. Inspect pg_stat_activity on the replica for active queries holding locks that block replay. Compare query duration to max_standby_streaming_delay; queries exceeding this are candidates for cancellation.

  3. If write_lag is dominant, check whether all replicas show similar growth. If yes, the primary’s WAL generation likely exceeds network capacity or the WAL sender process is delayed. Check network bandwidth and latency. If only one replica is affected, inspect its network path and connection state.

  4. Check pg_replication_slots on the primary. An inactive slot with a stale restart_lsn retains WAL and can create primary-side disk pressure that indirectly throttles WAL archiving and streaming.

  5. On the replica, correlate replay_lag with OS-level disk metrics. If iostat shows high await and queue depths during lag spikes, the replica cannot write or read fast enough to keep up with replay.

  6. Check hot_standby_feedback on the replica. If enabled, verify whether primary table bloat is growing because vacuum is being held back. Bloat increases index scan cost and can trigger autovacuum storms that consume CPU and I/O, further degrading replay.

flowchart TD
    A[Replication lag growing] --> B{Which lag column dominates?}
    B -->|write_lag high| C[Network bottleneck or WAL sender delay]
    B -->|flush_lag high| D[Replica disk write slow]
    B -->|replay_lag high| E{Recovery conflicts on replica?}
    E -->|conflicts > 0| F[Long query blocking replay or hot_standby_feedback]
    E -->|conflicts = 0| G[Replica disk or CPU saturated]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pg_stat_replication.replay_lagFailover readiness and RPO reality> 30 seconds on async replicas
pg_stat_replication.write_lagNetwork or WAL sender healthGrowing while flush and replay remain flat
pg_stat_replication.flush_lagReplica disk write speed> 1 second sustained
pg_stat_database.conflictsQuery cancellations from replay conflictsAny non-zero value
pg_replication_slots.retained_wal_bytesRisk of primary disk exhaustion from WAL retentionInactive slot retaining more than 1 GB of WAL
pg_stat_activity.query_start on replicaReplay blocking by long queriesActive query duration approaching max_standby_streaming_delay
Primary pg_stat_user_tables.n_dead_tupBloat from deferred vacuum when hot_standby_feedback is onDead tuple ratio > 20% on active tables

Fixes

Network bottleneck

Increase bandwidth between primary and replicas, or establish a dedicated replication network segment. If WAN replication is required, reduce bulk write volume on the primary. Tradeoff: network upgrades add infrastructure cost; batching writes may increase latency for individual transactions.

Replica disk saturation

Scale the replica’s storage IOPS or migrate to faster disk. If the replica uses network-attached storage, check for burst-balance exhaustion or cross-tenant noise. Reduce read load on the affected replica by moving long-running reports to a dedicated analytics replica or running them during off-peak windows. Tradeoff: faster storage increases cost; shifting queries may overload another replica.

Long-running queries blocking replay

Terminate offending queries on the replica if they are non-critical.

-- WARNING: Destructive. Terminates a running query.
SELECT pg_terminate_backend(pid);

Set statement_timeout and idle_in_transaction_session_timeout for roles that connect to replicas. If the workload requires long reads, enable hot_standby_feedback = on to defer primary vacuum rather than canceling queries. Lower max_standby_streaming_delay to force earlier cancellation when you prioritize replication over queries. Tradeoff: terminating queries aborts reports; hot_standby_feedback causes table bloat on the primary that must be managed with aggressive autovacuum tuning.

Forgotten replication slot

Warning: Dropping a slot is destructive. The consumer will require a full re-sync when it returns, and the slot holds back WAL truncation on the primary.

Drop the slot only if the consumer is permanently gone.

-- WARNING: Destructive. Only run if the consumer is permanently gone.
SELECT pg_drop_replication_slot('slot_name');

If the consumer is temporarily offline, provision additional primary disk space until it recovers, then monitor catch-up time.

Synchronous standby mismatch

If synchronous_standby_names names a standby that is not connected, the primary blocks commits. Update the parameter to match currently healthy standbys and reload. If you require zero data loss, do not reduce the count below the number of healthy sync standbys you actually have; instead, investigate why the standby disconnected. Tradeoff: you temporarily operate in asynchronous mode, which increases RPO until the synchronous standby returns.

Prevention

  • Alert at 50% of RPO. If your RPO is 60 seconds, page when replay_lag exceeds 30 seconds. Lag discovered during failover is too late.

  • Enforce query timeouts on replicas. Reporting and ETL connections should have statement_timeout values well below max_standby_streaming_delay.

  • Monitor replication slots as closely as replication lag. An inactive slot is a disk-full incident waiting to happen.

  • Validate replica disk I/O headroom during peak load. Replay is write-heavy; a replica sized for reads may saturate its disk during high primary write throughput.

  • Review replica query patterns quarterly. Introducing new reporting queries to a replica without timeout guardrails is a common cause of sudden lag spikes.

  • Run failover drills monthly. Lag is only a problem when you fail over. Drills expose hidden lag and validate your RPO assumptions.

  • Tune primary autovacuum aggressively when using hot_standby_feedback. The feedback prevents vacuum from reclaiming tuples the replica needs, so the primary must clean more eagerly to avoid bloat that slows everything down.

How Netdata helps

  • Charts for pg_stat_replication.write_lag, flush_lag, and replay_lag decompose lag without manual SQL polling.

  • Alerts on replay_lag relative to your RPO catch growth before failover.

  • Correlation of replica pg_stat_database.conflicts with CPU and disk metrics distinguishes replay blocking from resource saturation.

  • Tracking pg_replication_slots and WAL volume on the primary flags inactive slots that retain WAL.

  • Dashboards showing primary WAL generation rate against replica apply rate surface write/replay mismatches.