PostgreSQL replication lag: detection, diagnosis, and fixes
Replication lag is the distance between the last WAL record generated on the primary and the last record applied on a replica. In asynchronous streaming replication, a few seconds of lag is normal. When lag grows without bound, your recovery point objective becomes fiction.
Lag often grows silently. Replication processes stay connected, WAL streams flow, and uptime checks stay green while the byte gap creeps from megabytes to gigabytes. Promoting a replica that is hours behind destroys the consistency your application assumes.
This guide covers physical streaming replication lag. It shows how to decompose lag into network, disk, and apply components using pg_stat_replication, how to distinguish a slow network from a blocked replica, and how to fix the root cause without restarting PostgreSQL.
What this means
PostgreSQL appends every change to a WAL stream. The primary ships WAL to replica walreceiver processes; the replica applies records in strict LSN order. The primary exposes per-replica progress in pg_stat_replication via four LSNs: sent_lsn, write_lsn, flush_lsn, and replay_lsn. Subtracting them with pg_wal_lsn_diff yields exact byte distances, which is essential when the lag interval columns are NULL or when you need byte-based thresholds.
write_lag: time from WAL generation to the replica’s WAL receiver writing it.flush_lag: time to flush that WAL to replica disk.replay_lag: time for the startup process to apply the WAL so it is visible to queries.
These three intervals isolate network, disk, and apply bottlenecks.
On the replica, pg_last_xact_replay_timestamp() returns the commit timestamp of the last replayed transaction. now() - pg_last_xact_replay_timestamp() gives a coarse time-based estimate, but it stalls during quiet periods when the primary commits nothing.
Lag is fundamentally a byte offset, but operators usually read it as time. A replica 30 seconds behind an async primary is often acceptable. If the primary fails, those 30 seconds of committed writes are gone.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Network bottleneck | write_lag grows on all replicas equally; primary WAL generation is high | Network utilization and latency between primary and replicas |
| Replica disk saturation | replay_lag grows while write_lag and flush_lag stay small | Replica disk I/O latency and throughput |
| Long-running query blocking replay | replay_lag spikes; pg_stat_database.conflicts increments | Replica pg_stat_activity for active queries older than the streaming delay window |
| Hot standby feedback bloat | Gradual lag growth; high dead tuple count on primary | hot_standby_feedback setting on replica; primary pg_stat_user_tables.n_dead_tup |
| Forgotten replication slot | Primary pg_wal directory grows; disk pressure on primary | pg_replication_slots for inactive slots with a stale restart_lsn |
| Synchronous standby mismatch | Commits hang on primary; throughput collapse | synchronous_standby_names versus actual connected standbys in pg_stat_replication |
Quick checks
Run these safe, read-only checks on the primary and replica to characterize lag without changing state.
-- Decompose lag per replica on the primary
SELECT client_addr, write_lag, flush_lag, replay_lag
FROM pg_stat_replication;
-- Compute raw byte lag when interval columns are not available
SELECT client_addr,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
pg_wal_lsn_diff(sent_lsn, flush_lsn) AS flush_lag_bytes,
pg_wal_lsn_diff(sent_lsn, write_lsn) AS write_lag_bytes
FROM pg_stat_replication;
-- Detect recovery conflicts on the replica
SELECT datname, conflicts, confl_lock, confl_snapshot
FROM pg_stat_database
WHERE conflicts > 0;
-- Find long-running queries on the replica that may block replay
SELECT pid, usename, query_start, query
FROM pg_stat_activity
WHERE state = 'active'
AND query_start < NOW() - INTERVAL '30 seconds';
-- Check replication slot health on the primary
SELECT slot_name, active, restart_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_wal_bytes
FROM pg_replication_slots;
-- Verify hot standby feedback on the replica
SHOW hot_standby_feedback;
-- Check replay delay threshold on the replica
SHOW max_standby_streaming_delay;
-- Check for tables with high bloat on the primary when hot_standby_feedback is enabled
SELECT schemaname, relname, n_dead_tup, n_live_tup,
round(n_dead_tup::numeric / nullif(n_live_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 10;
# Check replica disk latency during apply
iostat -xz 1 5
# Measure network round-trip between primary and replica
ping -c 10 <replica_ip>
How to diagnose it
On the primary, run the
pg_stat_replicationlag query. Determine which component dominates:write_lagpoints to network or WAL sender delay.flush_lagpoints to replica disk write speed.replay_lagpoints to apply slowness.
If
replay_lagis dominant, check the replica’spg_stat_database.conflicts. Non-zero values mean queries were canceled because they conflicted with WAL replay. Inspectpg_stat_activityon the replica for active queries holding locks that block replay. Compare query duration tomax_standby_streaming_delay; queries exceeding this are candidates for cancellation.If
write_lagis dominant, check whether all replicas show similar growth. If yes, the primary’s WAL generation likely exceeds network capacity or the WAL sender process is delayed. Check network bandwidth and latency. If only one replica is affected, inspect its network path and connection state.Check
pg_replication_slotson the primary. An inactive slot with a stalerestart_lsnretains WAL and can create primary-side disk pressure that indirectly throttles WAL archiving and streaming.On the replica, correlate
replay_lagwith OS-level disk metrics. Ifiostatshows highawaitand queue depths during lag spikes, the replica cannot write or read fast enough to keep up with replay.Check
hot_standby_feedbackon the replica. If enabled, verify whether primary table bloat is growing because vacuum is being held back. Bloat increases index scan cost and can trigger autovacuum storms that consume CPU and I/O, further degrading replay.
flowchart TD
A[Replication lag growing] --> B{Which lag column dominates?}
B -->|write_lag high| C[Network bottleneck or WAL sender delay]
B -->|flush_lag high| D[Replica disk write slow]
B -->|replay_lag high| E{Recovery conflicts on replica?}
E -->|conflicts > 0| F[Long query blocking replay or hot_standby_feedback]
E -->|conflicts = 0| G[Replica disk or CPU saturated]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pg_stat_replication.replay_lag | Failover readiness and RPO reality | > 30 seconds on async replicas |
pg_stat_replication.write_lag | Network or WAL sender health | Growing while flush and replay remain flat |
pg_stat_replication.flush_lag | Replica disk write speed | > 1 second sustained |
pg_stat_database.conflicts | Query cancellations from replay conflicts | Any non-zero value |
pg_replication_slots.retained_wal_bytes | Risk of primary disk exhaustion from WAL retention | Inactive slot retaining more than 1 GB of WAL |
pg_stat_activity.query_start on replica | Replay blocking by long queries | Active query duration approaching max_standby_streaming_delay |
Primary pg_stat_user_tables.n_dead_tup | Bloat from deferred vacuum when hot_standby_feedback is on | Dead tuple ratio > 20% on active tables |
Fixes
Network bottleneck
Increase bandwidth between primary and replicas, or establish a dedicated replication network segment. If WAN replication is required, reduce bulk write volume on the primary. Tradeoff: network upgrades add infrastructure cost; batching writes may increase latency for individual transactions.
Replica disk saturation
Scale the replica’s storage IOPS or migrate to faster disk. If the replica uses network-attached storage, check for burst-balance exhaustion or cross-tenant noise. Reduce read load on the affected replica by moving long-running reports to a dedicated analytics replica or running them during off-peak windows. Tradeoff: faster storage increases cost; shifting queries may overload another replica.
Long-running queries blocking replay
Terminate offending queries on the replica if they are non-critical.
-- WARNING: Destructive. Terminates a running query.
SELECT pg_terminate_backend(pid);
Set statement_timeout and idle_in_transaction_session_timeout for roles that connect to replicas. If the workload requires long reads, enable hot_standby_feedback = on to defer primary vacuum rather than canceling queries. Lower max_standby_streaming_delay to force earlier cancellation when you prioritize replication over queries. Tradeoff: terminating queries aborts reports; hot_standby_feedback causes table bloat on the primary that must be managed with aggressive autovacuum tuning.
Forgotten replication slot
Warning: Dropping a slot is destructive. The consumer will require a full re-sync when it returns, and the slot holds back WAL truncation on the primary.
Drop the slot only if the consumer is permanently gone.
-- WARNING: Destructive. Only run if the consumer is permanently gone.
SELECT pg_drop_replication_slot('slot_name');
If the consumer is temporarily offline, provision additional primary disk space until it recovers, then monitor catch-up time.
Synchronous standby mismatch
If synchronous_standby_names names a standby that is not connected, the primary blocks commits. Update the parameter to match currently healthy standbys and reload. If you require zero data loss, do not reduce the count below the number of healthy sync standbys you actually have; instead, investigate why the standby disconnected. Tradeoff: you temporarily operate in asynchronous mode, which increases RPO until the synchronous standby returns.
Prevention
Alert at 50% of RPO. If your RPO is 60 seconds, page when
replay_lagexceeds 30 seconds. Lag discovered during failover is too late.Enforce query timeouts on replicas. Reporting and ETL connections should have
statement_timeoutvalues well belowmax_standby_streaming_delay.Monitor replication slots as closely as replication lag. An inactive slot is a disk-full incident waiting to happen.
Validate replica disk I/O headroom during peak load. Replay is write-heavy; a replica sized for reads may saturate its disk during high primary write throughput.
Review replica query patterns quarterly. Introducing new reporting queries to a replica without timeout guardrails is a common cause of sudden lag spikes.
Run failover drills monthly. Lag is only a problem when you fail over. Drills expose hidden lag and validate your RPO assumptions.
Tune primary autovacuum aggressively when using
hot_standby_feedback. The feedback prevents vacuum from reclaiming tuples the replica needs, so the primary must clean more eagerly to avoid bloat that slows everything down.
How Netdata helps
Charts for
pg_stat_replication.write_lag,flush_lag, andreplay_lagdecompose lag without manual SQL polling.Alerts on
replay_lagrelative to your RPO catch growth before failover.Correlation of replica
pg_stat_database.conflictswith CPU and disk metrics distinguishes replay blocking from resource saturation.Tracking
pg_replication_slotsand WAL volume on the primary flags inactive slots that retain WAL.Dashboards showing primary WAL generation rate against replica apply rate surface write/replay mismatches.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns
- PostgreSQL autovacuum blocked by long-running transaction: detection and fix
- PostgreSQL autovacuum not running: detection, causes, and fixes
- PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads
- PostgreSQL blocking queries: finding the root blocker in a lock cascade
- PostgreSQL checkpoint storms: detection, causes, and tuning
- PostgreSQL: checkpoints are occurring too frequently – what to tune
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis
- PostgreSQL: database is not accepting commands to avoid wraparound data loss
- PostgreSQL dead tuples piling up: why autovacuum can’t keep up






