PostgreSQL synchronous_commit: durability vs throughput trade-offs

synchronous_commit moves the durability boundary between a client receiving COMMIT OK and the data actually surviving a crash. The default, on, flushes local WAL to disk. With synchronous replication enabled, it also waits for a standby to flush. The five values (off, local, on, remote_write, and remote_apply) trade latency against survival guarantees. The wrong choice either accepts unplanned data loss or strangles write throughput with network round-trips.

Use this guide when investigating a sudden throughput drop after enabling high availability, or when deciding whether remote_apply is necessary for your failover target.

What it is and why it matters

synchronous_commit is a GUC that controls how long the backend waits before returning success to the client after COMMIT. It does not change WAL content; it changes when the backend tells the client the work is done.

Unlike fsync = off, which risks database corruption on an OS crash, synchronous_commit = off only risks losing recently committed transactions. The database remains internally consistent after crash recovery because replay stops at the last flushed WAL record.

The parameter is scoped globally in postgresql.conf, per-database, per-user, per-session, and per-transaction. You do not have to run the entire instance in one mode.

How it works

Behavior splits into two regimes depending on whether synchronous_standby_names is configured.

When synchronous_standby_names is empty, the replication-aware settings collapse to local-only behavior. Only off is different.

off: Return immediately. The WAL writer flushes dirty WAL buffers later, governed by wal_writer_delay (default 200 ms). If the primary crashes before flush, the committed transactions are lost. Recovery replays only to the last flushed record. Utility commands (DROP TABLE, TRUNCATE, CLUSTER, REINDEX, PREPARE TRANSACTION) ignore this setting and always wait for a local flush.
local: Wait for local WAL flush. Survives a primary crash. Does not wait for any standby.
on, remote_write, remote_apply: With no standby configured, all three wait only for local WAL flush, identical to local.

When synchronous_standby_names is populated, the modes diverge across the replication stream.

on: Wait for the synchronous standby to receive and flush WAL to durable storage. Survives a primary crash and a standby OS crash.
remote_write: Wait for the standby to write WAL to its kernel page cache. Survives a standby PostgreSQL crash but not a standby OS crash. Many operators mistakenly treat this as fully durable.
remote_apply: Wait for the standby to receive, flush, and replay WAL so the committed data is visible to queries on the standby. Adds replay latency on top of network and flush latency.

You do not need one global mode. SET LOCAL synchronous_commit TO OFF inside a transaction block scopes the change to that transaction only. Session-level SET persists for the session. Database-level and user-level defaults via ALTER DATABASE and ALTER USER are overridable at the session or transaction level. Use this when a bulk load can tolerate loss but the rest of the workload cannot.

flowchart LR
    Client[Client COMMIT] --> Primary[Primary WAL]
    Primary --> off[synchronous_commit = off]
    Primary --> local[synchronous_commit = local]
    Primary --> on[synchronous_commit = on]
    Primary --> rw[synchronous_commit = remote_write]
    Primary --> ra[synchronous_commit = remote_apply]
    off --> R1[Return immediately
WAL flushed later]
    local --> R2[Return after
primary fsync]
    rw --> R3[Return after
standby OS write]
    on --> R4[Return after
standby fsync]
    ra --> R5[Return after
standby replay]

Where it shows up in production

Write-heavy OLTP. Moving from on to off removes commit-time fsync latency, often doubling throughput for small-write OLTP. The loss window is bounded by wal_writer_delay (default 200 ms), with a worst-case loss of three times that interval. Use this only for data where loss of the last few hundred milliseconds is acceptable, such as telemetry or ephemeral session state.

Synchronous replication for HA. Enabling remote_apply because “we cannot lose data” adds network RTT, standby fsync, and replay time to every commit. On a cross-region link, this can add tens of milliseconds and collapse throughput. If the synchronous standby fails, commits block until a synchronous standby reconnects or an operator reduces the durability requirement.

Per-transaction tuning. A large COPY or batch INSERT can use SET LOCAL synchronous_commit TO OFF to reduce commit overhead, while financial ledger writes in the same session use the default. DDL and critical transactions remain synchronous regardless.

Connection pooling interaction. A session-level SET in PgBouncer transaction mode does not survive across transactions. If the application relies on session-level overrides, use session-mode pooling or apply the setting inside each transaction block with SET LOCAL.

Immediate shutdown equivalence. pg_ctl stop -m immediate is an unclean shutdown equivalent to a crash. Any unflushed asynchronous commits are lost. Do not expect immediate shutdown to be clean.

Startup race condition. On older unpatched PostgreSQL versions , a brief window after primary startup allowed commits to bypass synchronous standby waits even when synchronous_standby_names was enabled. If you observe intermittent replication lag spikes right after a primary restart on older minors, this bug is a likely cause.

Tradeoffs and when to use it

Mode	Durability boundary	Typical latency cost	Appropriate use
`off`	None at commit time; WAL flushed later by WAL writer	Lowest	Transient data, caches, telemetry where sub-second loss is acceptable
`local`	Primary disk flush	Local fsync	Single-node OLTP; topologies where standby durability is handled separately
`on`	Standby disk flush	Network RTT + standby fsync	HA requiring standby-side disk flush
`remote_write`	Standby OS page cache	Network RTT + OS write	Lower latency than `on`, but does not survive standby OS crash
`remote_apply`	Standby query visibility	Network RTT + fsync + replay	Read-after-write consistency across nodes; failover with zero stale reads

local is rarely the right answer in a replication topology. Use it when you need local durability but deliberately do not want to wait for the standby, such as bulk-loading non-critical data onto the primary.

remote_write is frequently misunderstood. If the standby loses power, unflushed page-cache data is lost. Use on or remote_apply if you need durability against standby OS crashes.

remote_apply adds replay latency on top of on. If replay is slow, commit latency on the primary spikes proportionally. Only use remote_apply when you need queries on the standby to see the data immediately after the primary commits.

Signals to watch in production

Signal	Why it matters	Warning sign
`pg_stat_replication.write_lag` / `flush_lag` / `replay_lag`	Decomposes network vs. disk vs. apply bottlenecks	`replay_lag` growing under `remote_apply` indicates standby replay cannot keep pace
`pg_stat_replication.sync_state`	Confirms whether the expected standby is synchronous	Expected `sync` standby showing `async` after failover or restart
Transaction rate from `pg_stat_database.xact_commit`	Measures throughput impact of stricter modes	Sudden TPS drop after enabling synchronous replication
Mean execution time for write queries in `pg_stat_statements`	Reveals commit latency inflation	Latency spikes matching network RTT to the standby
`pg_stat_activity.wait_event` (`WALSync`, `SyncRep`)	Identifies backends blocked on commit durability	`WALSync` dominating indicates local fsync pressure; `SyncRep` indicates synchronous standby wait

Query these signals on the primary. pg_stat_replication shows one row per standby. pg_stat_activity filtered to wait_event IN ('WALSync', 'SyncRep') shows backends currently blocked by commit durability.

How Netdata helps

Correlate application commit latency spikes with pg_stat_replication.flush_lag and replay_lag to determine whether the standby or the network is the bottleneck.
Alert on replication lag crossing your RPO threshold to detect standby replay that is too slow for remote_apply, or to catch WAL retention risk.
Track pg_stat_database.xact_commit rate alongside query latency to quantify the throughput cost of switching from off to on or enabling remote replication.
Monitor WAL generation rate to detect when standby lag risks disk exhaustion from unreplicated WAL.
Surface pg_stat_replication.sync_state changes to catch standby demotions that silently reduce durability guarantees.

The Netdata solution

PostgreSQL monitoring with Netdata

Netdata monitors PostgreSQL with per-second metrics, pre-built dashboards, and ML-powered anomaly detection. Correlate connection saturation, lock waits, autovacuum progress, replication lag, and checkpoint I/O against the rest of your stack so you catch the incidents in these runbooks before they page anyone.

See PostgreSQL monitoring → Start monitoring free

PostgreSQL synchronous_commit: durability vs throughput trade-offs

PostgreSQL synchronous_commit: durability vs throughput trade-offs

What it is and why it matters

How it works

Where it shows up in production

Tradeoffs and when to use it

Signals to watch in production

How Netdata helps

Related guides

PostgreSQL monitoring with Netdata