PostgreSQL checkpoint storms: detection, causes, and tuning
If query latency spikes and TPS drops coincide with your checkpoint_timeout schedule or follow a large bulk load, you are likely hitting a checkpoint storm. PostgreSQL checkpoints guarantee that dirty buffers are on disk. In a healthy system, timed checkpoints occur at checkpoint_timeout intervals and the background writer spreads that I/O. A storm happens when WAL generation hits max_wal_size before the timeout, forcing a requested checkpoint that flushes most dirty buffers at once, saturating disk bandwidth and stalling queries.
flowchart TD
A[Heavy write load or small max_wal_size] --> B[WAL fills before timeout]
B --> C[Forced checkpoint requested]
C --> D[Dirty buffer flush at full speed]
D --> E[Disk I/O saturation]
E --> F[Query latency spikes and TPS drops]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
Undersized max_wal_size | checkpoints_req rising steadily alongside checkpoints_timed; logs note WAL-driven checkpoints | Ratio of checkpoints_req to checkpoints_timed in pg_stat_bgwriter |
Short checkpoint_timeout with heavy writes | Predictable spikes on a fixed schedule; buffers_checkpoint dominates buffers_clean | pg_stat_bgwriter checkpoint timing and count |
| Bulk load exceeding WAL limits | Sudden spikes during COPY, CREATE INDEX, or REINDEX; cascading forced checkpoints | pg_stat_activity for large write operations and current WAL LSN growth |
| Slow storage or excessive dirty pages | High disk queue depth during flushes; backends doing their own fsync | OS I/O metrics and buffers_backend_fsync in pg_stat_bgwriter |
Quick checks
Run these read-only queries on a live primary. Compare counters against your last statistics reset to judge trends.
# 1. Checkpoint counts. Rising checkpoints_req means WAL is driving checkpoints.
psql -c "SELECT checkpoints_timed, checkpoints_req FROM pg_stat_bgwriter;"
# 2. I/O volume and timing. Large buffers_checkpoint with high write/sync time signals a heavy flush.
psql -c "SELECT buffers_checkpoint, buffers_clean, buffers_backend, checkpoint_write_time, checkpoint_sync_time FROM pg_stat_bgwriter;"
# 3. Current settings.
psql -c "SELECT current_setting('checkpoint_timeout') AS timeout, current_setting('max_wal_size') AS max_wal, current_setting('checkpoint_completion_target') AS target;"
# 4. Estimate WAL generation rate over 60 seconds.
START=$(psql -Atc "SELECT pg_current_wal_lsn()") && sleep 60 && psql -c "SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '$START') AS wal_bytes"
# 5. OS disk pressure during spikes.
iostat -xz 1 5
# 6. Backends forced to write their own dirty pages.
psql -c "SELECT buffers_backend_fsync FROM pg_stat_bgwriter;"
How to diagnose it
- Correlate latency spikes with checkpoint events. Compare application P99 latency with checkpoint counts from
pg_stat_bgwriteror PostgreSQL logs. If the spikes align with checkpoint intervals, the storm is confirmed. - Determine if checkpoints are forced. Query
pg_stat_bgwriter. Ifcheckpoints_reqis more than 10% ofcheckpoints_timed,max_wal_sizeis likely too small for your WAL generation rate. - Measure checkpoint duration. Sample
checkpoint_write_timeandcheckpoint_sync_timeover an interval, then divide by the number of new checkpoints in that period. Averages over 30 seconds indicate the I/O system cannot absorb the flush. - Check dirty page distribution. If
buffers_checkpointis consistently greater than 70% of total buffers written (buffers_checkpoint + buffers_clean + buffers_backend), the background writer is not keeping up. - Validate storage throughput. Run
iostatduring a spike. If disk latency jumps above 20 ms and queue depth rises, storage is the bottleneck. - Identify the write source. Query
pg_stat_activityfor activeCOPY,CREATE INDEX,REINDEX, or largeINSERT ... SELECTstatements. If the spike has passed, check PostgreSQL logs for recent completion of large write operations.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
checkpoints_req / checkpoints_timed | Forced checkpoints mean WAL is filling faster than the timeout interval | Requested checkpoints exceed 10% of timed |
checkpoint_write_time + checkpoint_sync_time | Total duration of disk flush and fsync | Per-checkpoint average over 30 seconds |
buffers_checkpoint / total_writes | Fraction of dirty pages written by checkpoints rather than bgwriter or backends | Consistently above 70% |
buffers_backend_fsync | Backends forced to fsync because bgwriter or checkpoint could not keep up | Non-zero or rising count |
WAL generation rate vs max_wal_size | Whether WAL fills before the timeout | LSN advances faster than max_wal_size can absorb within checkpoint_timeout |
Fixes
Increase max_wal_size
If checkpoints_req is high, raise max_wal_size to absorb write bursts. Start with 4 GB for many production workloads and scale upward for heavy write loads. Monitor disk usage; WAL segments are recycled, but the ceiling must fit your peak burst.
Tradeoff: more disk space consumed by WAL segments, but far fewer forced checkpoints.
Requires a configuration reload.
Lengthen checkpoint_timeout
If spikes are frequent because the interval is too short, increase checkpoint_timeout from the default 5 minutes to 15-30 minutes. This spreads the same dirty-page workload over fewer events.
Tradeoff: more dirty pages accumulate between checkpoints, but each checkpoint is less frequent and easier to smooth.
Requires a configuration reload.
Raise checkpoint_completion_target
Set checkpoint_completion_target to 0.9 so the background writer spreads checkpoint I/O over 90% of the interval. The default on many versions is 0.5. With 0.9, PostgreSQL starts writing dirty pages early and aims to finish 90% of the way through the interval, turning a spike into a sustained, low-rate background task.
Tradeoff: checkpoint writes start earlier and run continuously, slightly increasing baseline I/O but eliminating spikes.
Requires a configuration reload.
Break bulk operations into batches
A single COPY or CREATE INDEX can generate enough WAL to force an immediate checkpoint. Commit between batches so WAL is flushed incrementally rather than forcing a single massive checkpoint.
Tradeoff: slightly more complex application logic, but it avoids cascading forced checkpoints.
Improve storage throughput
If checkpoint duration remains high after tuning parameters, the storage layer is undersized. For new clusters, place WAL on a dedicated volume with low-latency disks to reduce contention between sequential WAL writes and random data page access. For existing clusters, migrating pg_wal requires a maintenance window.
Tradeoff: infrastructure cost.
Prevention
- Monitor the checkpoint ratio. Alert when
checkpoints_reqexceeds 10% ofcheckpoints_timed. - Size
max_wal_sizefor peaks, not averages. A bulk load or ETL job should not force a checkpoint. - Keep
checkpoint_completion_targetat 0.9 unless you have verified that your storage handles aggressive writeback. - Schedule bulk loads during low-traffic windows.
- Separate WAL and data volumes at initialization. On existing clusters, migration requires downtime.
How Netdata helps
- Correlate PostgreSQL checkpoint metrics from
pg_stat_bgwriterwith per-second disk I/O latency and utilization. If checkpoint counts rise at the same time as disk await spikes, storage saturation is confirmed. - Alert when the ratio of requested to timed checkpoints exceeds your threshold, catching undersized
max_wal_sizebefore it causes outages. - Visualize WAL generation rate alongside replication lag to distinguish primary write pressure from replica apply delays.
- Track buffer cache efficiency and backend fsync counts to spot background writer stress before it becomes a storm.
Related guides
- How PostgreSQL actually works in production: a mental model for operators: /guides/postgres/how-postgres-works-in-production/
- PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns: /guides/postgres/postgres-alter-table-blocked/
- PostgreSQL autovacuum blocked by long-running transaction: detection and fix: /guides/postgres/postgres-autovacuum-blocked-by-long-transaction/
- PostgreSQL autovacuum not running: detection, causes, and fixes: /guides/postgres/postgres-autovacuum-not-running/
- PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads: /guides/postgres/postgres-autovacuum-tuning/
- PostgreSQL blocking queries: finding the root blocker in a lock cascade: /guides/postgres/postgres-blocking-queries/
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention: /guides/postgres/postgres-connection-exhaustion/
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis: /guides/postgres/postgres-connection-refused/
- PostgreSQL dead tuples piling up: why autovacuum can’t keep up: /guides/postgres/postgres-dead-tuples-piling-up/
- PostgreSQL deadlock detected: how to diagnose and prevent deadlocks: /guides/postgres/postgres-deadlock-detected/
- PostgreSQL frozen XID monitoring: catching wraparound 6 months early: /guides/postgres/postgres-frozen-xid-monitoring/
- PostgreSQL idle in transaction: detecting and killing zombie sessions: /guides/postgres/postgres-idle-in-transaction/






