PostgreSQL temp file explosion: work_mem tuning and disk pressure
Disk usage on your PostgreSQL primary is climbing by gigabytes per hour. Query latency has doubled, iowait is spiking, and you have not deployed new code. The database is writing large temporary files to disk. In-memory operations such as sorts, hash joins, and materialization are spilling because they exceed the per-operation memory budget. The root cause is almost always a mismatch between work_mem and actual query memory demand, compounded by the fact that PostgreSQL applies this limit per plan node, not per query or session.
What this means
PostgreSQL executes queries using a tree of plan nodes. When a node needs to sort, hash, or materialize rows, it allocates memory up to work_mem. If the input exceeds that limit, the executor spills overflow to temporary files on disk. Because work_mem is per operation, a single query with four hash joins and two sorts can consume six times work_mem in that backend alone. With parallel query active, each worker gets its own allocation, multiplying usage further. Many backends spilling simultaneously can saturate disk I/O, trigger checkpoint pressure, and even cause the Linux OOM killer to terminate the postmaster if work_mem is set too aggressively.
flowchart TD
A[Sort or hash operation] --> B{Memory exceeds work_mem?}
B -->|Yes| C[Spill to temp file]
B -->|No| D[In-memory execution]
C --> E[Disk I/O increases]
E --> F[Query latency spikes]
C --> G[Multiple ops per query]
G --> H[Memory usage multiplies]
H --> I[OOM killer risk]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| work_mem too low for query complexity | pg_stat_database.temp_bytes growing rapidly; long-running queries with sorts or hashes | temp_bytes and temp_files deltas in pg_stat_database |
| work_mem set too high causing OOM | Postmaster killed by OOM killer in dmesg; cluster restarts; temp files appear after recovery | System OOM logs and pg_stat_activity backend count |
| Parallel query multiplying memory usage | Spikes correlate with high CPU; plans show Gather nodes | max_parallel_workers_per_gather and active worker count |
| Missing indexes forcing large sorts | Sequential scans on large tables paired with Sort nodes | pg_stat_user_tables.seq_scan vs. idx_scan |
| Long-running analytical queries on the primary | Sudden temp growth during batch jobs or reports | pg_stat_activity for active queries with long query_start |
Quick checks
# Check cumulative temp file volume per database
psql -c "SELECT datname, temp_files, pg_size_pretty(temp_bytes) FROM pg_stat_database WHERE temp_bytes > 0 ORDER BY temp_bytes DESC;"
# Find long-running active queries that may be spilling
psql -c "SELECT pid, usename, state, query_start, LEFT(query, 100) AS query FROM pg_stat_activity WHERE state = 'active' AND query_start < NOW() - INTERVAL '2 minutes';"
# Inspect memory and parallelism settings
psql -c "SELECT name, setting, unit FROM pg_settings WHERE name IN ('work_mem', 'max_connections', 'max_parallel_workers_per_gather');"
# Identify large tables using sequential scans
psql -c "SELECT schemaname, relname, seq_scan, idx_scan, n_live_tup FROM pg_stat_user_tables WHERE n_live_tup > 100000 ORDER BY seq_scan DESC LIMIT 10;"
# Check data volume disk usage
df -h "$(psql -Atc 'SHOW data_directory;')"
# Check for OOM kills (may require root)
dmesg | grep -i "out of memory"
How to diagnose it
- Confirm temp file volume is abnormal. Take two snapshots of
pg_stat_databasefive minutes apart and comparetemp_bytesandtemp_files. A sustained increase confirms active spilling. These counters reset on PostgreSQL crash or if you runpg_stat_reset(). - Identify contributing queries. If
pg_stat_statementsis enabled, sort bytotal_exec_timeand look for queries containing sorts, hashes, or window functions. Correlate execution times with the temp growth window. - Inspect query plans. Capture an
auto_explainlog for a representative query, or runEXPLAIN (ANALYZE, BUFFERS)inside a transaction that you roll back. Warning:EXPLAIN ANALYZEexecutes the query. Do not use it on DML (INSERT, UPDATE, DELETE) unless you understand the side effects. Even inside a rolled-back transaction, DML takes locks, fires triggers, and modifies rows. Look forSort Method: external merge Disk: ...or Hash nodeBatches: ...greater than one. Hightemp readandtemp writtenvalues in theBUFFERSoutput also confirm spill activity. - Check
work_memand parallelism. Verify the globalwork_memvalue and whether the query uses parallel workers. Each worker allocates the same per-operation budget independently. - Look for missing indexes. If
pg_stat_user_tablesshows highseq_scancounts on large tables involved in the slow queries, the planner may be sorting rows that an index could return pre-ordered. - Review OS memory pressure. Calculate approximate worst-case memory usage as
work_memmultiplied by typical plan nodes per query, active backends, and parallel workers (max_parallel_workers_per_gather + 1including the leader). If this approaches available RAM, OOM is the likely next failure.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| pg_stat_database.temp_bytes | Cumulative temp file write volume | Growing faster than baseline over 15 minutes |
| pg_stat_database.temp_files | Count of temp files created | Sustained increase indicating repeated spills |
| Disk utilization (data volume) | Temp files share disk with data and WAL | > 80% and climbing during peak load |
| pg_stat_activity long active queries | Spills occur during query execution | Active queries older than 60 seconds with sort or hash patterns |
| seq_scan / idx_scan ratio | Missing indexes force materialization and sort | High seq_scan on tables with > 100k rows |
| OOM kills (dmesg) | Excessive work_mem causes memory overcommit | Postmaster or backends killed by OOM killer |
Fixes
Right-size work_mem
Do not set work_mem based on RAM / max_connections. That calculation ignores the per-operation multiplier. A conservative global default is 4-16 MB. For known analytical roles, raise work_mem per role with ALTER ROLE ... SET work_mem = '64MB' rather than globally. Tradeoff: lower values increase disk spill and query latency; higher values risk OOM.
Add indexes to eliminate sorts
If plans show sequential scans followed by sorts on large tables, a covering index or partial index may let the planner return rows in the desired order without a separate sort node. Tradeoff: indexes slow writes and consume storage.
Reduce parallel worker multiplication
If the query plan shows a Gather node and temp spikes correlate with CPU saturation, lower max_parallel_workers_per_gather globally or per-query with SET max_parallel_workers_per_gather = 0. Tradeoff: queries run longer but use less aggregate memory per backend.
Isolate temp I/O
If temp file volume is large and sustained, route temp files to a dedicated tablespace using the temp_tablespaces setting. Place the tablespace on fast ephemeral storage to prevent temp I/O from competing with WAL and data file I/O on the same volume. Tradeoff: additional storage to manage; this does not reduce total temp volume.
Recover from OOM
If the Linux OOM killer terminated the postmaster while temp files were active, work_mem is likely too high for the concurrency level. Reduce work_mem in postgresql.conf, restart PostgreSQL, then reconnect applications. Verify with the memory formula in Prevention.
Prevention
- Log temp file creation. Set
log_temp_filesto a threshold matching yourwork_membudget so you catch spills early without flooding logs. - Weekly query review. Review
pg_stat_statementsfor new queries with hightotal_exec_timeand sort or hash patterns. - Memory budget math. Size
work_memso thatwork_memmultiplied by typical plan nodes per query, active backends, and parallel workers stays safely below available RAM. - Connection pooling. Use a pooler to limit active backends, which indirectly caps total memory usage.
- Query timeouts. Set
statement_timeoutandidle_in_transaction_session_timeoutto prevent runaway queries from holding temp files indefinitely.
How Netdata helps
Netdata correlates pg_stat_database.temp_bytes and temp_files with disk I/O wait and utilization on the same node. It alerts on disk utilization above 80% to catch pressure before queries fail. It tracks pg_stat_activity active connection states and query start times to identify long-running operations that coincide with temp spikes. It monitors system memory usage and Linux OOM kill events alongside PostgreSQL backend counts, and surfaces pg_stat_user_tables sequential scan rates to flag missing index candidates that drive sorts.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns
- PostgreSQL autovacuum blocked by long-running transaction: detection and fix
- PostgreSQL autovacuum not running: detection, causes, and fixes
- PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads
- PostgreSQL blocking queries: finding the root blocker in a lock cascade
- PostgreSQL checkpoint storms: detection, causes, and tuning
- PostgreSQL: checkpoints are occurring too frequently – what to tune
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis
- PostgreSQL: database is not accepting commands to avoid wraparound data loss
- PostgreSQL dead tuples piling up: why autovacuum can’t keep up






