PostgreSQL out of memory: OOM killer, shared_buffers, and work_mem

Your PostgreSQL primary restarts without warning, or individual backends vanish from the process list. The kernel log shows Out of Memory: Killed process 12345 (postgres). Existing connections may survive, but new connections fail until the postmaster recovers. This is a memory accounting mismatch between Linux overcommit, PostgreSQL shared and private memory allocation, and how you size shared_buffers and work_mem.

Inside Kubernetes or containers, the symptom is identical but the mechanism differs: cgroup v2 memory.max triggers an immediate SIGKILL with no ENOMEM grace period. In both cases, stop guessing at memory limits and start budgeting.

What this means

PostgreSQL splits memory into shared and private. Shared memory, dominated by shared_buffers, is allocated once at startup and mapped into every backend. Private memory is per-process: each backend consumes work_mem for every sort, hash join, and hash aggregate it runs concurrently. Autovacuum workers use maintenance_work_mem each, unless autovacuum_work_mem is set. Temp buffers and prepared-transaction overhead add per-connection weight.

When shared memory plus peak private memory exceeds what the kernel grants, the OOM killer ranks processes by oom_score and sends SIGKILL. If the postmaster dies, the cluster restarts. If a backend dies, the application sees a dropped connection.

flowchart TD
    A[Total RAM + Swap] --> B[shared_buffers
allocated at startup] A --> C[Per-backend private memory] C --> D[work_mem x concurrent operators] C --> E[temp_buffers per connection] A --> F[Autovacuum workers] F --> G[maintenance_work_mem
or autovacuum_work_mem] B --> H{Sum exceeds
kernel limit?} D --> H E --> H G --> H H -->|Yes| I[OOM killer targets
highest oom_score] H -->|No| J[System stable]

Common causes

CauseWhat it looks likeFirst thing to check
shared_buffers oversubscribedOOM soon after startup or during daily peaks; postmaster killedshared_buffers vs total RAM minus headroom for backends
work_mem multiplied by concurrent operatorsSudden backend death during large sorts, hash joins, or parallel queries; “Cannot allocate memory” in logsEXPLAIN output for Memory nodes; pg_stat_statements around incident time
Connection storm without poolingGradual RSS growth across many backends; system-wide OOMpg_stat_activity count vs max_connections; temp_buffers usage
Autovacuum worker pile-upMemory pressure during heavy churn table maintenanceautovacuum_max_workers x maintenance_work_mem or autovacuum_work_mem
Container cgroup hard limitImmediate SIGKILL at exact memory limit; host dmesg silentContainer memory limit and cgroup memory.max vs PostgreSQL budget

Quick checks

# Confirm the Linux OOM killer fired
dmesg -T | grep -i "killed process.*postgres"

# Inspect PostgreSQL memory parameters
psql -c "SHOW shared_buffers; SHOW work_mem; SHOW maintenance_work_mem;"
psql -c "SHOW max_connections; SHOW autovacuum_max_workers; SHOW temp_buffers;"

# Count current backends and sum RSS
psql -c "SELECT state, COUNT(*) FROM pg_stat_activity GROUP BY state;"
ps -o pid,rss,comm -p $(pgrep postgres) | awk 'NR>1 {sum+=$2} END {print "Total RSS KB:", sum}'

# Check kernel overcommit behavior
sysctl vm.overcommit_memory

# Check postmaster OOM score adjustment
cat /proc/$(pgrep -o postgres)/oom_score_adj

How to diagnose it

  1. Confirm OOM vs ENOMEM. Check dmesg, journalctl -k, or container status for OOMKilled. PostgreSQL logs may show FATAL: out of memory or the connection may simply drop if SIGKILL arrived first.
  2. Build a memory ceiling. Start with shared_buffers. Add autovacuum_max_workers x COALESCE(autovacuum_work_mem, maintenance_work_mem) for worst-case autovacuum. Add a realistic allowance for concurrent query operators (work_mem each) and per-connection overhead. The total must leave headroom for the OS page cache and other processes; if your PostgreSQL budget exceeds ~80% of RAM, you are overcommitted.
  3. Find the query. Query pg_stat_statements for entries with high mean_exec_time or stddev_exec_time around the incident window. Run EXPLAIN (ANALYZE, BUFFERS) on suspects and inspect Memory: lines for hash or sort nodes. Warning: EXPLAIN ANALYZE executes the query. Run it during low traffic or against a restored backup.
  4. Count the multiplier. work_mem is per operator, not per query. A complex analytical query with four hash joins and two sorts uses six times work_mem (or work_mem x hash_mem_multiplier for hashes on PostgreSQL 13+), multiplied again by any parallel_workers. This can exceed a naive connections x work_mem estimate by an order of magnitude.
  5. Inspect the container boundary. If the workload runs under cgroup v2, memory.max is a hard ceiling that kills the highest-scoring process immediately when breached. There is no ENOMEM equivalent inside the cgroup. Check cat /sys/fs/cgroup/memory.max or equivalent.
  6. Check autovacuum pressure. During high churn, multiple autovacuum workers can spawn simultaneously. If autovacuum_work_mem is unset, each worker inherits maintenance_work_mem, which defaults to 64 MB. Four workers equal 256 MB of concurrent allocation.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
System available memoryOOM fires when RAM + swap are exhaustedAvailable < 10% of total RAM
PostgreSQL backend RSSPrivate memory growth per processBackends growing well beyond idle baselines
Active connectionsMultiplier for private memory and temp spaceActive count > 70% of max_connections without a pooler
pg_stat_database.temp_bytesSpills indicate heavy sort/hash demand or undersized work_memSustained rapid growth
pg_stat_statements stddev timeReveals queries with unstable memory consumptionstddev_exec_time spikes on queries with hash or sort nodes
Autovacuum active workersEach worker holds maintenance memoryMultiple workers active during memory pressure windows

Fixes

Bare metal and VMs: use strict overcommit and protect the postmaster

Set vm.overcommit_memory = 2 and size vm.overcommit_ratio (or vm.overcommit_kbytes) so the commit limit covers all system allocations. This forces the kernel to return ENOMEM instead of sending SIGKILL. The tradeoff is that you must account for every committed allocation; if the limit is too low, legitimate requests fail system-wide. PostgreSQL backends can then fail the specific query rather than taking down the postmaster.

Protect the postmaster by setting OOMScoreAdjust=-1000 in the systemd unit. Alternatively, export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj and PG_OOM_ADJUST_VALUE=0 before starting postgres so child backends revert to a neutral score while the postmaster stays immune. The tradeoff is that the OOM killer selects another process, which may still disrupt a running query.

Right-size shared_buffers

If shared_buffers exceeds ~40% of RAM, reduce it. For OLTP, start at 25% of system memory. For OLAP workloads, keep it at or below 25% because the OS page cache handles sequential scans more efficiently. The tradeoff is slightly higher latency for hot pages that no longer fit in PostgreSQL’s dedicated cache, but this is preferable to an OOM-induced restart.

Cap per-backend memory without crippling queries

Lower work_mem globally to a conservative 4-16 MB. For specific heavy queries, raise it with SET LOCAL work_mem = '64MB' inside the transaction instead of inflating the global default. On PostgreSQL 13 and newer, prefer raising hash_mem_multiplier over raising work_mem if hash operations are spilling. This isolates extra allocation to hash workloads without inflating sort budgets.

Set autovacuum_work_mem independently from maintenance_work_mem so vacuum workers do not each inherit a large global value. The tradeoff is that aggressive vacuum on very large tables may take longer with a lower memory cap, but this prevents background maintenance from triggering system-wide OOM.

Stop connection bloat

Deploy PgBouncer in transaction mode and lower max_connections to roughly CPU count x 2-3, plus replication and superuser reservations. Do not raise max_connections to absorb connection storms; each backend increases private memory pressure. The tradeoff is that session-scoped features such as advisory locks, temporary tables, and prepared statements across transactions require session-mode pooling or direct connections.

Container and Kubernetes specifics

Accept that vm.overcommit_memory = 2 is evaluated at the host level and does not prevent cgroup-level OOM kills. If your container orchestrator supports node-level swap, enable it. Swap extends the memory envelope and defers OOM kills until both RAM and swap are exhausted.

Increase the container memory limit to at least your calculated PostgreSQL budget plus 15% headroom for the operating system and monitoring agents. The tradeoff is higher node utilization, but this is cheaper than an unplanned restart.

Prevention

  • Calculate the memory budget before deploying configuration changes. Keep the sum of shared_buffers, concurrent operator memory, and autovacuum worker memory below ~80% of available RAM.
  • Monitor pg_stat_database.temp_bytes as an early proxy for queries approaching memory limits.
  • Test new queries with EXPLAIN (ANALYZE, BUFFERS) in staging. Inspect Memory: usage in hash and sort nodes.
  • Review per-table autovacuum settings so high-churn tables do not spawn excessive concurrent workers.

How Netdata helps

  • Correlates system memory pressure (mem.available, Used RAM) with PostgreSQL connection counts and active backends to highlight when private memory multipliers are the culprit.
  • Surfaces kernel OOM kill events alongside PostgreSQL process metrics so you can distinguish host-level OOM from cgroup memory.max kills in containers.
  • Tracks pg_stat_database.temp_bytes to flag queries that are spilling to disk before they trigger an OOM.
  • Monitors autovacuum worker count and connection utilization to catch memory multiplier spikes early.