PostgreSQL shared_buffers tuning: 25% of RAM and why that rule of thumb breaks

The PostgreSQL documentation suggests 25% of RAM as a starting value for shared_buffers on dedicated servers and warns that values above 40% rarely improve performance. In production, operators routinely use 25% as a default and then encounter checkpoint storms, OOM kills, or degraded analytical query performance. The rule breaks because PostgreSQL does not coordinate with the Linux kernel page cache. The same page can exist in both shared_buffers and the OS page cache, so an oversized PostgreSQL cache starves the kernel of memory needed for sequential scans, WAL, and temporary files.

This guidance covers self-managed PostgreSQL on Linux. Managed services such as AWS Aurora use different defaults and failure modes, noted where relevant.

What shared_buffers is and why it matters

shared_buffers is PostgreSQL’s primary page cache. The default of 128 MB is inadequate for production. Pages read from disk load into this cache, and backends read from it directly. When a backend modifies a page, the change is written to shared_buffers and the page is marked dirty. Dirty pages flush to disk via the checkpointer or background writer. Changing shared_buffers requires a server restart; a reload is not sufficient.

Because shared_buffers caches table and index pages, its hit ratio indicates memory health. In OLTP workloads, aim for a cache hit ratio above 99%. If the working set exceeds shared_buffers, backends read from disk, adding milliseconds to queries that should complete in microseconds. However, increasing shared_buffers is not free. Each page requires management overhead, and the checkpointer must eventually flush every dirty page.

Check the current value and hit ratio:

SELECT name, setting, unit FROM pg_settings WHERE name = 'shared_buffers';
SELECT blks_hit, blks_read,
       blks_hit::float/(blks_hit+blks_read) AS ratio
FROM pg_stat_database
WHERE datname = current_database();

A low ratio alone does not mandate a larger shared_buffers. It may indicate the working set exceeds total RAM.

How the dual-cache architecture works

PostgreSQL does not inspect the OS page cache before storing a block in shared_buffers, and the kernel does not inspect shared_buffers before caching a file page. The same 8 KB page routinely exists in both caches simultaneously. This double buffering means total memory consumption for a single page can reach 16 KB or more.

flowchart TD
    query["Backend query"]
    disk["Disk storage"]
    sb["shared_buffers"]
    pc["OS page cache"]
    dirty["Dirty pages"]
    chk["Checkpointer"]

    disk -->|read on miss| sb
    disk -->|also cached by kernel| pc
    sb -->|query hit| query
    sb -.->|same page duplicated| pc
    query -->|writes| dirty
    dirty -->|flush| chk
    chk -->|checkpoint I/O spike| disk

When PostgreSQL evicts a clean page from shared_buffers, the kernel may still hold it in the page cache. A shared_buffers miss does not always mean a physical disk read. The OS page cache acts as a second line of defense, particularly for sequential scans that PostgreSQL may not keep in shared_buffers.

The downside emerges when shared_buffers is set too high. On a dedicated database server, RAM is finite. An oversized shared_buffers leaves less memory for the kernel page cache, slowing large sequential scans and increasing disk I/O. It also leaves less headroom for backend memory, connection overhead, and temporary files.

Checkpoint behavior compounds the problem. The checkpointer flushes dirty shared_buffers pages to disk at regular intervals. A larger cache allows more dirty pages to accumulate, so each checkpoint writes a larger volume. If storage cannot absorb the burst, query latency spikes until the flush completes. On Linux, PostgreSQL’s shared memory is allocated via mmap and is not swappable. If the sum of shared_buffers, connection overhead, and work_mem approaches available RAM, the kernel may OOM-kill backends under pressure.

Where the 25% rule breaks in production

The 25% guideline assumes a moderate-size OLTP server where the working set fits comfortably in RAM and queries access pages randomly. This assumption fails in several common scenarios.

OLAP and sequential scan workloads. Analytical queries scan large data ranges. The OS page cache is optimized for this access pattern because it caches underlying file blocks efficiently and evicts them under pressure. PostgreSQL’s shared_buffers uses a clock sweep algorithm that is less effective for massive sequential scans. For OLAP, values at or below 25% often perform better because the OS cache handles the scan load.

Large-memory systems. On servers with 512 GB or more of RAM, 25% yields 128 GB of shared_buffers or more. Beyond roughly 128 GB, the overhead of managing PostgreSQL’s internal buffer mapping table can decrease performance. Operators on very large systems often cap shared_buffers at 128-256 GB and let the OS page cache absorb the remainder.

Managed services with aggressive defaults. AWS Aurora PostgreSQL is reported to default shared_buffers to a large percentage of RAM. Combined with high connection counts and per-operation work_mem allocations, this can trigger OOM kills. Reducing shared_buffers to 25-40% may resolve these incidents. Always verify current defaults in your specific Aurora version and instance class.

Checkpoint amplification. Larger shared_buffers means more dirty pages can accumulate between checkpoints. When a checkpoint triggers, the checkpointer must flush all dirty buffers. Without adequate I/O bandwidth or tuning, this creates a latency spike that stalls queries.

Memory-constrained deployments. On instances with 16 GB or less of RAM, setting shared_buffers to 25% leaves only 12 GB for the OS, connection overhead, and work_mem. If max_connections is raised without pooling, or if work_mem is tuned aggressively, the remaining memory exhausts quickly. A smaller shared_buffers combined with greater reliance on the OS page cache often performs better and is less likely to trigger OOM kills.

Tradeoffs and when to deviate

Sizing shared_buffers is a tradeoff between cache locality and total system memory pressure. There is no single correct percentage.

OLTP vs. OLAP. For OLTP with random page access, shared_buffers in the 25-40% range is often appropriate. For OLAP, values at or below 25% are usually better because the OS page cache handles sequential scans more efficiently. If your workload mixes both, start at 25% and measure checkpoint duration and cache hit ratio before raising it.

Huge pages. On Linux, PostgreSQL can use explicit huge pages to reduce TLB misses for large shared_buffers allocations. The huge_pages parameter accepts off, try, or on. Setting it to try allows PostgreSQL to use them if available and fall back gracefully. Transparent Huge Pages (THP), however, are discouraged by the official PostgreSQL documentation because the kernel’s compaction process can cause latency spikes. Disable THP on database servers.

Calculate required huge pages before enabling:

# Check current huge page size and availability
grep Huge /proc/meminfo

# Estimate pages needed (shared_buffers in bytes / huge page size)
# For example, with shared_buffers = 64GB and Hugepagesize = 2048 kB:
# 64 * 1024*1024*1024 / (2048 * 1024) = 32768

If huge_pages = on, PostgreSQL will fail to start if the exact number of huge pages is not reserved in /proc/sys/vm/nr_hugepages.

Memory accounting. shared_buffers is not the only RAM consumer. The total footprint includes backend processes, autovacuum workers, maintenance_work_mem, wal_buffers, and per-query work_mem allocations. A safe heuristic is to ensure that shared_buffers plus expected concurrent memory usage stays below 80% of available RAM.

Setting vm.overcommit_memory = 2 on Linux makes the kernel strictly enforce overcommit limits and can prevent the OOM killer from targeting PostgreSQL backends. However, this is a system-wide, disruptive change that requires careful tuning of vm.overcommit_ratio or vm.overcommit_kbytes. If misconfigured, it can cause allocation failures for unrelated processes. Test in staging before applying in production.

Checkpoint tuning. If you increase shared_buffers, tune checkpoint behavior to avoid I/O storms. Set checkpoint_completion_target = 0.9 so the checkpointer spreads writes over most of the checkpoint interval. Increase max_wal_size so checkpoints are driven by timeout rather than WAL volume. Forced checkpoints produce I/O spikes.

Monitor checkpoint behavior:

SELECT checkpoints_timed, checkpoints_req,
       checkpoint_write_time, checkpoint_sync_time
FROM pg_stat_bgwriter;

Rising checkpoints_req relative to checkpoints_timed indicates WAL-driven checkpoints, which increases burst I/O.

Container and cgroup limits. In containers, shared_buffers must fit within the cgroup memory limit alongside the kernel page cache, connection overhead, and tmpfs usage. If the container lacks an adequate memory limit or swap configuration, the host OOM killer terminates the container during checkpoint spikes. Size shared_buffers to leave headroom for at least one checkpoint’s worth of dirty pages plus concurrent backend allocations.

Monitoring signals that indicate misconfiguration

When debugging production issues, correlate shared_buffers sizing with these signals:

Cache hit ratio sustained below 95% in OLTP. Check pg_stat_database. If disk latency is low and throughput is acceptable, the miss rate may be tolerable. If disk latency spikes, shared_buffers may be too small or the working set exceeds total RAM.
Checkpoint write time spikes. In Netdata, look for postgresql.checkpoint_write_time rising sharply at regular intervals. This indicates the checkpointer is flushing too many dirty pages at once. Either reduce shared_buffers or increase max_wal_size and checkpoint_completion_target.
System RAM exhaustion without obvious backend growth. If pg_stat_activity shows moderate connections but the OS reports high memory usage, check for double buffering. The kernel page cache plus shared_buffers may exceed physical RAM. Use free -m or /proc/meminfo to verify available memory and page cache size.
OOM kills targeting postgres backends. Check dmesg or the kernel log for OOM killer activity. If backends die while shared_buffers is above 40%, reduce it and verify work_mem and max_connections are not compounding the pressure.

Sizing workflow

Establish baseline. Record current shared_buffers, cache hit ratio, and checkpoint frequency.
Classify workload. OLTP favors 25-40%. OLAP favors 10-25%. Mixed workloads start at 25%.
Check total memory pressure. Ensure shared_buffers + (max_connections * work_mem * average concurrent sorts) + maintenance_work_mem + OS overhead stays below 80% of RAM.
Tune checkpoints. Set max_wal_size high enough to time out, and checkpoint_completion_target = 0.9.
Restart and measure. Because shared_buffers requires a restart, schedule the change during a maintenance window. Compare checkpoint write times and query latency for at least 24 hours before further adjustments.

The Netdata solution

PostgreSQL monitoring with Netdata

Netdata monitors PostgreSQL with per-second metrics, pre-built dashboards, and ML-powered anomaly detection. Correlate connection saturation, lock waits, autovacuum progress, replication lag, and checkpoint I/O against the rest of your stack so you catch the incidents in these runbooks before they page anyone.

See PostgreSQL monitoring → Start monitoring free