$ guides / postgres / postgres-pgbouncer-pool-exhausted ▌

Operations Guides

PgBouncer pool exhausted: how to diagnose and fix client waits

Connection timeouts appear in application logs, but pg_stat_activity shows plenty of idle PostgreSQL backends. The bottleneck is usually the connection pooler. When PgBouncer exhausts server connections, clients queue at the pooler instead of reaching PostgreSQL. Sustained queuing raises latency and can cause timeouts before the database sees the query.

Confirm pool exhaustion, distinguish it from PostgreSQL-side connection limits, and fix the root cause without restarting services.

What this means

PgBouncer maps many lightweight client connections to a smaller pool of PostgreSQL backends. Each distinct (database, user) pair gets its own pool bounded by default_pool_size unless overridden.

In transaction mode, a client borrows a server connection only for the duration of a transaction. In session mode, the server connection is held for the entire client lifetime, even when idle. Pool exhaustion means every server connection in a pool is occupied or unavailable, so additional clients wait. Sustained cl_waiting is the telltale sign.

flowchart TD
    A[Client connects to PgBouncer] --> B{Server connection available?}
    B -->|Yes| C[Linked to server: sv_active]
    B -->|No| D[Client queues: cl_waiting]
    D --> E{Why no server?}
    E -->|Long queries| F[sv_active at limit]
    E -->|Pool too small| G[default_pool_size too low]
    E -->|Session idle| H[sv_active held by idle clients]
    E -->|PG exhausted| I[max_connections reached]

Common causes

Cause	What it looks like	First thing to check
Long-running queries saturating the pool	`sv_active` at the pool limit, `cl_waiting` growing, PostgreSQL `pg_stat_activity` shows active queries	`SHOW POOLS` and `pg_stat_activity.state = 'active'`
Pool size too small for concurrent load	`cl_waiting` spikes under normal load, `sv_active` pegged at `default_pool_size`	`SHOW POOLS` pool size versus peak `sv_active`
Session pooling with idle clients	`sv_active` is high while `cl_waiting` grows; idle clients hold linked server connections	`SHOW CLIENTS` `link` column and `request_time`
PostgreSQL `max_connections` exhausted	PgBouncer cannot create new server connections; low `sv_active` but high `cl_waiting`	PostgreSQL `pg_stat_activity` count versus `max_connections`
Connection churn during deploys	Rolling deploy creates new connections before old ones close; brief `cl_waiting` spikes	PgBouncer `SHOW CLIENTS` state during deploy window

Quick checks

Run these read-only checks from the PgBouncer admin console or PostgreSQL.

# PgBouncer admin console
psql -p 6432 pgbouncer -c "SHOW POOLS;"
psql -p 6432 pgbouncer -c "SHOW CLIENTS;"

# PostgreSQL backend states
psql -d postgres -c "SELECT state, count(*) FROM pg_stat_activity GROUP BY state;"

# PostgreSQL capacity
psql -d postgres -c "SELECT count(*), current_setting('max_connections')::int FROM pg_stat_activity;"

# Long-running active queries
psql -d postgres -c "SELECT pid, usename, now() - query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' AND query_start < now() - interval '30 seconds';"

How to diagnose it

Confirm pool exhaustion. Run SHOW POOLS on the PgBouncer admin console. Sustained cl_waiting > 0 means clients are queued. Transient spikes during deploys are normal; sustained growth is not.
Check server utilization. For the affected pool, if sv_active is at the limit and cl_waiting grows, the pool is saturated by active work.
Identify the workload on PostgreSQL. Check pg_stat_activity. Many idle backends despite high sv_active suggests session mode clients are holding connections. Many active backends points to long-running queries.
Distinguish transaction versus session mode. In transaction mode, idle clients do not hold server connections; in session mode, they do. If you use session pooling, check SHOW CLIENTS for clients with a link and old request_time.
Verify PostgreSQL headroom. Low sv_active + sv_idle with high cl_waiting can mean PgBouncer cannot create new server connections because PostgreSQL max_connections is exhausted. Check pg_stat_activity counts.
Check for deploy or retry storms. Correlate with deploy timestamps. Rolling deploys that open new connections before draining old ones can exhaust pools and trigger retry loops.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`cl_waiting`	Direct indicator of client queuing	Sustained > 0 for more than 30 seconds
`sv_active / default_pool_size`	Pool utilization by active queries	Ratio > 0.8 sustained
`sv_idle`	Unused pooled server connections	High `sv_idle` with no waiting clients means over-allocation; near-zero `sv_idle` with waiting clients means saturation
Client wait time	User-facing latency from queueing	P99 wait time > 1 second
PostgreSQL active connections	Distinguishes pool saturation from backend exhaustion	Count approaching `max_connections`
`query_wait_timeout` disconnects	Clients giving up and reconnecting, worsening storms	Increasing error rate of wait timeouts

Fixes

Resize the pool

If sv_active consistently hits the limit and queries are short, increase default_pool_size or the per-pool override. Calculate the aggregate footprint across all PgBouncer instances and application pools. Tradeoff: more PostgreSQL backends consume more memory and CPU context-switch capacity. Do not raise pool size without confirming PostgreSQL can handle the extra backends. The general guidance is to size the aggregate pool footprint to roughly 2 to 3 times the CPU cores on the PostgreSQL server.

Reduce idle server connection lifetime

Lower server_idle_timeout to close idle server connections faster and return them to the pool. Tradeoff: higher connection setup overhead and potential thundering herd on reconnect. This helps only if sv_idle is high relative to the pool size.

Switch pooling mode

If you currently use session mode and your application does not rely on session-level features (advisory locks, SET commands, temporary tables, LISTEN/NOTIFY), switch to transaction mode. Tradeoff: session features break. This is the single most effective change for connection efficiency.

Terminate long-running queries

If a few queries monopolize the pool, terminate them cautiously:

-- WARNING: Destructive. Terminates the query and rolls back the transaction.
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < now() - interval '5 minutes';

Tradeoff: data modifications are rolled back and application logic must handle query failure. Always check what the query is doing before terminating.

Add burst reserve

Configure reserve_pool_size and reserve_pool_timeout to allow brief bursts beyond default_pool_size before refusing work. Tradeoff: higher PostgreSQL load during spikes, but fewer failed client requests.

Scale PostgreSQL backends

If the root cause is that PostgreSQL max_connections is too low for the aggregate pool size across all PgBouncer instances, add read replicas or increase max_connections after verifying memory capacity. Do not raise max_connections to solve pool pressure without a pooler; that causes backend bloat and context-switch overhead.

Prevention

Size default_pool_size based on PostgreSQL CPU cores, not application connection counts. The pooler handles thousands of clients; PostgreSQL handles tens to low hundreds of backends efficiently.
Use transaction pooling mode unless a specific workload requires session features.
Set server_reset_query = DISCARD ALL to clear session state between transactions and prevent leakage.
Monitor cl_waiting as a primary pool-health signal, not just connection counts.
Cap application-side pool sizes so that deploys do not create connection surges that overwhelm PgBouncer.
Test deploy procedures to ensure old instances release connections before new instances acquire them.

How Netdata helps

Correlate cl_waiting with PostgreSQL pg_stat_activity metrics to pinpoint whether the bottleneck is pool saturation or backend slowness.
Alert on PostgreSQL connection utilization before PgBouncer queues form.
Track query latency percentiles to detect when pool queuing becomes user-facing latency.
Monitor system CPU and memory alongside connection counts to validate headroom before increasing default_pool_size.
Track replication lag if reads are offloaded to replicas.

The Netdata solution

PostgreSQL monitoring with Netdata

Netdata monitors PostgreSQL with per-second metrics, pre-built dashboards, and ML-powered anomaly detection. Correlate connection saturation, lock waits, autovacuum progress, replication lag, and checkpoint I/O against the rest of your stack so you catch the incidents in these runbooks before they page anyone.

See PostgreSQL monitoring → Start monitoring free

PgBouncer pool exhausted: how to diagnose and fix client waits

PgBouncer pool exhausted: how to diagnose and fix client waits

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Resize the pool

Reduce idle server connection lifetime

Switch pooling mode

Terminate long-running queries

Add burst reserve

Scale PostgreSQL backends

Prevention

How Netdata helps

Related guides

PostgreSQL monitoring with Netdata