PgBouncer pool exhausted: how to diagnose and fix client waits

Connection timeouts appear in application logs, but pg_stat_activity shows plenty of idle PostgreSQL backends. The bottleneck is usually the connection pooler. When PgBouncer exhausts server connections, clients queue at the pooler instead of reaching PostgreSQL. Sustained queuing raises latency and can cause timeouts before the database sees the query.

Confirm pool exhaustion, distinguish it from PostgreSQL-side connection limits, and fix the root cause without restarting services.

What this means

PgBouncer maps many lightweight client connections to a smaller pool of PostgreSQL backends. Each distinct (database, user) pair gets its own pool bounded by default_pool_size unless overridden.

In transaction mode, a client borrows a server connection only for the duration of a transaction. In session mode, the server connection is held for the entire client lifetime, even when idle. Pool exhaustion means every server connection in a pool is occupied or unavailable, so additional clients wait. Sustained cl_waiting is the telltale sign.

flowchart TD
    A[Client connects to PgBouncer] --> B{Server connection available?}
    B -->|Yes| C[Linked to server: sv_active]
    B -->|No| D[Client queues: cl_waiting]
    D --> E{Why no server?}
    E -->|Long queries| F[sv_active at limit]
    E -->|Pool too small| G[default_pool_size too low]
    E -->|Session idle| H[sv_active held by idle clients]
    E -->|PG exhausted| I[max_connections reached]

Common causes

CauseWhat it looks likeFirst thing to check
Long-running queries saturating the poolsv_active at the pool limit, cl_waiting growing, PostgreSQL pg_stat_activity shows active queriesSHOW POOLS and pg_stat_activity.state = 'active'
Pool size too small for concurrent loadcl_waiting spikes under normal load, sv_active pegged at default_pool_sizeSHOW POOLS pool size versus peak sv_active
Session pooling with idle clientssv_active is high while cl_waiting grows; idle clients hold linked server connectionsSHOW CLIENTS link column and request_time
PostgreSQL max_connections exhaustedPgBouncer cannot create new server connections; low sv_active but high cl_waitingPostgreSQL pg_stat_activity count versus max_connections
Connection churn during deploysRolling deploy creates new connections before old ones close; brief cl_waiting spikesPgBouncer SHOW CLIENTS state during deploy window

Quick checks

Run these read-only checks from the PgBouncer admin console or PostgreSQL.

# PgBouncer admin console
psql -p 6432 pgbouncer -c "SHOW POOLS;"
psql -p 6432 pgbouncer -c "SHOW CLIENTS;"

# PostgreSQL backend states
psql -d postgres -c "SELECT state, count(*) FROM pg_stat_activity GROUP BY state;"

# PostgreSQL capacity
psql -d postgres -c "SELECT count(*), current_setting('max_connections')::int FROM pg_stat_activity;"

# Long-running active queries
psql -d postgres -c "SELECT pid, usename, now() - query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' AND query_start < now() - interval '30 seconds';"

How to diagnose it

  1. Confirm pool exhaustion. Run SHOW POOLS on the PgBouncer admin console. Sustained cl_waiting > 0 means clients are queued. Transient spikes during deploys are normal; sustained growth is not.
  2. Check server utilization. For the affected pool, if sv_active is at the limit and cl_waiting grows, the pool is saturated by active work.
  3. Identify the workload on PostgreSQL. Check pg_stat_activity. Many idle backends despite high sv_active suggests session mode clients are holding connections. Many active backends points to long-running queries.
  4. Distinguish transaction versus session mode. In transaction mode, idle clients do not hold server connections; in session mode, they do. If you use session pooling, check SHOW CLIENTS for clients with a link and old request_time.
  5. Verify PostgreSQL headroom. Low sv_active + sv_idle with high cl_waiting can mean PgBouncer cannot create new server connections because PostgreSQL max_connections is exhausted. Check pg_stat_activity counts.
  6. Check for deploy or retry storms. Correlate with deploy timestamps. Rolling deploys that open new connections before draining old ones can exhaust pools and trigger retry loops.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
cl_waitingDirect indicator of client queuingSustained > 0 for more than 30 seconds
sv_active / default_pool_sizePool utilization by active queriesRatio > 0.8 sustained
sv_idleUnused pooled server connectionsHigh sv_idle with no waiting clients means over-allocation; near-zero sv_idle with waiting clients means saturation
Client wait timeUser-facing latency from queueingP99 wait time > 1 second
PostgreSQL active connectionsDistinguishes pool saturation from backend exhaustionCount approaching max_connections
query_wait_timeout disconnectsClients giving up and reconnecting, worsening stormsIncreasing error rate of wait timeouts

Fixes

Resize the pool

If sv_active consistently hits the limit and queries are short, increase default_pool_size or the per-pool override. Calculate the aggregate footprint across all PgBouncer instances and application pools. Tradeoff: more PostgreSQL backends consume more memory and CPU context-switch capacity. Do not raise pool size without confirming PostgreSQL can handle the extra backends. The general guidance is to size the aggregate pool footprint to roughly 2 to 3 times the CPU cores on the PostgreSQL server.

Reduce idle server connection lifetime

Lower server_idle_timeout to close idle server connections faster and return them to the pool. Tradeoff: higher connection setup overhead and potential thundering herd on reconnect. This helps only if sv_idle is high relative to the pool size.

Switch pooling mode

If you currently use session mode and your application does not rely on session-level features (advisory locks, SET commands, temporary tables, LISTEN/NOTIFY), switch to transaction mode. Tradeoff: session features break. This is the single most effective change for connection efficiency.

Terminate long-running queries

If a few queries monopolize the pool, terminate them cautiously:

-- WARNING: Destructive. Terminates the query and rolls back the transaction.
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < now() - interval '5 minutes';

Tradeoff: data modifications are rolled back and application logic must handle query failure. Always check what the query is doing before terminating.

Add burst reserve

Configure reserve_pool_size and reserve_pool_timeout to allow brief bursts beyond default_pool_size before refusing work. Tradeoff: higher PostgreSQL load during spikes, but fewer failed client requests.

Scale PostgreSQL backends

If the root cause is that PostgreSQL max_connections is too low for the aggregate pool size across all PgBouncer instances, add read replicas or increase max_connections after verifying memory capacity. Do not raise max_connections to solve pool pressure without a pooler; that causes backend bloat and context-switch overhead.

Prevention

  • Size default_pool_size based on PostgreSQL CPU cores, not application connection counts. The pooler handles thousands of clients; PostgreSQL handles tens to low hundreds of backends efficiently.
  • Use transaction pooling mode unless a specific workload requires session features.
  • Set server_reset_query = DISCARD ALL to clear session state between transactions and prevent leakage.
  • Monitor cl_waiting as a primary pool-health signal, not just connection counts.
  • Cap application-side pool sizes so that deploys do not create connection surges that overwhelm PgBouncer.
  • Test deploy procedures to ensure old instances release connections before new instances acquire them.

How Netdata helps

  • Correlate cl_waiting with PostgreSQL pg_stat_activity metrics to pinpoint whether the bottleneck is pool saturation or backend slowness.
  • Alert on PostgreSQL connection utilization before PgBouncer queues form.
  • Track query latency percentiles to detect when pool queuing becomes user-facing latency.
  • Monitor system CPU and memory alongside connection counts to validate headroom before increasing default_pool_size.
  • Track replication lag if reads are offloaded to replicas.