PgBouncer pool exhausted: how to diagnose and fix client waits
Connection timeouts appear in application logs, but pg_stat_activity shows plenty of idle PostgreSQL backends. The bottleneck is usually the connection pooler. When PgBouncer exhausts server connections, clients queue at the pooler instead of reaching PostgreSQL. Sustained queuing raises latency and can cause timeouts before the database sees the query.
Confirm pool exhaustion, distinguish it from PostgreSQL-side connection limits, and fix the root cause without restarting services.
What this means
PgBouncer maps many lightweight client connections to a smaller pool of PostgreSQL backends. Each distinct (database, user) pair gets its own pool bounded by default_pool_size unless overridden.
In transaction mode, a client borrows a server connection only for the duration of a transaction. In session mode, the server connection is held for the entire client lifetime, even when idle. Pool exhaustion means every server connection in a pool is occupied or unavailable, so additional clients wait. Sustained cl_waiting is the telltale sign.
flowchart TD
A[Client connects to PgBouncer] --> B{Server connection available?}
B -->|Yes| C[Linked to server: sv_active]
B -->|No| D[Client queues: cl_waiting]
D --> E{Why no server?}
E -->|Long queries| F[sv_active at limit]
E -->|Pool too small| G[default_pool_size too low]
E -->|Session idle| H[sv_active held by idle clients]
E -->|PG exhausted| I[max_connections reached]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Long-running queries saturating the pool | sv_active at the pool limit, cl_waiting growing, PostgreSQL pg_stat_activity shows active queries | SHOW POOLS and pg_stat_activity.state = 'active' |
| Pool size too small for concurrent load | cl_waiting spikes under normal load, sv_active pegged at default_pool_size | SHOW POOLS pool size versus peak sv_active |
| Session pooling with idle clients | sv_active is high while cl_waiting grows; idle clients hold linked server connections | SHOW CLIENTS link column and request_time |
PostgreSQL max_connections exhausted | PgBouncer cannot create new server connections; low sv_active but high cl_waiting | PostgreSQL pg_stat_activity count versus max_connections |
| Connection churn during deploys | Rolling deploy creates new connections before old ones close; brief cl_waiting spikes | PgBouncer SHOW CLIENTS state during deploy window |
Quick checks
Run these read-only checks from the PgBouncer admin console or PostgreSQL.
# PgBouncer admin console
psql -p 6432 pgbouncer -c "SHOW POOLS;"
psql -p 6432 pgbouncer -c "SHOW CLIENTS;"
# PostgreSQL backend states
psql -d postgres -c "SELECT state, count(*) FROM pg_stat_activity GROUP BY state;"
# PostgreSQL capacity
psql -d postgres -c "SELECT count(*), current_setting('max_connections')::int FROM pg_stat_activity;"
# Long-running active queries
psql -d postgres -c "SELECT pid, usename, now() - query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' AND query_start < now() - interval '30 seconds';"
How to diagnose it
- Confirm pool exhaustion. Run
SHOW POOLSon the PgBouncer admin console. Sustainedcl_waiting > 0means clients are queued. Transient spikes during deploys are normal; sustained growth is not. - Check server utilization. For the affected pool, if
sv_activeis at the limit andcl_waitinggrows, the pool is saturated by active work. - Identify the workload on PostgreSQL. Check
pg_stat_activity. Manyidlebackends despite highsv_activesuggestssessionmode clients are holding connections. Manyactivebackends points to long-running queries. - Distinguish transaction versus session mode. In
transactionmode, idle clients do not hold server connections; insessionmode, they do. If you usesessionpooling, checkSHOW CLIENTSfor clients with alinkand oldrequest_time. - Verify PostgreSQL headroom. Low
sv_active + sv_idlewith highcl_waitingcan mean PgBouncer cannot create new server connections because PostgreSQLmax_connectionsis exhausted. Checkpg_stat_activitycounts. - Check for deploy or retry storms. Correlate with deploy timestamps. Rolling deploys that open new connections before draining old ones can exhaust pools and trigger retry loops.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
cl_waiting | Direct indicator of client queuing | Sustained > 0 for more than 30 seconds |
sv_active / default_pool_size | Pool utilization by active queries | Ratio > 0.8 sustained |
sv_idle | Unused pooled server connections | High sv_idle with no waiting clients means over-allocation; near-zero sv_idle with waiting clients means saturation |
| Client wait time | User-facing latency from queueing | P99 wait time > 1 second |
| PostgreSQL active connections | Distinguishes pool saturation from backend exhaustion | Count approaching max_connections |
query_wait_timeout disconnects | Clients giving up and reconnecting, worsening storms | Increasing error rate of wait timeouts |
Fixes
Resize the pool
If sv_active consistently hits the limit and queries are short, increase default_pool_size or the per-pool override. Calculate the aggregate footprint across all PgBouncer instances and application pools. Tradeoff: more PostgreSQL backends consume more memory and CPU context-switch capacity. Do not raise pool size without confirming PostgreSQL can handle the extra backends. The general guidance is to size the aggregate pool footprint to roughly 2 to 3 times the CPU cores on the PostgreSQL server.
Reduce idle server connection lifetime
Lower server_idle_timeout to close idle server connections faster and return them to the pool. Tradeoff: higher connection setup overhead and potential thundering herd on reconnect. This helps only if sv_idle is high relative to the pool size.
Switch pooling mode
If you currently use session mode and your application does not rely on session-level features (advisory locks, SET commands, temporary tables, LISTEN/NOTIFY), switch to transaction mode. Tradeoff: session features break. This is the single most effective change for connection efficiency.
Terminate long-running queries
If a few queries monopolize the pool, terminate them cautiously:
-- WARNING: Destructive. Terminates the query and rolls back the transaction.
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'active'
AND query_start < now() - interval '5 minutes';
Tradeoff: data modifications are rolled back and application logic must handle query failure. Always check what the query is doing before terminating.
Add burst reserve
Configure reserve_pool_size and reserve_pool_timeout to allow brief bursts beyond default_pool_size before refusing work. Tradeoff: higher PostgreSQL load during spikes, but fewer failed client requests.
Scale PostgreSQL backends
If the root cause is that PostgreSQL max_connections is too low for the aggregate pool size across all PgBouncer instances, add read replicas or increase max_connections after verifying memory capacity. Do not raise max_connections to solve pool pressure without a pooler; that causes backend bloat and context-switch overhead.
Prevention
- Size
default_pool_sizebased on PostgreSQL CPU cores, not application connection counts. The pooler handles thousands of clients; PostgreSQL handles tens to low hundreds of backends efficiently. - Use
transactionpooling mode unless a specific workload requires session features. - Set
server_reset_query = DISCARD ALLto clear session state between transactions and prevent leakage. - Monitor
cl_waitingas a primary pool-health signal, not just connection counts. - Cap application-side pool sizes so that deploys do not create connection surges that overwhelm PgBouncer.
- Test deploy procedures to ensure old instances release connections before new instances acquire them.
How Netdata helps
- Correlate
cl_waitingwith PostgreSQLpg_stat_activitymetrics to pinpoint whether the bottleneck is pool saturation or backend slowness. - Alert on PostgreSQL connection utilization before PgBouncer queues form.
- Track query latency percentiles to detect when pool queuing becomes user-facing latency.
- Monitor system CPU and memory alongside connection counts to validate headroom before increasing
default_pool_size. - Track replication lag if reads are offloaded to replicas.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL monitoring checklist: the signals every production database needs
- PostgreSQL monitoring maturity model: from reactive to self-healing
- PostgreSQL FATAL: too many connections - causes and fixes






