$ guides / postgres / postgres-connection-exhaustion ▌

Operations Guides

PostgreSQL connection exhaustion: detection, diagnosis, and prevention

Application logs show FATAL: sorry, too many clients already. Health checks are failing. A rolling deploy just finished, and now the database is rejecting connections. Because PostgreSQL uses one process per connection, every slot consumes memory and scheduler overhead. Once max_connections is reached, the server refuses new backends entirely. Raising max_connections without fixing the root cause increases memory pressure and context-switch thrashing. This guide covers how to distinguish a true capacity shortage from a leak or pool misconfiguration, how to recover safely, and how to prevent recurrence.

What this means

max_connections is a hard limit. Changing it requires a server restart. The default is 100, including all client backends, replication connections, and reserved superuser slots. When the limit is reached, PostgreSQL returns a fatal error to every new connection attempt. PostgreSQL has no built-in connection pooling: every application connection becomes a dedicated backend process. Without an external pooler such as PgBouncer, connection churn from deploys, health checks, or retry storms can exhaust slots in seconds.

Connection exhaustion is not only about total count. The state of those connections matters. An idle backend holds a slot and memory but no locks. An idle in transaction backend also holds a snapshot, blocks VACUUM, and can accelerate table bloat and transaction-ID wraparound pressure. An active backend running a slow query or waiting on a lock consumes a slot while a queue builds behind it.

flowchart TD
  A["Connection errors: too many clients"] --> B{"pg_stat_activity count >= max_connections?"}
  B -->|Yes| C["Server limit hit"]
  B -->|No| D["Pooler limit hit"]
  C --> E{"Dominant state?"}
  E -->|"idle in transaction"| F["App leak or missing timeout"]
  E -->|"idle"| G["Pool size too large or missing pooler"]
  E -->|"active"| H["Slow queries or lock waits"]
  D --> I["Check PgBouncer SHOW POOLS"]
  F --> J["Terminate idle; set idle_in_transaction_session_timeout"]
  G --> K["Deploy PgBouncer; reduce app pool size"]
  H --> L["Tune query or kill blocker"]

Common causes

Cause	What it looks like	First thing to check
Application connection leak	`idle in transaction` count grows monotonically; errors during steady-state traffic	`pg_stat_activity` filtered by `state = 'idle in transaction'` and `application_name`
Missing or misconfigured pooler	`max_connections` hit during deploys or spikes; hundreds of `idle` backends visible directly in PostgreSQL	Whether PgBouncer or an equivalent pooler is deployed, and its `SHOW POOLS` output
Idle-in-transaction sessions	Old `state_change` timestamps; `VACUUM` blocked; accelerating table bloat	`pg_stat_activity` ordered by `state_change` for `idle in transaction` rows
Thundering herd on restart or deploy	Connection count spikes immediately after a rolling deploy or primary restart; spike correlates with deploy timestamps	Correlation between deploy events and `pg_stat_activity` count
Long-running queries or lock waits	`active` connections with old `query_start`; a queue of ungranted locks in `pg_locks`	`pg_locks` where `granted = false`, joined to `pg_stat_activity`

Quick checks

Run these read-only checks before making any changes.

# Check current connection count grouped by state
psql -c "SELECT state, count(*) FROM pg_stat_activity GROUP BY state;"

# Check the hard server limit and reserved slots
psql -c "SHOW max_connections; SHOW superuser_reserved_connections;"

# Identify idle-in-transaction sessions and their age
psql -c "SELECT pid, usename, application_name, state, now() - state_change AS idle_time FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY idle_time DESC;"

# Check for lock waits that inflate required concurrency
psql -c "SELECT l.pid, a.usename, l.mode, l.granted, a.query FROM pg_locks l JOIN pg_stat_activity a ON l.pid = a.pid WHERE NOT l.granted LIMIT 10;"

# Inspect PgBouncer pool state if a pooler is in use
psql -p 6432 pgbouncer -c "SHOW POOLS;"

# Attribute connections by application source
psql -c "SELECT application_name, count(*) FROM pg_stat_activity GROUP BY application_name ORDER BY count(*) DESC;"

How to diagnose it

Confirm the boundary. Compare count(*) FROM pg_stat_activity against max_connections. If the server limit is not yet reached, the exhaustion may be happening in an intermediate pooler or application-side pool.
Classify states. High idle suggests the application is opening connections it does not need, or the pooler is not multiplexing. High idle in transaction points to an application logic leak. High active with old query_start indicates slow queries or lock contention.
Attribute by source. Use application_name and client_addr to find the specific service, host, or batch job responsible for the spike.
Check for lock pressure. Query pg_locks for ungranted locks. A single blocked UPDATE can hold a row lock while a dozen connections queue behind it, artificially inflating the required connection count.
Correlate with events. Match the timestamp of the spike against deploys, restarts, cron jobs, or ETL batches. A rolling deploy that opens new connections before draining old ones creates a transient doubling of slots.
Inspect the pooler. If PgBouncer exists, run SHOW POOLS. Check the cl_waiting column. If clients are waiting, the pooler is saturated. If client and server active counts are nearly equal, the pooler is not multiplexing effectively, or the pool size is too large.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Connection utilization (`count(*) / max_connections`)	Reaching the hard limit causes immediate fatal application errors	Sustained utilization above 80%
Idle-in-transaction count	Holds snapshots, blocks `VACUUM`, and wastes slots without doing work	Any session in this state for more than 5 minutes
Pooler client vs. server active ratio	Reveals whether exhaustion is at the pooler or the database	`cl_waiting` greater than zero, or client and server counts near parity in transaction mode
Lock wait queue depth	Queued locks increase the number of concurrent connections needed to maintain throughput	Ungranted locks persisting longer than 10 seconds
Connections by `application_name`	Identifies which service is leaking or over-connecting	A single application name consuming more than 50% of slots

Fixes

Terminate unsafe idle sessions for immediate relief

If idle in transaction sessions are consuming slots, terminate them to free capacity and unblock VACUUM.

-- Terminate idle-in-transaction sessions older than 5 minutes
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND state_change < NOW() - INTERVAL '5 minutes';

Warning: This aborts open transactions and requires privileges to terminate backends. Any in-flight application request will fail and roll back. Do not terminate actively running queries unless you have confirmed they are safe to abort.

Deploy or reconfigure connection pooling

If connections are hitting PostgreSQL directly, deploy PgBouncer in transaction mode. Size the PostgreSQL-side pool to roughly 2-3 times the number of CPU cores, and set server_reset_query = DISCARD ALL to clear session state between transactions.

Tradeoff: Transaction mode breaks session-level features such as advisory locks, LISTEN/NOTIFY, temporary tables, and prepared statements that span multiple transactions. If your application requires these, use session mode and accept lower efficiency.

Fix application connection leaks

Set idle_in_transaction_session_timeout to 2 minutes. On PostgreSQL 14 and later, also set idle_session_timeout to 30 minutes. Audit application code to ensure transactions are committed or rolled back promptly, including in exception paths.

Tradeoff: Aggressive timeouts may abort legitimate long-running transactions. Tune the threshold based on your longest expected interactive transaction, and set stricter limits on batch and health-check roles.

Eliminate thundering herd on deploy

Implement connection draining so old instances release backends before new instances connect. Stagger rolling restarts. Reduce health-check frequency if each check opens a new connection.

Tradeoff: This requires coordination between application deploy pipelines and the infrastructure layer. It is easier to enforce when a pooler sits in front of the database.

Increase max_connections only as a last resort

If a pooler is already in place and the legitimate active workload still exceeds the limit, calculate a new max_connections based on pool_size multiplied by the number of poolers, plus replication connections, superuser reserved slots, and a small spare margin. Then restart PostgreSQL.

Tradeoff: Higher max_connections increases per-process memory overhead and OS scheduler pressure. Do not raise the limit above 200-300 without aggressive pooling and memory validation.

Prevention

PgBouncer in transaction mode. Multiplex hundreds of client connections over a small pool of PostgreSQL backends.
idle_in_transaction_session_timeout. Set to 2 minutes to prevent abandoned transactions from holding slots and blocking VACUUM.
Connection count alerts. Page when connection utilization exceeds 70% so you can investigate before hard rejection begins.
Deploy-time connection draining. Ensure rolling deployments close old connections before opening new ones to avoid double-counting slots.
Application-side pool limits. Cap application pool sizes so the aggregate across all instances stays well below the server or pooler limit.

How Netdata helps

Correlates PostgreSQL connection count with application error rates to distinguish database saturation from pooler misconfiguration.
Tracks pg_stat_activity state breakdowns (active, idle, idle in transaction).
Alerts on connection utilization percentage before max_connections is reached.
Surfaces memory pressure and context-switch rates, which typically increase when max_connections is raised without pooling.

PostgreSQL monitoring checklist: the signals every production database needs

The Netdata solution

PostgreSQL monitoring with Netdata

Netdata monitors PostgreSQL with per-second metrics, pre-built dashboards, and ML-powered anomaly detection. Correlate connection saturation, lock waits, autovacuum progress, replication lag, and checkpoint I/O against the rest of your stack so you catch the incidents in these runbooks before they page anyone.

See PostgreSQL monitoring → Start monitoring free

PostgreSQL connection exhaustion: detection, diagnosis, and prevention

PostgreSQL connection exhaustion: detection, diagnosis, and prevention

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Terminate unsafe idle sessions for immediate relief

Deploy or reconfigure connection pooling

Fix application connection leaks

Eliminate thundering herd on deploy

Increase max_connections only as a last resort

Prevention

How Netdata helps

Related guides

PostgreSQL monitoring with Netdata