PostgreSQL idle in transaction: detecting and killing zombie sessions
When pg_stat_activity fills with idle in transaction sessions, autovacuum stalls, dead tuples accumulate, and applications throw “too many clients” errors while backends sit idle. These zombie sessions hold a transaction snapshot. Under MVCC, this prevents VACUUM from reclaiming dead tuples created after the transaction started, causing table bloat, blocked DDL, and transaction-ID wraparound pressure.
This guide shows how to detect these sessions, determine whether they are blocking work, terminate them safely, and prevent recurrence.
What this means
A backend moves to idle in transaction when the client has issued a BEGIN (or an implicit equivalent) and at least one query, but has not yet ended the transaction. The backend waits for the next client command. If the client crashes, disconnects uncleanly, or forgets to commit, the backend stays in this state indefinitely.
idle in transaction (aborted) occurs when a statement fails inside a transaction and the client does not issue ROLLBACK. The transaction is aborted, but the backend remains connected and idle. These sessions still hold locks and block VACUUM exactly like non-aborted variants. Detection queries must include both states.
The transaction’s snapshot becomes the oldest active snapshot in the cluster. Autovacuum cannot freeze tuples or remove dead rows that remain visible to this snapshot. The result is table and index bloat, slower scans over dead rows, replication lag on hot standbys if the primary’s vacuum is stalled, and eventual connection exhaustion.
flowchart TD
A[Connections climbing] --> B[Query pg_stat_activity]
B --> C{Idle in transaction?}
C -->|Yes| D[Check xact_start age]
D --> E{Blocking locks?}
E -->|Yes| F[pg_terminate_backend]
E -->|No| G[Set session timeout]
F --> H[Vacuum resumes]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Application missing commit/rollback | state is idle in transaction, query shows the last statement, application_name matches the app | Application logs for unhandled exceptions after BEGIN |
| ORM auto-begin without explicit close | Many short query strings from the same app, connection returned to pool in an open transaction | ORM configuration (for example, Django ATOMIC_REQUESTS, SQLAlchemy session lifecycle) |
| Client crash or network partition | client_addr shows a host with no active connections, state_change is old | Network or client logs for disconnects around xact_start time |
| Interactive psql or admin tool left open | usename is a human account, application_name is psql or a GUI client | Whether the user still needs the session |
| Driver context cancellation without rollback | state is idle in transaction (aborted), driver is Go pgx or Python psycopg2 | Driver logs for client-side timeouts or context cancellation |
Quick checks
Run these read-only queries to assess scope and impact.
-- Count idle-in-transaction sessions by state
SELECT state, count(*)
FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'idle in transaction (aborted)')
GROUP BY state;
-- List oldest idle-in-transaction sessions
SELECT pid, usename, application_name, client_addr,
state, xact_start, state_change,
now() - xact_start AS xact_age,
query
FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'idle in transaction (aborted)')
ORDER BY xact_start;
-- Check if a specific backend holds locks
SELECT mode, granted, relation::regclass
FROM pg_locks
WHERE pid = <pid> AND granted = true;
-- Check dead tuple accumulation on tables
SELECT schemaname, relname, n_dead_tup, n_live_tup,
last_autovacuum, last_vacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
-- Aggregate idle-in-transaction time for this database
SELECT datname, idle_in_transaction_time
FROM pg_stat_database
WHERE datname = current_database();
-- Connection utilization by state
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;
How to diagnose it
- Confirm zombies and measure age. Query
pg_stat_activityfor bothidle in transactionandidle in transaction (aborted). Sort byxact_start. Do not usequery_startto judge transaction age; for idle-in-transaction sessions,query_startreflects the last completed statement, not the transaction boundary. - Map sessions to applications. Group by
application_nameandclient_addrto identify whether the leak is coming from a specific service, a connection pool, or an ad-hoc client. - Check lock impact. Join
pg_stat_activitywithpg_locksonpid. Look forAccessExclusiveLockor other granted locks that block DDL or queries. - Correlate with bloat. Compare
pg_stat_user_tables.n_dead_tupon write-heavy tables againstlast_autovacuum. If dead tuples grow while autovacuum is active on other tables, the zombie snapshot is likely holding back vacuum. - Check for the aborted variant. Filter for
idle in transaction (aborted)specifically. These often follow a client-side error or driver context cancellation and indicate an application bug rather than a simple missing commit.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pg_stat_activity count of idle in transaction and idle in transaction (aborted) | Direct measure of open, uncommitted transactions | Any session with xact_start older than 5 minutes |
pg_stat_activity.xact_start age | Transaction duration determines how far back VACUUM is blocked | now() - xact_start exceeds 5 minutes |
pg_stat_database.idle_in_transaction_time | Aggregate milliseconds spent in idle-in-transaction states per database | Sustained increase between samples |
pg_stat_user_tables.n_dead_tup | Dead tuples that autovacuum cannot reclaim while blocked | Growing on active tables despite running autovacuum workers |
pg_stat_user_tables.last_autovacuum | Whether autovacuum has recently processed the table | Stale timestamp while n_dead_tup rises |
| Connection utilization | Zombies consume slots that active queries need | Sustained above 70 percent with many idle-in-transaction sessions |
Fixes
Immediate termination
If a session is blocking autovacuum or DDL and the application owner confirms it is not needed, terminate it. pg_terminate_backend(pid) sends SIGTERM to forcibly disconnect the backend and roll back its open transaction. pg_cancel_backend(pid) sends SIGINT to cancel the active query; for idle-in-transaction sessions there is no active query, so cancel has no effect. Use terminate for cleanup.
-- Terminate zombies older than 5 minutes
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'idle in transaction (aborted)')
AND xact_start < now() - interval '5 minutes';
Caution: terminating a backend rolls back its transaction. Any uncommitted work is lost. Verify with the application team before terminating sessions that may hold legitimate long-running batch work. This requires superuser or membership in the pg_signal_backend role.
Enable idle_in_transaction_session_timeout
The permanent fix is to let PostgreSQL clean these sessions automatically. The parameter defaults to 0 (disabled). When set to a positive value, PostgreSQL terminates any session that remains idle inside an open transaction longer than the specified duration.
Set it at the database level:
ALTER DATABASE mydb SET idle_in_transaction_session_timeout = '2min';
Or in postgresql.conf:
idle_in_transaction_session_timeout = 2min
A starting point of 2 minutes is pragmatic for OLTP. Tune based on your workload: 30 seconds for aggressive protection, up to 5 minutes for batch applications with legitimate pauses.
Application-level fixes
- Ensure every code path that begins a transaction ends with
COMMITorROLLBACK, including exception handlers. - Review ORM configurations. Framework defaults can leave transactions open if exceptions are swallowed.
- For drivers that cancel query contexts, ensure the connection issues an explicit
ROLLBACKbefore reuse. Otherwise the server side remains inidle in transaction (aborted). - Use connection pooling that resets session state between checkouts, such as PgBouncer with
server_reset_query = DISCARD ALL.
Addressing the aborted variant
Sessions in idle in transaction (aborted) have already failed a statement. They cannot execute further commands until rolled back. If your detection shows many aborted variants, focus on application error handling. The fix is almost always in the client code: catch the error, roll back, and either retry or close the connection.
Prevention
- Set
idle_in_transaction_session_timeoutas a baseline before the problem occurs. - Alert on
pg_stat_activitysessions inidle in transactionwithxact_startolder than 5 minutes. - Monitor
pg_stat_database.idle_in_transaction_timefor trend increases. - Include transaction boundary checks in code review.
- Run
pg_stat_statementsreviews to catch queries that might be part of leaked transactions.
How Netdata helps
- Netdata surfaces
pg_stat_activitystate breakdowns, including idle-in-transaction counts. - Alerts on connection saturation and transaction age catch zombies before they block vacuum.
- Connection state metrics correlated with
pg_stat_databaseactivity distinguish leaks from legitimate load spikes. - Per-database
idle_in_transaction_timehighlights chronic transaction hygiene issues.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL monitoring checklist: the signals every production database needs
- PostgreSQL monitoring maturity model: from reactive to self-healing
- PostgreSQL FATAL: too many connections - causes and fixes






