PostgreSQL idle in transaction: detecting and killing zombie sessions

When pg_stat_activity fills with idle in transaction sessions, autovacuum stalls, dead tuples accumulate, and applications throw “too many clients” errors while backends sit idle. These zombie sessions hold a transaction snapshot. Under MVCC, this prevents VACUUM from reclaiming dead tuples created after the transaction started, causing table bloat, blocked DDL, and transaction-ID wraparound pressure.

This guide shows how to detect these sessions, determine whether they are blocking work, terminate them safely, and prevent recurrence.

What this means

A backend moves to idle in transaction when the client has issued a BEGIN (or an implicit equivalent) and at least one query, but has not yet ended the transaction. The backend waits for the next client command. If the client crashes, disconnects uncleanly, or forgets to commit, the backend stays in this state indefinitely.

idle in transaction (aborted) occurs when a statement fails inside a transaction and the client does not issue ROLLBACK. The transaction is aborted, but the backend remains connected and idle. These sessions still hold locks and block VACUUM exactly like non-aborted variants. Detection queries must include both states.

The transaction’s snapshot becomes the oldest active snapshot in the cluster. Autovacuum cannot freeze tuples or remove dead rows that remain visible to this snapshot. The result is table and index bloat, slower scans over dead rows, replication lag on hot standbys if the primary’s vacuum is stalled, and eventual connection exhaustion.

flowchart TD
    A[Connections climbing] --> B[Query pg_stat_activity]
    B --> C{Idle in transaction?}
    C -->|Yes| D[Check xact_start age]
    D --> E{Blocking locks?}
    E -->|Yes| F[pg_terminate_backend]
    E -->|No| G[Set session timeout]
    F --> H[Vacuum resumes]

Common causes

CauseWhat it looks likeFirst thing to check
Application missing commit/rollbackstate is idle in transaction, query shows the last statement, application_name matches the appApplication logs for unhandled exceptions after BEGIN
ORM auto-begin without explicit closeMany short query strings from the same app, connection returned to pool in an open transactionORM configuration (for example, Django ATOMIC_REQUESTS, SQLAlchemy session lifecycle)
Client crash or network partitionclient_addr shows a host with no active connections, state_change is oldNetwork or client logs for disconnects around xact_start time
Interactive psql or admin tool left openusename is a human account, application_name is psql or a GUI clientWhether the user still needs the session
Driver context cancellation without rollbackstate is idle in transaction (aborted), driver is Go pgx or Python psycopg2Driver logs for client-side timeouts or context cancellation

Quick checks

Run these read-only queries to assess scope and impact.

-- Count idle-in-transaction sessions by state
SELECT state, count(*)
FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'idle in transaction (aborted)')
GROUP BY state;
-- List oldest idle-in-transaction sessions
SELECT pid, usename, application_name, client_addr,
       state, xact_start, state_change,
       now() - xact_start AS xact_age,
       query
FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'idle in transaction (aborted)')
ORDER BY xact_start;
-- Check if a specific backend holds locks
SELECT mode, granted, relation::regclass
FROM pg_locks
WHERE pid = <pid> AND granted = true;
-- Check dead tuple accumulation on tables
SELECT schemaname, relname, n_dead_tup, n_live_tup,
       last_autovacuum, last_vacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
-- Aggregate idle-in-transaction time for this database
SELECT datname, idle_in_transaction_time
FROM pg_stat_database
WHERE datname = current_database();
-- Connection utilization by state
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;

How to diagnose it

  1. Confirm zombies and measure age. Query pg_stat_activity for both idle in transaction and idle in transaction (aborted). Sort by xact_start. Do not use query_start to judge transaction age; for idle-in-transaction sessions, query_start reflects the last completed statement, not the transaction boundary.
  2. Map sessions to applications. Group by application_name and client_addr to identify whether the leak is coming from a specific service, a connection pool, or an ad-hoc client.
  3. Check lock impact. Join pg_stat_activity with pg_locks on pid. Look for AccessExclusiveLock or other granted locks that block DDL or queries.
  4. Correlate with bloat. Compare pg_stat_user_tables.n_dead_tup on write-heavy tables against last_autovacuum. If dead tuples grow while autovacuum is active on other tables, the zombie snapshot is likely holding back vacuum.
  5. Check for the aborted variant. Filter for idle in transaction (aborted) specifically. These often follow a client-side error or driver context cancellation and indicate an application bug rather than a simple missing commit.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pg_stat_activity count of idle in transaction and idle in transaction (aborted)Direct measure of open, uncommitted transactionsAny session with xact_start older than 5 minutes
pg_stat_activity.xact_start ageTransaction duration determines how far back VACUUM is blockednow() - xact_start exceeds 5 minutes
pg_stat_database.idle_in_transaction_timeAggregate milliseconds spent in idle-in-transaction states per databaseSustained increase between samples
pg_stat_user_tables.n_dead_tupDead tuples that autovacuum cannot reclaim while blockedGrowing on active tables despite running autovacuum workers
pg_stat_user_tables.last_autovacuumWhether autovacuum has recently processed the tableStale timestamp while n_dead_tup rises
Connection utilizationZombies consume slots that active queries needSustained above 70 percent with many idle-in-transaction sessions

Fixes

Immediate termination

If a session is blocking autovacuum or DDL and the application owner confirms it is not needed, terminate it. pg_terminate_backend(pid) sends SIGTERM to forcibly disconnect the backend and roll back its open transaction. pg_cancel_backend(pid) sends SIGINT to cancel the active query; for idle-in-transaction sessions there is no active query, so cancel has no effect. Use terminate for cleanup.

-- Terminate zombies older than 5 minutes
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'idle in transaction (aborted)')
  AND xact_start < now() - interval '5 minutes';

Caution: terminating a backend rolls back its transaction. Any uncommitted work is lost. Verify with the application team before terminating sessions that may hold legitimate long-running batch work. This requires superuser or membership in the pg_signal_backend role.

Enable idle_in_transaction_session_timeout

The permanent fix is to let PostgreSQL clean these sessions automatically. The parameter defaults to 0 (disabled). When set to a positive value, PostgreSQL terminates any session that remains idle inside an open transaction longer than the specified duration.

Set it at the database level:

ALTER DATABASE mydb SET idle_in_transaction_session_timeout = '2min';

Or in postgresql.conf:

idle_in_transaction_session_timeout = 2min

A starting point of 2 minutes is pragmatic for OLTP. Tune based on your workload: 30 seconds for aggressive protection, up to 5 minutes for batch applications with legitimate pauses.

Application-level fixes

  • Ensure every code path that begins a transaction ends with COMMIT or ROLLBACK, including exception handlers.
  • Review ORM configurations. Framework defaults can leave transactions open if exceptions are swallowed.
  • For drivers that cancel query contexts, ensure the connection issues an explicit ROLLBACK before reuse. Otherwise the server side remains in idle in transaction (aborted).
  • Use connection pooling that resets session state between checkouts, such as PgBouncer with server_reset_query = DISCARD ALL.

Addressing the aborted variant

Sessions in idle in transaction (aborted) have already failed a statement. They cannot execute further commands until rolled back. If your detection shows many aborted variants, focus on application error handling. The fix is almost always in the client code: catch the error, roll back, and either retry or close the connection.

Prevention

  • Set idle_in_transaction_session_timeout as a baseline before the problem occurs.
  • Alert on pg_stat_activity sessions in idle in transaction with xact_start older than 5 minutes.
  • Monitor pg_stat_database.idle_in_transaction_time for trend increases.
  • Include transaction boundary checks in code review.
  • Run pg_stat_statements reviews to catch queries that might be part of leaked transactions.

How Netdata helps

  • Netdata surfaces pg_stat_activity state breakdowns, including idle-in-transaction counts.
  • Alerts on connection saturation and transaction age catch zombies before they block vacuum.
  • Connection state metrics correlated with pg_stat_database activity distinguish leaks from legitimate load spikes.
  • Per-database idle_in_transaction_time highlights chronic transaction hygiene issues.