PostgreSQL blocking queries: finding the root blocker in a lock cascade

A query that normally finishes in milliseconds is now running for minutes. pg_stat_activity shows a queue of sessions with wait_event_type = 'Lock'. You identify one session holding the contested lock, but terminating it does not clear the queue. That session was itself blocked by another, which was blocked by another. Until you find the session at the head of the chain, the cascade continues.

Use this guide to traverse the lock graph, distinguish direct blockers from the root blocker, and decide whether to terminate or wait. It applies to self-managed PostgreSQL, RDS, Aurora, and containerized deployments where you have access to pg_stat_activity and pg_locks.

What this means

PostgreSQL queues lock requests per object. When a session cannot acquire a lock, it waits. If that session already holds locks on other objects, it becomes a blocker for those objects, creating a cascade. The session you see in pg_locks with granted = true is often only the direct blocker, not the root cause.

pg_blocking_pids(int) returns the direct blockers for a given backend PID (one hop only). To find the root blocker in a cascade, walk the chain recursively. The root is the session that is not itself waiting for a lock, or is waiting on something other than a lock (such as I/O) while still holding the contested resource.

Common lock types in these incidents include relation (table-level), transactionid (row-level contention under MVCC), virtualxid, and advisory locks. Note that pg_locks does not expose which parallel workers belong to which client session, so cascades involving parallel query workers can be harder to trace.

flowchart TD
    A[Client session requests lock] --> B{Lock available?}
    B -->|Yes| C[Lock granted]
    B -->|No| D[Session waits]
    D --> E[Direct blocker holds lock]
    E --> F{Blocker is waiting?}
    F -->|Yes| G[Intermediate blocker]
    F -->|No| H[Root blocker]
    G --> E
    H --> I{Root type}
    I --> J[Active query]
    I --> K[Idle in transaction]
    I --> L[Prepared transaction]

Common causes

CauseWhat it looks likeFirst thing to check
Long-running DDLALTER TABLE, CREATE INDEX, or VACUUM FULL holding AccessExclusiveLockpg_locks for mode = 'AccessExclusiveLock' and granted = true
Idle in transaction holding locksstate = 'idle in transaction' with an old xact_start and active entries in pg_lockspg_stat_activity for idle transactions older than 5 minutes
Heavy UPDATE/DELETE without index supportRow contention visible as transactionid waits in pg_locksRoot blocker query text in pg_stat_activity
Prepared transactionpg_blocking_pids returns an empty array or the wait involves no running PIDpg_prepared_xacts for stale entries
Autovacuum blocked by long query or DDLAutovacuum worker in pg_stat_activity with wait_event_type = 'Lock'Whether the root blocker is a long-running query or DDL

Quick checks

Run these in psql or your monitoring SQL console. They are read-only and safe.

# Show current ungranted locks
psql -c "SELECT pid, locktype, mode, granted, relation::regclass FROM pg_locks WHERE NOT granted LIMIT 5;"
# Show sessions waiting on locks with their direct blockers
psql -c "SELECT pid, wait_event_type, wait_event, pg_blocking_pids(pid) AS blockers, query FROM pg_stat_activity WHERE wait_event_type = 'Lock';"
# Show idle-in-transaction sessions that may be holding locks
psql -c "SELECT pid, usename, state, xact_start, query FROM pg_stat_activity WHERE state = 'idle in transaction' AND xact_start < NOW() - INTERVAL '5 minutes';"
# Show DDL operations holding exclusive locks
psql -c "SELECT pid, mode, relation::regclass, query FROM pg_locks JOIN pg_stat_activity USING (pid) WHERE mode = 'AccessExclusiveLock' AND granted;"
# Check for prepared transactions that cannot be terminated via pg_terminate_backend
psql -c "SELECT gid, prepared, owner FROM pg_prepared_xacts WHERE prepared < NOW() - INTERVAL '1 hour';"

How to diagnose it

  1. Confirm the waits. Query pg_stat_activity where wait_event_type = 'Lock'. Record the pid, query, and wait start time. On PostgreSQL 13+, pg_locks.waitstart gives the exact timestamptz the wait began. On earlier versions, derive approximate wait time from pg_stat_activity.query_start.

  2. Find direct blockers. Use pg_blocking_pids(pid) for each waiting backend. PostgreSQL recommends this function over self-joining pg_locks, which can mislead under heavy lock churn or fast-path lock activity.

  3. Walk the chain to the root. If a direct blocker is also waiting on a lock, it is an intermediate node. Repeat until you reach a backend that is not waiting for a lock. A recursive CTE automates the traversal and avoids revisiting the same PID:

WITH RECURSIVE lock_chain AS (
    SELECT
        pid,
        pg_blocking_pids(pid) AS blockers,
        query,
        0 AS depth,
        ARRAY[pid] AS path
    FROM pg_stat_activity
    WHERE cardinality(pg_blocking_pids(pid)) > 0
    UNION ALL
    SELECT
        a.pid,
        pg_blocking_pids(a.pid),
        a.query,
        lc.depth + 1,
        lc.path || a.pid
    FROM pg_stat_activity a
    JOIN lock_chain lc
        ON a.pid = ANY(lc.blockers)
        AND NOT a.pid = ANY(lc.path)
)
SELECT * FROM lock_chain ORDER BY depth, pid;
  1. Classify the root blocker. Inspect the root’s state in pg_stat_activity.

    • active: running a query. Check whether it is legitimate or runaway.
    • idle in transaction: a zombie session holding a snapshot and locks.
    • If pg_blocking_pids returns an empty array or the chain ends without a running PID, check pg_prepared_xacts. A prepared transaction holds locks but does not appear in pg_stat_activity with a running PID.
  2. Map the lock object. Join pg_locks with pg_stat_activity using pid.

    • locktype = 'relation' with AccessExclusiveLock: DDL is the culprit.
    • locktype = 'transactionid': row-level contention from concurrent UPDATE/DELETE.
    • locktype = 'advisory': application-level locking.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pg_stat_activity.wait_event_type = 'Lock'Indicates active lock waitsSustained rows for > 30 seconds during peak traffic
pg_locks.waitstart age (PG 13+)Exact lock wait durationAny wait exceeding your query latency SLO
pg_stat_database.deadlocksDeadlock detector firingIncrease over a 5-minute window
log_lock_waits outputLogs blocker and blocked PIDs for waits that exceed deadlock_timeoutFrequent entries in PostgreSQL logs
pg_stat_activity.state = 'idle in transaction' countZombie sessions hold locks and block vacuumAny session in this state for > 5 minutes
Connection utilizationLock cascades consume the connection budgetActive + waiting connections > 80% of max_connections

Fixes

Terminate the root blocker

If the root is an idle in transaction zombie or a runaway query, use:

SELECT pg_terminate_backend(<pid>);

This forces rollback, releasing all locks held by that backend. Every waiter in the cascade then retries. Do not terminate if the root is a legitimate long-running batch job or an ALTER TABLE in progress; rollback may take minutes and hold locks until it completes.

Terminate intermediate waiters

If the root blocker is legitimate and must finish, but the cascade is consuming all connections and risking an outage, terminate some of the waiting sessions. This preserves connection headroom but does not restore throughput until the root finishes.

Resolve prepared transactions

A prepared transaction cannot be killed with pg_terminate_backend. If the root is a stale prepared transaction identified in pg_prepared_xacts, commit it if the application logic allows, or roll it back:

ROLLBACK PREPARED '<gid>';

Tune timeouts

Set idle_in_transaction_session_timeout = '2min' to prevent abandoned transactions from holding locks indefinitely. Set lock_timeout = '1s' for application-side DDL or schema migration scripts so they fail fast instead of queuing behind existing locks. Set statement_timeout appropriate to your workload to cap runaway queries.

Prevention

  • Enable log_lock_waits = on. Combined with the default deadlock_timeout = 1s, this logs the blocker PID, blocked PID, lock type, and query text for waits exceeding one second. Essential for post-incident forensics.
  • Schedule heavy DDL during maintenance windows. ALTER TABLE, TRUNCATE, and VACUUM FULL acquire AccessExclusiveLock and block all concurrent access.
  • Index foreign keys. Unindexed foreign keys on child tables can turn parent updates into long-lasting table locks.
  • Commit promptly. Do not open a transaction, perform a query, and then wait for an external service before committing. Connection pooling does not fix application-level transaction boundaries.
  • Monitor replication slots. An inactive logical replication slot retains WAL and can indirectly pressure the primary, though it does not directly cause lock cascades.

How Netdata helps

Netdata correlates postgres.locks_wait metrics with query duration and connection state charts, showing when lock waits spike relative to active query load. Alerts on sustained lock wait time or a growing count of idle in transaction connections can surface cascades before connection exhaustion.

The pg_stat_activity wait event breakdown shows Lock waits alongside IO and Client waits without manual SQL. Track pg_stat_database.deadlocks rate to flag when contention turns into deadlocks. Use checkpoint and replication context to distinguish lock contention from checkpoint I/O stalls that can look similar in application latency.