PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns

An ALTER TABLE to add a column or change a type hangs. Application queries time out. Connection pools saturate. What looked like a simple schema change becomes a production incident.

By default, ALTER TABLE acquires an ACCESS EXCLUSIVE lock. It conflicts with every other lock mode, including ACCESS SHARE held by a plain SELECT. Once the DDL statement queues behind a blocker, every subsequent read and write on that table queues behind the waiting DDL. The table goes offline before the ALTER TABLE executes any work.

The outage lasts until the blocker releases its lock plus the time for the DDL to complete. If the blocker is idle in transaction and has been open for hours, the outage extends until the session is terminated or times out.

flowchart TD
    A[Running query holds AccessShareLock] --> B[ALTER TABLE waiting for AccessExclusiveLock]
    B --> C[New queries queue here]
    B --> D[Application timeouts spike]
    E[AccessExclusiveLock blocks all other locks] -.-> A

Common causes

CauseWhat it looks likeFirst thing to check
Long-running query or transaction holding ACCESS SHAREALTER TABLE hangs immediately; application queries were running fine before the DDLpg_stat_activity for old query_start or xact_start on the target relation
idle in transaction session holding a relation lockNo active query visible, but pg_locks shows granted locks on the table; state = 'idle in transaction'pg_stat_activity.state_change age for idle sessions
DDL already queued and causing a lock cascadeMultiple sessions in wait_event_type = 'Lock' on the same relation; normally fast queries are stalledpg_locks for ungranted AccessExclusiveLock and queue depth
Unsafe DDL pattern that could use a weaker-lock alternativeCREATE INDEX without CONCURRENTLY, ADD FOREIGN KEY without NOT VALID, or SET NOT NULL directly on a large tableThe DDL statement itself for missing CONCURRENTLY, NOT VALID, or VALIDATE keywords

Quick checks

Run these in order. The first three are read-only.

-- Check 1: Find ungranted locks and the queries waiting for them
SELECT pid, mode, granted, relation::regclass, query_start, query
FROM pg_locks
JOIN pg_stat_activity USING (pid)
WHERE NOT granted
ORDER BY query_start;
-- Check 2: Identify the blocking PIDs behind Lock waits
SELECT DISTINCT unnest(pg_blocking_pids(pid)) AS blocker_pid
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';
-- Check 3: Details of the blocking sessions
SELECT pid, usename, state, query_start, xact_start, query
FROM pg_stat_activity
WHERE pid = ANY(
  ARRAY(SELECT DISTINCT unnest(pg_blocking_pids(pid)) FROM pg_stat_activity WHERE wait_event_type = 'Lock')
);
-- Check 4: Lock state for a specific table
SELECT mode, granted, pid, query
FROM pg_locks
JOIN pg_stat_activity USING (pid)
WHERE relation = 'my_table'::regclass;
-- Check 5: Current lock timeout (0 means the DDL can wait forever)
SHOW lock_timeout;

How to diagnose it

  1. Confirm the DDL is waiting. Query pg_stat_activity for the ALTER TABLE session. Look for wait_event_type = 'Lock' and a query_start older than the command should take.
  2. Find the blocking session. Use pg_blocking_pids() or join pg_locks with pg_stat_activity. Note the blocker’s state: active, idle in transaction, or idle.
  3. Check transaction age. If xact_start is old but query_start is recent, the transaction has been open a long time and may be holding locks despite appearing idle.
  4. Assess queue depth. Count how many sessions are waiting on the same relation. A long queue means a full lock cascade is in progress.
  5. Decide on intervention. If the blocker is idle in transaction with no valuable work, terminating it is usually safe. If it is a long-running report or backup query, weigh the cost of canceling against the cost of the table remaining locked.
  6. Break the cascade. If the DDL has not yet been granted its lock, cancel the DDL itself to immediately free the queue. Retry later with a safer pattern or a short lock_timeout.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pg_stat_activity with wait_event_type = 'Lock' on the target relationDirect indicator of lock contentionSustained Lock wait > 30 seconds
pg_locks.granted = false with mode = AccessExclusiveLockShows DDL is queued and blocking subsequent trafficAny ungranted AccessExclusiveLock on a production table
pg_stat_activity.state = 'idle in transaction' with old state_changeIdle transactions hold locks and block vacuum and DDLSessions idle in transaction > 5 minutes
pg_stat_database.deadlocksDeadlocks can form when lock queues interact with concurrent updatesIncrementing deadlocks during DDL windows
Query latency p99 on the tableLock cascades manifest as application timeoutsp99 latency > 10x baseline during schema changes

Fixes

Terminate the blocker

If the blocker is idle in transaction and has been idle longer than your application tolerance, terminate it. This kills that session and breaks the logjam.

-- Terminate an idle-in-transaction blocker
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = <blocker_pid>
  AND state = 'idle in transaction';

If the blocker is an active long-running query, try pg_cancel_backend first. Use pg_terminate_backend only if cancel does not respond.

Cancel the queued DDL to stop the cascade

If the ALTER TABLE is still waiting, cancel it. The lock queue behind it will clear, and normal traffic resumes. Retry the operation during a quieter window or with a safer pattern.

-- Cancel the waiting DDL
SELECT pg_cancel_backend(pid)
FROM pg_stat_activity
WHERE query LIKE '%ALTER TABLE%'
  AND wait_event_type = 'Lock';

Use lock_timeout to fail fast

Before running any DDL, set a short lock_timeout so the statement aborts rather than camping in the queue.

-- Fail DDL if lock is not available in 2 seconds
SET LOCAL lock_timeout = '2s';
ALTER TABLE ...

When the timeout fires, PostgreSQL returns an error and releases the queue position. Retry with exponential backoff from your migration tool.

Replace plain CREATE INDEX with CONCURRENTLY

Plain CREATE INDEX acquires SHARE lock, which blocks writes. CREATE INDEX CONCURRENTLY uses SHARE UPDATE EXCLUSIVE, allowing reads and writes to continue. It takes longer and uses more CPU, but it does not block traffic. It cannot run inside a transaction block.

CREATE INDEX CONCURRENTLY idx_name ON table_name(column_name);

Split constraint addition into NOT VALID and VALIDATE

Adding a CHECK or FOREIGN KEY constraint directly performs a full table scan under ACCESS EXCLUSIVE. Split the operation:

-- Step 1: Add without validating (brief lock, metadata-only change)
ALTER TABLE orders ADD CONSTRAINT check_positive CHECK (amount > 0) NOT VALID;

-- Step 2: Validate in a separate statement (weaker lock, allows reads and writes)
ALTER TABLE orders VALIDATE CONSTRAINT check_positive;

The VALIDATE step acquires SHARE UPDATE EXCLUSIVE, not ACCESS EXCLUSIVE.

Use expand-contract for hazardous changes

For column type changes, renames, or backfill requirements, use the expand-contract pattern. Add the new column or table alongside the old one, backfill asynchronously, switch application logic, then drop the old artifact. This avoids table rewrites and long-lived exclusive locks.

Use online rewrite tools for large tables

When PostgreSQL’s built-in options are insufficient, use pg_repack for online table reorganization. It builds a shadow table, replays changes, and swaps the tables with only a brief ACCESS EXCLUSIVE lock at the end. It requires a primary key or unique NOT NULL index and approximately 2x disk space. PgOnlineSchemaChange (pg-osc) is another option for minimal-locking schema changes on large tables.

Prevention

  • Review DDL in CI. Flag any migration that runs ALTER TABLE without CONCURRENTLY, NOT VALID, or VALIDATE.
  • Set lock_timeout by default. Never run production DDL with an indefinite lock wait. A 1-5 second timeout turns a queue camp into a fast retry.
  • Schedule heavy DDL in maintenance windows. Run risky changes when long-running reports and analytic queries are not active.
  • Prevent idle-in-transaction blockers. Set idle_in_transaction_session_timeout to a reasonable value so zombie sessions cannot hold locks for hours.
  • Maintain a runbook. Document which application PIDs and users are safe to terminate, and who must approve a blocker kill during business hours.

How Netdata helps

  • Netdata collects pg_stat_activity state breakdowns, including Lock waits and idle in transaction sessions, so you can spot blocker patterns without querying the system catalogs.
  • Per-table query latency percentiles help confirm whether a migration caused a traffic stall and whether recovery worked.
  • Correlating pg_stat_database.deadlocks with deployment timestamps exposes schema changes that introduced contention.
  • Connection state charts show the moment a DDL queue consumes available connection headroom, before the application returns FATAL: too many connections errors.