PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns
An ALTER TABLE to add a column or change a type hangs. Application queries time out. Connection pools saturate. What looked like a simple schema change becomes a production incident.
By default, ALTER TABLE acquires an ACCESS EXCLUSIVE lock. It conflicts with every other lock mode, including ACCESS SHARE held by a plain SELECT. Once the DDL statement queues behind a blocker, every subsequent read and write on that table queues behind the waiting DDL. The table goes offline before the ALTER TABLE executes any work.
The outage lasts until the blocker releases its lock plus the time for the DDL to complete. If the blocker is idle in transaction and has been open for hours, the outage extends until the session is terminated or times out.
flowchart TD
A[Running query holds AccessShareLock] --> B[ALTER TABLE waiting for AccessExclusiveLock]
B --> C[New queries queue here]
B --> D[Application timeouts spike]
E[AccessExclusiveLock blocks all other locks] -.-> ACommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
Long-running query or transaction holding ACCESS SHARE | ALTER TABLE hangs immediately; application queries were running fine before the DDL | pg_stat_activity for old query_start or xact_start on the target relation |
idle in transaction session holding a relation lock | No active query visible, but pg_locks shows granted locks on the table; state = 'idle in transaction' | pg_stat_activity.state_change age for idle sessions |
| DDL already queued and causing a lock cascade | Multiple sessions in wait_event_type = 'Lock' on the same relation; normally fast queries are stalled | pg_locks for ungranted AccessExclusiveLock and queue depth |
| Unsafe DDL pattern that could use a weaker-lock alternative | CREATE INDEX without CONCURRENTLY, ADD FOREIGN KEY without NOT VALID, or SET NOT NULL directly on a large table | The DDL statement itself for missing CONCURRENTLY, NOT VALID, or VALIDATE keywords |
Quick checks
Run these in order. The first three are read-only.
-- Check 1: Find ungranted locks and the queries waiting for them
SELECT pid, mode, granted, relation::regclass, query_start, query
FROM pg_locks
JOIN pg_stat_activity USING (pid)
WHERE NOT granted
ORDER BY query_start;
-- Check 2: Identify the blocking PIDs behind Lock waits
SELECT DISTINCT unnest(pg_blocking_pids(pid)) AS blocker_pid
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';
-- Check 3: Details of the blocking sessions
SELECT pid, usename, state, query_start, xact_start, query
FROM pg_stat_activity
WHERE pid = ANY(
ARRAY(SELECT DISTINCT unnest(pg_blocking_pids(pid)) FROM pg_stat_activity WHERE wait_event_type = 'Lock')
);
-- Check 4: Lock state for a specific table
SELECT mode, granted, pid, query
FROM pg_locks
JOIN pg_stat_activity USING (pid)
WHERE relation = 'my_table'::regclass;
-- Check 5: Current lock timeout (0 means the DDL can wait forever)
SHOW lock_timeout;
How to diagnose it
- Confirm the DDL is waiting. Query
pg_stat_activityfor theALTER TABLEsession. Look forwait_event_type = 'Lock'and aquery_startolder than the command should take. - Find the blocking session. Use
pg_blocking_pids()or joinpg_lockswithpg_stat_activity. Note the blocker’sstate:active,idle in transaction, oridle. - Check transaction age. If
xact_startis old butquery_startis recent, the transaction has been open a long time and may be holding locks despite appearing idle. - Assess queue depth. Count how many sessions are waiting on the same relation. A long queue means a full lock cascade is in progress.
- Decide on intervention. If the blocker is
idle in transactionwith no valuable work, terminating it is usually safe. If it is a long-running report or backup query, weigh the cost of canceling against the cost of the table remaining locked. - Break the cascade. If the DDL has not yet been granted its lock, cancel the DDL itself to immediately free the queue. Retry later with a safer pattern or a short
lock_timeout.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pg_stat_activity with wait_event_type = 'Lock' on the target relation | Direct indicator of lock contention | Sustained Lock wait > 30 seconds |
pg_locks.granted = false with mode = AccessExclusiveLock | Shows DDL is queued and blocking subsequent traffic | Any ungranted AccessExclusiveLock on a production table |
pg_stat_activity.state = 'idle in transaction' with old state_change | Idle transactions hold locks and block vacuum and DDL | Sessions idle in transaction > 5 minutes |
pg_stat_database.deadlocks | Deadlocks can form when lock queues interact with concurrent updates | Incrementing deadlocks during DDL windows |
| Query latency p99 on the table | Lock cascades manifest as application timeouts | p99 latency > 10x baseline during schema changes |
Fixes
Terminate the blocker
If the blocker is idle in transaction and has been idle longer than your application tolerance, terminate it. This kills that session and breaks the logjam.
-- Terminate an idle-in-transaction blocker
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = <blocker_pid>
AND state = 'idle in transaction';
If the blocker is an active long-running query, try pg_cancel_backend first. Use pg_terminate_backend only if cancel does not respond.
Cancel the queued DDL to stop the cascade
If the ALTER TABLE is still waiting, cancel it. The lock queue behind it will clear, and normal traffic resumes. Retry the operation during a quieter window or with a safer pattern.
-- Cancel the waiting DDL
SELECT pg_cancel_backend(pid)
FROM pg_stat_activity
WHERE query LIKE '%ALTER TABLE%'
AND wait_event_type = 'Lock';
Use lock_timeout to fail fast
Before running any DDL, set a short lock_timeout so the statement aborts rather than camping in the queue.
-- Fail DDL if lock is not available in 2 seconds
SET LOCAL lock_timeout = '2s';
ALTER TABLE ...
When the timeout fires, PostgreSQL returns an error and releases the queue position. Retry with exponential backoff from your migration tool.
Replace plain CREATE INDEX with CONCURRENTLY
Plain CREATE INDEX acquires SHARE lock, which blocks writes. CREATE INDEX CONCURRENTLY uses SHARE UPDATE EXCLUSIVE, allowing reads and writes to continue. It takes longer and uses more CPU, but it does not block traffic. It cannot run inside a transaction block.
CREATE INDEX CONCURRENTLY idx_name ON table_name(column_name);
Split constraint addition into NOT VALID and VALIDATE
Adding a CHECK or FOREIGN KEY constraint directly performs a full table scan under ACCESS EXCLUSIVE. Split the operation:
-- Step 1: Add without validating (brief lock, metadata-only change)
ALTER TABLE orders ADD CONSTRAINT check_positive CHECK (amount > 0) NOT VALID;
-- Step 2: Validate in a separate statement (weaker lock, allows reads and writes)
ALTER TABLE orders VALIDATE CONSTRAINT check_positive;
The VALIDATE step acquires SHARE UPDATE EXCLUSIVE, not ACCESS EXCLUSIVE.
Use expand-contract for hazardous changes
For column type changes, renames, or backfill requirements, use the expand-contract pattern. Add the new column or table alongside the old one, backfill asynchronously, switch application logic, then drop the old artifact. This avoids table rewrites and long-lived exclusive locks.
Use online rewrite tools for large tables
When PostgreSQL’s built-in options are insufficient, use pg_repack for online table reorganization. It builds a shadow table, replays changes, and swaps the tables with only a brief ACCESS EXCLUSIVE lock at the end. It requires a primary key or unique NOT NULL index and approximately 2x disk space. PgOnlineSchemaChange (pg-osc) is another option for minimal-locking schema changes on large tables.
Prevention
- Review DDL in CI. Flag any migration that runs
ALTER TABLEwithoutCONCURRENTLY,NOT VALID, orVALIDATE. - Set lock_timeout by default. Never run production DDL with an indefinite lock wait. A 1-5 second timeout turns a queue camp into a fast retry.
- Schedule heavy DDL in maintenance windows. Run risky changes when long-running reports and analytic queries are not active.
- Prevent idle-in-transaction blockers. Set
idle_in_transaction_session_timeoutto a reasonable value so zombie sessions cannot hold locks for hours. - Maintain a runbook. Document which application PIDs and users are safe to terminate, and who must approve a blocker kill during business hours.
How Netdata helps
- Netdata collects
pg_stat_activitystate breakdowns, includingLockwaits andidle in transactionsessions, so you can spot blocker patterns without querying the system catalogs. - Per-table query latency percentiles help confirm whether a migration caused a traffic stall and whether recovery worked.
- Correlating
pg_stat_database.deadlockswith deployment timestamps exposes schema changes that introduced contention. - Connection state charts show the moment a DDL queue consumes available connection headroom, before the application returns
FATAL: too many connectionserrors.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis
- PostgreSQL deadlock detected: how to diagnose and prevent deadlocks
- PostgreSQL idle in transaction: detecting and killing zombie sessions
- PostgreSQL ERROR: could not obtain lock - diagnosis and recovery
- PostgreSQL monitoring checklist: the signals every production database needs
- PostgreSQL monitoring maturity model: from reactive to self-healing
- PgBouncer pool exhausted: how to diagnose and fix client waits
- PgBouncer vs Pgpool-II vs Odyssey: choosing a PostgreSQL connection pooler
- PostgreSQL FATAL: too many connections - causes and fixes






