MySQL metadata lock cascade: how one ALTER TABLE freezes a whole table
You run an ALTER TABLE to add an index. Seconds later, health checks fail for queries that touch only that table. Other tables work fine. CPU and disk are idle. The connection pool fills. SHOW ENGINE INNODB STATUS shows no row lock waits. The culprit is a metadata lock cascade, not InnoDB contention.
This happens when DDL requests an exclusive metadata lock (MDL) on a table already protected by a shared MDL held by a long-running or idle transaction. The DDL waits. Subsequent DML on that table queues behind it. The queue grows until the connection pool exhausts and the application times out. Because the outage is isolated to one table and leaves no trace in InnoDB lock metrics, it is easy to misdiagnose as a network blip or a runaway query.
This guide covers formation, diagnosis, and safe remediation.
What this means
MySQL uses metadata locks (MDL) to protect table definitions. When a transaction accesses a table, it acquires a shared MDL. DDL statements such as ALTER TABLE require an exclusive MDL. If a transaction is still open, its shared lock blocks the exclusive request.
Once DDL queues for the exclusive lock, every new DML statement on that table also needs a shared lock. Because the DDL request is already ahead of them in the queue, and because MySQL orders lock requests to prevent writer starvation, the DML sessions pile up behind the DDL. The original transaction may be completely idle, perhaps with trx_query IS NULL, but its MDL is held until the transaction commits or rolls back.
The cascade is contained: queries against other tables continue normally. This partial outage signature is one of the strongest diagnostic clues.
flowchart TD
A[Long-running transaction holds SHARED MDL on table T] --> B[ALTER TABLE requests EXCLUSIVE MDL on T]
B --> C[ALTER enters Waiting for table metadata lock]
C --> D[New DML on T requests SHARED MDL]
D --> E[DML queues behind ALTER]
E --> F[Connection pool fills]
F --> G[Partial outage on table T only]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
ALTER TABLE during peak traffic | SHOW FULL PROCESSLIST shows the ALTER with state Waiting for table metadata lock | The query text in the Info column |
| Idle transaction holding MDL | INNODB_TRX shows an old transaction with trx_query IS NULL | SELECT * FROM information_schema.INNODB_TRX ORDER BY trx_started ASC LIMIT 1 |
| Monitoring or backup tool with open transaction | A connection from a monitoring host has been open for minutes or hours | SHOW FULL PROCESSLIST Host column mapped to INNODB_TRX |
| ORM auto-begin without commit | Application connections are in Sleep state but still hold open transactions | trx_mysql_thread_id from INNODB_TRX matched to PROCESSLIST.ID |
Quick checks
Run these read-only queries to confirm the pattern.
-- Count pending metadata locks per object
SELECT OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS, COUNT(*) AS waiting
FROM performance_schema.metadata_locks
WHERE LOCK_STATUS = 'PENDING'
GROUP BY OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS;
-- See who is waiting and what they are running
SHOW FULL PROCESSLIST;
-- Find the oldest open transaction (likely the blocker)
SELECT trx_id, trx_mysql_thread_id, trx_started, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC LIMIT 1;
-- Confirm this is NOT an InnoDB row lock problem
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_current_waits';
-- Check connection saturation
SHOW GLOBAL STATUS LIKE 'Threads_connected';
SHOW GLOBAL VARIABLES LIKE 'max_connections';
-- Check active execution queue
SHOW GLOBAL STATUS LIKE 'Threads_running';
-- Sample throughput: take two readings 10 seconds apart
SHOW GLOBAL STATUS LIKE 'Questions';
How to diagnose it
Confirm the scope. Run
SHOW FULL PROCESSLIST. If many threads showWaiting for table metadata lockand they all reference the same table, while queries against other tables complete normally, you are looking at an MDL cascade.Identify the DDL. Look for an
ALTER TABLE,DROP TABLE,RENAME TABLE, orTRUNCATE TABLEin the processlist. It will usually have a highTimevalue and stateWaiting for table metadata lock. Note itsId.Identify the blocker. Query
information_schema.INNODB_TRXordered bytrx_started. The oldest transaction is almost always the one holding the shared metadata lock. Note itstrx_mysql_thread_id. Cross-reference withSHOW FULL PROCESSLISTto see connection details. The blocker may show no active query because it is idle between statements.Rule out InnoDB row contention. Verify that
Innodb_row_lock_current_waitsis zero or near zero. If it is elevated, you may have a composite problem or a misdiagnosis.Assess cascade severity. Check whether
Threads_connectedis rising towardmax_connectionsand whether theQuestionsrate has dropped sharply from baseline. If both are true, the cascade is active. You may have seconds to minutes before connection exhaustion causes a wider outage.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
performance_schema.metadata_locks with LOCK_STATUS = 'PENDING' | This is the only place MDL waits are visible before the cascade spreads | More than 3 sessions pending on the same object for longer than 30 seconds |
Threads_running | Measures active query concurrency, not idle connections | Rising steadily while the workload appears to stall |
Threads_connected / max_connections | Connection pool saturation | Ratio above 0.90 during an incident |
Questions rate | Throughput collapse | Drop greater than 50% from baseline while active clients remain present |
Innodb_row_lock_current_waits | Distinguishes MDL from InnoDB row contention | Stays near zero during an MDL cascade |
Fixes
Kill the DDL
If the schema change is not urgent, kill the DDL session. This immediately removes the exclusive lock request. Every DML session queued behind it can then acquire its shared lock and resume.
KILL <ddl_processlist_id>;
Warning: This aborts the schema change. Even ALGORITHM=INSTANT requires an exclusive MDL before execution, so it can still seed a cascade. If the DDL had already begun copying data, rollback duration varies by algorithm and table size. In most cases where the DDL was stuck waiting for the MDL, the main phase had not started, so rollback is fast. Verify application recovery before retrying.
Kill the blocking transaction
If the oldest transaction in INNODB_TRX is idle, unimportant, or abandoned, kill it. This releases the shared metadata lock and allows the DDL to proceed.
KILL <blocker_thread_id>;
Warning: Rolling back a long-running transaction can be expensive if it has modified many rows. Check trx_rows_modified in INNODB_TRX before killing. A large value means a brief I/O spike as undo is applied. If the transaction has held a REPEATABLE READ snapshot for a long time, history list length may also lag after rollback. Do not kill replication threads or known backup processes unless you are prepared to restart them.
If you are using an online schema change tool
Tools such as pt-online-schema-change or gh-ost can also trigger MDL waits during their cutover or copy phases. If the tool is holding or waiting for a metadata lock, consult its specific pause or throttle controls rather than killing the MySQL connection directly. Killing the connection may leave a temporary table or trigger behind.
Prevention
- Inspect
INNODB_TRXbefore scheduling DDL. If any transaction is older than a few seconds on the target table, wait or kill it first. - Run DDL during low-traffic windows. Even a briefly idle transaction is enough to seed a cascade during peak load.
- Ensure monitoring and backup tools do not leave transactions open. Explicitly commit or close connections after checks. Avoid
REPEATABLE READsnapshot sessions unless necessary. - Review ORM transaction boundaries. Disable auto-begin if the application issues a
BEGINand then waits for user input or external API calls before committing. - Reserve admin connections. MySQL reserves one connection above
max_connectionsfor users withSUPERorCONNECTION_ADMIN. Ensure your on-call credential has this privilege so you can diagnose even during connection exhaustion.
How Netdata helps
- Correlates
Threads_connected,Threads_running, andQuestionsrate so you can see the cascade form. - Tracks
mysql.connection_errors_max_connectionsto detect when the server has begun rejecting new sessions. - Surfaces
Innodb_row_lock_current_waitsto help rule out InnoDB row contention during diagnosis. - Provides pre-built alerts on
Threads_connectedapproachingmax_connections, giving early warning before the pool exhausts.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL connection exhaustion: detection, diagnosis, and prevention
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction
- MySQL gap locks and next-key locks: surprising deadlocks under REPEATABLE READ
- MySQL ERROR 1205: Lock wait timeout exceeded; try restarting transaction
- MySQL: got a packet bigger than ‘max_allowed_packet’ bytes - causes and fixes
- MySQL monitoring checklist: the signals every production instance needs
- MySQL monitoring maturity model: from survival to expert
- MySQL InnoDB row lock contention: finding who blocks whom
- MySQL Threads_created climbing: thread cache churn and missing pooling
- MySQL Threads_connected vs Threads_running: which one to actually alert on







