MySQL metadata lock cascade: how one ALTER TABLE freezes a whole table

You run an ALTER TABLE to add an index. Seconds later, health checks fail for queries that touch only that table. Other tables work fine. CPU and disk are idle. The connection pool fills. SHOW ENGINE INNODB STATUS shows no row lock waits. The culprit is a metadata lock cascade, not InnoDB contention.

This happens when DDL requests an exclusive metadata lock (MDL) on a table already protected by a shared MDL held by a long-running or idle transaction. The DDL waits. Subsequent DML on that table queues behind it. The queue grows until the connection pool exhausts and the application times out. Because the outage is isolated to one table and leaves no trace in InnoDB lock metrics, it is easy to misdiagnose as a network blip or a runaway query.

This guide covers formation, diagnosis, and safe remediation.

What this means

MySQL uses metadata locks (MDL) to protect table definitions. When a transaction accesses a table, it acquires a shared MDL. DDL statements such as ALTER TABLE require an exclusive MDL. If a transaction is still open, its shared lock blocks the exclusive request.

Once DDL queues for the exclusive lock, every new DML statement on that table also needs a shared lock. Because the DDL request is already ahead of them in the queue, and because MySQL orders lock requests to prevent writer starvation, the DML sessions pile up behind the DDL. The original transaction may be completely idle, perhaps with trx_query IS NULL, but its MDL is held until the transaction commits or rolls back.

The cascade is contained: queries against other tables continue normally. This partial outage signature is one of the strongest diagnostic clues.

flowchart TD
    A[Long-running transaction holds SHARED MDL on table T] --> B[ALTER TABLE requests EXCLUSIVE MDL on T]
    B --> C[ALTER enters Waiting for table metadata lock]
    C --> D[New DML on T requests SHARED MDL]
    D --> E[DML queues behind ALTER]
    E --> F[Connection pool fills]
    F --> G[Partial outage on table T only]

Common causes

CauseWhat it looks likeFirst thing to check
ALTER TABLE during peak trafficSHOW FULL PROCESSLIST shows the ALTER with state Waiting for table metadata lockThe query text in the Info column
Idle transaction holding MDLINNODB_TRX shows an old transaction with trx_query IS NULLSELECT * FROM information_schema.INNODB_TRX ORDER BY trx_started ASC LIMIT 1
Monitoring or backup tool with open transactionA connection from a monitoring host has been open for minutes or hoursSHOW FULL PROCESSLIST Host column mapped to INNODB_TRX
ORM auto-begin without commitApplication connections are in Sleep state but still hold open transactionstrx_mysql_thread_id from INNODB_TRX matched to PROCESSLIST.ID

Quick checks

Run these read-only queries to confirm the pattern.

-- Count pending metadata locks per object
SELECT OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS, COUNT(*) AS waiting
FROM performance_schema.metadata_locks
WHERE LOCK_STATUS = 'PENDING'
GROUP BY OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS;
-- See who is waiting and what they are running
SHOW FULL PROCESSLIST;
-- Find the oldest open transaction (likely the blocker)
SELECT trx_id, trx_mysql_thread_id, trx_started, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC LIMIT 1;
-- Confirm this is NOT an InnoDB row lock problem
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_current_waits';
-- Check connection saturation
SHOW GLOBAL STATUS LIKE 'Threads_connected';
SHOW GLOBAL VARIABLES LIKE 'max_connections';
-- Check active execution queue
SHOW GLOBAL STATUS LIKE 'Threads_running';
-- Sample throughput: take two readings 10 seconds apart
SHOW GLOBAL STATUS LIKE 'Questions';

How to diagnose it

  1. Confirm the scope. Run SHOW FULL PROCESSLIST. If many threads show Waiting for table metadata lock and they all reference the same table, while queries against other tables complete normally, you are looking at an MDL cascade.

  2. Identify the DDL. Look for an ALTER TABLE, DROP TABLE, RENAME TABLE, or TRUNCATE TABLE in the processlist. It will usually have a high Time value and state Waiting for table metadata lock. Note its Id.

  3. Identify the blocker. Query information_schema.INNODB_TRX ordered by trx_started. The oldest transaction is almost always the one holding the shared metadata lock. Note its trx_mysql_thread_id. Cross-reference with SHOW FULL PROCESSLIST to see connection details. The blocker may show no active query because it is idle between statements.

  4. Rule out InnoDB row contention. Verify that Innodb_row_lock_current_waits is zero or near zero. If it is elevated, you may have a composite problem or a misdiagnosis.

  5. Assess cascade severity. Check whether Threads_connected is rising toward max_connections and whether the Questions rate has dropped sharply from baseline. If both are true, the cascade is active. You may have seconds to minutes before connection exhaustion causes a wider outage.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
performance_schema.metadata_locks with LOCK_STATUS = 'PENDING'This is the only place MDL waits are visible before the cascade spreadsMore than 3 sessions pending on the same object for longer than 30 seconds
Threads_runningMeasures active query concurrency, not idle connectionsRising steadily while the workload appears to stall
Threads_connected / max_connectionsConnection pool saturationRatio above 0.90 during an incident
Questions rateThroughput collapseDrop greater than 50% from baseline while active clients remain present
Innodb_row_lock_current_waitsDistinguishes MDL from InnoDB row contentionStays near zero during an MDL cascade

Fixes

Kill the DDL

If the schema change is not urgent, kill the DDL session. This immediately removes the exclusive lock request. Every DML session queued behind it can then acquire its shared lock and resume.

KILL <ddl_processlist_id>;

Warning: This aborts the schema change. Even ALGORITHM=INSTANT requires an exclusive MDL before execution, so it can still seed a cascade. If the DDL had already begun copying data, rollback duration varies by algorithm and table size. In most cases where the DDL was stuck waiting for the MDL, the main phase had not started, so rollback is fast. Verify application recovery before retrying.

Kill the blocking transaction

If the oldest transaction in INNODB_TRX is idle, unimportant, or abandoned, kill it. This releases the shared metadata lock and allows the DDL to proceed.

KILL <blocker_thread_id>;

Warning: Rolling back a long-running transaction can be expensive if it has modified many rows. Check trx_rows_modified in INNODB_TRX before killing. A large value means a brief I/O spike as undo is applied. If the transaction has held a REPEATABLE READ snapshot for a long time, history list length may also lag after rollback. Do not kill replication threads or known backup processes unless you are prepared to restart them.

If you are using an online schema change tool

Tools such as pt-online-schema-change or gh-ost can also trigger MDL waits during their cutover or copy phases. If the tool is holding or waiting for a metadata lock, consult its specific pause or throttle controls rather than killing the MySQL connection directly. Killing the connection may leave a temporary table or trigger behind.

Prevention

  • Inspect INNODB_TRX before scheduling DDL. If any transaction is older than a few seconds on the target table, wait or kill it first.
  • Run DDL during low-traffic windows. Even a briefly idle transaction is enough to seed a cascade during peak load.
  • Ensure monitoring and backup tools do not leave transactions open. Explicitly commit or close connections after checks. Avoid REPEATABLE READ snapshot sessions unless necessary.
  • Review ORM transaction boundaries. Disable auto-begin if the application issues a BEGIN and then waits for user input or external API calls before committing.
  • Reserve admin connections. MySQL reserves one connection above max_connections for users with SUPER or CONNECTION_ADMIN. Ensure your on-call credential has this privilege so you can diagnose even during connection exhaustion.

How Netdata helps

  • Correlates Threads_connected, Threads_running, and Questions rate so you can see the cascade form.
  • Tracks mysql.connection_errors_max_connections to detect when the server has begun rejecting new sessions.
  • Surfaces Innodb_row_lock_current_waits to help rule out InnoDB row contention during diagnosis.
  • Provides pre-built alerts on Threads_connected approaching max_connections, giving early warning before the pool exhausts.