MySQL Waiting for table metadata lock: diagnosing the DDL stall

You run SHOW PROCESSLIST and see a wall of threads in Waiting for table metadata lock. An ALTER TABLE that should finish in seconds hangs for minutes. Queries against one table stop returning, the application connection pool drains, and CPU and disk look calm. This is a metadata lock (MDL) stall. It is a queueing failure, not resource exhaustion. Left alone, it cascades into connection exhaustion and a partial outage where every other table continues to work normally.

What this means

MySQL uses metadata locks to protect table definitions. When a transaction accesses a table, it acquires a shared MDL. DDL statements such as ALTER TABLE, DROP TABLE, RENAME TABLE, or TRUNCATE TABLE need an exclusive MDL. If a long-running transaction or an idle connection with an open transaction holds a shared MDL, the DDL waits. Once the DDL is waiting for the exclusive lock, every new DML query against that table queues behind the DDL request. The shared lock from the original transaction blocks the exclusive request, and the exclusive request blocks all subsequent shared requests. Within minutes the queue grows, Threads_connected climbs toward max_connections, and the Questions rate for that table drops to zero.

Metadata locks are separate from InnoDB row locks. They do not appear in SHOW ENGINE INNODB STATUS or data_lock_waits. That is why the stall is invisible to InnoDB lock monitoring, and why operators search for the exact state string Waiting for table metadata lock.

flowchart TD
    T[Idle transaction with open trx] -->|holds SHARED MDL| Tab[Target table]
    DDL[DDL e.g. ALTER TABLE] -->|needs EXCLUSIVE MDL| Tab
    DML[Incoming DML] -->|needs SHARED MDL| Tab
    T -->|blocks| DDL
    DDL -->|blocks| DML

Common causes

CauseWhat it looks likeFirst thing to check
DDL during active loadOne thread running ALTER TABLE, DROP TABLE, RENAME TABLE, or TRUNCATE TABLE; all waiters name the same tableSHOW FULL PROCESSLIST filtered by the table name
Idle transaction holding shared MDLOldest row in INNODB_TRX has trx_query IS NULL and age in minutes or hoursSELECT trx_mysql_thread_id, trx_started, trx_query FROM information_schema.INNODB_TRX ORDER BY trx_started ASC LIMIT 5
Backup or dump with table lockingConnection from a backup host running FLUSH TABLES WITH READ LOCK or LOCK TABLESSHOW FULL PROCESSLIST for a backup user or LOCK TABLES statements
ORM or connection pool leakMultiple application connections in Sleep with high Time values and open transactionsSHOW FULL PROCESSLIST for Command = 'Sleep' from application hosts

Quick checks

# Count metadata lock waiters across all tables
mysql -e "SELECT OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, COUNT(*) AS waiting FROM performance_schema.metadata_locks WHERE LOCK_STATUS = 'PENDING' AND OBJECT_TYPE = 'TABLE' GROUP BY OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE;"
# Compare active connections to the configured limit
mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_connected'; SHOW GLOBAL VARIABLES LIKE 'max_connections';"
# Rule out InnoDB row lock contention
mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_current_waits';"
# Find the oldest open transaction
mysql -e "SELECT trx_mysql_thread_id, trx_started, TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS age_seconds, trx_rows_modified, trx_query FROM information_schema.INNODB_TRX ORDER BY trx_started ASC LIMIT 5;"
# Isolate metadata lock states and DDL in the processlist
mysql -e "SHOW FULL PROCESSLIST" | grep -E "Waiting for table metadata lock|ALTER TABLE|DROP TABLE|RENAME TABLE|TRUNCATE TABLE"
# Sample Questions to check for throughput collapse
mysql -e "SHOW GLOBAL STATUS LIKE 'Questions';"
# Take a second sample after 10 seconds and compute the rate.

How to diagnose it

  1. Confirm the scope. Run SHOW FULL PROCESSLIST. If every waiter names the same table and the state is Waiting for table metadata lock, you have a targeted MDL stall. If many unrelated tables are affected, look for a global read lock from a backup instead.

  2. Find the DDL. Look for a thread whose Info contains ALTER TABLE, DROP TABLE, RENAME TABLE, or TRUNCATE TABLE. Note its Id and Time. If there is no DDL visible, a prior DDL may have timed out and left the queue in a confused state, or the stall is caused by a LOCK TABLES statement.

  3. Find the blocking transaction. Query information_schema.INNODB_TRX ordered by trx_started. The oldest transaction is almost always the blocker. Pay attention to trx_mysql_thread_id and trx_query. If trx_query is NULL, the connection is idle between statements but still holds the lock. This is the most common production scenario.

  4. Map the lock ownership in Performance Schema. Query performance_schema.metadata_locks for rows with LOCK_STATUS = 'GRANTED' and OBJECT_NAME matching the affected table. If this table is empty, the metadata_locks instrument may not be enabled.

  5. Measure the cascade. Compare Threads_running and Threads_connected to max_connections. A rising Threads_connected with a flat or low Threads_running while Questions drops indicates the queue is deepening. Flat Threads_running means work is not progressing; the threads are parked.

  6. Exclude InnoDB row lock contention. Check SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_current_waits';. If this is zero or low while dozens of threads are stuck, the stall is purely at the MDL layer.

  7. Decide who to kill. If the long-running transaction is expendable, kill it. If it belongs to a critical reporting job or an uninterruptible backup, kill the DDL instead and reschedule the schema change. Consider whether killing a modifying transaction requires downstream reconciliation before you run KILL.

Metrics and signals

SignalWhy it mattersWarning sign
performance_schema.metadata_locks PENDING countDirect measure of MDL queue depth on specific objectsMore than 3 sessions pending on the same table for over 30 seconds
SHOW PROCESSLIST state Waiting for table metadata lockImmediate user-facing symptomSustained presence during any DDL window
Threads_runningActive work piling up behind the lockRising steadily while CPU and disk remain normal
Threads_connected / max_connectionsProximity to connection exhaustionRatio above 0.80 and climbing
Questions rateConfirms throughput collapseDrop greater than 50 percent from baseline while clients remain connected
Innodb_row_lock_current_waitsDistinguishes MDL stalls from row lock contentionNear zero despite many stuck threads

Fixes

Kill the DDL. If the schema change is not urgent, identify the DDL thread ID from SHOW FULL PROCESSLIST and run KILL <thread_id>;. If the DDL has not yet acquired the exclusive lock, this removes the request instantly. If it has already acquired the lock and is executing, KILL initiates rollback; relief is not instant on large tables and may hold the MDL until rollback completes. The tradeoff is that the schema change must be reattempted later, ideally during a maintenance window after confirming no long transactions.

Kill the blocking transaction. If the oldest transaction in INNODB_TRX is idle or expendable, run KILL <trx_mysql_thread_id>;. This rolls back uncommitted work and releases the shared MDL. The DDL then proceeds, followed by the queued DML. The tradeoff is application errors for whatever work was in flight, plus any downstream effects if the transaction had already modified data.

Do not kill the waiting queries. Sessions showing Waiting for table metadata lock are victims, not blockers. Killing them frees no locks and can trigger application retry loops that make the queue worse.

Address backup and monitoring connections. If a backup job caused the stall, stop the backup tool or switch to a non-locking method such as --single-transaction without --lock-tables or --master-data if consistency allows. For monitoring tools that open transactional connections, configure them to use READ COMMITTED and to commit or disconnect promptly.

Prevention

  • Inspect INNODB_TRX before DDL. Always check SELECT COUNT(*), MAX(TIMESTAMPDIFF(SECOND, trx_started, NOW())) AS max_age FROM information_schema.INNODB_TRX; before starting an ALTER TABLE. If transactions exist, wait or notify owners.

  • Keep application transactions short. Ensure ORMs and connection pools commit or rollback promptly. An idle connection with an open transaction is the most common blocker.

  • Reserve an admin connection. MySQL reserves one connection above max_connections for users with SUPER or CONNECTION_ADMIN. Ensure your on-call credential has this privilege so you can diagnose even during connection exhaustion.

  • Schedule DDL during low-traffic windows. Even if the DDL itself is online, the exclusive MDL acquisition at the beginning and end can stall if transactions are active.

  • Monitor open transaction age. Alert on any transaction older than a few minutes in OLTP workloads. This catches idle blockers before they cause a cascade.

How Netdata helps

  • A Threads_running spike with flat CPU and disk points to a lock stall rather than resource saturation.
  • Threads_connected against max_connections warns before connection exhaustion.
  • A Questions rate drop during active hours is visible immediately.
  • Innodb_row_lock_current_waits distinguishes MDL stalls from row lock contention.
  • Historical context shows when a DDL began and when the query backlog started forming.