MySQL online DDL still blocking: ALGORITHM, LOCK, and the copy phase
An ALTER TABLE ... ALGORITHM=INPLACE, LOCK=NONE does not guarantee zero disruption. “Online” means selected algorithms permit concurrent DML during the main execution phase, not that the operation is lock-free. Every DDL still transitions through metadata locks (MDL). A long-running transaction holding a shared MDL can stall phase 1 or phase 3, causing new DML to queue behind the DDL. This is the metadata lock cascade.
Operators often assume LOCK=NONE means no blocking. In reality, NONE refers to DML concurrency during phase 2; it does not remove the brief exclusive MDL at commit, and it does not help if phase 1 cannot acquire the initial shared upgradeable lock. When a DDL waits, all subsequent queries on that table queue behind it. On a busy system the connection pool drains within minutes while other tables continue to work, making the root cause non-obvious.
What it is and why it matters
MySQL selects an algorithm based on the operation type and version. Always specify ALGORITHM explicitly; if the server cannot honor it, the statement returns an error rather than silently degrading to a more locking-intensive method.
ALGORITHM=INSTANT modifies only the data dictionary. No rows are copied, and concurrent DML proceeds.
ALGORITHM=INPLACE rebuilds the table internally without creating a full user-visible copy, though it may still rebuild secondary indexes or reorder rows.
ALGORITHM=COPY creates a temporary table, copies every row, and swaps. DML is blocked for the entire duration.
The LOCK clause controls concurrency independently. LOCK=NONE permits concurrent queries and DML. LOCK=SHARED permits queries but blocks DML. LOCK=EXCLUSIVE blocks everything. LOCK=DEFAULT lets MySQL choose the least restrictive level supported. Omitting LOCK is equivalent to LOCK=DEFAULT.
The critical operational point is that INPLACE with LOCK=NONE still requires a brief exclusive MDL at the beginning and end of the operation. The MySQL 8.4 manual notes that an exclusive metadata lock may be taken briefly during execution and is upgraded to exclusive at commit regardless of locking mode. If any session holds a shared MDL on the table, the DDL waits. All new DML then queues behind the DDL’s exclusive request.
How it works
Every online DDL passes through three phases.
Phase 1, initialization, acquires a shared upgradeable metadata lock. MySQL checks feasibility of the requested algorithm and lock level. If another transaction already holds a shared MDL, this phase waits.
Phase 2, execution, performs the schema change. For INPLACE with LOCK=NONE, InnoDB allows concurrent DML by logging changes to an online alter log. If the log exceeds innodb_online_alter_log_max_size, the DDL fails and must be restarted. For INSTANT, only the data dictionary changes. For COPY, there is no concurrent window; DML is blocked throughout.
Phase 3, commit table definition, upgrades the MDL to exclusive. The old table definition is evicted and the new one committed. This exclusive lock is brief, but real. For INPLACE operations that rebuilt the table, InnoDB replays the online alter log during this window. A large innodb_online_alter_log_max_size extends the final exclusive-MDL window because more concurrent DML must be merged.
Under ALGORITHM=INSTANT, only LOCK=DEFAULT is permitted; explicit LOCK=NONE or SHARED produces an error.
flowchart TD
A[Client submits ALTER TABLE] --> B[Phase 1: Initialization
Shared upgradeable MDL]
B --> C{Conflicting shared MDL?}
C -->|Yes| D[DDL waits]
C -->|No| E[Phase 2: Execution
Concurrent DML permitted]
D --> F[New DML queues
behind DDL]
F --> G[Metadata lock cascade]
E --> H[Phase 3: Commit
Brief exclusive MDL]
H --> I[New table definition visible]
E -.->|If commit blocked| DWhere it shows up in production
In production the brief exclusive MDL at commit is normally sub-second and invisible. It becomes visible when a long-running transaction holds a shared MDL. Common culprits are an explicit BEGIN in an interactive session, a reporting query under REPEATABLE READ, a monitoring connection with autocommit=0, or a backup using --single-transaction. The DDL queues behind it. Because the DDL requests an exclusive MDL, every new query on that table queues behind the DDL. The application sees timeouts. Threads_connected climbs toward max_connections. Other tables continue to work, which makes diagnosis confusing.
Identify the blocker before acting:
SELECT thr.processlist_id,
thr.processlist_info AS current_query,
ml.lock_type,
ml.lock_duration
FROM performance_schema.metadata_locks ml
JOIN performance_schema.threads thr ON ml.owner_thread_id = thr.thread_id
WHERE ml.object_name = 'your_table'
AND ml.object_type = 'TABLE'
AND ml.lock_status = 'GRANTED';
If performance_schema is unavailable, SHOW PROCESSLIST showing multiple Waiting for table metadata lock states on the same table confirms the cascade is forming.
External tools are not exempt. pt-online-schema-change creates triggers, copies rows in chunks, and swaps tables. It requires exclusive access during trigger creation and the final rename. It cannot run on tables that already have triggers.
gh-ost streams row-based binlog events instead of using triggers, so it works on tables with existing triggers. It also holds MDL during the copy phase and the final cut-over. Both tools avoid locking the table for the full duration of a COPY alter, but they do not eliminate MDL contention.
Tradeoffs and when to use it
Use ALGORITHM=INSTANT whenever the operation supports it. In MySQL 8.4, this is the default for supported operations such as adding or dropping a column, or renaming a column. It is the least intrusive option. Each INSTANT add or drop column increments TOTAL_ROW_VERSIONS in INFORMATION_SCHEMA.INNODB_TABLES. After 64 instant operations, an explicit ALGORITHM=INSTANT request returns an error; if you omit ALGORITHM, MySQL falls back to INPLACE or COPY. Monitor the counter before running DDL:
SELECT NAME, TOTAL_ROW_VERSIONS
FROM INFORMATION_SCHEMA.INNODB_TABLES
WHERE TOTAL_ROW_VERSIONS > 0;
Rebuild the table with OPTIMIZE TABLE or ALTER TABLE ... ENGINE=InnoDB to reset the counter. Both rebuild the table; schedule them during low traffic.
Use ALGORITHM=INPLACE with LOCK=NONE for operations that INSTANT cannot handle, such as adding a secondary index or changing ROW_FORMAT. Rebuilds consume disk space equal to the original table size in the InnoDB temporary directory. The final exclusive-MDL window grows with the volume of concurrent DML logged during execution. If write volume is high, consider throttling DML or increasing innodb_online_alter_log_max_size before the alter.
Use ALGORITHM=COPY only when you require the most compatible behavior or when INPLACE is not supported. It is fully blocking.
Use pt-online-schema-change or gh-ost when the table is very large, the operation is not supported online, or you cannot tolerate even the brief exclusive MDL of an INPLACE alter. These tools add operational complexity: trigger overhead for pt-osc, binlog parsing overhead for gh-ost, and MDL timeout risk during cut-over on both. Both also generate replication traffic; watch replica lag.
Setting old_alter_table=1 forces COPY behavior and bypasses online DDL entirely. Avoid this in production unless explicitly required for compatibility.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
performance_schema.metadata_locks pending counts | Early indicator of a cascade developing | >3 sessions in PENDING state on the same table, sustained >30 s |
SHOW PROCESSLIST state Waiting for table metadata lock | Confirms DDL is blocked and queue is forming | Multiple threads on the same table with this state |
Threads_running vs Questions rate | Throughput collapse while load persists | Threads_running rises while Questions rate drops sharply |
Open transaction age (INNODB_TRX) | Finds the long-running transaction holding shared MDL | Transaction >5 min in OLTP; idle trx_query IS NULL with trx_rows_modified > 0 |
Connection utilization (Threads_connected / max_connections) | Cascade consumes connection pool | Ratio >0.95 sustained for >5 min |
trx_rseg_history_len (history list) | Long transactions may also block purge | Growing history list with an old idle transaction |
How Netdata helps
Netdata surfaces Threads_running, Threads_connected, and Questions in real time. Correlate a flat or dropping
[OUTPUT TRUNCATED: Response exceeded output token limit.]







