MySQL long-running transactions: detecting and killing the silent blocker
Threads_running climbs. A DDL operation that should take seconds is stuck for minutes. Storage grows steadily with no matching data increase. The cause is often a single idle transaction in INNODB_TRX, holding row locks and an MVCC read view long after its last query finished. It does not appear in the slow query log and may have no active query string. Left alone, it blocks InnoDB purge, inflates the history list, and can trigger a metadata lock cascade that fills the connection pool.
Find the blocker, confirm the damage, and terminate it safely.
What this means
InnoDB uses multi-version concurrency control. When a transaction modifies data, it creates undo log records that let other transactions see a consistent snapshot as of the transaction start. The purge thread asynchronously removes undo records that are no longer needed by any active transaction.
A transaction holds its read view and row-level locks until it commits or rolls back. If it stays open and goes idle, it continues to hold these resources. An idle transaction with trx_query IS NULL and trx_rows_modified > 0 is particularly dangerous: it holds both locks and a read view, causing lock contention and purge lag. The trx_mysql_thread_id column in INNODB_TRX maps to PROCESSLIST.ID, the target for KILL.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Forgotten interactive session | A developer ran BEGIN and disconnected without committing. | trx_query IS NULL in INNODB_TRX and an old trx_started timestamp. |
| Application connection leak | A framework auto-begins a transaction and an exception or missing rollback leaves it open. | Same connection host and user pattern in PROCESSLIST; repeated thread IDs over time. |
| Monitoring or backup tool | A monitoring connection uses a transactional isolation level and stays open. | USER and HOST in PROCESSLIST match known tools; low trx_rows_modified but long age. |
| Long analytical query on OLTP | A reporting query runs for hours inside an explicit transaction. | trx_query shows a large SELECT; trx_rows_locked may be zero but the read view is held. |
Quick checks
Run these safe, read-only checks.
# Oldest open transactions
SELECT trx_mysql_thread_id, trx_started,
TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS age_seconds,
trx_rows_modified, trx_rows_locked, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC LIMIT 10;
# History list length (purge backlog)
SELECT COUNT FROM information_schema.INNODB_METRICS
WHERE NAME = 'trx_rseg_history_len';
# Alternative for all versions
SHOW ENGINE INNODB STATUS\G
-- Look for "History list length" in the TRANSACTIONS section
# Current row lock waits
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_current_waits';
# Metadata lock waits (MySQL 8.0+)
SELECT OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS
FROM performance_schema.metadata_locks
WHERE LOCK_STATUS = 'PENDING'
ORDER BY OBJECT_NAME;
# Map thread ID to processlist details
SHOW PROCESSLIST;
-- Match Id to trx_mysql_thread_id from INNODB_TRX
How to diagnose it
- Find the oldest transaction. Query
information_schema.INNODB_TRXordered bytrx_startedascending. The oldest row is your primary suspect. - Compute its age.
TIMESTAMPDIFF(SECOND, trx_started, NOW()). In OLTP, anything over 60 seconds warrants investigation; over 5 minutes is a problem. - Identify the idle-but-holding pattern. If
trx_query IS NULLandtrx_rows_modified > 0, the transaction is idle and holding both locks and a read view. This is the classic forgotten transaction. - Map to the connection.
trx_mysql_thread_idequalsPROCESSLIST.ID. UseSHOW PROCESSLISTto identify the user, host, and database. Determine whether the connection is expendable. - Check for lock waits. Query
performance_schema.data_lock_waits(MySQL 8.0+) orsys.innodb_lock_waits. If the idle transaction is blocking others,blocking_pidmatchestrx_mysql_thread_id. If the blocking session is idle, the active query text may be unavailable in lock wait views, so map the thread ID back toINNODB_TRXandPROCESSLIST. - Confirm purge impact. Check
trx_rseg_history_lenviaINNODB_METRICSorSHOW ENGINE INNODB STATUS. If it is above 1,000,000 and growing, the transaction is preventing purge. - Check for metadata lock cascade. Query
performance_schema.metadata_locksforPENDINGlocks on the same table. If a DDL is waiting behind the transaction, all new DML on that table will queue behind it. - Decide on intervention. If the transaction is unapproved, idle, and causing impact, kill it. If it is a legitimate long-running operation, coordinate with the owner rather than killing.
flowchart TD
A[Idle transaction
trx_query IS NULL
trx_rows_modified > 0] --> B[Holds row locks]
A --> C[Holds MVCC read view]
C --> D[Blocks purge thread]
D --> E[History list grows
undo tablespace expands]
B --> F[Blocks DML on same rows]
A --> G[Blocks DDL metadata lock]
G --> H[Metadata lock cascade]
F --> I[Lock waits pile up]
E --> J[All reads slow down]
H --> K[Connection pool fills]
I --> L[Incident]
J --> L
K --> LMetrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Open Transaction Age | Directly identifies the blocker before it causes secondary damage. | Any transaction older than 60 seconds in OLTP; idle transaction older than 5 minutes. |
History List Length (trx_rseg_history_len) | Measures MVCC debt. Growth means purge is blocked. | Above 100,000 sustained; above 1,000,000 indicates active blockage. |
| InnoDB Row Lock Current Waits | Confirms the idle transaction is blocking other work. | Greater than 0 sustained for more than 30 seconds with rising Threads_running. |
| Metadata Lock Waits | Catches the DDL cascade before connection exhaustion. | More than 3 sessions pending on the same table for more than 30 seconds. |
| Undo Tablespace Size | Storage impact of blocked purge. | Steady growth correlated with history list length. |
Fixes
Kill the blocking connection
If the transaction is expendable, terminate the connection using the trx_mysql_thread_id from INNODB_TRX. This maps to PROCESSLIST.ID.
KILL <trx_mysql_thread_id>;
KILL with no modifier defaults to KILL CONNECTION, which tears down the session and rolls back the transaction. KILL QUERY terminates only the active statement and leaves the connection alive. For an idle transaction with no active query, use KILL CONNECTION.
After the kill, the transaction state in INNODB_TRX shows ROLLING BACK while InnoDB undoes its modifications. Do not restart the server during this phase.
Handle the metadata lock cascade
If a DDL operation is waiting behind the idle transaction and the query queue is growing:
- Kill the DDL to instantly unblock DML, then retry the schema change during a maintenance window.
- Kill the idle transaction if it is expendable, allowing the DDL to proceed.
Killing the DDL is the safer immediate relief if the long transaction must survive. Killing the idle transaction is the correct fix if the transaction was abandoned.
Address application-level leaks
Killing treats the symptom. To fix the root cause:
- Add explicit
COMMITorROLLBACKin application code paths, including exception handlers. - Ensure connection pools return connections to a clean state. Many pools can issue
ROLLBACKbefore returning a connection to the pool. - Set
wait_timeoutandinteractive_timeoutlow enough that idle connections close before they become blockers, but high enough to avoid disrupting legitimate connection pooling.
Prevention
- Alert on transaction age. Any InnoDB transaction older than 60 seconds in an OLTP workload should trigger a ticket. Idle transactions older than 5 minutes should page.
- Enforce connection timeouts.
wait_timeoutcloses idle connections that the application abandoned. Pair this with application-side connection validation. - Audit autocommit behavior. Applications that explicitly disable autocommit must guarantee commit or rollback on every code path.
- Schedule DDL during low-traffic windows. Always check
INNODB_TRXfor old transactions before startingALTER TABLE. - Monitor history list length proactively. A steadily growing history list is an early warning that purge is blocked, even before query latency degrades. If needed,
innodb_max_purge_lagcan throttle DML to let purge catch up, though this deliberately slows application writes.
How Netdata helps
- Correlates open transaction age with history list length in the same time window.
- Surfaces InnoDB row lock waits alongside metadata lock waits to distinguish row-level blocking from DDL cascades.
- Tracks
Threads_runningandQuestionsrate to confirm when a silent blocker is impacting throughput. - Alerts on idle-transaction patterns without requiring manual polling of
INNODB_TRX.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL connection exhaustion: detection, diagnosis, and prevention
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction
- MySQL gap locks and next-key locks: surprising deadlocks under REPEATABLE READ
- MySQL ERROR 1205: Lock wait timeout exceeded; try restarting transaction
- MySQL: got a packet bigger than ‘max_allowed_packet’ bytes - causes and fixes
- MySQL monitoring checklist: the signals every production instance needs
- MySQL monitoring maturity model: from survival to expert
- MySQL InnoDB row lock contention: finding who blocks whom
- MySQL Threads_created climbing: thread cache churn and missing pooling
- MySQL Threads_connected vs Threads_running: which one to actually alert on







