MySQL long-running transactions: detecting and killing the silent blocker

Threads_running climbs. A DDL operation that should take seconds is stuck for minutes. Storage grows steadily with no matching data increase. The cause is often a single idle transaction in INNODB_TRX, holding row locks and an MVCC read view long after its last query finished. It does not appear in the slow query log and may have no active query string. Left alone, it blocks InnoDB purge, inflates the history list, and can trigger a metadata lock cascade that fills the connection pool.

Find the blocker, confirm the damage, and terminate it safely.

What this means

InnoDB uses multi-version concurrency control. When a transaction modifies data, it creates undo log records that let other transactions see a consistent snapshot as of the transaction start. The purge thread asynchronously removes undo records that are no longer needed by any active transaction.

A transaction holds its read view and row-level locks until it commits or rolls back. If it stays open and goes idle, it continues to hold these resources. An idle transaction with trx_query IS NULL and trx_rows_modified > 0 is particularly dangerous: it holds both locks and a read view, causing lock contention and purge lag. The trx_mysql_thread_id column in INNODB_TRX maps to PROCESSLIST.ID, the target for KILL.

Common causes

CauseWhat it looks likeFirst thing to check
Forgotten interactive sessionA developer ran BEGIN and disconnected without committing.trx_query IS NULL in INNODB_TRX and an old trx_started timestamp.
Application connection leakA framework auto-begins a transaction and an exception or missing rollback leaves it open.Same connection host and user pattern in PROCESSLIST; repeated thread IDs over time.
Monitoring or backup toolA monitoring connection uses a transactional isolation level and stays open.USER and HOST in PROCESSLIST match known tools; low trx_rows_modified but long age.
Long analytical query on OLTPA reporting query runs for hours inside an explicit transaction.trx_query shows a large SELECT; trx_rows_locked may be zero but the read view is held.

Quick checks

Run these safe, read-only checks.

# Oldest open transactions
SELECT trx_mysql_thread_id, trx_started,
       TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS age_seconds,
       trx_rows_modified, trx_rows_locked, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC LIMIT 10;
# History list length (purge backlog)
SELECT COUNT FROM information_schema.INNODB_METRICS
WHERE NAME = 'trx_rseg_history_len';
# Alternative for all versions
SHOW ENGINE INNODB STATUS\G
-- Look for "History list length" in the TRANSACTIONS section
# Current row lock waits
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_current_waits';
# Metadata lock waits (MySQL 8.0+)
SELECT OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS
FROM performance_schema.metadata_locks
WHERE LOCK_STATUS = 'PENDING'
ORDER BY OBJECT_NAME;
# Map thread ID to processlist details
SHOW PROCESSLIST;
-- Match Id to trx_mysql_thread_id from INNODB_TRX

How to diagnose it

  1. Find the oldest transaction. Query information_schema.INNODB_TRX ordered by trx_started ascending. The oldest row is your primary suspect.
  2. Compute its age. TIMESTAMPDIFF(SECOND, trx_started, NOW()). In OLTP, anything over 60 seconds warrants investigation; over 5 minutes is a problem.
  3. Identify the idle-but-holding pattern. If trx_query IS NULL and trx_rows_modified > 0, the transaction is idle and holding both locks and a read view. This is the classic forgotten transaction.
  4. Map to the connection. trx_mysql_thread_id equals PROCESSLIST.ID. Use SHOW PROCESSLIST to identify the user, host, and database. Determine whether the connection is expendable.
  5. Check for lock waits. Query performance_schema.data_lock_waits (MySQL 8.0+) or sys.innodb_lock_waits. If the idle transaction is blocking others, blocking_pid matches trx_mysql_thread_id. If the blocking session is idle, the active query text may be unavailable in lock wait views, so map the thread ID back to INNODB_TRX and PROCESSLIST.
  6. Confirm purge impact. Check trx_rseg_history_len via INNODB_METRICS or SHOW ENGINE INNODB STATUS. If it is above 1,000,000 and growing, the transaction is preventing purge.
  7. Check for metadata lock cascade. Query performance_schema.metadata_locks for PENDING locks on the same table. If a DDL is waiting behind the transaction, all new DML on that table will queue behind it.
  8. Decide on intervention. If the transaction is unapproved, idle, and causing impact, kill it. If it is a legitimate long-running operation, coordinate with the owner rather than killing.
flowchart TD
    A[Idle transaction
trx_query IS NULL
trx_rows_modified > 0] --> B[Holds row locks] A --> C[Holds MVCC read view] C --> D[Blocks purge thread] D --> E[History list grows
undo tablespace expands] B --> F[Blocks DML on same rows] A --> G[Blocks DDL metadata lock] G --> H[Metadata lock cascade] F --> I[Lock waits pile up] E --> J[All reads slow down] H --> K[Connection pool fills] I --> L[Incident] J --> L K --> L

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Open Transaction AgeDirectly identifies the blocker before it causes secondary damage.Any transaction older than 60 seconds in OLTP; idle transaction older than 5 minutes.
History List Length (trx_rseg_history_len)Measures MVCC debt. Growth means purge is blocked.Above 100,000 sustained; above 1,000,000 indicates active blockage.
InnoDB Row Lock Current WaitsConfirms the idle transaction is blocking other work.Greater than 0 sustained for more than 30 seconds with rising Threads_running.
Metadata Lock WaitsCatches the DDL cascade before connection exhaustion.More than 3 sessions pending on the same table for more than 30 seconds.
Undo Tablespace SizeStorage impact of blocked purge.Steady growth correlated with history list length.

Fixes

Kill the blocking connection

If the transaction is expendable, terminate the connection using the trx_mysql_thread_id from INNODB_TRX. This maps to PROCESSLIST.ID.

KILL <trx_mysql_thread_id>;

KILL with no modifier defaults to KILL CONNECTION, which tears down the session and rolls back the transaction. KILL QUERY terminates only the active statement and leaves the connection alive. For an idle transaction with no active query, use KILL CONNECTION.

After the kill, the transaction state in INNODB_TRX shows ROLLING BACK while InnoDB undoes its modifications. Do not restart the server during this phase.

Handle the metadata lock cascade

If a DDL operation is waiting behind the idle transaction and the query queue is growing:

  • Kill the DDL to instantly unblock DML, then retry the schema change during a maintenance window.
  • Kill the idle transaction if it is expendable, allowing the DDL to proceed.

Killing the DDL is the safer immediate relief if the long transaction must survive. Killing the idle transaction is the correct fix if the transaction was abandoned.

Address application-level leaks

Killing treats the symptom. To fix the root cause:

  • Add explicit COMMIT or ROLLBACK in application code paths, including exception handlers.
  • Ensure connection pools return connections to a clean state. Many pools can issue ROLLBACK before returning a connection to the pool.
  • Set wait_timeout and interactive_timeout low enough that idle connections close before they become blockers, but high enough to avoid disrupting legitimate connection pooling.

Prevention

  • Alert on transaction age. Any InnoDB transaction older than 60 seconds in an OLTP workload should trigger a ticket. Idle transactions older than 5 minutes should page.
  • Enforce connection timeouts. wait_timeout closes idle connections that the application abandoned. Pair this with application-side connection validation.
  • Audit autocommit behavior. Applications that explicitly disable autocommit must guarantee commit or rollback on every code path.
  • Schedule DDL during low-traffic windows. Always check INNODB_TRX for old transactions before starting ALTER TABLE.
  • Monitor history list length proactively. A steadily growing history list is an early warning that purge is blocked, even before query latency degrades. If needed, innodb_max_purge_lag can throttle DML to let purge catch up, though this deliberately slows application writes.

How Netdata helps

  • Correlates open transaction age with history list length in the same time window.
  • Surfaces InnoDB row lock waits alongside metadata lock waits to distinguish row-level blocking from DDL cascades.
  • Tracks Threads_running and Questions rate to confirm when a silent blocker is impacting throughput.
  • Alerts on idle-transaction patterns without requiring manual polling of INNODB_TRX.