MySQL purge lag from an idle transaction: the slow bleed

Your OLTP workload slows over hours. No single query dominates the slow log. CPU and disk I/O are not saturated. Yet read latency climbs, throughput drifts downward, and the undo tablespace grows. The culprit is often a single idle transaction that opened a read view and never closed it. Under InnoDB’s default REPEATABLE READ isolation, that transaction pins the purge boundary. Every subsequent write appends to the undo log. The history list length grows without bound, and all MVCC reads traverse longer version chains.

Unlike a checkpoint stall or metadata lock cascade, there is no sudden freeze. Connections stay open and replication may keep pace. The damage accumulates in the undo log, invisible to standard connection monitoring and often absent from the slow query log until the backlog is severe.

What this means

InnoDB stores old row versions in the undo log so that consistent reads can reconstruct data as of a transaction’s start time. A background purge thread removes these versions once no active transaction needs them. The read view of the oldest active transaction defines the horizon. Anything newer cannot be purged.

When a connection issues BEGIN and then sits idle without COMMIT or ROLLBACK, its read view stays open indefinitely. Every INSERT, UPDATE, and DELETE from other connections adds undo records that must be retained. The history list length, visible as trx_rseg_history_len in INFORMATION_SCHEMA.INNODB_METRICS, tracks this backlog. As it grows, read queries spend more time following version chains. The degradation is gradual, proportional to the history length, and affects the whole workload rather than one query.

InnoDB must retain every old row version from the point of the oldest active transaction forward. A DELETE does not physically remove the row. It marks it and writes an undo record. An UPDATE copies the old row to undo. Under REPEATABLE READ, a transaction that started at 09:00 sees the database exactly as it was at 09:00, so purge cannot touch anything newer. If that transaction is still open at 15:00, six hours of writes remain in the history list. Each consistent read must skip over those versions, adding CPU and I/O overhead that grows linearly with the backlog.

Common causes

CauseWhat it looks likeFirst thing to check
Developer or ORM abandoned transactionConnection pool is fully utilized while Threads_running stays low; INNODB_TRX shows an old transaction with trx_query = NULLSELECT * FROM information_schema.INNODB_TRX ORDER BY trx_started ASC LIMIT 1
Connection pool returning without rollbackTransaction age spikes after application exceptions or restarts; the pool validates liveness but does not reset transaction statePool configuration for rollbackOnReturn or connection validation queries
Monitoring or reporting tool holding a snapshotA long-running trx_mysql_thread_id maps to a monitoring host, BI client, or backup subnetSHOW PROCESSLIST Host column for known tooling subnets
mysqldump --single-transactionHistory list rises during backup windows; the oldest transaction belongs to mysqldumpBackup schedule overlap with write-heavy periods

Quick checks

# Current history list length (MySQL 8.0+)
mysql -e "SELECT COUNT FROM information_schema.INNODB_METRICS WHERE NAME = 'trx_rseg_history_len';"
# Alternative via SHOW ENGINE INNODB STATUS
mysql -e "SHOW ENGINE INNODB STATUS\G" | grep "History list length"
# Active InnoDB transactions ordered by age
mysql -e "SELECT trx_mysql_thread_id, trx_started, TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS age_seconds, trx_rows_modified, trx_query FROM information_schema.INNODB_TRX ORDER BY trx_started ASC LIMIT 5;"
# Map transaction threads to client connections
mysql -e "SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE FROM information_schema.PROCESSLIST WHERE ID IN (SELECT trx_mysql_thread_id FROM information_schema.INNODB_TRX);"
# Undo tablespace disk usage (default datadir)
du -sh /var/lib/mysql/undo_*
# Purge lag throttling configuration
mysql -e "SHOW GLOBAL VARIABLES LIKE 'innodb_max_purge_lag';"
# SQL text that started the idle transaction via Performance Schema
# Requires performance_schema and events_statements_history enabled
mysql -e "SELECT t.trx_mysql_thread_id, t.trx_started, s.SQL_TEXT FROM information_schema.INNODB_TRX t JOIN performance_schema.threads th ON th.PROCESSLIST_ID = t.trx_mysql_thread_id LEFT JOIN performance_schema.events_statements_history s ON s.THREAD_ID = th.THREAD_ID WHERE t.trx_started < NOW() - INTERVAL 5 MINUTE ORDER BY t.trx_started ASC LIMIT 1;"

How to diagnose it

  1. Sample trx_rseg_history_len twice, one minute apart. If it is growing, purge is falling behind writes.
  2. Query information_schema.INNODB_TRX ordered by trx_started. The oldest row is the read view pinning the purge horizon.
  3. Check trx_query. If it is NULL, the transaction is idle between statements but still holds its read view.
  4. Map trx_mysql_thread_id to information_schema.PROCESSLIST to see the user, host, and whether the command is Sleep.
  5. If the source of the transaction is unclear, query performance_schema.events_statements_history via the thread mapping to find the SQL that initiated it.
  6. Check undo tablespace size on disk to gauge space impact.
  7. If the old transaction is a replication thread, do not kill it without planning for replica rebuild.
  8. Before killing, consider whether the transaction has modified many rows. A large rollback can itself take minutes and generate significant redo.
flowchart TD
    A[History list growing] --> B{Oldest INNODB_TRX idle?}
    B -->|Yes| C[KILL thread_id]
    B -->|No, active query| D[Long-running query]
    B -->|No old trx| E[Check purge threads]
    C --> F[History list drains]
    D --> G[Wait or optimize]
    E --> H[Scale purge threads]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
trx_rseg_history_len (INNODB_METRICS)Measures unpurged undo entries and MVCC debt> 100,000 sustained, or > 1,000,000 and growing
Oldest transaction age (INNODB_TRX)Identifies the read view blocking purge> 5 minutes in OLTP; > 30 minutes if idle
Undo tablespace size on diskRetained history consumes storageRapid growth correlating with history list
Read query latencyMVCC traversal overhead slows all readsLatency rises without lock waits or CPU saturation
innodb_max_purge_lagIf set above zero, DML is throttled to let purge catch upApplication write latency spikes proportionally

Fixes

Kill the idle transaction

If the transaction is not a replication thread and is expendable, run KILL <thread_id>;. The history list will begin to drain immediately. Caution: if the transaction modified many rows, the rollback phase can take minutes and generate significant redo. Monitor Innodb_row_lock_time and buffer pool activity after the kill.

Fix the application or pool

Ensure all code paths issue COMMIT or ROLLBACK. Add a connection validation query that checks for open transactions before returning a connection to the pool. Catch exceptions and roll back explicitly.

Reschedule long snapshots

If mysqldump --single-transaction or a reporting tool is the cause, move the job to a replica, a low-traffic window, or use a non-transactional backup method such as a physical snapshot.

Tune purge capacity only when appropriate

If no old transaction exists and the history list grows purely from write volume, you can increase innodb_purge_threads. InnoDB adjusts the active thread count automatically, but you can raise the maximum if the write rate consistently outstrips purge capacity.

Use purge lag throttling deliberately

Setting innodb_max_purge_lag above zero causes InnoDB to delay DML proportionally. This is a backpressure mechanism, not a fix. Raising the threshold allows more backlog but delays the symptom. Lowering it slows the application further. The only durable fix is releasing the read view.

Prevention

  • Monitor oldest transaction age. Alert on any INNODB_TRX entry older than 60 seconds for OLTP workloads.
  • Enforce connection hygiene. Configure pools to validate connections and rollback on return. Ensure exception handlers always close transactions.
  • Review tooling. Audit monitoring and backup scripts for REPEATABLE READ snapshots that outlive their usefulness.
  • Choose isolation level with intent. If your application does not require REPEATABLE READ, READ COMMITTED reduces read view lifetime and gap locking. Validate semantics before switching.
  • Set connection timeouts aggressively enough to prune abandoned connections, but not so low that legitimate long queries are terminated.

How Netdata helps

Netdata collects mysql.innodb_history_list_length without parsing SHOW ENGINE INNODB STATUS. It tracks active InnoDB transaction count and age so you can spot the blocker before the history list reaches 1 million. Correlating history list growth with query latency and buffer pool metrics confirms MVCC overhead. Alerts on rising undo tablespace size alongside transaction age distinguish purge lag from generic disk space issues.

  • How MySQL actually works in production: a mental model for operators: /guides/mysql/how-mysql-works-in-production/
  • MySQL Aborted_connects and Aborted_clients climbing: diagnosis: /guides/mysql/mysql-aborted-connections/
  • MySQL adaptive hash index latch contention: high CPU, low throughput: /guides/mysql/mysql-adaptive-hash-index-latch-contention/
  • MySQL InnoDB buffer pool hit ratio collapse: the cliff edge: /guides/mysql/mysql-buffer-pool-hit-ratio-collapse/
  • MySQL slow after restart: buffer pool warm-up and the cold cache: /guides/mysql/mysql-buffer-pool-not-warming-up/
  • MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks: /guides/mysql/mysql-buffer-pool-sizing/
  • MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure: /guides/mysql/mysql-buffer-pool-wait-free/
  • MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches: /guides/mysql/mysql-checkpoint-age-monitoring/
  • MySQL connection exhaustion: detection, diagnosis, and prevention: /guides/mysql/mysql-connection-exhaustion/
  • MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck: /guides/mysql/mysql-deadlock-detect-off-high-concurrency/
  • MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction: /guides/mysql/mysql-deadlock-found/
  • MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server: /guides/mysql/mysql-flush-tables-with-read-lock-stall/