MySQL purge lag from an idle transaction: the slow bleed
Your OLTP workload slows over hours. No single query dominates the slow log. CPU and disk I/O are not saturated. Yet read latency climbs, throughput drifts downward, and the undo tablespace grows. The culprit is often a single idle transaction that opened a read view and never closed it. Under InnoDB’s default REPEATABLE READ isolation, that transaction pins the purge boundary. Every subsequent write appends to the undo log. The history list length grows without bound, and all MVCC reads traverse longer version chains.
Unlike a checkpoint stall or metadata lock cascade, there is no sudden freeze. Connections stay open and replication may keep pace. The damage accumulates in the undo log, invisible to standard connection monitoring and often absent from the slow query log until the backlog is severe.
What this means
InnoDB stores old row versions in the undo log so that consistent reads can reconstruct data as of a transaction’s start time. A background purge thread removes these versions once no active transaction needs them. The read view of the oldest active transaction defines the horizon. Anything newer cannot be purged.
When a connection issues BEGIN and then sits idle without COMMIT or ROLLBACK, its read view stays open indefinitely. Every INSERT, UPDATE, and DELETE from other connections adds undo records that must be retained. The history list length, visible as trx_rseg_history_len in INFORMATION_SCHEMA.INNODB_METRICS, tracks this backlog. As it grows, read queries spend more time following version chains. The degradation is gradual, proportional to the history length, and affects the whole workload rather than one query.
InnoDB must retain every old row version from the point of the oldest active transaction forward. A DELETE does not physically remove the row. It marks it and writes an undo record. An UPDATE copies the old row to undo. Under REPEATABLE READ, a transaction that started at 09:00 sees the database exactly as it was at 09:00, so purge cannot touch anything newer. If that transaction is still open at 15:00, six hours of writes remain in the history list. Each consistent read must skip over those versions, adding CPU and I/O overhead that grows linearly with the backlog.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Developer or ORM abandoned transaction | Connection pool is fully utilized while Threads_running stays low; INNODB_TRX shows an old transaction with trx_query = NULL | SELECT * FROM information_schema.INNODB_TRX ORDER BY trx_started ASC LIMIT 1 |
| Connection pool returning without rollback | Transaction age spikes after application exceptions or restarts; the pool validates liveness but does not reset transaction state | Pool configuration for rollbackOnReturn or connection validation queries |
| Monitoring or reporting tool holding a snapshot | A long-running trx_mysql_thread_id maps to a monitoring host, BI client, or backup subnet | SHOW PROCESSLIST Host column for known tooling subnets |
mysqldump --single-transaction | History list rises during backup windows; the oldest transaction belongs to mysqldump | Backup schedule overlap with write-heavy periods |
Quick checks
# Current history list length (MySQL 8.0+)
mysql -e "SELECT COUNT FROM information_schema.INNODB_METRICS WHERE NAME = 'trx_rseg_history_len';"
# Alternative via SHOW ENGINE INNODB STATUS
mysql -e "SHOW ENGINE INNODB STATUS\G" | grep "History list length"
# Active InnoDB transactions ordered by age
mysql -e "SELECT trx_mysql_thread_id, trx_started, TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS age_seconds, trx_rows_modified, trx_query FROM information_schema.INNODB_TRX ORDER BY trx_started ASC LIMIT 5;"
# Map transaction threads to client connections
mysql -e "SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE FROM information_schema.PROCESSLIST WHERE ID IN (SELECT trx_mysql_thread_id FROM information_schema.INNODB_TRX);"
# Undo tablespace disk usage (default datadir)
du -sh /var/lib/mysql/undo_*
# Purge lag throttling configuration
mysql -e "SHOW GLOBAL VARIABLES LIKE 'innodb_max_purge_lag';"
# SQL text that started the idle transaction via Performance Schema
# Requires performance_schema and events_statements_history enabled
mysql -e "SELECT t.trx_mysql_thread_id, t.trx_started, s.SQL_TEXT FROM information_schema.INNODB_TRX t JOIN performance_schema.threads th ON th.PROCESSLIST_ID = t.trx_mysql_thread_id LEFT JOIN performance_schema.events_statements_history s ON s.THREAD_ID = th.THREAD_ID WHERE t.trx_started < NOW() - INTERVAL 5 MINUTE ORDER BY t.trx_started ASC LIMIT 1;"
How to diagnose it
- Sample
trx_rseg_history_lentwice, one minute apart. If it is growing, purge is falling behind writes. - Query
information_schema.INNODB_TRXordered bytrx_started. The oldest row is the read view pinning the purge horizon. - Check
trx_query. If it isNULL, the transaction is idle between statements but still holds its read view. - Map
trx_mysql_thread_idtoinformation_schema.PROCESSLISTto see the user, host, and whether the command isSleep. - If the source of the transaction is unclear, query
performance_schema.events_statements_historyvia the thread mapping to find the SQL that initiated it. - Check undo tablespace size on disk to gauge space impact.
- If the old transaction is a replication thread, do not kill it without planning for replica rebuild.
- Before killing, consider whether the transaction has modified many rows. A large rollback can itself take minutes and generate significant redo.
flowchart TD
A[History list growing] --> B{Oldest INNODB_TRX idle?}
B -->|Yes| C[KILL thread_id]
B -->|No, active query| D[Long-running query]
B -->|No old trx| E[Check purge threads]
C --> F[History list drains]
D --> G[Wait or optimize]
E --> H[Scale purge threads]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
trx_rseg_history_len (INNODB_METRICS) | Measures unpurged undo entries and MVCC debt | > 100,000 sustained, or > 1,000,000 and growing |
| Oldest transaction age (INNODB_TRX) | Identifies the read view blocking purge | > 5 minutes in OLTP; > 30 minutes if idle |
| Undo tablespace size on disk | Retained history consumes storage | Rapid growth correlating with history list |
| Read query latency | MVCC traversal overhead slows all reads | Latency rises without lock waits or CPU saturation |
innodb_max_purge_lag | If set above zero, DML is throttled to let purge catch up | Application write latency spikes proportionally |
Fixes
Kill the idle transaction
If the transaction is not a replication thread and is expendable, run KILL <thread_id>;. The history list will begin to drain immediately. Caution: if the transaction modified many rows, the rollback phase can take minutes and generate significant redo. Monitor Innodb_row_lock_time and buffer pool activity after the kill.
Fix the application or pool
Ensure all code paths issue COMMIT or ROLLBACK. Add a connection validation query that checks for open transactions before returning a connection to the pool. Catch exceptions and roll back explicitly.
Reschedule long snapshots
If mysqldump --single-transaction or a reporting tool is the cause, move the job to a replica, a low-traffic window, or use a non-transactional backup method such as a physical snapshot.
Tune purge capacity only when appropriate
If no old transaction exists and the history list grows purely from write volume, you can increase innodb_purge_threads. InnoDB adjusts the active thread count automatically, but you can raise the maximum if the write rate consistently outstrips purge capacity.
Use purge lag throttling deliberately
Setting innodb_max_purge_lag above zero causes InnoDB to delay DML proportionally. This is a backpressure mechanism, not a fix. Raising the threshold allows more backlog but delays the symptom. Lowering it slows the application further. The only durable fix is releasing the read view.
Prevention
- Monitor oldest transaction age. Alert on any
INNODB_TRXentry older than 60 seconds for OLTP workloads. - Enforce connection hygiene. Configure pools to validate connections and rollback on return. Ensure exception handlers always close transactions.
- Review tooling. Audit monitoring and backup scripts for
REPEATABLE READsnapshots that outlive their usefulness. - Choose isolation level with intent. If your application does not require
REPEATABLE READ,READ COMMITTEDreduces read view lifetime and gap locking. Validate semantics before switching. - Set connection timeouts aggressively enough to prune abandoned connections, but not so low that legitimate long queries are terminated.
How Netdata helps
Netdata collects mysql.innodb_history_list_length without parsing SHOW ENGINE INNODB STATUS. It tracks active InnoDB transaction count and age so you can spot the blocker before the history list reaches 1 million. Correlating history list growth with query latency and buffer pool metrics confirms MVCC overhead. Alerts on rising undo tablespace size alongside transaction age distinguish purge lag from generic disk space issues.
Related guides
- How MySQL actually works in production: a mental model for operators: /guides/mysql/how-mysql-works-in-production/
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis: /guides/mysql/mysql-aborted-connections/
- MySQL adaptive hash index latch contention: high CPU, low throughput: /guides/mysql/mysql-adaptive-hash-index-latch-contention/
- MySQL InnoDB buffer pool hit ratio collapse: the cliff edge: /guides/mysql/mysql-buffer-pool-hit-ratio-collapse/
- MySQL slow after restart: buffer pool warm-up and the cold cache: /guides/mysql/mysql-buffer-pool-not-warming-up/
- MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks: /guides/mysql/mysql-buffer-pool-sizing/
- MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure: /guides/mysql/mysql-buffer-pool-wait-free/
- MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches: /guides/mysql/mysql-checkpoint-age-monitoring/
- MySQL connection exhaustion: detection, diagnosis, and prevention: /guides/mysql/mysql-connection-exhaustion/
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck: /guides/mysql/mysql-deadlock-detect-off-high-concurrency/
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction: /guides/mysql/mysql-deadlock-found/
- MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server: /guides/mysql/mysql-flush-tables-with-read-lock-stall/







