MySQL InnoDB history list length growing: purge lag explained

If query latency climbs across every table and no single query stands out, check the InnoDB history list length. It tracks unpurged undo records: MVCC debt. When purge falls behind, every consistent read walks longer version chains, and the slowdown is global. Unlike a slow query, this debt is invisible to conventional query analysis.

In MySQL 8.0, the programmatic source for this metric is trx_rseg_history_len in information_schema.INNODB_METRICS. Despite being labeled status_counter in the metrics table, trx_rseg_history_len behaves as a gauge: it rises when purge lags and falls when purge catches up.

What it is and why it matters

History list length is the count of unpurged undo log entries in InnoDB rollback segments. Every UPDATE, DELETE, and some INSERT operations create undo records so transactions can roll back and other transactions can read older row versions. These records live in undo tablespaces. In MySQL 8.0 these are automatic and typically reside in the data directory with .ibu extensions. History list length counts the logical undo records; the physical size on disk can remain large even after purge because truncation is asynchronous.

The purge coordinator and its worker threads remove undo records no longer needed by any active transaction. When purge keeps up, the history list stays flat. When purge falls behind, the list grows with the write rate.

The damage is not limited to hot tables. Because InnoDB uses MVCC for all consistent reads, every query touching any InnoDB table may traverse longer version chains to reconstruct the correct snapshot. The symptom is global latency degradation without a single query regression.

In many OLTP workloads, a history list length below 10,000 is benign. Above 100,000 the system is accumulating meaningful MVCC debt. Above 1,000,000 something is actively blocking purge, and read performance suffers fleet-wide.

How it works

When a transaction modifies a row, InnoDB writes the before-image to an undo log record and links it to the row’s rollback pointer. The purge thread discards the record once no active read view needs it.

A read view is the snapshot of committed data a transaction sees. Under the default REPEATABLE READ isolation level, the first consistent read creates a read view that stays open until the transaction commits or rolls back. This is the critical mechanism: one long-running transaction holds a read view, and purge cannot remove undo records newer than that view. One idle transaction blocks global cleanup.

As writes continue, the history list lengthens. Every subsequent modification adds another undo record. MVCC-consistent reads must traverse these chains to find the correct row version. The cost is proportional to chain length, so the slowdown is gradual, predictable, and global.

flowchart TD
    A[DML modifies a row] --> B[Undo log record created]
    B --> C[History list length grows]
    D[Long-running transaction holds read view] --> E[Purge cannot remove old undo]
    E --> C
    C --> F[MVCC reads traverse longer chains]
    F --> G[Query latency degrades across all tables]

Where it shows up in production

The most common trigger is an idle open transaction. A developer runs BEGIN and walks away. An application framework auto-begins a transaction and a code path exits without commit or rollback. A connection pool returns a connection without rolling back. A monitoring tool opens a transactional connection and holds it.

Backups are another expected source. mysqldump --single-transaction acquires a read view and holds it for the dump duration. On a busy primary, this is a deliberate tradeoff, but the history list grows throughout the backup. Long-running reporting queries on the primary under REPEATABLE READ have the same effect.

Less commonly, extreme write pressure outpaces purge even without a long transaction. In these cases the history list rises during write bursts and recovers during quiet periods. If it does not recover, look for a hidden read view or purge competing for I/O and buffer pool resources.

Tradeoffs and when this matters

Switching from REPEATABLE READ to READ COMMITTED narrows the scope and lifetime of read views, reducing the chance that a long-running select blocks purge. Under READ COMMITTED, each consistent read acquires its own fresh read view. This is an application-level consistency tradeoff, not a database knob to flip during an incident.

The innodb_max_purge_lag variable sets a threshold beyond which InnoDB intentionally delays DML. When the history list exceeds this value, write operations are paused to give purge time to catch up. The delay is proportional to the overshoot, capped by innodb_max_purge_lag_delay. This protects against unbounded growth by slowing the application.

Killing the oldest idle connection is usually the fastest recovery. When the connection dies, its transaction rolls back and the read view is released. History list growth stops immediately, and purge begins draining the backlog.

Warning: KILL CONNECTION <thread_id> is disruptive. Before killing, confirm the connection is expendable in information_schema.PROCESSLIST or performance_schema.threads. An idle read-only transaction releases its read view instantly on disconnect. If the transaction has uncommitted writes, InnoDB must roll them back, which can take time and generate additional I/O. Do not kill replication threads, backup connections, or application threads that own state without a maintenance plan.

In MySQL 8.0.14 and later, undo tablespace truncation is automatic but asynchronous. Even after the history list drops, undo files may stay large on disk until truncation completes. innodb_undo_tablespaces is deprecated in 8.0 and ignored; the server manages undo tablespace creation automatically.

Checks during an incident

To confirm purge lag and find the blocking transaction:

Check the current history list length:

SELECT `COUNT` FROM information_schema.INNODB_METRICS
WHERE NAME = 'trx_rseg_history_len';

Find the oldest active transactions, including idle ones:

SELECT trx_id,
       trx_mysql_thread_id,
       trx_state,
       trx_started,
       TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS trx_idle_seconds,
       LEFT(trx_query, 100) AS query_preview
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC
LIMIT 5;

Map the thread ID to a connection and host:

SELECT t.trx_mysql_thread_id,
       p.user,
       p.host,
       p.db,
       p.command,
       t.trx_started
FROM information_schema.INNODB_TRX t
JOIN information_schema.PROCESSLIST p
  ON t.trx_mysql_thread_id = p.id
ORDER BY t.trx_started ASC
LIMIT 5;

An idle transaction holding a read view will show trx_query IS NULL and a command of Sleep in PROCESSLIST.

SHOW ENGINE INNODB STATUS also prints the History list length near the top of the output, along with the oldest transaction view. This is useful when querying INNODB_METRICS is slow or blocked.

To release the blocking view after confirming the connection is expendable:

KILL CONNECTION <trx_mysql_thread_id>;

Use KILL CONNECTION, not KILL QUERY. KILL QUERY terminates the current statement but leaves the transaction and its read view open. KILL CONNECTION closes the session and rolls back the transaction.

If the history list remains high after killing the blocker, look for additional idle transactions or check disk I/O and buffer pool pressure for signs that purge is resource-starved.

Signals to watch in production

SignalWhy it mattersWarning sign
trx_rseg_history_len (INNODB_METRICS)Direct measure of unpurged undo debt and MVCC traversal cost> 100,000 sustained, or > 1,000,000 at any time
Oldest transaction age (INNODB_TRX)Identifies the read view blocking purgeAny transaction active > 5 minutes in OLTP; idle with trx_query IS NULL
SHOW ENGINE INNODB STATUS History list lengthSame metric accessible via CLI when querying metrics tables is slow or blockedSame thresholds as trx_rseg_history_len
Read query latencyLonger undo chains slow all consistent readsLatency rising across all tables without a single query regression
Undo tablespace size on diskPhysical footprint of retained undoRapid growth tracking history list spike
Threads_runningQueries slow and pile upRising while throughput is flat or falling

How Netdata helps

  • Netdata collects trx_rseg_history_len from INFORMATION_SCHEMA.INNODB_METRICS and charts it as a gauge, making the slope visible immediately. A steep climb over minutes indicates purge is falling behind writes.
  • Correlate history list length with active transaction age from INNODB_TRX on the same dashboard to identify the blocking read view.
  • Alert on thresholds such as > 100,000 or rapid growth over a 15-minute window, before latency degrades.
  • Overlay read query latency and Threads_running to confirm when purge lag is translating into user-facing impact.
  • Surface SHOW ENGINE INNODB STATUS metrics without manual parsing through the MySQL collector.