MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches
Checkpoint age is the distance between the current InnoDB log sequence number (LSN) and the last checkpoint LSN. It measures how much of the circular redo log is occupied by changes not yet flushed to data files. When this age approaches redo log capacity, InnoDB escalates flushing aggressiveness until it runs out of options and stalls all writes.
Standard MySQL does not expose Innodb_checkpoint_age as a status variable. Depending on version, you must parse SHOW ENGINE INNODB STATUS, enable a disabled-by-default INNODB_METRICS counter, or compute the delta from MySQL 8.0.30+ redo status variables. MySQL 8.0.30 replaced innodb_log_file_size and innodb_log_files_in_group with innodb_redo_log_capacity.
This guide covers how checkpoint age works, why default instrumentation hides the signal, and the thresholds that separate healthy operation from synchronous write stalls.
What checkpoint age is and why it matters
InnoDB uses a write-ahead log. Every modification is appended to the redo log before it is applied to buffer pool pages. The redo log is a fixed-size circular buffer. To reuse a portion of the log, InnoDB must flush the corresponding dirty pages to the tablespace and advance the checkpoint LSN.
Checkpoint age is:
checkpoint_age = current_lsn - last_checkpoint_lsn
This is how far write activity has outpaced the background page cleaner threads. As the age grows, InnoDB’s adaptive flushing algorithm works harder. The margin between healthy and stalled is narrow. The operational baseline considers 50% of capacity comfortable, 75% the start of aggressive flushing, and 85% stall imminent. Above that ratio, InnoDB enters synchronous flush mode and all write operations block until the checkpoint advances.
Unlike buffer pool pressure, which degrades read performance gradually, redo log exhaustion is a hard cliff. A system at 70% checkpoint age can appear fine, then spike to 90% during a write burst and freeze within seconds.
How checkpoint age works
Transactions generate redo records; the log writer appends them to the current log file. Page cleaner threads flush dirty pages in the background. When a flush completes, the checkpoint LSN moves forward, reclaiming log space.
flowchart LR
A[Current LSN] -->|minus| B[Checkpoint LSN]
B --> C[Checkpoint age]
C --> D{Redo capacity}
D -->|below 50%| E[Adaptive flushing]
D -->|50-75%| F[Watch closely]
D -->|75-85%| G[Aggressive flushing]
D -->|above 85%| H[Synchronous stall]Under normal load, adaptive flushing keeps checkpoint age below 50%. During write bursts, bulk loads, or when storage cannot keep up with innodb_io_capacity_max, the age climbs. At around 75%, InnoDB pushes page cleaners toward their maximum I/O rate. Between 75% and 85%, flushing consumes most disk bandwidth and write latency degrades. Above 85%, InnoDB forces a sharp checkpoint, stalling all writes until space is reclaimed.
Do not confuse this with log buffer exhaustion. Innodb_log_waits tracks when the log buffer is too small to hold a transaction’s redo entries. That is a buffer-sizing issue, not a checkpoint capacity issue.
Where it shows up in production
Checkpoint stalls rarely log a clear error. The first symptom is usually application write timeouts while reads remain functional. Threads_running climbs because write transactions are stuck waiting for the checkpoint to advance. Questions rate drops because new writes cannot commit. Disk write latency may spike, but CPU can look idle because transactions are blocked on fsync and page flush.
This pattern is easily mistaken for disk failure or lock contention. The distinguishing feature is the correlation between rising checkpoint age and collapsing write throughput. If you see Threads_running piling up, Innodb_row_lock_current_waits near zero, and buffer pool metrics stable, check the redo log immediately.
Reads continue to serve from the buffer pool or disk, so the application may report a partial outage rather than total unavailability. If you run pt-stalk or a similar high-frequency diagnostic collector, capture SHOW ENGINE INNODB STATUS during the stall. The Log sequence number and Last checkpoint at values will be frozen or advancing only in small jumps.
Checkpoint pressure also appears during unbatched bulk loads, large schema migrations, or heavy batch UPDATE operations. A subtler source is the post-upgrade configuration gap. A team migrating from MySQL 5.7 to 8.0.30+ may carry forward innodb_log_file_size tuned for the old workload without realizing that innodb_redo_log_capacity now controls effective size. The legacy variables are deprecated and no longer authoritative on 8.0.30+. If the new capacity variable is unset, the effective redo space can be smaller than before, and the first heavy write burst triggers an unexpected stall.
Why the signal is invisible by default
Standard MySQL does not ship an Innodb_checkpoint_age status variable.
Before 8.0.30, the only structured source was the log_lsn_checkpoint_age counter in INFORMATION_SCHEMA.INNODB_METRICS, disabled by default. Until you run SET GLOBAL innodb_monitor_enable = 'log_lsn_checkpoint_age', queries return stale or incoherent values. Tools that only poll SHOW GLOBAL STATUS never see the metric.
MySQL 8.0.30 introduced dedicated redo status variables:
Innodb_redo_log_checkpoint_lsnInnodb_redo_log_current_lsnInnodb_redo_log_flushed_to_disk_lsnInnodb_redo_log_logical_sizeInnodb_redo_log_physical_size
These make checkpoint age computable from SHOW GLOBAL STATUS, but only if you manually compute the delta and divide by capacity. The ratio is not exposed directly.
How to collect checkpoint age
Use the method that matches your version.
MySQL 8.0.30 and later:
SELECT
((SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_redo_log_current_lsn') -
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_redo_log_checkpoint_lsn'))
/ @@innodb_redo_log_capacity AS checkpoint_age_ratio;
Before MySQL 8.0.30:
Enable the counter:
SET GLOBAL innodb_monitor_enable = 'log_lsn_checkpoint_age';
Then query:
SELECT COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS
WHERE NAME = 'log_lsn_checkpoint_age';
Capacity is innodb_log_file_size * innodb_log_files_in_group.
All versions (fallback):
SHOW ENGINE INNODB STATUS\G
Parse the LOG section for Log sequence number and Last checkpoint at. Subtract and compare to total capacity.
Thresholds and operational semantics
Monitor checkpoint age as a ratio, never as an absolute LSN.
| Ratio | Zone | Meaning |
|---|---|---|
| below 50% | Comfortable | Normal adaptive flushing, healthy headroom |
| 50-75% | Watch | Trending upward during peaks; plan capacity review |
| above 75% | Aggressive flushing | Page cleaners at high I/O; query latency degrades |
| above 85% | Stall imminent | Synchronous flush likely; writes will freeze |
Alerting severity:
- PLAN: Peak checkpoint age trending upward over weeks.
- TICKET: Checkpoint age above 75% of capacity sustained.
- PAGE: Checkpoint age above 90% of capacity AND commit rate or
Questionsrate is collapsing, sustained for more than 60 seconds. The throughput collapse confirms that writes are stalling, not merely running hot.
Do not page on checkpoint age alone during a known bulk load.
Capacity sizing tradeoffs
Redo log capacity trades write headroom against crash recovery duration. A larger log gives InnoDB more time to flush dirty pages in the background, reducing the risk of synchronous stalls. A larger log also means more redo to scan during crash recovery, extending startup time after an unclean shutdown.
Before 8.0.30, changing capacity required setting innodb_log_file_size and restarting the server. This made proactive resizing expensive, so many operators ran with conservative sizes until the first stall. MySQL 8.0.30 makes innodb_redo_log_capacity dynamic. You can raise it online during a peak, then lower it afterward.
If checkpoint age is spiking and you cannot resize immediately, temporarily increase innodb_io_capacity_max to let page cleaners consume more disk bandwidth. This trades query I/O for flush throughput. Killing the largest running write transaction is an emergency option. Only use it when the transaction is expendable. Neither fixes an undersized log; they buy minutes, not hours.
How Netdata helps
Netdata correlates checkpoint age with related signals:
- Computes the checkpoint age ratio automatically on MySQL 8.0.30+ from
Innodb_redo_log_current_lsnandInnodb_redo_log_checkpoint_lsn. - Surfaces
Innodb_buffer_pool_pages_dirty,Innodb_os_log_fsyncs, and disk write latency on the same chart to distinguish I/O saturation from capacity saturation. - Includes checkpoint age in the MySQL monitoring maturity model, aligned with the PLAN/TICKET/PAGE severity framework.
- Composite alerting requires both high checkpoint age and collapsing commit rate before paging, reducing false alarms during scheduled bulk operations.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL connection exhaustion: detection, diagnosis, and prevention
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction
- MySQL gap locks and next-key locks: surprising deadlocks under REPEATABLE READ
- MySQL ERROR 1205: Lock wait timeout exceeded; try restarting transaction
- MySQL long-running transactions: detecting and killing the silent blocker
- MySQL: got a packet bigger than ‘max_allowed_packet’ bytes - causes and fixes
- MySQL metadata lock cascade: how one ALTER TABLE freezes a whole table
- MySQL monitoring checklist: the signals every production instance needs
- MySQL monitoring maturity model: from survival to expert







