MySQL Innodb_log_waits > 0: the log buffer is too small (not a checkpoint stall)
InnoDB increments Innodb_log_waits every time a thread waits because the log buffer is full. A sustained nonzero rate means the buffer cannot absorb your write burst, or the storage layer cannot drain it fast enough.
This is not a checkpoint stall. Checkpoint stalls happen when the on-disk redo log fills and InnoDB forces synchronous dirty-page flushing. Innodb_log_waits measures in-memory log buffer pressure only. Confusing the two leads to tuning redo log file size while the real problem persists.
What this means
InnoDB writes redo records to the log buffer before commit. The buffer is sized by innodb_log_buffer_size. When a thread cannot reserve space, it waits for the log writer to flush to disk. Each wait increments Innodb_log_waits.
In a healthy system, this counter stays at zero. A sustained positive rate means the buffer is undersized for the write rate, or storage fsync is too slow. A single large transaction can also spike the counter by generating more redo than the buffer holds in one shot.
Checkpoint age pressure is a different mechanism. Checkpoint age tracks how much on-disk redo log space has been written since the last checkpoint. When it nears capacity, InnoDB stalls writes to flush dirty pages. That stall is measured by checkpoint age, not Innodb_log_waits. Both can coexist, but Innodb_log_waits alone does not indicate a checkpoint stall.
flowchart TD
A[Innodb_log_waits > 0] --> B{Checkpoint age high?}
B -->|Yes| C[Checkpoint stall
address redo capacity
and flush rate]
B -->|No| D{Pending fsyncs > 0?}
D -->|Yes| E[Storage fsync bottleneck
check disk latency]
D -->|No| F[Log buffer too small]
F --> G[Check INNODB_TRX
for large transactions]
G --> H[Increase buffer
or split commits]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
innodb_log_buffer_size too small | Sustained Innodb_log_waits during normal write load; rate scales with transaction throughput | SHOW GLOBAL VARIABLES LIKE 'innodb_log_buffer_size'; |
| Slow redo log fsync | Innodb_log_waits paired with Innodb_os_log_pending_fsyncs > 0 or rising disk write latency | SHOW GLOBAL STATUS LIKE 'Innodb_os_log_pending_fsyncs'; plus OS disk metrics |
| Large transaction | Brief spike correlating with a single transaction modifying many rows | SELECT * FROM information_schema.INNODB_TRX ORDER BY trx_rows_modified DESC; |
Quick checks
Run these read-only checks to confirm the signal and separate it from a checkpoint stall.
-- Confirm the counter and current buffer size
SHOW GLOBAL STATUS LIKE 'Innodb_log_waits';
SHOW GLOBAL VARIABLES LIKE 'innodb_log_buffer_size';
-- Check redo write rate and pending I/O
SHOW GLOBAL STATUS LIKE 'Innodb_os_log_written';
SHOW GLOBAL STATUS LIKE 'Innodb_os_log_pending_fsyncs';
SHOW GLOBAL STATUS LIKE 'Innodb_os_log_pending_writes';
-- Check for large active transactions
SELECT trx_id, trx_mysql_thread_id, trx_started, trx_rows_modified, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_rows_modified DESC LIMIT 5;
-- MySQL 8.0.30+: compute checkpoint age to rule out stall
SELECT
CAST((SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_redo_log_current_lsn') AS UNSIGNED) -
CAST((SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_redo_log_checkpoint_lsn') AS UNSIGNED)
AS checkpoint_age;
-- Pre-8.0.30: read checkpoint age from INNODB_METRICS
-- Note: enable the metric first if it returns zero
SELECT `COUNT` FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME = 'log_lsn_checkpoint_age';
# OS-level write latency on the redo log device
iostat -x 1 5
How to diagnose it
Confirm the signal is sustained. Take two samples of
Innodb_log_waitsone minute apart. A single increment during a burst may be harmless. A steady climb means active pressure.Rule out a checkpoint stall. Compute checkpoint age as a ratio of total redo log capacity. In MySQL 8.0.30+, capacity is
innodb_redo_log_capacity. In earlier versions, it isinnodb_log_file_sizemultiplied byinnodb_log_files_in_group. If the ratio is below 75%, the redo log files are not the problem. If it is above 75%, you have a checkpoint stall in addition to, or instead of, a buffer issue.Check for storage fsync pressure. If
Innodb_os_log_pending_fsyncsorInnodb_os_log_pending_writesis sustained above zero, the disk subsystem cannot keep up with the flush rate. This causes the buffer to back up even if it is nominally large enough.Look for a large transaction. Query
information_schema.INNODB_TRXordered bytrx_rows_modified. A transaction with millions of modified rows can fill the log buffer regardless of the configured size.Baseline the write rate. Divide the delta of
Innodb_os_log_writtenby the elapsed seconds between samples. If the rate is high and steady, the buffer is simply too small for your workload.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Innodb_log_waits rate | Direct indicator of log buffer pressure | Any sustained nonzero rate |
Innodb_os_log_pending_fsyncs | Storage cannot keep up with redo durability requests | Sustained nonzero value |
Innodb_os_log_pending_writes | Storage cannot keep up with redo log write volume | Sustained nonzero value |
| Checkpoint age / redo capacity ratio | Distinguishes buffer waits from checkpoint stall; both may coexist | > 75% of capacity |
Innodb_os_log_written rate | Quantifies the write load driving buffer turnover | Sudden burst correlating with waits |
| Disk write latency on redo device | Reveals storage-level slowness independent of buffer size | Latency spikes correlating with wait episodes |
Fixes
Increase the log buffer size
If checkpoint age is healthy and storage latency is low, increase innodb_log_buffer_size. The default in MySQL 8.0 is 16 MB. Production write-heavy workloads often need more headroom. This variable is not dynamic; schedule a restart to apply the change.
Reduce redo log fsync latency
If Innodb_os_log_pending_fsyncs is sustained above zero, the bottleneck is storage throughput or latency, not buffer size. Investigate the disk subsystem hosting the redo log. In MySQL 8.0.30+, redo log files live in the #innodb_redo/ directory inside the data directory. In earlier versions, they are the ib_logfile* files in the data directory. If the underlying storage is shared, degraded, or misconfigured, moving to dedicated, low-latency storage may be necessary.
Warning: Reducing fsync frequency by changing
innodb_flush_log_at_trx_commitsacrifices durability and risks data loss on crash. Do not use this as a permanent fix.
Split large transactions
If a single transaction in INNODB_TRX is responsible for the spike, break the work into smaller commits. Each commit flushes the buffer, preventing it from filling monotonically during one massive operation.
Prevention
- Monitor
Innodb_log_waitsas a binary health signal. Zero should be the steady state. - Baseline
Innodb_os_log_writtenduring your peak write window before deploying new write-heavy features. - Track OS-level disk write latency on the volume hosting the redo log so storage degradation does not masquerade as a buffer sizing issue.
- Review application batch jobs and ETL processes to ensure they commit in bounded-size chunks.
How Netdata helps
- Netdata collects
Innodb_log_waitsandInnodb_os_log_pending_fsyncsfromSHOW GLOBAL STATUS. - Disk latency charts on the redo log partition are correlated with the MySQL metrics, distinguishing a storage bottleneck from an undersized buffer.
- Checkpoint age is tracked alongside log waits, preventing the common misdiagnosis of checkpoint stalls.
- Per-second resolution catches brief spikes from large transactions that slower polling intervals might smooth over.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL connection exhaustion: detection, diagnosis, and prevention
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction
- MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server
- MySQL gap locks and next-key locks: surprising deadlocks under REPEATABLE READ
- MySQL ERROR 1205: Lock wait timeout exceeded; try restarting transaction
- MySQL long-running transactions: detecting and killing the silent blocker
- MySQL: got a packet bigger than ‘max_allowed_packet’ bytes - causes and fixes
- MySQL metadata lock cascade: how one ALTER TABLE freezes a whole table
- MySQL monitoring checklist: the signals every production instance needs







