MySQL slow commits with idle CPU: redo and binlog fsync pressure
Slow commits or transaction timeouts with idle CPU usually mean the bottleneck is in the durability path, not the query engine.
With innodb_flush_log_at_trx_commit=1 and sync_binlog=1, every COMMIT triggers an InnoDB redo log fsync() and a binary log fsync() before returning to the client. These are sequential, latency-bound operations. When storage cannot drain fsync requests as fast as the server generates them, commits queue up while the CPU waits.
This is not a checkpoint stall. A checkpoint stall happens when the redo log fills and InnoDB must flush dirty pages synchronously to reclaim log capacity. Fsync pressure happens when the log flusher thread cannot complete fsync() calls fast enough to keep up with the commit rate, even though redo log capacity is abundant. Check Innodb_os_log_pending_fsyncs to tell the difference.
What this means
When a client issues COMMIT, InnoDB writes redo entries to the log buffer. A dedicated log flusher thread issues fsync() to make them durable. If binary logging is enabled and sync_binlog >= 1, the server also fsyncs the binary log in the ordered commit pipeline. User threads wait until their log sequence number is flushed.
If storage latency is elevated, the flusher thread falls behind. Pending fsyncs accumulate in OS and device queues. Committing transactions block on the durability barrier. Because the wait is I/O-bound, CPU stays low while query latency spikes.
The critical signals are Innodb_os_log_pending_fsyncs and Innodb_os_log_pending_writes. In a healthy system these are zero. Sustained values greater than zero mean storage cannot keep up with the commit rate.
flowchart TD
A[Client COMMIT] --> B[Redo log buffer write]
B --> C[Redo log fsync]
C --> D[Binlog fsync]
D --> E[Commit returns]
C -.->|Slow storage| F[Pending redo fsyncs]
D -.->|Slow storage| G[Pending binlog fsyncs]
F --> H[Innodb_os_log_pending_fsyncs > 0]
G --> I[Commit latency spikes]
H --> ICommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Storage latency spike | Innodb_os_log_pending_fsyncs > 0 sustained; OS disk await elevated; no query plan regressions | iostat -x on the redo and binlog devices |
| Durability settings too strict for storage tier | Every commit triggers two fsyncs; commit rate exceeds storage IOPS ceiling | innodb_flush_log_at_trx_commit and sync_binlog |
| Redo log and binlog sharing the same device | Both sequential fsync streams compete for the same disk queue; latency doubles under load | Filesystem paths for redo log and binlog |
| Shared storage contention | Periodic latency spikes that correlate with neighbor activity; no single query culprit | Hypervisor or cloud volume metrics |
| Network-attached storage latency | Fsync latency measured in tens to hundreds of milliseconds; affects all durable writes | NFS or SAN latency metrics |
Quick checks
-- Pending redo fsyncs and writes
SHOW GLOBAL STATUS LIKE 'Innodb_os_log_pending%';
-- Aggregate pending fsyncs across all tablespaces
SHOW GLOBAL STATUS LIKE 'Innodb_data_pending_fsyncs';
-- Durability configuration
SHOW GLOBAL VARIABLES WHERE Variable_name IN ('innodb_flush_log_at_trx_commit', 'sync_binlog', 'innodb_flush_method');
-- InnoDB FILE I/O for pending log flushes
SHOW ENGINE INNODB STATUS\G
-- Look for: Pending flushes (fsync) log: N
-- Checkpoint age to rule out capacity stall
SHOW ENGINE INNODB STATUS\G
-- Compare Log sequence number to Last checkpoint at
# OS-level disk latency on the transaction log device
iostat -x 1
How to diagnose it
- Confirm the pattern. Check
SHOW GLOBAL STATUSforThreads_runningandThreads_connected. IfThreads_runningis well below the CPU core count, the buffer pool hit ratio is healthy, and queries are still slow, suspect the durability path rather than query complexity or CPU saturation. - Check pending fsyncs. Run
SHOW GLOBAL STATUS LIKE 'Innodb_os_log_pending%';. Sustained nonzero values forInnodb_os_log_pending_fsyncsorInnodb_os_log_pending_writesindicate the redo log flusher thread is behind and fsyncs are stacking up in the storage queue. - Read InnoDB status. Run
SHOW ENGINE INNODB STATUS;and look at theFILE I/Osection.Pending flushes (fsync) log: NwithN > 0confirms the storage layer is not draining fsync requests fast enough. - Rule out checkpoint stall. Calculate checkpoint age from the
LOGsection (Log sequence numberminusLast checkpoint at) and compare it to total redo log capacity. In MySQL 8.0.30+, checkInnodb_redo_log_capacity_resized. If checkpoint age is well below 75% of capacity but fsyncs are pending, the issue is fsync speed, not log capacity. - Quantify the fsync load. Check
innodb_flush_log_at_trx_commitandsync_binlog. If both are1, every commit generates two fsyncs. Multiply your commit rate by two and compare that to the sustained IOPS and latency profile of your storage. If the product exceeds what the device can deliver at low latency, you have a mathematical mismatch. - Inspect disk latency. Run
iostat -x 1on the devices hosting the redo logs and binary logs. Elevatedawaitwith low throughput means the device is struggling with synchronous operations, not bandwidth. Watchaqu-sz(average queue size); values above the device queue-depth limit indicate saturation. - Check for large transactions. Large transactions increase the amount of data flushed per fsync and hold resources in the commit pipeline longer. Review
Binlog_cache_disk_useandinformation_schema.INNODB_TRXfor abnormally large writers or long-running transactions. - Correlate with application metrics. If end-to-end commit latency tracks storage fsync latency one-for-one, the diagnosis is confirmed.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Innodb_os_log_pending_fsyncs | Direct gauge of redo log flusher thread backlog | Sustained value > 0 |
Innodb_os_log_pending_writes | Pending log writes waiting for I/O | Sustained value > 0 |
Innodb_data_pending_fsyncs | Aggregate fsync backlog across InnoDB data files | Sustained value > 0 |
| Checkpoint age / redo log capacity | Distinguishes fsync pressure from capacity exhaustion | Age < 75% but commits are still slow = fsync speed issue |
Innodb_os_log_fsyncs rate | Baseline for fsync throughput capacity | Rate collapses while pending fsyncs rise = storage is stalling |
Questions or commit rate | Application-visible throughput | Sudden drop with steady Threads_connected while CPU is idle |
Fixes
Reduce fsync frequency
If the storage tier cannot sustain the current fsync rate, relax durability knobs. Each change trades safety for speed.
- Set
innodb_flush_log_at_trx_commit=2. Redo log entries are written to the OS buffer on every commit, butfsync()happens only once per second. This eliminates the per-commit redo fsync. The tradeoff is up to one second of data loss on an OS crash or power failure. It is safe against a mysqld process crash. - Set
sync_binlog=0. The binary log fsync is deferred to the operating system. The tradeoff is risk of binlog corruption or data loss on crash. Alternatively, setsync_binlogto a value greater than 1 to fsync the binlog once every N commits instead of every commit. - Do not change both settings to zero unless you explicitly accept the risk of losing committed transactions.
Isolate redo and binlog I/O
If the redo log and binary log reside on the same physical device, their sequential fsync streams compete for the same disk queue. Move them to separate volumes backed by distinct physical devices or controllers so that a spike in one does not inflate latency for the other. Update innodb_log_group_home_dir and log_bin accordingly; the change requires a restart.
Upgrade or tune storage
If the storage subsystem is the bottleneck, increase its synchronous IOPS capacity. Options include moving to local NVMe, increasing provisioned IOPS on cloud volumes, resolving RAID rebuilds, or moving off oversubscribed shared storage. If the storage has a write-back cache, ensure it is battery-backed or non-volatile before relying on it for durability.
Prevention
- Monitor
Innodb_os_log_pending_fsyncscontinuously. Any sustained nonzero value means storage headroom is gone and commit latency will follow. - Size storage for peak commit rate multiplied by the fsync multiplier imposed by your durability settings. With
innodb_flush_log_at_trx_commit=1andsync_binlog=1, every commit generates two fsyncs. - Keep transaction logs and binlogs on separate volumes when infrastructure allows.
- Avoid colocating high-fsync workloads such as backups, schema migrations, or batch imports on the same storage controller as the transaction logs.
- Track transaction latency in
performance_schemaso that fsync pressure is visible before application timeouts trigger.
How Netdata helps
- Netdata tracks
Innodb_os_log_pending_fsyncsandInnodb_os_log_pending_writeswithout manual polling ofSHOW GLOBAL STATUS. - It plots MySQL commit latency against OS disk
awaitand utilization on the same timeline, confirming whether storage is the bottleneck. - Checkpoint age and pending fsyncs are visualized together, distinguishing a capacity stall from a write-speed stall in one view.
- Alerts on sustained nonzero pending fsyncs provide early warning before commit latency degrades into application timeouts.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches
- MySQL connection exhaustion: detection, diagnosis, and prevention
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction
- MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server
- MySQL gap locks and next-key locks: surprising deadlocks under REPEATABLE READ
- MySQL ERROR 1205: Lock wait timeout exceeded; try restarting transaction
- MySQL long-running transactions: detecting and killing the silent blocker
- MySQL: got a packet bigger than ‘max_allowed_packet’ bytes - causes and fixes
- MySQL metadata lock cascade: how one ALTER TABLE freezes a whole table







