MySQL innodb_redo_log_capacity sizing: how big should the redo log be

An undersized InnoDB redo log is a common cause of MySQL write stalls. When redo generation outpaces dirty-page flushing, checkpoint age advances until InnoDB enters synchronous flushing and user writes block.

Since MySQL 8.0.30, innodb_redo_log_capacity replaces innodb_log_file_size and innodb_log_files_in_group. Resizing no longer requires a restart, but the sizing logic is unchanged: capacity must absorb peak write rates without pushing checkpoint age into the synchronous-flush zone.

What it is and why it matters

InnoDB is a write-ahead log engine. Modifications append redo records before dirty pages are written to data files. The redo log is circular: once the oldest entries are flushed during a checkpoint, that space is recycled.

Total redo capacity limits how far the current log sequence number (LSN) can advance ahead of the last checkpoint LSN. This gap is checkpoint age. As checkpoint age grows, InnoDB must flush dirty pages to advance the checkpoint and free log space.

In MySQL 8.0.30+, innodb_redo_log_capacity is the single control. The server maintains 32 files in #innodb_redo/, each sized to capacity / 32. The default is 100 MB, which is too small for most production write workloads.

Before 8.0.30, capacity was fixed at startup by innodb_log_file_size * innodb_log_files_in_group, stored as ib_logfile*. Changing it required shutdown, removal of old log files, and restart. That operational cost led many teams to undersize the log.

How it works

Transactions append redo, advancing the current LSN. Page cleaners flush dirty pages, advancing the checkpoint LSN. The difference is checkpoint age.

InnoDB ramps flushing as checkpoint age grows:

  • Below roughly 50% of capacity: adaptive flushing handles dirty pages without stressing I/O.
  • 50% to 75%: flushing becomes aggressive, consuming additional I/O bandwidth.
  • Above roughly 85%: InnoDB enters synchronous flushing, stalling user writes until the checkpoint advances.

This is a cliff, not a slope. A system running at 80% can hit a wall at 90% during a write burst.

flowchart TD
    W[Write workload appends redo] --> C[Checkpoint age grows]
    C --> Z{Age vs capacity}
    Z -->|"< 50%"| A[Adaptive flushing]
    Z -->|"50-75%"| B[Aggressive flushing]
    Z -->|"> 85%"| S[Synchronous flush stall]
    A --> F[Dirty pages flushed]
    B --> F
    S --> F
    F --> P[Checkpoint LSN advances]
    P --> C

Capacity buys runway for page cleaners to flush at a sustainable rate. It is only useful if storage and page cleaners can actually use that runway.

Sizing methodology

Target at least 300 seconds of peak write rate without checkpoint age exceeding 75% of total capacity. This means the redo log must hold at least 300 seconds of peak redo generation, plus margin for bursts and measurement jitter.

Step 1: Measure peak redo write rate

Use Innodb_os_log_written:

SHOW GLOBAL STATUS LIKE 'Innodb_os_log_written';

Wait 60 to 120 seconds during sustained peak load, then sample again:

redo_write_rate = (sample2 - sample1) / elapsed_seconds

Repeat during your highest sustained write window - bulk ETL, batch settlement, or peak OLTP. Average daily load is not relevant.

Step 2: Calculate minimum capacity

minimum_capacity = peak_write_rate * 300

Many operators target significantly more if write bursts are unpredictable or if maintenance windows restrict aggressive flushing. The tradeoff is longer crash recovery.

Step 3: Compare and set

On MySQL 8.0.30+:

SHOW GLOBAL VARIABLES LIKE 'innodb_redo_log_capacity';
SET GLOBAL innodb_redo_log_capacity = <bytes>;

Increasing capacity adds files gradually. Decreasing capacity triggers aggressive flushing to reclaim space before shrinking, which can cause a temporary I/O spike. Resize downward during low-traffic windows.

Before 8.0.30, calculate current capacity from:

SHOW GLOBAL VARIABLES LIKE 'innodb_log_file_size';
SHOW GLOBAL VARIABLES LIKE 'innodb_log_files_in_group';
-- Capacity = innodb_log_file_size * innodb_log_files_in_group

To change it, update both variables in the configuration file, shut down MySQL, remove the existing ib_logfile* files from the data directory, and start the server.

WARNING: Do not delete ib_logfile* while the server is running. Doing so destroys the redo log and can corrupt the instance. Always take a backup before changing redo log configuration.

Storage and flush alignment

Redo capacity is necessary but not sufficient. If storage cannot flush dirty pages fast enough, a large log only delays the stall. Align three variables with your hardware:

  • innodb_io_capacity: baseline IOPS InnoDB expects from storage.
  • innodb_io_capacity_max: maximum IOPS available for bursts such as adaptive flushing.
  • innodb_page_cleaners: number of threads dedicated to flushing dirty pages. Defaults to innodb_buffer_pool_instances.

If checkpoint age stays above 50% of capacity after you increase log size, the bottleneck is flush bandwidth or page cleaner count. Raise innodb_io_capacity_max to match actual storage throughput, or raise innodb_page_cleaners if storage has headroom but cleaners cannot saturate it.

Tradeoffs and common mistakes

Undersized log. The most common mistake. Teams size for average write rates or inherit a 100 MB default. During peaks, checkpoint age enters aggressive flushing, latency rises, and writes eventually block. The stall is often misdiagnosed as disk failure or lock contention.

Oversized log. A larger log improves stability but extends crash recovery because InnoDB replays more redo on startup. It also increases shutdown time with innodb_fast_shutdown=0 because more dirty pages must be flushed cleanly. Size for your peak headroom target, not the largest value the server accepts.

Reducing capacity at runtime. Decreasing innodb_redo_log_capacity on a busy system triggers space reclamation. Doing this during peak hours can induce the stall you were trying to avoid.

Log buffer saturation. innodb_log_buffer_size must hold redo generated between flushes. If Innodb_log_waits is nonzero, transactions are waiting for log buffer space. This is independent of log file capacity. Increase the buffer size if you see sustained log waits while checkpoint age is healthy.

Signals to watch in production

SignalWhy it mattersWarning sign
Checkpoint age / redo log capacityDirect measure of remaining headroom before stallsSustained > 75%
Innodb_os_log_written ratePeak write throughput used to size capacityTrending up over weeks
Innodb_log_waitsLog buffer is too small for burst writesNonzero sustained rate
Innodb_data_pending_fsyncsStorage cannot keep up with flush demandNonzero sustained
Innodb_buffer_pool_pages_dirty / totalDirty page backlog that must be flushed before checkpoint advances> 50% and climbing

On MySQL 8.0.30+, compute checkpoint age directly from status variables:

SELECT
  (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_redo_log_current_lsn')
  -
  (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_redo_log_checkpoint_lsn')
  AS checkpoint_age_bytes;

On pre-8.0.30 instances, parse SHOW ENGINE INNODB STATUS for Log sequence number and Last checkpoint at in the LOG section, or enable the log_lsn_checkpoint_age metric in INFORMATION_SCHEMA.INNODB_METRICS.

How Netdata helps

Netdata correlates checkpoint age as a percentage of innodb_redo_log_capacity with application-visible latency on the same timeline. It tracks Innodb_os_log_written rate for peak-load baselines, alerts when checkpoint age exceeds 75% of capacity, and surfaces Innodb_log_waits and Innodb_buffer_pool_wait_free to separate log buffer pressure from checkpoint stalls. Dirty page ratio is visualized alongside redo utilization to identify whether the bottleneck is log size or flush bandwidth.