$ guides / mysql / mysql-crash-recovery-slow-startup ▌

Operations Guides

MySQL stuck in InnoDB crash recovery: why startup hangs after an unclean shutdown

After an unclean shutdown, mysqld may look healthy: the process is running, port 3306 accepts connections, and SHOW GLOBAL STATUS LIKE 'Uptime' counts up. Yet applications time out, SELECT 1 hangs, and the error log prints InnoDB recovery messages. InnoDB is replaying the redo log to make data files consistent, rolling back incomplete transactions, merging the change buffer, and running purge. Until that finishes, queries are rejected.

Recovery time is proportional to unflushed work at crash time. A small redo log and clean transaction set recovers in seconds; gigabytes of redo, many dirty pages, or a long uncommitted transaction can take minutes or hours. Do not mistake the open TCP port for “MySQL is ready.” Do not restart because queries are slow, and do not confuse a single long recovery with a restart loop.

What this means

InnoDB is a write-ahead logging (WAL) engine. Committed changes go to the redo log first; dirty pages flush to tablespaces later. After a crash, the redo log may contain changes not yet in the data files. On startup, InnoDB automatically discovers tablespaces, applies the redo log, rolls back uncommitted transactions, merges the change buffer, and purges old row versions. See InnoDB recovery.

During redo log application, mysqld may accept TCP connections, but queries are rejected. Uptime counts from process start, so it increases while recovery runs. MySQL 8.0+ accepts connections early after redo application and finishes rollback in the background; until then, the server is not usable. If a tablespace referenced by the redo log is missing, recovery terminates. If the redo log is corrupt or rollback cannot complete, recovery may hang indefinitely.

flowchart TD
    A[Unclean shutdown] --> B[Redo log holds unflushed changes]
    B --> C[mysqld starts and listens on 3306]
    C --> D[InnoDB replays redo log]
    D --> E[Queries are rejected]
    E --> F{Recovery complete?}
    F -->|Yes| G[Queries accepted]
    F -->|No| H[Wait, or use force recovery for extraction]

Common causes

Cause	What it looks like	First thing to check
Large redo log after a write burst	Error log shows log scan progressing slowly through many records; recovery takes minutes	Redo log capacity and checkpoint age before the crash
Long-running uncommitted transaction at crash	Error log shows “Rolling back trx id” for one transaction with little other progress	`INNODB_TRX` history or error log for the transaction ID
Missing tablespace referenced in redo	Recovery aborts with “cannot find tablespace”	`.ibd` file exists and `innodb_directories` covers all paths
Corrupt redo log or data page	Repeated errors, assertion failures, or looped “Out of space in the redo log”	Error log for stack traces and corruption messages
Restart loop misidentified as slow recovery	`Uptime` resets every few minutes; error log repeats crash and recovery	Process start time versus `Uptime`; systemd/kubelet restart events

Quick checks

Run these read-only checks before making changes.

# mysqld is up, but recovery may still be running
mysqladmin -u monitor -p status

-- Server uptime
SHOW GLOBAL STATUS LIKE 'Uptime';

# Current recovery phase; adjust path to match log_error
tail -f /var/log/mysql/error.log

-- MySQL 8.0.30+: estimate redo log replay scope
SELECT
  (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_redo_log_current_lsn') -
  (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_redo_log_checkpoint_lsn')
  AS checkpoint_age;

-- Redo log capacity
SHOW GLOBAL STATUS LIKE 'Innodb_redo_log_capacity_resized';

# Confirm whether queries execute or block
time mysql -u monitor -p -e "SELECT 1;"

# Compare process age to Uptime to detect a restart loop
ps -o pid,etime,cmd -p $(pgrep -x mysqld)

-- MySQL 8.0+: transactions rolling back after startup
SELECT COUNT(*) FROM information_schema.INNODB_TRX WHERE trx_state = 'ROLLING BACK';

How to diagnose it

Confirm the symptom. mysqld is running, TCP port 3306 is open, but SELECT 1 hangs or errors. Uptime is low and climbing. Do not restart the server; a restart replays redo from the beginning.
Read the error log from crash time forward. Look for “InnoDB: Starting crash recovery,” “Doing recovery,” “Rolling back trx id,” or “Applying log.” The log tells you whether InnoDB is making progress. Progress can be slow on large redo logs.
Estimate recovery scope. On MySQL 8.0.30+, compute checkpoint age from the Quick checks and compare it to Innodb_redo_log_capacity_resized. A checkpoint age close to capacity means large replay. On earlier versions, parse SHOW ENGINE INNODB STATUS for “Log sequence number” and “Last checkpoint at.” There is no built-in ETA; divide the gap by your storage throughput to approximate minutes, not seconds.
Identify long rollback work. If the error log shows one transaction rolling back for a long time, that transaction may have run for hours before the crash. Rollback of a large transaction can take longer than the original transaction. The safe options are to wait, or to start with innodb_force_recovery and drop the offending table. Killing mysqld does not speed this up.
Distinguish a restart loop. Compare the mysqld process start time from ps with Uptime. If Uptime resets while the process is unchanged, MySQL is crashing and recovering in a loop. If the process is recreated, check systemd, Kubernetes, or your orchestrator for back-to-back restarts. Restart loops have different root causes: corrupt system tables, full disk, or invalid configuration.
Handle stuck or corrupt recovery. If the error log shows no progress for a long period, repeated “Out of space in the redo log” messages (MySQL bug #107594), or corruption errors, use innodb_force_recovery. The scale runs from 1 to 6; levels 4 through 6 place InnoDB in read-only mode. Use these only to extract data with mysqldump or SELECT, not for normal writes. See Forcing InnoDB Recovery.
Check tablespace discovery. Recovery must find every tablespace referenced in the redo log. If a file was moved, deleted, or stored outside datadir, recovery terminates. Verify that innodb_directories, datadir, innodb_data_home_dir, and innodb_undo_directory include the correct paths.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`Uptime`	Detects unexpected restarts	Sudden drop or repeated resets within an hour
Checkpoint age / redo log capacity	Scope of redo replay after crash	Gap > 50% of capacity before crash, or large absolute gap
Error log recovery phase	Shows whether recovery is progressing	Same phase and trx id stuck for many minutes
`Questions` rate	Confirms the server is executing queries	Near zero after TCP is listening while connections exist
`Threads_connected`	Misleading during recovery	High value before `Uptime` stabilizes and queries return
`INNODB_TRX` rollback count	Reveals long rollback work	Transactions in `ROLLING BACK` state persisting

Fixes

Wait for automatic recovery

If the error log shows steady progress, wait. Restarting mysqld aborts current recovery and restarts redo replay from the beginning, making the outage longer. Monitor the error log for “ready for connections.” Do not schedule backups or run FLUSH commands during recovery.

Start with `innodb_force_recovery` to extract data

If a runaway rollback blocks recovery, add innodb_force_recovery=3 to my.cnf, start, drop the offending table, then remove the setting and restart. This bypasses normal safety checks. Use it only to extract or remove the blocking object, not to resume production writes. Levels 4 through 6 make InnoDB read-only; plan for a full restore from backup afterward.

Restore missing tablespaces

If recovery terminates because a tablespace is missing, restore the missing .ibd file from backup or place it back in the path recorded in innodb_directories. Tablespace discovery must happen in the original location. If the tablespace is unimportant and has no foreign key relationships, you may be able to discard it after starting with innodb_force_recovery, but that risks data loss.

Break a restart loop

A restart loop is not a slow recovery. Identify the crash trigger from the error log just before each restart: disk full, assertion failure, corrupt data dictionary, or invalid system variable. Fix the underlying condition. Do not raise innodb_force_recovery above 3 unless you are extracting data.

Resize the redo log after recovery

If the root cause was an undersized redo log that repeatedly fills, plan a maintenance window to increase innodb_redo_log_capacity on MySQL 8.0.30+, or innodb_log_file_size and innodb_log_files_in_group on older versions. This does not speed up current recovery, but it reduces future replay size.

Never delete redo logs manually

Deleting #innodb_redo files or old ib_logfile* files to skip recovery causes data loss and can make the server unstartable. Remove redo logs only after a clean shutdown with innodb_fast_shutdown=0 or 1, and only when intentionally reinitializing the instance.

Prevention

Use clean shutdowns: mysqladmin shutdown, systemd graceful stops, or the SHUTDOWN SQL statement. Avoid kill -9 on mysqld.
Before planned maintenance, set innodb_fast_shutdown=0 to perform a full purge and change buffer merge. This shortens recovery on restart at the cost of a slower shutdown. The default of 1 defers that work.
Size the redo log for your write rate. On MySQL 8.0.30+, use innodb_redo_log_capacity. On earlier versions, use innodb_log_file_size and innodb_log_files_in_group. Monitor checkpoint age and keep peak usage below 75% of capacity.
Keep transactions short. Long uncommitted transactions block purge, locks, and create long rollback times after a crash.
Do not move the data directory after a crash. Ensure innodb_directories and related path variables cover all tablespaces, undo files, and external tables.
Monitor Uptime, error log recovery keywords, and process restart events to distinguish a long recovery from a restart loop.
Maintain tested backups. innodb_force_recovery is an extraction tool, not a repair tool; final recovery usually involves a restore.

How Netdata helps

Detects unexpected Uptime drops and restart-loop patterns.
Tracks Innodb_redo_log_current_lsn, Innodb_redo_log_checkpoint_lsn, and capacity on MySQL 8.0.30+, showing replay scope.
Monitors Questions rate and Threads_connected to expose when the server accepts TCP connections but does not execute queries.
Surfaces MySQL error log anomalies, including recovery phase keywords.
Correlates MySQL signals with OS metrics such as disk I/O, memory pressure, and OOM kills to identify the original cause of the unclean shutdown.

The Netdata solution

MySQL monitoring with Netdata

Netdata monitors MySQL and MariaDB with per-second metrics and ML anomaly detection. Track connection usage, query throughput, slow queries, redo-log pressure, and replication lag alongside the host and storage signals that explain them.

See MySQL monitoring → Start monitoring free

MySQL stuck in InnoDB crash recovery: why startup hangs after an unclean shutdown

MySQL stuck in InnoDB crash recovery: why startup hangs after an unclean shutdown

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Wait for automatic recovery

Start with innodb_force_recovery to extract data

Restore missing tablespaces

Break a restart loop

Resize the redo log after recovery

Never delete redo logs manually

Prevention

How Netdata helps

Related guides

MySQL monitoring with Netdata

Start with `innodb_force_recovery` to extract data