MySQL Got error 28 from storage engine / No space left on device — recovery

When Got error 28 from storage engine appears, the underlying filesystem is at 100%. MySQL needs writable space for InnoDB data files, redo logs, binary logs, temporary tables, relay logs, and slow query logs. Writes fail immediately. Affected threads enter a retry loop, logging warnings until an operator frees space or kills the operation.

The most dangerous scenario is the redo log filling up. If checkpoint age approaches capacity, InnoDB forces aggressive synchronous flushing. Under sustained pressure this can trigger an unclean shutdown and crash recovery on restart. Replicas are similarly vulnerable: a full relay log partition stops the I/O thread and lets lag grow unboundedly, risking binlog expiry on the source before catch-up.

What this means

Errno 28 is ENOSPC. MySQL surfaces it as Got error 28 from storage engine, Can't create/write to file, or The table is full. This is not an internal MySQL limit; the underlying filesystem has no blocks left.

MySQL pauses the affected thread and retries. This applies to binary log writes, data file writes, and temporary table creation. However, REPAIR TABLE, OPTIMIZE TABLE, LOAD DATA INFILE, and ALTER TABLE create large temporary files. If the disk fills during one of these operations, MySQL removes the temporary files and marks the table as crashed. For ALTER TABLE, the original table is preserved but the operation is lost.

Keep these consumers distinct:

  • Binary logs: sequential files that accumulate until expiry or purge.
  • Redo log: circular, but since MySQL 8.0.30 lives in #innodb_redo/ under the datadir with capacity set by innodb_redo_log_capacity.
  • Relay logs: on replicas, fetched from the source and purged after apply.
  • Tmpdir: hidden files for sorts, joins, and online DDL.
  • Session temporary tablespace: ibtmp1, which grows under heavy temp table load and does not shrink automatically.
flowchart TD
    A[Disk partition reaches 100%] --> B{Which consumer?}
    B -->|Binlogs| C[Binlog writes block
Transactions wait] B -->|Redo log| D[Sharp checkpoint forced
Unclean shutdown risk] B -->|Tmpdir| E[Temp queries stall
Online DDL aborts] B -->|Relay logs| F[Replica IO thread stops
Lag grows unboundedly] B -->|Data files| G[All writes fail
Error 28 to clients]

Common causes

CauseWhat it looks likeFirst thing to check
Binary log accumulationBinlog partition full; replication may break if primary cannot write binlogsSHOW BINARY LOGS; and SHOW GLOBAL VARIABLES LIKE 'binlog_expire_logs_seconds';
Temporary table spill to tmpdirQueries with large GROUP BY or ORDER BY create hidden files in tmpdirSHOW GLOBAL STATUS LIKE 'Created_tmp%'; and SHOW PROCESSLIST;
Redo log or undo tablespace growth#innodb_redo/ or ibtmp1 consuming unexpected space on the data volumeSHOW ENGINE INNODB STATUS\G and du -sh on the data directory
Relay log accumulation on replicaReplica disk full; Seconds_Behind_Source growing without boundSHOW REPLICA STATUS\G and the Relay_Log_Space field
General or slow query log without rotationA single log file growing to fill its partitionSHOW GLOBAL VARIABLES LIKE 'slow_query_log_file'; and filesystem size

Quick checks

These checks identify which filesystem and subsystem are responsible.

# Identify the full filesystem
df -h
-- List binlog files and sizes
SHOW BINARY LOGS;
-- Check automatic expiry policy (MySQL 8.0+)
SHOW GLOBAL VARIABLES LIKE 'binlog_expire_logs_seconds';
-- For MySQL 5.7, check the deprecated variable:
SHOW GLOBAL VARIABLES LIKE 'expire_logs_days';
-- Temporary table spill ratio
SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables';
SHOW GLOBAL STATUS LIKE 'Created_tmp_tables';
-- On replicas, check relay log footprint
SHOW REPLICA STATUS\G
-- Look for Relay_Log_Space
-- Find long-running operations that may be building temp files
SHOW PROCESSLIST;
-- Check redo log checkpoint age and warnings
SHOW ENGINE INNODB STATUS\G
-- Look for LOG section: Log sequence number vs Last checkpoint at
# Estimate redo log size on MySQL 8.0.30+
du -sh /var/lib/mysql/#innodb_redo/
# Pre-8.0.30 redo logs:
ls -lh /var/lib/mysql/ib_logfile*
# Check session temp tablespace size
ls -lh /var/lib/mysql/ibtmp1

How to diagnose it

  1. Confirm the filesystem with df -h. tmpdir, relay logs, and slow query logs often live on separate mount points or on root. Do not assume the data partition is the culprit.

  2. Map the full partition to a MySQL consumer. Binlog directory full: SHOW BINARY LOGS shows large cumulative sizes. tmpdir full: the error log references temp files and errno: 28. Data directory full: inspect #innodb_redo/, ibtmp1, and .ibd files.

  3. Check the error log for the exact file path. MySQL logs the specific file it was writing when ENOSPC occurred.

  4. Correlate with status variables. A high Created_tmp_disk_tables ratio with a full tmpdir points to a runaway query. A large Relay_Log_Space on a replica with normal apply threads means fetch outpaces consumption. Binlogs with no expiry policy point to configuration drift.

  5. Identify whether the growth is sudden or gradual. Sudden spikes usually mean a single large operation (ALTER TABLE, LOAD DATA INFILE, or an unoptimized aggregation). Gradual growth suggests missing expiry, data growth, or undo accumulation from a long transaction.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Disk utilization on data, binlog, tmpdir, and relay partitionsMySQL has no graceful degradation for ENOSPC; writes stop at 100%Sustained > 80% on any MySQL-related filesystem
Binary log size and ageUnexpired binlogs accumulate indefinitely if a replica lags or if expiry is disabledOldest binlog age exceeds binlog_expire_logs_seconds by a day or more
Created_tmp_disk_tables / Created_tmp_tables ratioIndicates queries spilling from memory to tmpdir, which can silently fill a partitionRatio > 25% and climbing
Relay_Log_SpaceReplicas that cannot apply events fast enough accrue relay logs until the disk fillsGrowing monotonically alongside replication lag
Checkpoint age / redo log capacityAn undersized redo log forces sharp checkpoints and can cause unclean shutdownsCheckpoint age > 75% of innodb_redo_log_capacity
ibtmp1 sizeThe InnoDB session temporary tablespace never shrinks; sustained temp table load fills itFile size trending upward over days

Fixes

Binary logs consuming the partition

If SHOW BINARY LOGS confirms binlogs are the dominant consumer and replicas have consumed the older files:

-- Verify replica positions first if not using GTID auto-positioning
PURGE BINARY LOGS BEFORE '2024-06-07 00:00:00';

For a permanent fix, set binlog_expire_logs_seconds (MySQL 8.0+) instead of the deprecated expire_logs_days. In MySQL 5.7, expire_logs_days defaults to 0, meaning binlogs never expire.

Tradeoff: Premature purging breaks replication if a replica has not fetched the events. Always confirm replica catch-up before manual purge.

Tmpdir exhausted by a query

Identify the query via SHOW PROCESSLIST. Look for Copying to tmp table or Sending data paired with a heavy GROUP BY, ORDER BY, or DISTINCT. Kill it:

KILL <processlist_id>;

MySQL removes the hidden temp files on abort.

Tradeoff: The operation fails and must be retried. If the query is an online ALTER TABLE, the table is preserved but the alteration is lost.

General or slow query log filling the disk

Do not rm the file while MySQL holds the file descriptor. Truncate it instead:

# On the host, after copying the log elsewhere if needed
> /var/lib/mysql/slow.log

Then tell MySQL to reopen the file handle:

FLUSH LOGS;

Tradeoff: Truncation destroys log history. Copy the file to another partition first if forensic analysis is required.

Redo log pressure

You cannot delete redo log files while the server is running. Never remove #innodb_redo/ files or ib_logfile* after an unclean shutdown and expect clean recovery. If the redo log partition is full because capacity is undersized, free space by removing non-essential files on that filesystem, or expand the volume.

On MySQL 8.0.30+, increase capacity dynamically if the server is still responsive:

SET GLOBAL innodb_redo_log_capacity = <new_size>;

Pre-8.0.30, changing redo log size requires a clean shutdown with innodb_fast_shutdown=0 and a restart.

Tradeoff: Increasing redo log capacity consumes more disk permanently. Plan the partition size accordingly.

Relay logs filling the replica disk

If the replica cannot apply events fast enough and the relay log partition is critical, pause the I/O thread to stop fetching new events:

STOP REPLICA IO_THREAD;

This prevents disk exhaustion at the cost of increasing lag. The permanent fix is to increase replica apply capacity (enable multi-threaded apply in 8.0.27+, scale CPU/disk) or increase the relay log partition size.

Tradeoff: Stopping the I/O thread accepts unbounded lag. If the source’s binlog expires before you resume, the replica requires a full rebuild.

Prevention

  • Set binary log expiry explicitly. Use binlog_expire_logs_seconds in MySQL 8.0+. Do not rely on defaults, because MySQL 5.7 defaults to never expiring.
  • Monitor tmpdir separately. Place tmpdir on its own partition or volume so that a runaway query cannot take down the data directory.
  • Size redo log capacity for peak write rate. On MySQL 8.0.30+, set innodb_redo_log_capacity so checkpoint age stays below 50% during normal peaks.
  • Set relay_log_space_limit on replicas. This caps total relay log space and stops the I/O thread before the filesystem fills.
  • Keep 20% headroom on all MySQL filesystems. Track daily growth rates and alert at 80%, not at 95%.
  • Watch the temp table disk ratio. A climbing Created_tmp_disk_tables ratio is an early signal that queries are spilling and tmpdir pressure is building.

How Netdata helps

  • Correlates disk utilization per mount point with MySQL internal signals to distinguish a full binlog volume from a full tmpdir.
  • Tracks Created_tmp_disk_tables and Created_tmp_tables to surface temp-table spill before the partition fills.
  • Monitors Relay_Log_Space alongside replication lag to show when relay log growth outpaces apply rate.
  • Surfaces redo log checkpoint age as a percentage of capacity for early warning before sharp checkpoint stalls.
  • Alerts on disk space thresholds with enough runway to purge binlogs or kill a runaway query before errno 28 hits.