MySQL binary logs filling the disk: expiry, lagging replicas, and purge
You get a disk-full alert on the MySQL primary. df -h shows the data partition at 95%, and du points to /var/lib/mysql/binlog.* consuming hundreds of gigabytes. Writes are about to fail.
Binary logs are append-only. MySQL rotates to a new file at max_binlog_size (default 1 GB), but rotation does not delete old files. Deletion only happens via automatic expiry or PURGE BINARY LOGS. MySQL refuses to delete any binlog file that a connected replica has not yet consumed. In MySQL 5.7, the default expiry is never (expire_logs_days = 0). In 8.0, the default is 30 days, but automatic purge cannot remove files that a connected replica still needs. A replica lagging past the expiry window therefore causes unbounded growth.
If you are responding to an incident, confirm whether binlogs can be safely purged without breaking replication, then free space. If you are reading to prevent the next incident, configure explicit expiry and monitor the signals that predict exhaustion.
What this means
The binary log records every data-modifying event on the primary. Replicas pull events via the I/O thread and apply them. MySQL’s purge logic is conservative: it will not remove a binlog file still needed by any connected replica. A single lagging replica can hold the entire retention window hostage. The primary generates new binlog files proportional to write workload. Without purge, disk usage grows linearly.
When the binlog partition fills, the primary cannot write new events and commits stall. On replicas, a full relay log partition stops the I/O thread, causing replication lag to grow. If the primary purges events before the replica fetches them, replication breaks irrecoverably.
flowchart TD
A[Disk alert on primary] --> B[SHOW BINARY LOGS]
B --> C{Binlogs dominant?}
C -->|No| D[Check ibdata1, tmpdir, undo]
C -->|Yes| E[Check expiry config]
E --> F{Expiry set?}
F -->|No| G[Configure expiry]
F -->|Yes| H[Check replica status]
H --> I{Replica blocking?}
I -->|Yes| J[Fix or remove replica]
I -->|No| K[Check for bulk loads]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| No expiry configured | MySQL 5.7 with expire_logs_days = 0; hundreds of binlog files | SHOW GLOBAL VARIABLES LIKE 'expire_logs%' |
| Lagging replica | Seconds_Behind_Source growing or NULL; Relay_Log_Space accumulating | SHOW REPLICA STATUS\G on every replica |
| Large transactions spilling to disk | Sudden binlog growth after a bulk load; Binlog_cache_disk_use increasing | SHOW GLOBAL STATUS LIKE 'Binlog_cache_%' |
| Delayed replica | SQL_Delay > 0; replica intentionally lags by hours or days | SHOW REPLICA STATUS\G and check SQL_Delay |
Quick checks
# Check filesystem utilization for the MySQL data directory
df -h /var/lib/mysql
-- Sum binary log sizes and count files
SHOW BINARY LOGS;
-- Check automatic expiry configuration for your version
SHOW GLOBAL VARIABLES LIKE 'binlog_expire_logs_seconds';
SHOW GLOBAL VARIABLES LIKE 'expire_logs_days';
-- Check replication thread state and lag (8.0.22+)
SHOW REPLICA STATUS\G
-- For MySQL 5.7: SHOW SLAVE STATUS\G
-- Check if large transactions are spilling to disk
SHOW GLOBAL STATUS LIKE 'Binlog_cache_%';
# Check current binlog file sizes directly on disk
du -sh /var/lib/mysql/binlog.*
How to diagnose it
- Confirm binlogs are the primary disk consumer. Run
SHOW MASTER STATUS;to identify the current file, thenSHOW BINARY LOGS;to list all files and sum theFile_sizecolumn. If binlogs are not the majority of consumed space, investigateibdata1, undo tablespaces, or temp table spills instead. - Verify expiry configuration. On MySQL 8.0,
binlog_expire_logs_secondsshould match your recovery window. On 5.7,expire_logs_daysdefaults to 0, which means logs accumulate forever. - Inspect every replica. On each replica, run
SHOW REPLICA STATUS\G(orSHOW SLAVE STATUS\Gon 5.7). CheckReplica_IO_Running,Replica_SQL_Running,Seconds_Behind_Source, andMaster_Log_File. A thread notYes, or lag that is growing, means that replica is blocking purge. - Determine if growth is organic or from a recent burst. Compare the current
SHOW BINARY LOGSfile count to yesterday’s baseline. A sudden spike after a bulkLOAD DATAor largeUPDATEindicates a one-time event. Steady linear growth indicates missing expiry or persistent lag. - Calculate runway before disk full. Measure daily growth rate of the binlog directory and divide remaining free space by that rate. If the volume also holds data files, redo logs, or temp tables, leave at least 30% headroom.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Binary log partition utilization | Direct measure of space exhaustion | > 70% of partition capacity |
Seconds_Behind_Source | Lagging replicas block automatic purge because the primary must retain logs until the replica reads them | > 300 seconds and growing |
| Oldest binlog age vs expiry window | Confirms automatic purge is actually working and not blocked by a replica | Oldest file age > expiry setting + 1 day |
Binlog_cache_disk_use rate | Large transactions accelerate space consumption and often replicate slower | Sharp increase after bulk operations |
Relay_Log_Space on replicas | Predicts cascading replica disk failure; a full replica stops replication and falls further behind | Growing continuously with lag |
Fixes
Configure automatic expiry
For MySQL 8.0, set binlog_expire_logs_seconds to your recovery window. For example, seven days is 604800. For MySQL 5.7, set expire_logs_days explicitly; the default of 0 means logs never expire. Both are dynamic variables: apply with SET GLOBAL and persist in the configuration file. Tradeoff: a shorter window reduces disk usage but narrows the recovery window after a backup.
Safely purge binary logs manually
If disk space is critical and automatic expiry is not removing files fast enough, use PURGE BINARY LOGS BEFORE 'YYYY-MM-DD hh:mm:ss';. You can also use PURGE BINARY LOGS TO 'binlog.000999'; to avoid timestamp ambiguity. Before running either, verify every replica has consumed events past the cutoff. On each replica, run SHOW REPLICA STATUS\G (or SHOW SLAVE STATUS\G on 5.7) and confirm Master_Log_File is past the file you intend to purge. If a replica is stopped, do not purge past its last read position unless you plan to rebuild that replica.
Warning: do not rm binlog files from the shell. Removing files directly orphans entries in the index file and corrupts MySQL’s log state.
Address replication lag
If a replica is lagging because of apply bottlenecks, the primary retains binlogs until the replica catches up. Check Seconds_Behind_Source, Relay_Log_Space, and whether the replica’s SQL thread is applying events slower than the I/O thread fetches them. For replicas configured with MASTER_DELAY, the delay window directly extends binlog retention on the primary. Size the binlog partition to tolerate the delay window plus normal growth, or reduce the delay.
Reduce transaction size
If Binlog_cache_disk_use is increasing, transactions are spilling from memory to disk before being written to the binlog. Break bulk operations into smaller commits. This reduces the per-transaction binlog footprint and often improves replication apply performance on the replica side.
Prevention
- Set explicit expiry on every primary. Do not rely on defaults, especially on MySQL 5.7.
- Monitor binlog partition utilization and alert at 70%, not 90%.
- Monitor replica lag with a heartbeat table or GTID set comparison.
Seconds_Behind_Sourceis unreliable for critical decisions. - Size the binlog partition with at least 30% free headroom if replication lag is common.
- Track binlog growth rate daily. Bulk loads can spike growth by an order of magnitude.
How Netdata helps
- Disk utilization alerts per mount point catch binlog partition growth before it becomes critical.
- MySQL collector exposes
Binlog_cache_disk_useandBinlog_cache_use, letting you correlate sudden binlog spikes with large transactions. - Replication lag monitoring per replica identifies which downstream host is blocking purge on the primary.
- Long-term retention of binlog directory growth rate makes runway estimation automatic.
Related guides
- How MySQL actually works in production: a mental model for operators: /guides/mysql/how-mysql-works-in-production/
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis: /guides/mysql/mysql-aborted-connections/
- MySQL adaptive hash index latch contention: high CPU, low throughput: /guides/mysql/mysql-adaptive-hash-index-latch-contention/
- MySQL InnoDB buffer pool hit ratio collapse: the cliff edge: /guides/mysql/mysql-buffer-pool-hit-ratio-collapse/
- MySQL slow after restart: buffer pool warm-up and the cold cache: /guides/mysql/mysql-buffer-pool-not-warming-up/
- MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks: /guides/mysql/mysql-buffer-pool-sizing/
- MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure: /guides/mysql/mysql-buffer-pool-wait-free/
- MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches: /guides/mysql/mysql-checkpoint-age-monitoring/
- MySQL connection exhaustion: detection, diagnosis, and prevention: /guides/mysql/mysql-connection-exhaustion/
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck: /guides/mysql/mysql-deadlock-detect-off-high-concurrency/
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction: /guides/mysql/mysql-deadlock-found/
- MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server: /guides/mysql/mysql-flush-tables-with-read-lock-stall/







