Redis aof_last_write_status:err: AOF write failures and recovery
INFO persistence showing aof_last_write_status:err means Redis failed to flush its Append-Only File buffer to disk on the last attempt. If you depend on AOF for durability, the instance is no longer persisting writes. With the default appendfsync everysec, Redis logs the failed fsync and retries, but once aof_last_write_status is err and stop-writes-on-bgsave-error is enabled, the server rejects mutations. The error returned to clients references RDB snapshots even when AOF is the actual failure, which often misleads first-line diagnosis.
What this means
Redis appends writes to an in-memory AOF buffer and flushes to disk via a background thread according to appendfsync. The aof_last_write_status field in INFO persistence tracks whether the last write or fsync succeeded. A value of err is sticky: it remains until the next successful flush. It can be set by an open(), write(), or fsync() failure on the AOF file, or by a stall that exceeds the internal threshold.
When stop-writes-on-bgsave-error is yes (the default), a persistent AOF failure causes Redis to reject write commands with:
(error) MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk.
This protects against silent data loss, but turns a disk issue into a write availability incident.
flowchart TD
A[Disk full / I/O stall / permissions] --> B[fsync or write fails]
B --> C[aof_last_write_status:err]
C --> D{stop-writes-on-bgsave-error?}
D -->|yes| E[Redis rejects writes with MISCONF]
D -->|no| F[Writes accepted but not persisted]
A --> G[aof_delayed_fsync rises]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Disk full on persistence volume | aof_last_write_status:err, OS ENOSPC alerts, write rejection | df -h on the directory holding the AOF file |
| Disk I/O saturation | aof_delayed_fsync increasing, high iowait, fsync latency spikes | iostat -x 1 or cloud volume burst metrics |
| Filesystem permissions error | Redis logs permission denied, often after migrations or package updates | ls -ld on the Redis dir and AOF path |
| Filesystem-level I/O error | Kernel logs show ext4/xfs errors or EIO | dmesg -T and filesystem health checks |
| AOF rewrite failure | aof_last_bgrewrite_status:err, aof_current_size growing without compaction | INFO persistence size and status fields |
Quick checks
Run these safe, read-only commands to triage.
# Check AOF state, rewrite health, and delayed fsync count
redis-cli INFO persistence | grep -E "aof_last_write_status|aof_enabled|aof_last_bgrewrite_status|aof_delayed_fsync|aof_current_size|aof_base_size"
# Check if writes are already being rejected
redis-cli INFO stats | grep total_error_replies
redis-cli INFO errorstats
# Verify fsync policy and write-stop behavior
redis-cli CONFIG GET appendfsync
redis-cli CONFIG GET stop-writes-on-bgsave-error
# Check disk space on the persistence volume (run on the Redis host)
df -h "$(redis-cli CONFIG GET dir | tail -n1)"
# Inspect recent kernel storage errors
dmesg -T | grep -iE "error|ext4|xfs|scsi|block"
# Verify AOF directory permissions (run on the Redis host)
ls -ld "$(redis-cli CONFIG GET dir | tail -n1)"
How to diagnose it
- Confirm the error and scope. Run the
INFO persistencechecks. Verifyaof_enabledis1. If AOF is disabled, the status is irrelevant and you are running without AOF persistence. Noteaof_last_bgrewrite_statusas well; a rewrite failure compounds the problem by allowing unbounded AOF growth. - Determine if clients are impacted. Check
total_error_repliesrate. Ifstop-writes-on-bgsave-errorisyes, attempt a testSETfrom a non-production client. The MISCONF error confirms that write rejection is active. - Inspect disk space and inodes. AOF appends continuously and rewrites temporarily need up to the dataset size in additional space. Use
df -handdf -ion the persistence volume. If usage is near 100%, this is the cause. - Check logs for specific errors. Review the Redis log and
dmesgfor “No space left on device”, “Permission denied”, “Read-only file system”, or block-layer I/O errors. - Correlate with
aof_delayed_fsync. A rising counter means the background fsync thread is stalling. Cross-reference with disk latency metrics. During RDB snapshots or AOF rewrite, temporary spikes are expected; sustained growth is not. - Verify permissions. Ensure the Redis process owner can write to the configured
dir. Ownership changes after OS upgrades or volume remounts are common culprits. - Differentiate transient from persistent. Because
aof_last_write_statusis sticky, it may reflect a past error that has already cleared. Free space or restore I/O capacity, then re-check the field after a write operation triggers a new fsync. - If Redis will not start. If the instance crashes on startup due to AOF issues, check the startup log. You may need to run
redis-check-aof --fixagainst your AOF file before the server can load.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
aof_last_write_status | Binary AOF health | err |
aof_delayed_fsync | Leading indicator of disk I/O saturation | Rate increasing per minute |
aof_last_bgrewrite_status | Rewrite failure prevents AOF compaction | err |
aof_current_size / aof_base_size | AOF bloat when rewrite is stuck | Ratio > 2 for extended periods |
total_error_replies | Application-visible command failures | Rate > 0 |
errorstat_OOM | Specific write rejections from noeviction or persistence blocks | Rate > 0 |
latest_fork_usec | Background persistence latency and memory pressure | Spikes correlating with AOF issues |
| Disk free space | AOF needs space to append and rewrite | < 3x dataset size |
Fixes
Disk full
Free space on the persistence volume by removing logs, rotating files, or expanding storage. Do not delete or truncate the active AOF file while Redis is running. Once space is available, Redis should succeed on the next fsync. Verify recovery:
redis-cli INFO persistence | grep aof_last_write_status
If the status does not return to ok after a successful write cycle, check for lingering filesystem errors. A controlled restart is a last resort if you suspect a stale file descriptor.
Disk I/O saturation
Identify competing I/O consumers. RDB snapshots, AOF rewrites, log shippers, and backup agents all contend for the same volume. If possible, move AOF to a dedicated fast disk. Reduce write volume temporarily at the application layer. Ensure that AOF rewrite scheduling does not compound normal fsync load.
Permissions or filesystem errors
Fix directory ownership so the Redis user can write to the configured dir and AOF path. If the filesystem has remounted read-only due to corruption, resolve the filesystem health before allowing Redis to continue writing.
AOF corruption preventing startup
If Redis detects corruption at startup and refuses to load, use redis-check-aof --fix against your configured AOF file. This truncates the last incomplete command. After repair, start Redis and confirm aof_last_write_status:ok. Be aware that --fix discards data at the tail of the file.
Emergency: allow writes while storage is unavailable
If you cannot restore storage quickly and need to prevent a total outage, you can temporarily disable write rejection:
# DANGER: allows writes without durability. Use only as a temporary bridge.
redis-cli CONFIG SET stop-writes-on-bgsave-error no
All writes accepted during this window are at risk of loss. Re-enable the setting immediately after the storage issue is resolved.
Prevention
- Monitor disk free space independently. Maintain at least 3x the dataset size as free space on the persistence volume to accommodate AOF growth, temporary rewrite files, and RDB snapshots.
- Alert on
aof_delayed_fsyncrate increases. It is the earliest signal of disk I/O pressure beforeaof_last_write_statusflips toerr. - Audit permissions on the Redis
dirafter deployments, volume mounts, or OS upgrades. - Track the ratio
aof_current_size / aof_base_size. If it climbs steadily, AOF rewrite is failing or disabled. - For automated backup scripts, ensure you capture a consistent AOF state. Disable automatic rewrites during the backup window if your tooling supports it.
How Netdata helps
- Netdata collects
aof_last_write_status,aof_delayed_fsync, andaof_last_bgrewrite_statusfromINFO persistencewithout extra configuration. - Correlate rising
aof_delayed_fsyncwith node-leveldisk.awaitand utilization to distinguish disk saturation from a configuration issue. - Alert on
aof_last_write_status:errwith a duration threshold to avoid paging on transient stalls. - Cross-reference
redis.total_error_repliesandredis.errorstat_OOMto detect whether persistence failure has escalated to write rejection. - Disk space charts on the persistence volume show remaining capacity before Redis hits ENOSPC.
Related guides
- How Redis actually works in production: a mental model for operators
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix
- Redis eviction policy tuning: allkeys-lru vs volatile-ttl vs noeviction
- Redis fork/COW memory storm: why persistence doubles RSS and OOM-kills the box
- Redis latest_fork_usec too high: THP, NUMA, and fork latency
- Redis maxmemory not set: why every production instance needs a memory limit
- MISCONF Redis is configured to save RDB snapshots - what it means and how to fix it
- Redis monitoring checklist: the signals every production instance needs
- Redis monitoring maturity model: from survival to expert
- Redis OOM command not allowed when used memory > ‘maxmemory’ - causes and fixes
- Redis OOM-killed by the kernel: RSS, overcommit, and recovery
- Redis rdb_last_bgsave_status:err: diagnosing failed background saves







