$ guides / redis / redis-rdb-last-bgsave-status-err ▌

Operations Guides

Redis rdb_last_bgsave_status:err: diagnosing failed background saves

INFO persistence returning rdb_last_bgsave_status:err means the last background save failed. The flag is sticky: it remains err until a subsequent BGSAVE succeeds, so the failure may be hours old. If stop-writes-on-bgsave-error is enabled (the default), Redis rejects writes and your application sees MISCONF errors. If the setting is disabled, writes continue but durability is broken; the exposure window grows with every update.

The failure modes are a narrow set: fork failure, disk full, filesystem write rejection, or child death before completion. Follow this sequence to separate them without restarting Redis.

What this means

rdb_last_bgsave_status reports the result of the most recent background RDB save attempt. Valid values are ok and err. The field is sticky: it stays err until a subsequent BGSAVE succeeds. Redis initializes this status to ok on startup, so an err after restart means a save already failed. Do not assume the failure is transient just because the status looks old.

When the status is err, check three companion fields immediately:

rdb_last_save_time: the Unix epoch timestamp of the last successful save. It does not advance when saves fail.
rdb_changes_since_last_save: the number of dirty keys accumulated since that last successful save.
rdb_bgsave_in_progress: 1 when a child is currently serializing the dataset.

If rdb_last_bgsave_status is ok but rdb_last_save_time is hours old, the configured save thresholds have not been met and no save was attempted. That is normal for idle instances. This guide covers explicit save failures.

While a BGSAVE runs, rdb_bgsave_in_progress is 1. Redis does not run RDB and AOF background operations simultaneously, so a long-running AOF rewrite can delay an automatic RDB save. A delayed save does not produce err; it waits. err only appears when the child exits with a failure.

flowchart TD
    A[BGSAVE triggered by threshold or command] --> B{fork succeeds?}
    B -->|No| C[Status err
fork failure]
    B -->|Yes| D{Disk space
sufficient?}
    D -->|No| E[Status err
disk full]
    D -->|Yes| F{Permissions
valid?}
    F -->|No| G[Status err
permission denied]
    F -->|Yes| H{Child survives
COW pressure?}
    H -->|No| I[Status err
child OOM killed]
    H -->|Yes| J[Atomic rename
Status ok]

Common causes

Cause	What it looks like	First thing to check
Disk full on persistence volume	`err` status, Redis read-only if `stop-writes-on-bgsave-error yes`, no new RDB file	Free space on the directory configured by the `dir` directive
Fork failure (memory/overcommit)	Log contains `fork: Cannot allocate memory`, `latest_fork_usec` absent or very high	`vm.overcommit_memory` and available RAM versus `used_memory_rss`
Permission denied	Status flips to `err` after deployment or migration, log shows write denied	Ownership and permissions on the `dir` directory and `dbfilename` path
Child OOM killed during COW	Kernel OOM killer log, `rdb_bgsave_in_progress` drops from 1 to 0 with `err`	`dmesg` or `/var/log/kern.log` for OOM events
Slow fork (THP, NUMA)	Save hangs or times out, `latest_fork_usec` spikes before failure	`/sys/kernel/mm/transparent_hugepage/enabled`

Quick checks

Run these read-only commands to orient yourself before making changes.

# Check persistence status and companion fields
redis-cli INFO persistence | grep -E "rdb_last_bgsave_status|rdb_last_save_time|rdb_changes_since_last_save|rdb_bgsave_in_progress"

# Check how saves are configured
redis-cli CONFIG GET save

# Check where Redis is trying to write
redis-cli CONFIG GET dir
redis-cli CONFIG GET dbfilename

# Check whether failed saves are blocking writes
redis-cli CONFIG GET stop-writes-on-bgsave-error

# Check fork latency from the last attempt
redis-cli INFO stats | grep latest_fork_usec

# Check memory footprint
redis-cli INFO memory | grep used_memory_rss

# Check THP status, a common cause of fork slowness
cat /sys/kernel/mm/transparent_hugepage/enabled

How to diagnose it

Follow these steps in order. Each one targets a specific failure mode.

Confirm a save was attempted.
Check rdb_bgsave_in_progress. If it is 1, a child process is currently serializing the dataset. Do not treat the err status as current until that child finishes; the status reflects the previous attempt. Redis prevents simultaneous background RDB and AOF operations, so an in-progress AOF rewrite can delay a scheduled RDB save without causing an error.
Check the configured save directory and file name.
Run CONFIG GET dir and CONFIG GET dbfilename. Verify the path is the one you expect. If the instance was recently migrated or started with a different configuration, Redis may be trying to write to a path that does not exist or is on the wrong volume.
Check disk space on the persistence volume.
The RDB write path is: child process writes to a temporary file, then performs an atomic rename to the target file. The temporary file consumes roughly the same space as the final RDB snapshot. Use df -h against the directory from step 2. If the volume is full, this is your root cause.
Check for fork failure in the Redis log.
If the log shows Can't save in background: fork: Cannot allocate memory, the child process could not be created. This is almost always caused by vm.overcommit_memory set to 0 on Linux. Redis relies on overcommit because fork() duplicates page tables, not data pages. Check with sysctl vm.overcommit_memory. If it returns 0, run sudo sysctl vm.overcommit_memory=1 and update /etc/sysctl.conf to persist the change.
Check kernel logs for child OOM kills.
Even when fork succeeds, the child process can be killed by the OOM killer during copy-on-write if write traffic is heavy and the host is memory-constrained. Check dmesg or /var/log/kern.log for OOM killer events targeting the Redis child. If you find them, your system lacks sufficient headroom for COW during persistence. For persistent instances, used_memory_rss should stay below roughly 50 percent of physical RAM to leave room for COW.
Check directory and file permissions.
If the disk has space and fork succeeded, verify that the user running the Redis process has write permission on the dir directory and can create the dbfilename file. ls -ld on the directory is usually enough to spot a permission mismatch after a deployment or restore.
Check for THP-induced fork latency.
If latest_fork_usec is extremely high (hundreds of milliseconds or more) relative to your dataset size, Transparent Huge Pages may be causing page-table bloat. Check /sys/kernel/mm/transparent_hugepage/enabled. If it is not [never], disable THP. While THP usually causes slowness rather than outright failure, a sufficiently delayed fork can trigger client timeouts or watchdog behavior that aborts the save.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`rdb_last_bgsave_status`	Binary result of the last save attempt	Any `err` value
`rdb_last_save_time`	Age of the last successful snapshot	Older than twice the expected save interval
`rdb_changes_since_last_save`	Exposure window in dirty keys	Growing while status is `err`
`latest_fork_usec`	How long the main thread was frozen during fork	> 500 ms, or trending upward
`used_memory_rss`	Physical memory footprint; COW spikes this during saves	Approaching host or container memory limit
Disk free space on `dir` volume	RDB requires contiguous free space for the temp file	Less than 2x current RDB size

Fixes

Disk full or insufficient space

Free space on the volume holding the dir directory. The temporary RDB file requires roughly the final snapshot size before the atomic rename. Maintain at least three times the dataset size as free space on the persistence volume to cover RDB snapshots, AOF, and temporary rewrite files.

If you are blocked by stop-writes-on-bgsave-error yes and need to restore writes immediately while you arrange more disk space, you can run CONFIG SET stop-writes-on-bgsave-error no. Do this only as an emergency workaround. Your persistence guarantee remains broken until the root cause is fixed.

If the volume is shared with logs or AOF, consider moving RDB to a dedicated mount. The tradeoff is operational complexity: another volume to monitor and back up. Isolating RDB I/O prevents log rotation or AOF growth from stepping on snapshot space.

Fork failure or child OOM kill

Set vm.overcommit_memory to 1. Redis requires this because fork() must allocate page tables for the entire virtual address space even though COW avoids immediate physical duplication. Run sudo sysctl vm.overcommit_memory=1 and update /etc/sysctl.conf to persist the change.

If the child is being OOM killed, reduce memory pressure. Lower maxmemory, add physical RAM, or shed write load during the save window. Sharding the dataset across multiple Redis instances reduces the memory footprint per node, which lowers both fork latency and COW risk. The tradeoff is application complexity: you must handle multi-key operations that cross slot boundaries.

If you do not need RDB snapshots, disable them, but ensure AOF is enabled and healthy if you require durability.

Permission denied

Fix ownership and permissions on the dir directory so the Redis process user can write. You can change dir at runtime with CONFIG SET dir <path>, but ensure the configuration file is updated so the change survives restart.

THP or NUMA-induced slowness

Disable Transparent Huge Pages at runtime with echo never > /sys/kernel/mm/transparent_hugepage/enabled (requires root). To make this survive reboot, add it to a sysctl script or your init system. Also check NUMA binding if running on bare metal; misconfigured NUMA can inflate fork latency.

Prevention

Set stop-writes-on-bgsave-error yes so that a failed save becomes an immediate availability signal rather than a silent durability gap.
Monitor disk free space on the persistence volume; alert when it drops below three times the current RDB size.
Ensure vm.overcommit_memory is 1 on all Linux hosts running Redis.
Disable THP before putting Redis into production.
Keep used_memory_rss below 50 percent of physical memory on persistent instances to leave headroom for COW.
Place dir on a volume with dedicated IOPS, not shared with logs or noisy neighbors.
Review save thresholds against your actual change rate. Aggressive settings such as save 60 10000 increase fork frequency and COW pressure on busy instances. Relaxing them widens your RPO window.

How Netdata helps

Plots rdb_last_bgsave_status and rdb_last_save_time alongside disk space and memory utilization to correlate err flips with disk-full or RSS-spike events.
Alerts when rdb_last_bgsave_status:err persists longer than one expected save cycle, with filtering to suppress noise on idle instances where saves do not trigger.
Tracks rdb_changes_since_last_save to quantify the dirty-key exposure window during a persistence outage.
Plots latest_fork_usec spikes against save failures to flag THP or memory-pressure root causes.

The Netdata solution

Redis monitoring with Netdata

Netdata monitors Redis with per-second metrics and ML anomaly detection. Track memory usage and fragmentation, fork/COW latency, replication backlog, evictions, and connection pressure to spot the failure modes in these runbooks early.

See Redis monitoring → Start monitoring free

Redis rdb_last_bgsave_status:err: diagnosing failed background saves

Redis rdb_last_bgsave_status:err: diagnosing failed background saves

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Disk full or insufficient space

Fork failure or child OOM kill

Permission denied

THP or NUMA-induced slowness

Prevention

How Netdata helps

Related guides

Redis monitoring with Netdata