MISCONF Redis is configured to save RDB snapshots - what it means and how to fix it

Applications see MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk on every write. Reads still work, but SET, HSET, LPUSH, and all mutating commands are rejected.

Redis makes itself read-only when the last background save failed and stop-writes-on-bgsave-error is yes (the default). The error persists until a subsequent BGSAVE succeeds. Retrying writes will not help. The instance is protecting you from accepting writes that can never be persisted. To recover, fix the underlying persistence failure and clear the error state with a successful save.

What this means

Redis persists data to disk by forking a child process that writes the dataset to an RDB file. This happens automatically based on the save directives in redis.conf, or manually via BGSAVE. If the child fails, rdb_last_bgsave_status flips to err and stays there. It is sticky: it remains err across reconnections and even after restart until a save succeeds.

With stop-writes-on-bgsave-error yes (the default), a failed background save stops all writes. If the server cannot snapshot its data, continuing to accept writes creates a silent durability gap. Every write accepted after the failure would be lost in a crash.

The MISCONF error is a safety mechanism, not a configuration mistake. Do not simply disable the setting unless you are running a pure cache that can repopulate from an authoritative source. Resolve why BGSAVE cannot complete.

flowchart TD
    A[Client sees MISCONF on write] --> B{Check INFO persistence}
    B -->|rdb_last_bgsave_status:err| C{Check disk space}
    C -->|Full or quota| D[Free disk space]
    C -->|Adequate space| E{Check logs and fork}
    E -->|fork failed| F[Fix memory or overcommit]
    E -->|Permission denied| G[Fix dir or file ownership]
    E -->|Child OOM killed| H[Add memory headroom for COW]
    D --> I[Trigger manual BGSAVE]
    F --> I
    G --> I
    H --> I
    I -->|BGSAVE ok| J[Writes resume]
    I -->|BGSAVE fails| K[Re-check root cause]

Common causes

CauseWhat it looks likeFirst thing to check
Disk full or I/O errorsrdb_last_bgsave_status:err, disk utilization at 100%, or filesystem remounted read-only.df -h on the volume holding the Redis dir.
Fork failure from memory pressureLog contains Can't save in background: fork: Cannot allocate memory. Common on large instances or containers with tight memory limits.vm.overcommit_memory and the gap between used_memory_rss and total RAM.
Permission denied on RDB pathThe child process cannot open or rename the RDB file in the configured directory. Often after permission changes or SELinux policy updates.Ownership and mode of the dir and dbfilename path.
Child process killed by OOM killerThe forked child exceeds container or host memory limits during copy-on-write and is killed mid-save.dmesg or kernel logs for OOM killer events targeting the Redis child.
Conflicting RDB paths in containersMultiple Redis instances share a volume and fight over the same dump.rdb file. Common in Docker Compose or misconfigured StatefulSets.Whether CONFIG GET dir and dbfilename point to unique paths per instance.

Quick checks

Run these read-only checks to confirm state and narrow the cause.

# Confirm the error state and see when the last save succeeded
redis-cli INFO persistence | grep -E "rdb_last_bgsave_status|rdb_last_save_time|rdb_bgsave_in_progress"

# Check if write errors are accumulating
redis-cli INFO stats | grep total_error_replies

# Check per-error counters on Redis 6.2+
redis-cli INFO errorstats

# Check free disk space on the persistence volume
df -h "$(redis-cli CONFIG GET dir | tail -1)"

# Check RDB file and directory ownership
ls -l "$(redis-cli CONFIG GET dir | tail -1)/$(redis-cli CONFIG GET dbfilename | tail -1)"

# Check if the kernel allows fork overcommit
cat /proc/sys/vm/overcommit_memory

# Check for recent OOM killer or fork failures in kernel logs
dmesg | grep -iE "oom|fork|redis"

How to diagnose it

  1. Verify the sticky error. Run redis-cli INFO persistence and look for rdb_last_bgsave_status:err. If it is ok but writes are still failing, the issue is likely AOF-related or the error is coming from a replica with a different condition. If it is err, note rdb_last_save_time to see how stale the last successful snapshot is.

  2. Check disk space. Redis needs enough free space to write a temporary RDB file before atomically renaming it to the final file. Run df -h on the persistence volume. A full volume is the root cause. Also check inode usage with df -i on smaller or heavily fragmented filesystems.

  3. Inspect logs for fork failures. If disk space is adequate, check the Redis server log and dmesg for messages like fork: Cannot allocate memory or OOM killer events. The Redis child process duplicates page tables during fork(). On a memory-heavy instance, this can fail if vm.overcommit_memory is 0 or if a container memory limit leaves no room for page table overhead.

  4. Validate permissions. Ensure the user running redis-server has write access to the directory returned by CONFIG GET dir and can create and rename files there. Do not assume the directory is correct after a migration or container restart.

  5. Check for path collisions. If you are running multiple Redis instances on the same host or shared volume, confirm each instance has a unique dir and dbfilename. Two instances writing the same dump.rdb simultaneously will corrupt it or cause one to fail.

  6. Assess whether AOF is active. If aof_enabled is 1 and aof_last_write_status is ok, you still have append-only durability while RDB is failing. This does not fix the MISCONF block, but it changes the urgency: you are not fully without persistence. If AOF is also failing, treat this as a critical data-loss risk.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
rdb_last_bgsave_statusBinary health of the last background save.err persisting beyond one save interval.
Age of rdb_last_save_timeHow long since the last successful snapshot.Older than 2x the configured save interval.
Disk free on persistence volumeRDB and AOF need sufficient free space for temp files.Less than 3x the current dataset size.
latest_fork_usecThe main thread is blocked during fork().Greater than 500ms, or trending upward with dataset growth.
total_error_replies rateDirect evidence that clients are seeing failures.Increasing while rdb_last_bgsave_status is err.
used_memory_rss vs host memoryPredicts fork failure and COW-driven OOM kills.RSS exceeding 80% of available RAM.

Fixes

Disk full or I/O errors

Free space on the persistence volume. Delete old logs, oversized backups, or orphaned temp files left by interrupted saves. After cleanup, trigger a manual save:

redis-cli BGSAVE

Watch rdb_bgsave_in_progress and rdb_last_bgsave_status until it returns ok. Do not simply restart Redis; a restart with a full disk will fail to save during shutdown and may leave you with no recent RDB on startup.

Fork failure or memory pressure

If the fork fails with a memory error and the host is not actually out of RAM, the most common cause is vm.overcommit_memory=0. Set it to 1 to allow the kernel to overcommit memory for the page tables Redis needs during fork():

sysctl -w vm.overcommit_memory=1

Make it permanent in /etc/sysctl.conf. This is a system-wide setting. If Redis runs inside a container, increase the container memory limit to account for copy-on-write overhead. Plan for at least 50% headroom above used_memory_rss on persistent instances; heavy writes during a fork can temporarily double RSS.

If freeing memory or adding capacity is not immediately possible and you need writes to resume, see the temporary workarounds below. Only use them if you understand the durability tradeoff.

Permission or ownership issues

Fix ownership so the Redis process can write to its configured directory:

chown redis:redis /var/lib/redis

If you use SELinux or AppArmor, check for denials in the audit log and adjust the profile. After fixing permissions, run BGSAVE and verify rdb_last_bgsave_status returns ok.

Path collisions in containers

If multiple instances target the same dump.rdb, reconfigure each instance with a unique dir or unique dbfilename, then restart the affected instances. Restarting is acceptable here because the root cause is configuration, not resource exhaustion.

Temporary workarounds and their tradeoffs

If you must unblock writes immediately and you are running a pure cache workload where data can be reconstructed from another source, you can disable the write block:

redis-cli CONFIG SET stop-writes-on-bgsave-error no

This makes the instance accept writes again even though RDB saves are failing. Your data loss window widens to the time since the last successful save. If you do this, also call CONFIG REWRITE to persist the change, or it will revert on restart.

Alternatively, if you rely on AOF for durability and do not need RDB snapshots at all, you can disable automatic RDB saves:

redis-cli CONFIG SET save ""

Again, follow with CONFIG REWRITE if you want the change to survive restart.

Do not use these workarounds on primary databases where RDB is the only persistence mechanism. The correct path is to fix the underlying failure and then run a successful BGSAVE.

Prevention

  • Monitor disk space proactively. Maintain at least 3x the dataset size as free space on the persistence volume to accommodate temp files, AOF rewrites, and RDB snapshots simultaneously.
  • Set vm.overcommit_memory=1 on every Redis host. Fork failures on healthy machines are almost always caused by this setting.
  • Size memory for copy-on-write. If you use RDB or AOF rewrite, keep used_memory_rss below roughly 50% of physical RAM on persistent instances. On cache-only instances, 75% is a safer ceiling.
  • Audit persistence completion. Do not assume that because save directives exist, saves are succeeding. Alert on rdb_last_bgsave_status:err and on stale rdb_last_save_time.
  • Validate container and orchestration configs. Ensure each Redis pod or container has a distinct persistent volume and does not share a dump.rdb path with another instance.

How Netdata helps

  • Netdata collects rdb_last_bgsave_status and rdb_last_save_time from INFO persistence, so you can alert on a sticky err before application write failures spike.
  • Disk space and memory metrics on the same node are correlated with Redis persistence health, making it faster to distinguish a disk-full incident from a fork-failure incident.
  • total_error_replies and throughput charts let you confirm the exact moment writes started failing and whether the failure correlates with a scheduled BGSAVE.
  • RSS and used_memory are plotted together, which helps you forecast when copy-on-write overhead during a fork will exceed available RAM.
  • Fork duration (latest_fork_usec) is tracked over time, revealing when Transparent Huge Pages or memory pressure are slowing the background save process before it fails entirely.