Redis LOADING Redis is loading the dataset in memory - why and how long

What this means

When INFO persistence returns loading:1, Redis is reading an RDB dump or replaying an AOF log into memory. This happens after any restart where persistence files are present. redis-cli PING returns PONG, but all data commands such as GET, SET, and HGETALL are rejected with a -LOADING error. The loading_loaded_perc, loading_loaded_bytes, and loading_eta_seconds fields expose real-time progress. While loading is active, latency, hit rate, and eviction metrics are meaningless. Load duration ranges from seconds for small RDB snapshots to tens of minutes, or even hours, for large AOF files on slow disks.

If the instance is a replica, the LOADING error can also appear when the replica reconnects to a master that is itself in the loading phase. In that case, the replica is healthy; the master is not ready for replication connections. Other symptoms, such as 100% cache misses or replica disconnects, are side effects of the same root cause. The loading state is expected after restart, but if it persists far beyond your baseline, it indicates slow disk I/O, an unexpectedly large persistence file, or corruption causing Redis to hang or abort.

flowchart TD
    A[Client receives -LOADING error] --> B[Check INFO persistence]
    B --> C{loading:1?}
    C -->|Yes| D{Is uptime low?}
    D -->|Yes| E[Expected startup restore]
    D -->|No| F[Unexpected reload or hung process]
    C -->|No| G[Replica reading from a loading master]
    E --> H{Progress advancing?}
    H -->|Yes| I[Monitor disk I/O and ETA]
    H -->|No| J[Run redis-check-aof and redis-check-rdb]

Common causes

CauseWhat it looks likeFirst thing to check
Normal RDB restore after restartloading:1 after an uptime reset; loading_loaded_perc climbs steadilyINFO persistence progress fields
Normal AOF restore after restartLoad duration is much longer than RDB baseline; progress advances slowlyAOF file size in your Redis data directory
Corrupt AOF preventing startupRedis aborts with a bad file format error, or load hangs indefinitelyredis-check-aof output on the AOF file
Corrupt RDB preventing startupStartup fails after partial RDB loadredis-check-rdb on the dump file
Slow disk I/Oloading_eta_seconds grows or progress stalls; disk latency spikesOS-level disk metrics
Replica connecting to loading masterReplica logs show LOADING errors but its own loading is 0Master’s INFO persistence state

Quick checks

# Check loading state and progress
redis-cli INFO persistence | grep -E "loading:|loading_loaded_perc|loading_eta_seconds|loading_loaded_bytes"

# Confirm recent restart
redis-cli INFO server | grep uptime_in_seconds

# Determine which persistence method is active
redis-cli INFO persistence | grep -E "aof_enabled|rdb_last_save_time"

# Check persistence file sizes
ls -lh <your-redis-data-directory>

# Verify AOF file integrity
redis-check-aof <path-to-aof>

# Verify RDB file integrity
redis-check-rdb <path-to-rdb>

Replace <your-redis-data-directory> and file paths with the locations configured for your deployment.

How to diagnose it

  1. Verify loading:1 in INFO persistence. If it is 0 and a replica reports LOADING, the master is still restoring. Diagnose the master.
  2. Check uptime_in_seconds in INFO server. A low value confirms a recent restart. A high value with loading:1 suggests an unexpected internal reload or a hung process.
  3. Inspect progress fields. loading_loaded_perc should increase and loading_eta_seconds should trend toward zero. If either is stuck, the load is not progressing.
  4. Identify the persistence source. If aof_enabled is 1 and the AOF file exists, Redis is replaying AOF. If only RDB is configured, it is loading the RDB snapshot. AOF replay is sequential and typically much slower than loading an RDB snapshot because it replays every write operation.
  5. Compare duration to your baseline. RDB load time is roughly proportional to dataset size divided by disk read throughput. AOF load time depends on the number of write operations logged, so a high-write workload with a long AOF history takes significantly longer.
  6. If progress is stalled or Redis failed to start, run redis-check-aof on the AOF file or redis-check-rdb on the dump file. Look for corruption, truncation, or bad format errors.
  7. Check OS-level disk I/O metrics. If disk latency or queue depth is elevated, the storage subsystem is the bottleneck. In virtualized environments, also check for hypervisor-level I/O throttling.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
loadingBinary state of startup restorationRemains 1 for longer than your baseline
loading_loaded_percPercentage of file loadedStuck or advancing slower than expected
loading_eta_secondsEstimated time to completionGrowing instead of shrinking
loading_loaded_bytesBytes loaded so farNot increasing between checks
uptime_in_secondsTime since restartLow value confirms cold start
Disk I/O latencyLoading is disk-boundSustained high latency or queue depth

Fixes

Normal restore: wait

If loading_loaded_perc is advancing and disk I/O is healthy, wait. Do not restart Redis; a restart resets progress to zero and begins the load again. If a load balancer or orchestrator sits in front of Redis, ensure its health check treats loading:1 as not-ready. The Redis PING command succeeds during loading, so a naive TCP or PING probe will falsely report the instance as healthy. A proper readiness check should attempt a lightweight data command or parse INFO persistence directly.

Slow AOF restore: reduce startup time

If AOF replay is too slow for your startup time requirements, consider switching to RDB snapshots or the RDB-AOF hybrid mode for faster restarts. The tradeoff is coarser durability: RDB snapshots can lose more data on crash than AOF. You can also trigger an AOF rewrite before planned restarts to compact the AOF file and reduce replay time.

Corrupt AOF: repair and truncate

If redis-check-aof reports corruption, run redis-check-aof --fix <file>. This is destructive: it discards everything from the corruption point to the end of the file. After running it, restart Redis. If aof-load-truncated is enabled (the default since Redis 3.0), Redis already handles cleanly truncated AOF files automatically. Byte-level corruption in the middle of the file, however, requires manual repair and results in data loss from the corruption point forward.

Corrupt RDB: diagnose and restore

Use redis-check-rdb to confirm RDB corruption. If the RDB is corrupt and AOF is also present and healthy, Redis preferring AOF may bypass the issue at the next startup. If only RDB exists and it is corrupt, restore from a backup RDB file or start without persistence, accepting data loss. To start without persistence, move or delete the corrupt RDB file before restarting Redis.

Disk bottleneck: upgrade or relocate

If disk latency is the constraint, move Redis persistence to faster storage. For containerized deployments, ensure the persistence volume is not sharing IOPS with noisy neighbors. In cloud environments, check for volume throughput caps.

Prevention

  • Configure readiness probes that poll INFO persistence and wait for loading:0 before marking the instance ready.
  • Establish a baseline for loading duration after each restart. Alert when the loading phase exceeds that baseline by a significant margin.
  • If startup time matters, prefer RDB snapshots over AOF alone, or ensure AOF rewrites run frequently enough to keep the AOF file compact.
  • Keep valid backup RDB snapshots for recovery from corruption.
  • Monitor disk I/O latency as part of your standard Redis health checks so you can forecast when storage will become a bottleneck.
  • For replicas, reconnecting to a master in the loading phase propagates the LOADING error downstream.

How Netdata helps

  • Correlates the loading state with uptime_in_seconds to immediately distinguish a startup restore from an unexpected failure.
  • Tracks loading_loaded_perc and loading_eta_seconds so you can visualize restore progress and share time estimates during incidents.
  • Alerts when loading remains active beyond your configured baseline, flagging disk stalls or corruption before operators manually check.
  • Correlates the loading phase with disk I/O saturation metrics to identify storage bottlenecks.
  • Uses the loading state to suppress false alerts for eviction, hit rate, and latency while the dataset is being restored.