Redis BUSY Redis is busy running a script: blocking Lua and how to recover
redis-cli returns (error) BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Normal commands stall. Redis is not down, but it might as well be: a Lua script is holding the single event loop hostage and will not yield until it finishes or you intervene.
Redis executes EVAL and EVALSHA atomically. While a script runs, no other command processes. Once execution exceeds lua-time-limit (default 5000 ms), Redis replies with BUSY to other clients. The script itself continues until it finishes, is killed, or the server shuts down. Whether the script has already performed writes determines whether you can kill it safely or must choose between waiting and a hard shutdown.
What this means
Redis runs Lua scripts in the main thread. Atomicity guarantees that no other client sees intermediate script state, but the tradeoff is total event-loop blocking. Once execution exceeds lua-time-limit (default 5000 ms), Redis returns BUSY to all other clients.
SCRIPT KILL is the primary recovery command. It succeeds only if the running script has not yet executed any write command. If the script has written even once, Redis returns (error) UNKILLABLE Sorry the script already executed write commands. Redis cannot leave the dataset in a partially modified state, so it refuses to interrupt a script that has performed writes. When a script is unkillable, the only in-process recovery option is SHUTDOWN NOSAVE, which kills the Redis process immediately and discards any data written since the last RDB snapshot or AOF fsync.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Unbounded iteration over large keys inside Lua | BUSY appears after a batch job starts; instantaneous_ops_per_sec drops to zero | SLOWLOG GET and CLIENT LIST to identify the source client and key |
| Application script logic that scales with data volume | Script runtime grows as sets, lists, or hashes grow; recurring BUSY from the same client IP | Key cardinality with SCARD, LLEN, HLEN, and the script source code |
| Blocking command misuse inside script | Script hangs indefinitely or exceeds lua-time-limit without obvious CPU burn | Script content for BLPOP, BRPOP, WAIT, or XREAD calls |
| Reliance on key expiry during execution (Redis 7.2+) | Script loops waiting for a TTL that never arrives because time sampling is frozen | Script logic and INFO server version |
Quick checks
Run these read-only commands to assess the situation without making it worse. Note that once a script exceeds lua-time-limit, Redis rejects all commands except SCRIPT KILL and SHUTDOWN NOSAVE. If Redis is already BUSY, run CLIENT LIST, SLOWLOG, and INFO from a replica or after recovery.
# Verify Redis is in BUSY state and measure response time
time timeout 10 redis-cli PING
# Check the configured lua-time-limit
redis-cli CONFIG GET lua-time-limit
# Identify the client running the script
redis-cli CLIENT LIST
# Review recent slow commands for EVAL/EVALSHA
redis-cli SLOWLOG GET 10
# Check how often scripts are invoked
redis-cli INFO commandstats | grep -E "cmdstat_eval|cmdstat_evalsha"
# Check current throughput to confirm event-loop blocking
redis-cli INFO stats | grep instantaneous_ops_per_sec
How to diagnose it
- Confirm event-loop blocking. Run
timeout 5 redis-cli PING. If it hangs or returnsBUSY, the main thread is blocked. - Identify the offending client. If you caught the incident before the timeout,
CLIENT LISTshows active connections. Look forcmd=evalorcmd=evalsha. Otherwise, check application logs or the Redis slowlog after recovery. - Inspect slowlog history.
SLOWLOG GET 50after recovery reveals whether the same script pattern appears repeatedly with high execution times. - Attempt SCRIPT KILL. This is both diagnosis and recovery.
OKmeans the script had not written and is terminated.UNKILLABLEmeans it has performed writes and cannot be interrupted. - Assess resource context. Check
INFO memoryfor pressure towardmaxmemory, andINFO replicationfor replica timeouts caused by the blocked primary. - Determine scope of impact. If the script is unkillable, estimate completion time from its logic and data size. Compare that against your recovery time objective.
flowchart TD
A[BUSY error received] --> B[Attempt SCRIPT KILL]
B -->|OK| C[Script terminated]
B -->|UNKILLABLE error| D{Can you wait?}
D -->|Yes| E[Wait for completion]
D -->|No| F[SHUTDOWN NOSAVE]
F --> G[Process exits unwritten data lost]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
instantaneous_ops_per_sec | Event-loop blocking drops throughput to zero | Sustained drop while clients remain connected |
Slowlog EVAL / EVALSHA entries | Identifies specific slow scripts before they trigger BUSY | Entries > 100 ms or rapid slowlog growth |
used_cpu_user_main_thread (Redis 6.2+) | Main-thread saturation from script execution | Rate approaching 1.0 during script workloads |
| Replication offset lag | A blocked primary stops sending replication traffic | Lag growing while ops per sec is zero |
evicted_keys rate | Long scripts can aggravate memory pressure | Spike concurrent with or following long script execution |
total_error_replies rate | BUSY errors contribute to client-visible failures | Rate increase correlating with script execution window |
Fixes
Kill a read-only script
If SCRIPT KILL returns OK, the script had not yet executed a write command. The script client receives an error, but the dataset remains consistent. This is the safest recovery path. After killing it, identify the source client via CLIENT LIST and audit the script before running it again.
Wait for an unkillable script to finish
If SCRIPT KILL returns UNKILLABLE, the script has already modified the dataset. Redis cannot roll back partial script effects, so waiting is often the least destructive option if the script is expected to finish in a predictable time and your application tolerates the delay. Use CLIENT LIST to confirm the script is still executing and has not crashed silently.
Use SHUTDOWN NOSAVE as last resort
When a script is unkillable, blocking all clients, and you cannot wait, SHUTDOWN NOSAVE forces immediate process termination. All unwritten data is discarded, including not only the script’s in-flight changes but any other writes since the last RDB save or AOF fsync. Use this only when the cost of continued downtime exceeds the cost of data loss.
Audit and harden the script
After recovery, inspect the Lua source.
- Replace unbounded loops over large collections with application-side
SCANor pagination. - Use
redis.pcall()instead ofredis.call()to catch errors without aborting the entire script. - Remove any reliance on
print(). Useredis.log(redis.LOG_WARNING, msg)for debug output. - Remove dependency on key expiry occurring mid-script. Since Redis 7.2, time sampling is frozen during scripts, so keys do not expire while a script runs.
- Avoid blocking commands such as
BLPOP,BRPOP, orWAITinside scripts.
Prevention
- Tune
lua-time-limit. The default of 5000 ms is generous. For latency-sensitive workloads, lower it so runaway scripts surface asBUSYsooner, giving you more time to react before clients time out. - Review script complexity in staging. Test every production Lua script against data volumes at least as large as production. A script that runs in 10 ms against 100 elements may take minutes against 10 million.
- Restrict script execution via ACLs. Use Redis 6.0+ ACLs to limit which users can run
EVAL,EVALSHA, andFUNCTION LOAD. This mitigates both accidental production scripts and exploitation of Lua engine vulnerabilities such as CVE-2025-49844. - Monitor slowlog for script patterns. Any recurring
EVALorEVALSHAin the slowlog is a candidate for refactoring. - Do not rely on key expiry inside scripts. Because keys cannot expire during script execution, any logic that spins waiting for a TTL will loop forever.
- Set
maxmemoryand an eviction policy. Ifmaxmemoryis reached and no keys can be evicted, a write inside a script fails and aborts the script unlessredis.pcall()catches the error.
How Netdata helps
- Correlate throughput drops with CPU saturation and replication lag to confirm event-loop blocking.
- Alert on unexpected
uptime_in_secondsresets afterSHUTDOWN NOSAVE. - Track
evicted_keysandused_memoryalongside BUSY events to spot memory pressure from scripts. - Surface slowlog metrics and per-command statistics to spot
EVAL/EVALSHAoutliers before they triggerBUSY. - Monitor
total_error_repliesto quantify client impact.
Related guides
- How Redis actually works in production: a mental model for operators: /guides/redis/how-redis-works-in-production/
- Redis aof_last_write_status:err: AOF write failures and recovery: /guides/redis/redis-aof-last-write-status-err/
- Redis appendfsync always latency: durability vs throughput trade-offs: /guides/redis/redis-appendfsync-always-latency/
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix: /guides/redis/redis-cant-save-in-background-fork/
- Redis event loop blocked: when one slow command freezes everything: /guides/redis/redis-event-loop-blocked/
- Redis eviction policy tuning: allkeys-lru vs volatile-ttl vs noeviction: /guides/redis/redis-eviction-policy-tuning/
- Redis fork/COW memory storm: why persistence doubles RSS and OOM-kills the box: /guides/redis/redis-fork-cow-storm/
- Redis KEYS command blocking production: why to replace it with SCAN: /guides/redis/redis-keys-command-blocking-production/
- Redis latest_fork_usec too high: THP, NUMA, and fork latency: /guides/redis/redis-latest-fork-usec-high/
- Redis maxmemory not set: why every production instance needs a memory limit: /guides/redis/redis-maxmemory-not-set/
- MISCONF Redis is configured to save RDB snapshots - what it means and how to fix it: /guides/redis/redis-misconf-rdb-snapshots/
- Redis monitoring checklist: the signals every production instance needs: /guides/redis/redis-monitoring-checklist/







