Redis BUSY Redis is busy running a script: blocking Lua and how to recover

redis-cli returns (error) BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Normal commands stall. Redis is not down, but it might as well be: a Lua script is holding the single event loop hostage and will not yield until it finishes or you intervene.

Redis executes EVAL and EVALSHA atomically. While a script runs, no other command processes. Once execution exceeds lua-time-limit (default 5000 ms), Redis replies with BUSY to other clients. The script itself continues until it finishes, is killed, or the server shuts down. Whether the script has already performed writes determines whether you can kill it safely or must choose between waiting and a hard shutdown.

What this means

Redis runs Lua scripts in the main thread. Atomicity guarantees that no other client sees intermediate script state, but the tradeoff is total event-loop blocking. Once execution exceeds lua-time-limit (default 5000 ms), Redis returns BUSY to all other clients.

SCRIPT KILL is the primary recovery command. It succeeds only if the running script has not yet executed any write command. If the script has written even once, Redis returns (error) UNKILLABLE Sorry the script already executed write commands. Redis cannot leave the dataset in a partially modified state, so it refuses to interrupt a script that has performed writes. When a script is unkillable, the only in-process recovery option is SHUTDOWN NOSAVE, which kills the Redis process immediately and discards any data written since the last RDB snapshot or AOF fsync.

Common causes

CauseWhat it looks likeFirst thing to check
Unbounded iteration over large keys inside LuaBUSY appears after a batch job starts; instantaneous_ops_per_sec drops to zeroSLOWLOG GET and CLIENT LIST to identify the source client and key
Application script logic that scales with data volumeScript runtime grows as sets, lists, or hashes grow; recurring BUSY from the same client IPKey cardinality with SCARD, LLEN, HLEN, and the script source code
Blocking command misuse inside scriptScript hangs indefinitely or exceeds lua-time-limit without obvious CPU burnScript content for BLPOP, BRPOP, WAIT, or XREAD calls
Reliance on key expiry during execution (Redis 7.2+)Script loops waiting for a TTL that never arrives because time sampling is frozenScript logic and INFO server version

Quick checks

Run these read-only commands to assess the situation without making it worse. Note that once a script exceeds lua-time-limit, Redis rejects all commands except SCRIPT KILL and SHUTDOWN NOSAVE. If Redis is already BUSY, run CLIENT LIST, SLOWLOG, and INFO from a replica or after recovery.

# Verify Redis is in BUSY state and measure response time
time timeout 10 redis-cli PING

# Check the configured lua-time-limit
redis-cli CONFIG GET lua-time-limit

# Identify the client running the script
redis-cli CLIENT LIST

# Review recent slow commands for EVAL/EVALSHA
redis-cli SLOWLOG GET 10

# Check how often scripts are invoked
redis-cli INFO commandstats | grep -E "cmdstat_eval|cmdstat_evalsha"

# Check current throughput to confirm event-loop blocking
redis-cli INFO stats | grep instantaneous_ops_per_sec

How to diagnose it

  1. Confirm event-loop blocking. Run timeout 5 redis-cli PING. If it hangs or returns BUSY, the main thread is blocked.
  2. Identify the offending client. If you caught the incident before the timeout, CLIENT LIST shows active connections. Look for cmd=eval or cmd=evalsha. Otherwise, check application logs or the Redis slowlog after recovery.
  3. Inspect slowlog history. SLOWLOG GET 50 after recovery reveals whether the same script pattern appears repeatedly with high execution times.
  4. Attempt SCRIPT KILL. This is both diagnosis and recovery. OK means the script had not written and is terminated. UNKILLABLE means it has performed writes and cannot be interrupted.
  5. Assess resource context. Check INFO memory for pressure toward maxmemory, and INFO replication for replica timeouts caused by the blocked primary.
  6. Determine scope of impact. If the script is unkillable, estimate completion time from its logic and data size. Compare that against your recovery time objective.
flowchart TD
    A[BUSY error received] --> B[Attempt SCRIPT KILL]
    B -->|OK| C[Script terminated]
    B -->|UNKILLABLE error| D{Can you wait?}
    D -->|Yes| E[Wait for completion]
    D -->|No| F[SHUTDOWN NOSAVE]
    F --> G[Process exits unwritten data lost]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
instantaneous_ops_per_secEvent-loop blocking drops throughput to zeroSustained drop while clients remain connected
Slowlog EVAL / EVALSHA entriesIdentifies specific slow scripts before they trigger BUSYEntries > 100 ms or rapid slowlog growth
used_cpu_user_main_thread (Redis 6.2+)Main-thread saturation from script executionRate approaching 1.0 during script workloads
Replication offset lagA blocked primary stops sending replication trafficLag growing while ops per sec is zero
evicted_keys rateLong scripts can aggravate memory pressureSpike concurrent with or following long script execution
total_error_replies rateBUSY errors contribute to client-visible failuresRate increase correlating with script execution window

Fixes

Kill a read-only script

If SCRIPT KILL returns OK, the script had not yet executed a write command. The script client receives an error, but the dataset remains consistent. This is the safest recovery path. After killing it, identify the source client via CLIENT LIST and audit the script before running it again.

Wait for an unkillable script to finish

If SCRIPT KILL returns UNKILLABLE, the script has already modified the dataset. Redis cannot roll back partial script effects, so waiting is often the least destructive option if the script is expected to finish in a predictable time and your application tolerates the delay. Use CLIENT LIST to confirm the script is still executing and has not crashed silently.

Use SHUTDOWN NOSAVE as last resort

When a script is unkillable, blocking all clients, and you cannot wait, SHUTDOWN NOSAVE forces immediate process termination. All unwritten data is discarded, including not only the script’s in-flight changes but any other writes since the last RDB save or AOF fsync. Use this only when the cost of continued downtime exceeds the cost of data loss.

Audit and harden the script

After recovery, inspect the Lua source.

  • Replace unbounded loops over large collections with application-side SCAN or pagination.
  • Use redis.pcall() instead of redis.call() to catch errors without aborting the entire script.
  • Remove any reliance on print(). Use redis.log(redis.LOG_WARNING, msg) for debug output.
  • Remove dependency on key expiry occurring mid-script. Since Redis 7.2, time sampling is frozen during scripts, so keys do not expire while a script runs.
  • Avoid blocking commands such as BLPOP, BRPOP, or WAIT inside scripts.

Prevention

  • Tune lua-time-limit. The default of 5000 ms is generous. For latency-sensitive workloads, lower it so runaway scripts surface as BUSY sooner, giving you more time to react before clients time out.
  • Review script complexity in staging. Test every production Lua script against data volumes at least as large as production. A script that runs in 10 ms against 100 elements may take minutes against 10 million.
  • Restrict script execution via ACLs. Use Redis 6.0+ ACLs to limit which users can run EVAL, EVALSHA, and FUNCTION LOAD. This mitigates both accidental production scripts and exploitation of Lua engine vulnerabilities such as CVE-2025-49844.
  • Monitor slowlog for script patterns. Any recurring EVAL or EVALSHA in the slowlog is a candidate for refactoring.
  • Do not rely on key expiry inside scripts. Because keys cannot expire during script execution, any logic that spins waiting for a TTL will loop forever.
  • Set maxmemory and an eviction policy. If maxmemory is reached and no keys can be evicted, a write inside a script fails and aborts the script unless redis.pcall() catches the error.

How Netdata helps

  • Correlate throughput drops with CPU saturation and replication lag to confirm event-loop blocking.
  • Alert on unexpected uptime_in_seconds resets after SHUTDOWN NOSAVE.
  • Track evicted_keys and used_memory alongside BUSY events to spot memory pressure from scripts.
  • Surface slowlog metrics and per-command statistics to spot EVAL / EVALSHA outliers before they trigger BUSY.
  • Monitor total_error_replies to quantify client impact.
  • How Redis actually works in production: a mental model for operators: /guides/redis/how-redis-works-in-production/
  • Redis aof_last_write_status:err: AOF write failures and recovery: /guides/redis/redis-aof-last-write-status-err/
  • Redis appendfsync always latency: durability vs throughput trade-offs: /guides/redis/redis-appendfsync-always-latency/
  • Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix: /guides/redis/redis-cant-save-in-background-fork/
  • Redis event loop blocked: when one slow command freezes everything: /guides/redis/redis-event-loop-blocked/
  • Redis eviction policy tuning: allkeys-lru vs volatile-ttl vs noeviction: /guides/redis/redis-eviction-policy-tuning/
  • Redis fork/COW memory storm: why persistence doubles RSS and OOM-kills the box: /guides/redis/redis-fork-cow-storm/
  • Redis KEYS command blocking production: why to replace it with SCAN: /guides/redis/redis-keys-command-blocking-production/
  • Redis latest_fork_usec too high: THP, NUMA, and fork latency: /guides/redis/redis-latest-fork-usec-high/
  • Redis maxmemory not set: why every production instance needs a memory limit: /guides/redis/redis-maxmemory-not-set/
  • MISCONF Redis is configured to save RDB snapshots - what it means and how to fix it: /guides/redis/redis-misconf-rdb-snapshots/
  • Redis monitoring checklist: the signals every production instance needs: /guides/redis/redis-monitoring-checklist/