The LSM compaction death spiral
Write rate exceeds disk compaction throughput. L0 SSTables accumulate, storage_l0_sublevels climbs past 10 then 20+, and read amplification rises — which makes compaction itself slower, a positive feedback loop. Eventually Pebble stalls writes, the node can't service its Raft log, loses leases, and appears partially unavailable. If several nodes hit this at once, the cluster goes down.
- storage_l0_sublevels rising past 20 and not decreasing
- storage_write_stalls incrementing (rate above 1/second)
- KV write latency climbing from milliseconds to seconds
- admission store-write queue deep, disk I/O pinned at 100%







