Elasticsearch FORBIDDEN/12/index read-only / allow delete (api) — flood stage recovery
When Elasticsearch returns cluster_block_exception with FORBIDDEN/12/index read-only / allow delete (api), every write, update, and index creation fails with HTTP 403. Search continues to work.
This happens when a data node crosses the flood-stage disk watermark (95% by default). Elasticsearch auto-applies index.blocks.read_only_allow_delete to every index with a shard on that node, preventing writes to avoid Lucene segment corruption from a full disk.
In 7.x and 8.x, the block auto-clears once disk usage drops below the high watermark (90%). If you cannot reach that threshold, or if you need writes to resume immediately after freeing space, clear the block manually. Until then, ingestion pipelines back up and time-series data is lost.
What this means
Elasticsearch uses three disk watermarks to govern shard allocation. The defaults are percentages of used disk space:
- Low (85%): no new shards are allocated to the node.
- High (90%): Elasticsearch actively relocates shards away from the node.
- Flood stage (95%): Elasticsearch sets
index.blocks.read_only_allow_delete: trueon every index that has at least one shard on the affected node.
The block permits deletes so you can remove data to recover. The index otherwise becomes read-only. In 7.x and 8.x, the block auto-clears once disk usage on the triggering node falls below the high watermark. However, if you remove the block manually while disk remains above the flood stage, Elasticsearch reapplies it on the next disk allocation cycle. Sending the unblock API without freeing space creates a loop.
This error is a symptom. The root cause is unmanaged data growth, uneven shard distribution, or a transient disk spike (for example, a large merge temporarily doubling segment size).
flowchart TD
A[Node disk crosses 95% flood stage] --> B[Elasticsearch sets index.blocks.read_only_allow_delete on affected indices]
B --> C{Free enough disk to drop below 90% high watermark?}
C -->|Yes| D[Block automatically clears and writes resume]
C -->|No| E[Writes stay blocked]
E --> F{Operator frees disk manually?}
F -->|Yes| G[Disk below 95% but may still be above 90%]
G --> H[Manually clear block or wait for auto-clear below 90%]
F -->|No| I[Add capacity or delete data to unblock]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Unmanaged index growth | Old time-series indices never deleted; disk trends linearly upward | GET /_cat/indices?v&s=store.size:desc and ILM explain |
| Uneven shard distribution | One node at 96% while others sit at 60% | GET /_cat/allocation?v |
| Merge spike | Disk usage jumps rapidly without a matching increase in document count | GET /_nodes/stats/indices/merges plus OS disk metrics |
| Non-Elasticsearch data on the volume | _cat/allocation shows less usage than the OS | df -h on the affected node |
| Shard relocations from another high-watermark node | Incoming shards push a previously safe node over 95% | GET /_cluster/health and GET /_cat/recovery |
Quick checks
Run these read-only commands to confirm the failure scope and identify the affected node.
# Confirm cluster health and check for active shard relocations
curl -s 'http://localhost:9200/_cluster/health?filter_path=status,number_of_nodes,relocating_shards,unassigned_shards'
# Identify which nodes are above watermarks and by how much
curl -s 'http://localhost:9200/_cat/allocation?v'
# Verify that the read-only block is present on indices
curl -s 'http://localhost:9200/_all/_settings?filter_path=*.settings.index.blocks.read_only_allow_delete'
# Inspect current watermark thresholds and defaults
curl -s 'http://localhost:9200/_cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.disk.watermark.*'
# OS-level disk utilization on the node (run locally on the host)
df -h
# List largest indices to identify deletion candidates
curl -s 'http://localhost:9200/_cat/indices?v&h=index,pri,rep,docs.count,store.size,pri.store.size&s=store.size:desc'
How to diagnose it
- Confirm the block. The write error confirms the index is read-only. Use the settings API to see how many indices carry
index.blocks.read_only_allow_delete. If the flood stage hit a node holding shards for many indices, the impact is cluster-wide. - Find the triggering node.
_cat/allocationshowsdisk.percentper node. Any node at or above 95% is the trigger. If multiple nodes are near the limit, you are in a disk watermark cascade. - Compare Elasticsearch data with total disk.
_cat/allocationcounts shard data, not every file on the volume. Rundf -hon the node. If OS disk is much higher than ES-reported shard size, non-Elasticsearch files (logs, snapshots on local disk, other services) are consuming space. - Determine why disk is full. Check the largest indices, ILM execution status (
GET /<index>/_ilm/explain), and merge activity. A sudden jump often correlates with a large force merge or translog accumulation during recovery. - Check for active relocations. If another node recently crossed the high watermark, shards may be relocating onto the flooded node, accelerating the problem.
GET /_cluster/healthshowsrelocating_shards.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Disk used percent per node (_cat/allocation) | The direct trigger; one node at 95% blocks writes for all indices with shards on it | Any data node >90% |
Shard relocation count (_cluster/health) | Relocations from high-watermark nodes add I/O load and can push targets toward flood stage | relocating_shards rising while disk >85% |
Indexing rate (_nodes/stats/indices/indexing) | Confirms business impact and data loss window | Rate drops to zero or spikes with rejections |
Write thread pool rejections (_cat/thread_pool) | After the block lifts, a backlog of writes may overwhelm the queue | write rejections sustained >0/min |
| Cluster health status | Red or yellow health during the incident indicates relocation or primary shard problems | Status yellow/red combined with disk >90% |
Fixes
Free disk space immediately
The only durable fix is to reduce disk usage on the affected node. The fastest methods are:
- Delete old or non-critical indices. Use
DELETE /<index>. This is the most effective immediate action. - Drop unnecessary replicas temporarily. If an index has multiple replicas and you need space urgently, reduce the replica count. Warning: this costs I/O later when Elasticsearch re-replicates those shards, and you lose redundancy until replication completes.
- Remove non-Elasticsearch files on the same volume, such as old application logs or temporary exports. Do not manually delete Elasticsearch translog files, segment data, or snapshot repository contents managed by the cluster; this will corrupt shards.
After deleting data, wait for the disk monitor to refresh (default interval is 30 seconds, controlled by cluster.info.update.interval), then verify the node reports lower usage in _cat/allocation.
Clear the read-only block
In 7.x and 8.x, the block auto-clears when disk drops below the high watermark (90%). If you need writes to resume before reaching that threshold, or if the auto-clear does not trigger promptly, remove the block manually:
# Clear the block on all indices
curl -X PUT 'http://localhost:9200/_all/_settings' -H 'Content-Type: application/json' -d '
{
"index.blocks.read_only_allow_delete": null
}'
If disk is still above the flood stage when you run this, Elasticsearch re-applies the block on the next allocation cycle. Do not script this call in a retry loop without first freeing space.
If you cannot delete data
If the data must be retained and disk cannot be freed quickly:
- Add data nodes. New nodes give the allocator space to relocate shards, which reduces pressure on the full node. This is slower but preserves data.
- Expand the underlying volume. If running on cloud block storage or a SAN, volume expansion may be faster than adding nodes, though it requires a filesystem resize and possibly a rolling restart depending on the OS and deployment.
Do not mask the problem
Disabling cluster.routing.allocation.disk.threshold_enabled stops Elasticsearch from enforcing disk watermarks and prevents shard relocations driven by disk pressure. Existing read-only blocks may remain, and the node can then fill its disk completely, risking segment corruption, translog failures, and crashes. Use this only as a temporary emergency measure while you add capacity, and re-enable it immediately afterward.
Prevention
- Alert on the high watermark (90%), not the flood stage. By the time flood stage triggers, writes are already blocked. A 90% alert gives you time to act.
- Monitor per-node disk, not cluster-wide averages. A cluster can report 70% disk used while one node is at 93%. Hot-spotted shards cause isolated floods.
- Enforce ILM or curator deletion policies. Time-series clusters generate data continuously. If old indices are not deleted on schedule, flood stage is inevitable.
- Account for merge overhead in capacity planning. A large merge can temporarily require disk space for both old and new segments. Maintain at least 20% free space below the high watermark to absorb these spikes.
- Review shard allocation regularly. Use shard allocation awareness and rebalancing settings to prevent a single node from accumulating disproportionately large shards.
How Netdata helps
- Per-node disk utilization charts show which node is approaching watermarks before the block fires.
- Indexing rate and write-thread-pool rejection metrics correlate with the block onset, showing the business impact in the same timeline as disk saturation.
- Shard relocation and cluster health status show a disk watermark cascade before it reaches flood stage.
- JVM heap and segment memory metrics distinguish pure disk pressure from composite failures involving heap saturation.
Related guides
- Elasticsearch all shards failed: diagnosing search_phase_execution_exception
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch heap pressure death spiral: GC, node removal, and the cascade
- Elasticsearch JVM heap usage high: reading the sawtooth and the post-GC floor
- Elasticsearch this action would add too many shards: max_shards_per_node limit
- Elasticsearch monitoring checklist: the signals every production cluster needs
- Elasticsearch monitoring maturity model: from survival to expert
- Elasticsearch long GC pauses: old-generation stop-the-world and node drops
- Elasticsearch node OOM-killed: heap ceiling, page cache, and container limits







