ClickHouse too many open files: file descriptors, part count, and nofile limits

ClickHouse aborts queries with “Too many open files” or the server process dies. Logs show errors about failing to open column files or metadata. This is not a traditional leak; it is a capacity cliff. Every active MergeTree part keeps multiple files open, background merges temporarily spike that count, and the Linux nofile limit is usually the bottleneck. The default limit of 1024 is catastrophic for production ClickHouse.

A table with 100 columns and 200 active parts needs roughly 20,000 file descriptors for column files alone, plus metadata, sockets, and log files. A merge that combines several source parts into one opens all source and target column files simultaneously. When the hard limit is reached, the next open() fails with EMFILE, which crashes the server or causes cascading query failures.

What this means

Each MergeTree data part is a directory containing one file per column plus index, checksum, and metadata files. ClickHouse opens these files to read or write the part. More active parts means more open files. Wide tables amplify this because file count scales linearly with columns.

Background merges make usage spiky. When ClickHouse merges N source parts into one target part, it opens all source and all target column files concurrently. A single merge can spike file descriptor usage far above the steady-state level. If the nofile limit is close to typical usage, merges push you over.

The failure mode is a hard cliff. Once the process hits the nofile limit, open() returns EMFILE. ClickHouse may crash, refuse new connections, or fail queries with errors about unreadable parts. There is no graceful degradation.

flowchart TD
    A[Inserts create parts] --> B[Active part count grows]
    B --> C[Column files + metadata kept open]
    C --> D[Baseline FD usage rises]
    D --> E[Merge opens all source and target files]
    E --> F[FD usage spikes]
    F --> G[Hit nofile hard limit]
    G --> H[Open fails server crash or query errors]

Common causes

CauseWhat it looks likeFirst thing to check
Part count explosionFD count tracks active parts; DelayedInserts is climbingsystem.parts active count per partition
Low nofile limitFD usage is near the limit even under normal load/proc/<pid>/limits for Max open files
Wide tablesFD count is disproportionately high relative to part countCompare columns per table against FD baseline
Merge stormsSudden FD spikes that correlate with heavy merge activitysystem.merges for num_parts and elapsed time
Connection leaksFD count is high but active part count is normalsystem.metrics for TCPConnection and HTTPConnection

Quick checks

Run these read-only checks from the server host.

# Current open FD count for the ClickHouse process
ls /proc/$(pgrep clickhouse-server)/fd | wc -l

# Soft and hard nofile limits
cat /proc/$(pgrep clickhouse-server)/limits | grep "Max open files"

# What kinds of FDs are open (files, sockets, pipes)
ls -l /proc/$(pgrep clickhouse-server)/fd | awk '{print $NF}' | sort | uniq -c | sort -nr | head
-- ClickHouse-tracked open file handles
SELECT metric, value
FROM system.metrics
WHERE metric IN ('OpenFileForRead', 'OpenFileForWrite');
-- TODO: verify metric names for your ClickHouse version

-- Active parts per partition (the primary FD driver)
SELECT
    database,
    table,
    partition_id,
    count() AS active_parts
FROM system.parts
WHERE active = 1
GROUP BY database, table, partition_id
ORDER BY active_parts DESC
LIMIT 20;

-- Active merges (temporary FD spikes)
SELECT
    database,
    table,
    elapsed,
    num_parts,
    is_mutation
FROM system.merges
ORDER BY elapsed DESC;

-- Background pool saturation
SELECT metric, value
FROM system.metrics
WHERE metric LIKE 'Background%Pool%';

-- Client connection count
SELECT metric, value
FROM system.metrics
WHERE metric IN ('TCPConnection', 'HTTPConnection', 'InterserverConnection');

How to diagnose it

  1. Confirm the limit is being hit. Check /proc/$(pgrep clickhouse-server)/limits for the Max open files line and compare it to ls /proc/$(pgrep clickhouse-server)/fd | wc -l. If the count is within 10-20% of the soft limit, you are in the danger zone. grep the ClickHouse server logs for “Too many open files” or “EMFILE”.

  2. Correlate FD usage with part count. Run the system.parts query. Each active part contributes roughly one FD per column plus a few for metadata. If your top partitions have hundreds of active parts and your tables have dozens or hundreds of columns, the math explains the FD count.

  3. Check for merge-induced spikes. Query system.merges. If merges are combining many parts or have been running for a long time, they are the likely trigger that pushed FD usage over the limit even if the baseline seemed safe.

  4. Identify wide-table hotspots. If one table dominates the FD count despite having fewer parts than others, it likely has many columns. Wide tables are FD-intensive because every part opens a file for every column.

  5. Rule out connection leaks. If TCPConnection or HTTPConnection is unexpectedly high and stable while query load is low, clients may be leaking connections. Each connection consumes an FD independently of parts.

  6. Verify systemd or container limits. Even if you raised /etc/security/limits.conf, a ClickHouse process started by systemd or inside a container inherits limits from the runtime, not from PAM. Check /proc/<pid>/limits to see the effective limit.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Open FDs vs nofile limitProximity to the hard cliff> 70% of soft limit sustained
Active parts per partitionPrimary driver of baseline FD usage> 50% of parts_to_delay_insert default (1000)
OpenFileForRead / OpenFileForWriteClickHouse internal tracking of open handlesSteady growth without corresponding insert rate increase
Background pool utilizationSaturated pool delays merges, letting parts accumulate> 90% sustained for > 10 minutes
Merge activityMerges temporarily spike FD usageLong-running merges with high num_parts
Client connection countConnections consume FDs independently of data parts> 80% of max_connections

Fixes

Raise the nofile limit

Set the limit to at least 100000 before ClickHouse starts; ClickHouse recommends 262144. The limit is fixed at process start time, so changing it requires restarting ClickHouse. Plan this during a maintenance window.

  • For systemd: set LimitNOFILE=262144 in the service unit, run systemctl daemon-reload, then restart ClickHouse.
  • For containers: configure the limit at the runtime or orchestrator level.
  • For bare metal or init.d: set the limit in the environment that starts the process.

Do not rely on /etc/security/limits.conf alone for systemd or container deployments. Verify with /proc/<pid>/limits after restart.

Reduce part count pressure immediately

If you cannot restart, reduce active parts to lower open files:

  • Throttle or pause inserts to stop new part creation.
  • Kill blocking mutations with KILL MUTATION if system.mutations shows long-running mutations consuming the background pool. See ClickHouse ALTER UPDATE/DELETE overuse.
  • Detach old partitions to remove their files from the active set. WARNING: This makes data unavailable until reattached.

Tune merge capacity

If the merge pool is saturated, merges cannot complete fast enough to close files and free FDs:

  • Check system.merges and background pool metrics. If merges are stuck due to disk space, see ClickHouse disk space collapse.
  • Ensure background_merges_mutations_concurrency_ratio is appropriate for your CPU and I/O capacity.

Fix connection leaks

If connection count is the primary FD consumer rather than parts:

  • Identify leaking clients via system.processes and system.metrics.
  • Set aggressive client-side connection timeouts and pool limits.
  • Consider lowering max_connections temporarily to force clients to queue rather than open infinite sockets.

Prevention

  • Set nofile to 262144 at the OS and runtime level. The Linux default of 1024 is insufficient for production ClickHouse, and the limit is fixed at process start.
  • Monitor part count at the partition level. The FD cost is per-partition and per-column, so a single hot partition can exhaust limits even when the table total looks safe.
  • Batch inserts to 1000+ rows per INSERT. Many small inserts create parts faster than merges can close them, driving FD usage up permanently.
  • Alert on the merge-to-insert ratio. When part creation chronically exceeds merge completion, FD usage trends upward until it hits the hard limit.
  • Account for wide tables in capacity planning. A table with 500 columns opens an order of magnitude more files per part than a table with 10 columns.
  • Verify container runtime limits independently. Orchestrators and container runtimes can impose their own nofile ceilings that override OS settings.

How Netdata helps

  • Correlates open file descriptor usage with the process nofile limit, showing proximity to the hard cliff before crashes.
  • Tracks active part count per partition and MaxPartCountForPartition to identify FD growth drivers.
  • Monitors background pool utilization and merge activity to flag spikes in temporary FD consumption.
  • Surfaces DelayedInserts and RejectedInserts, which often rise alongside FD pressure.
  • Displays per-query memory and I/O alongside FD metrics to distinguish part-driven exhaustion from connection leaks or query storms.