ClickHouse authentication failures: system.session_log, brute force, and credential drift

You notice a spike in failed connection attempts to ClickHouse: a security scanner flags repeated TCP 9000 probes, or an application logs connection timeouts after a secrets rotation. ClickHouse exposes authentication events through system.session_log, but only if the feature is enabled. Without it, fallback to server error logs and system.query_log exceptions.

The failures split into two patterns. Malicious: brute-force or credential-scanning campaigns against exposed TCP 9000 or HTTP 8123. Operational drift: a rotated password not updated in a client config, or a deployment shipping an old connection string. Distinguish them fast. Block an external attacker at the network layer; fix credential drift on the client.

This guide shows how to read system.session_log, correlate failures with connection counts and network exposure, and fix the root cause.

What this means

system.session_log records session lifecycle events, including LoginFailure entries with the username, source IP (client_address), authentication type, and failure reason. More than ten LoginFailure events per minute from one IP is a strong brute-force signal. Any failure from a new source IP needs tracing to a known application or user.

If session_log is disabled, the same events may appear in ClickHouse server logs or as AUTHENTICATION_FAILED exceptions in system.query_log. These are harder to aggregate and may lack source-address detail. Regardless of source, failures correlate with network exposure. If listen_host is bound to 0.0.0.0 or a public interface, any host that can reach ports 9000 and 8123 is in the attack surface.

flowchart TD
    A[Auth failures detected] --> B{session_log enabled?}
    B -->|No| C[Enable session_log or use query_log fallback]
    B -->|Yes| D[Aggregate by client_address and user]
    D --> E{Single IP > 10/min?}
    E -->|Yes| F[Brute force or scanning]
    E -->|No| G{Service account failing?}
    G -->|Yes| H[Credential drift]
    G -->|No| I[Check network exposure and client configs]

Common causes

CauseWhat it looks likeFirst thing to check
Brute force or credential scanning> 10 LoginFailure events per minute from one IP; many distinct usernamessystem.session_log aggregated by client_address
Credential rotation driftSteady failures from a known app server or service accountuser and client_address in system.session_log; secrets manager sync status
Application misconfigurationFailures begin after a deployment; usually one userDeployment timeline and user in system.query_log
Overly permissive network bindingExternal IPs reaching TCP 9000 or HTTP 8123 at allss -tlnp output for listen_host
Misconfigured monitoring probesRegular, low-rate failures from internal infra hostsSource IP of monitoring checkers against known probe config

Quick checks

Run these read-only checks to characterize the failure pattern without changing any state.

-- Recent authentication failures from session_log
SELECT event_time, user, client_address, auth_type, failure_reason
FROM system.session_log
WHERE type = 'LoginFailure'
  AND event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC;
-- Aggregate failures by source IP to detect brute force
SELECT
    client_address,
    user,
    count(*) AS failures,
    max(event_time) AS last_failure
FROM system.session_log
WHERE type = 'LoginFailure'
  AND event_time > now() - INTERVAL 10 MINUTE
GROUP BY client_address, user
HAVING failures > 10
ORDER BY failures DESC;
-- Fallback: authentication errors from query_log
SELECT event_time, user, exception, query_id
FROM system.query_log
WHERE exception LIKE '%AUTHENTICATION_FAILED%'
  AND event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC;
# Check network exposure: what interfaces is ClickHouse bound to
ss -tlnp | grep clickhouse
-- Check connection volume for context
SELECT metric, value
FROM system.metrics
WHERE metric IN ('TCPConnection', 'HTTPConnection');

If system.session_log does not exist or returns no rows, the feature is not enabled. Enable it in the server configuration to capture these events.

How to diagnose it

  1. Confirm the event source. Query system.session_log for type = 'LoginFailure'. If the table is empty, use the system.query_log fallback with exception LIKE '%AUTHENTICATION_FAILED%'. Note that query_log lacks the precise source-address detail of session_log.

  2. Identify the failure pattern. Aggregate by client_address and user. A single IP with more than ten failures per minute suggests brute force or automated scanning. A single service account failing from a known application host suggests credential drift.

  3. Correlate with changes. Check whether the onset of failures aligns with a recent deployment, secrets rotation, or infrastructure change. Credential drift almost always starts within minutes of a password or key rotation.

  4. Audit network exposure. Run ss -tlnp | grep clickhouse and inspect the bound addresses. If ClickHouse is listening on 0.0.0.0 or a public interface and you see brute-force attempts from external IPs, the immediate priority is reducing that exposure.

  5. Review server error logs. Check the ClickHouse server log file for connection failure details. On standard Linux installations this is /var/log/clickhouse-server/clickhouse-server.log. Look for unknown user, wrong password, or protocol mismatch messages.

  6. Map internal failures to consumers. For operational drift, filter session_log by the failing user and map client_address to known applications or hosts. Verify the connection strings and credentials in the corresponding secrets manager or configuration store.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
system.session_log LoginFailure rateCaptures every failed authentication event with source IP and reason> 10 failures per minute from one IP
system.query_log AUTHENTICATION_FAILEDFallback when session_log is disabled; tracks exceptions across all queriesSustained failures from service accounts
Client connection countDistinguishes brute force from connection leaks or retry stormsSpike in TCPConnection matching auth failure times
Network interface bindingUnnecessary exposure invites scanning and widens the blast radiusListening on 0.0.0.0 or public interfaces

Fixes

Brute force or credential scanning

Block the source IP at your firewall, cloud security group, or reverse proxy. Do not rely on ClickHouse for rate limiting. If ClickHouse is directly exposed because of a permissive listen_host, restrict it to internal interfaces or specific addresses. If the source is an internal misconfigured health check, fix the checker instead of blocking the IP.

Credential drift after rotation

Identify the failing user from system.session_log.user. Update the password or key in the application’s connection string, environment variable, or secrets manager. Restart or reload the client to clear cached credentials. Verify by watching LoginFailure entries for that user stop. If old and new credentials overlap during rotation, revoke the old credential to prevent silent fallback.

Application misconfiguration

Correlate the start of failures with a deployment timestamp. Roll back if ongoing, or patch the configuration. Use client_address from system.session_log or user and event_time in system.query_log to identify the emitting host if session detail is insufficient.

Missing session_log coverage

If system.session_log is disabled, failed authentication events are invisible to native SQL audit. Enable it in the ClickHouse server configuration. Until then, use system.query_log and server error logs as fallbacks.

Prevention

  • Enable and retain system.session_log.
  • Bind ClickHouse to specific internal interfaces via listen_host; audit with ss -tlnp after any configuration change.
  • Store credentials in a secrets manager and automate rotation with application restarts or hot-reload.
  • Monitor for LoginFailure spikes as an infrastructure security signal, not just a database issue.
  • Run periodic audits of active users and their expected source IP ranges.

How Netdata helps

Netdata collects ClickHouse TCPConnection and HTTPConnection metrics and query error rates. Correlate connection spikes with error-rate jumps to distinguish brute-force scans from client misconfiguration. Set alerts on unusual connection counts or error rates to catch authentication issues without manual polling.