Kubernetes API server audit logging: policy, backends, and forensics
Kubernetes API server audit logging is the authoritative record of every request that reaches the control plane. It captures the identity of the caller, the resource and verb, the timestamp, the stage, and the outcome. Without it, a security investigation into unauthorized access, a compliance audit, or a postmortem into a failed certificate rotation is built on inference rather than evidence.
This article walks through enabling and configuring audit logging for a self-managed cluster. It covers writing an audit policy that balances signal and noise, configuring the file and webhook backends, verifying that events are captured correctly, and running common forensic queries against the resulting logs. It does not cover managed control planes where the provider controls the API server flags.
What this enables
Audit logging produces a structured JSON line for every request that passes through the API server. At the Metadata level, you get the user, source IP, resource, verb, and response code. At RequestResponse, you also get the request and response bodies. This is the data source you use to answer: who created this cluster-admin binding, who read a Secret outside business hours, or why did mass authentication failures start at 02:00. It is also required by compliance frameworks such as CIS Kubernetes Benchmark v1.10 for sensitive resources.
Prerequisites
- Administrative access to control plane nodes to edit the kube-apiserver static pod manifest.
- Kubernetes 1.27 or later. The stable audit policy API is
audit.k8s.io/v1; thev1beta1variant is deprecated and should not be used. - Sufficient disk capacity on control plane nodes if using the file backend, or a reachable HTTPS endpoint if using the webhook backend.
- A maintenance window if you run a single API server instance, because applying the policy requires restarting the API server. HA deployments can be rolled without downtime.
Procedure
1. Write the audit policy
Create a policy file that the API server will read at startup. The policy uses apiVersion: audit.k8s.io/v1 and kind: Policy. Rules are evaluated in order; the first matching rule wins.
A minimal production policy looks like this:
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
- "RequestReceived"
omitManagedFields: true
rules:
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: ""
resources: ["endpoints", "services"]
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
- level: RequestResponse
resources:
- group: ""
resources: ["pods", "deployments", "serviceaccounts"]
- level: Metadata
omitStages:
- "ResponseStarted"
Key decisions in this file:
- omitStages: Suppress
RequestReceivedglobally. This stage creates an event before the request is authenticated or authorized, which produces noise without adding security value. - omitManagedFields: Set to
trueto strip server-side apply field manager metadata from events. This significantly reduces log volume on clusters with many controllers. - Levels: Use
Metadatafor high-volume, low-sensitivity reads such as Secret get/list. UsingRequestResponseon Secret reads would write secret values into the log, which is a security vulnerability. UseRequestResponsefor mutating operations and RBAC changes where the full body is needed for forensics. - Order: The
Nonerule for kube-proxy watch traffic prevents endpoint watch floods from drowning out important events. Place the most specific rules first.
2. Configure the file backend
Mount the policy into the kube-apiserver static pod and add the following flags:
--audit-policy-file=/etc/kubernetes/audit-policy.yaml--audit-log-path=/var/log/kubernetes/audit.log--audit-log-maxsize=100--audit-log-maxbackup=10--audit-log-maxage=30
Setting --audit-log-path=- writes audit events to stdout, which is useful if your control plane logging ships journald directly. If you write to a file, ensure the directory exists and the API server process has write permissions.
The maxsize, maxbackup, and maxage flags control rotation. Without rotation, a busy cluster can fill a disk in hours.
3. Configure the webhook backend (optional)
You can send events to an external SIEM or log aggregator simultaneously with the file backend. Add:
--audit-webhook-config-file=/etc/kubernetes/audit-webhook.yaml
The referenced file is a kubeconfig-style document that points to the remote HTTPS endpoint and includes the CA bundle for verifying the remote server. The webhook backend buffers events before sending. If the destination is unreachable, events can be dropped when the buffer overflows, so treat the file backend as your durable source of truth.
Because the webhook is called in the request path, a slow or unreachable webhook can stall API requests if not properly decoupled. The default buffered mode mitigates this, but monitor the delivery path.
4. Mount and restart the API server
Add hostPath volumes for the policy file and webhook config to the kube-apiserver static pod manifest, then mount them into the container at the paths referenced by the flags.
There is no dynamic reload for audit policy or webhook configuration. Changing the policy requires restarting the API server pod. In a single-instance control plane, schedule this during a maintenance window. In an HA deployment, restart instances one at a time and confirm /readyz passes before proceeding to the next.
Verifying it works
After restart, generate a test event and inspect the output.
Create a test resource:
kubectl create configmap audit-test --from-literal=key=valueRead the audit log:
grep '"objectRef":{"resource":"configmaps"' /var/log/kubernetes/audit.log | tail -1 | jq .Confirm the expected fields are present:
user.username,verb,requestURI,responseStatus.code, andstageTimestamp.If you enabled RequestResponse for configmaps, confirm the
requestObjectcontains the configmap data but that no secret values appear for secret reads if you restricted those to Metadata.Check for audit annotations that indicate policy enforcement:
grep 'authorization.k8s.io/decision' /var/log/kubernetes/audit.log | head -5
Common pitfalls
- Secret leakage in RequestResponse logs: At RequestResponse level, the full request and response bodies are logged. If a Secret is created or updated, its values appear verbatim in the audit log. Use Metadata level for secret reads.
- Log volume and disk exhaustion: A busy cluster can generate gigabytes of audit data per hour. Pair rotation with host-level log shipping or a sidecar that moves logs to cold storage before deletion.
- Performance impact on slow disks: Writing high-volume audit logs to a shared or network-attached disk can stall the API server. Place the audit log on a local SSD or fast volume.
- Truncation disabled by default: Large payloads from large ConfigMaps or CustomResources can produce oversized events. If
--audit-log-max-sizeis not set, these events may be silently dropped by the log backend rather than truncated. Theaudit.k8s.io/truncatedannotation is only emitted when truncation is active. - No hot reload: Operators sometimes edit the policy file and forget to restart the API server. Verify the running pod spec matches the intended configuration after any change.
- Cloud provider defaults vary: EKS, GKE, and AKS each ship default audit policies that are typically less verbose than a hardened policy. Verify the effective policy via the API server pod spec rather than assuming coverage.
Forensics
Once logging is active, use jq or grep to answer common investigation questions.
Find gaps in the log that might indicate API server unavailability or backend failure:
tail -1000 /var/log/kubernetes/audit.log | \
jq -r '.stageTimestamp' | \
while read ts; do
if [ -n "$prev" ]; then
diff=$(($(date -d "$ts" +%s) - $(date -d "$prev" +%s)))
if [ $diff -gt 60 ]; then
echo "Gap of ${diff}s between $prev and $ts"
fi
fi
prev="$ts"
done
Find anonymous requests:
grep '"username":"system:anonymous"' /var/log/kubernetes/audit.log | tail -20
Find RBAC privilege escalation:
grep -E '"resource":"clusterrolebindings".*"verb":"(create|update)"' \
/var/log/kubernetes/audit.log | grep -i "cluster-admin" | tail -20
Find secret access across namespaces:
grep '"resource":"secrets"' /var/log/kubernetes/audit.log | tail -50
Find deprecated API usage:
grep 'k8s.io/deprecated' /var/log/kubernetes/audit.log | \
jq -r '.annotations["k8s.io/removed-release"]' | sort | uniq -c
Signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Audit log gaps > 60s | Unexplained gaps indicate API server downtime or audit backend failure | Any gap during active cluster hours |
| Anonymous request rate | Anonymous requests may indicate misconfiguration or probing | Spike above baseline or successful access to non-public resources |
| Audit log disk usage | Full disk stops audit logging and can crash the API server | Disk > 80% of capacity on the audit volume |
| RBAC modification rate | Unexpected bindings may indicate compromise | Any cluster-admin binding outside change management |
| Webhook delivery failures | Dropped events break the forensics trail | Failed webhook delivery rate > 0 |
How Netdata helps
- Correlate audit log gaps with the
apiserver_request_totalerror rate and etcd health metrics to distinguish API server crashes from backend disk failures. - Track control plane node disk utilization to receive advance warning before audit log volume fills the volume.
- Monitor API server request latency spikes that coincide with webhook backend latency, identifying when the audit pipeline is adding synchronous overhead.
Related guides
- How the Kubernetes control plane works: a mental model for operators
- Kubernetes anonymous API access: detection, audit, and lockdown
- Kubernetes API server certificate rotation: detection and grace handling
- Kubernetes API server etcd latency: detection and cascading failures
- Kubernetes API server FlowSchemas and PriorityLevels: design and tuning
- Kubernetes API server memory pressure: OOM cycle and tuning
- Kubernetes API server rate limiting: APF priority levels and starvation
- Kubernetes API server slow or unresponsive: causes and fixes
- Kubernetes API server watch storm: re-list cascades and connection floods
- Kubernetes bound service account tokens: rotation, audience, and expiry
- Kubernetes conntrack exhaustion: dropped connections under load
- Kubernetes controller-manager leader election failures






