Monitoring Kubernetes across vanilla, EKS, GKE, AKS, k3s, and rke2

Every Kubernetes cluster exposes the same core control plane components, but what you can actually scrape, query, and alert on depends entirely on how the distribution packages those components. An operator migrating between vanilla kubeadm, Amazon EKS, Google GKE, Azure AKS, k3s, and rke2 needs to know which metrics endpoints exist, which are reachable without extra tooling, and which are gated behind vendor lock-in or architectural quirks. This article maps the native monitoring surface of each variant so you can adapt your collection strategy without discovering gaps during an incident.

Vanilla and kubeadm

Vanilla Kubernetes and clusters bootstrapped with kubeadm give you full, unfiltered access to the entire control plane. The API server serves /metrics, /livez, and /readyz on port 6443. The kube-controller-manager and kube-scheduler expose their own metrics endpoints on ports 10257 and 10259. etcd serves metrics on port 2379. The kubelet serves cAdvisor container metrics, resource metrics, and probe tallies on port 10250, with the read-only port 10255 still commonly enabled by default on self-managed clusters. Control plane components run as static pods on control plane nodes, so their logs land in journald or container runtime log files, and kubelet manages their lifecycle directly.

What you do not get out of the box is any aggregation layer. metrics-server, kube-state-metrics, and node-exporter are not installed by default in a raw kubeadm cluster. You must bring your own scraping infrastructure and secure it. Stacked etcd is common, which means etcd shares disk I/O with the API server and other control plane components. That co-location makes etcd_disk_wal_fsync_duration_seconds a critical signal to monitor locally, because disk saturation on the control plane node will cascade into API server write latency.

The read-only kubelet port 10255 is another operational liability. It is unauthenticated and disabled by default on most managed offerings, but often left open on self-managed clusters. Recent security advisories note that unauthenticated kubelet APIs can be used for denial of service via checkpoint abuse. Production kubeadm clusters should disable port 10255 and enforce --authorization-mode=Webhook on the kubelet.

Amazon EKS

EKS runs the control plane on AWS-managed infrastructure. You cannot SSH into the control plane nodes, and the API server metrics endpoint on port 6443 is not directly scrapable from your worker nodes. Until EKS 1.28, this meant control plane observability was limited to what you could infer from worker-side signals and API audit logs. Starting with EKS 1.28, core control plane metrics (API server request rates, etcd database size, scheduler attempts) are automatically shipped to Amazon CloudWatch under the AWS/EKS namespace at no extra cost. CloudWatch Container Insights with enhanced observability adds kube-apiserver, etcd, and kube-scheduler telemetry, and newer CloudWatch agent versions support sub-minute GPU monitoring on EKS.

What remains missing is direct access to etcd disk-level telemetry. You do not get etcd_disk_wal_fsync_duration_seconds or raw etcd peer round-trip times from CloudWatch. You infer etcd health from the API server metrics that AWS surfaces, or from elevated API server latency on the worker side. If you run a Prometheus-compatible stack inside the cluster, you scrape kubelets and node-level exporters, but you still need CloudWatch for the control plane.

A common silent failure on EKS is metrics-server returning no data because the worker node security group blocks inbound TCP on port 10250. EKS requires this path for metrics-server to reach kubelet metrics. Note that metrics-server v0.7.0 and later changed the default container port from 4443 to 10250 to align with existing AWS security group conventions, which simplifies network policy but means older firewall assumptions may break during upgrades.

Google GKE

GKE Standard clusters expose Kubernetes control plane metrics (API server, scheduler, controller manager) through the Observability tab and Cloud Monitoring. GKE Autopilot, however, does not expose these metrics at all. Autopilot clusters come with Cloud Logging, Cloud Monitoring, and Google Cloud Managed Service for Prometheus enabled by default, but the control plane itself remains opaque. On Standard, you get the telemetry; on Autopilot, you get node and workload telemetry only, and must infer control plane health from client-side latency and error rates.

GKE disables the kubelet read-only port 10255 by default. Any workload that relies on unauthenticated kubelet access for node filesystem statistics or legacy metrics must be updated to use port 10250 with proper authentication. Google also enforces strict RBAC around control plane interactions, which means standard kube-state-metrics ClusterRole manifests may not apply cleanly. You either use Google’s managed kube-state-metrics offering or maintain modified RBAC with narrower permissions.

The monitoring split between Standard and Autopilot is sharp. If your organization runs both, you need two runbooks: one that checks Cloud Monitoring for control plane signals on Standard, and one that treats the control plane as a black box on Autopilot.

Azure AKS

AKS surfaces control plane metrics (API server CPU and memory, etcd utilization, scheduler health) through Azure Monitor Managed Prometheus. This is not enabled automatically in the default Container Insights experience. Without configuring the managed Prometheus addon, and optionally Azure Managed Grafana, control plane visibility is effectively absent. Diagnostic settings on the AKS resource can route control plane logs to Log Analytics, but that is a logging path, not a metrics path.

A critical transition to note is the retirement of AKS Container Insights custom metrics, which reached end-of-life on May 31, 2024. Teams that relied on that preview feature must migrate to Azure Monitor Managed Prometheus or Azure Managed Grafana to retain metric visibility. On the worker nodes, standard kubelet and container runtime metrics are available, but the control plane telemetry is strictly opt-in.

If you are sizing an AKS monitoring deployment, budget for the managed Prometheus addon and verify that your alerting stack can consume data from Azure Monitor, because the control plane will not serve Prometheus scrape targets directly.

k3s

k3s collapses the API server, etcd, kube-controller-manager, kube-scheduler, kubelet, kube-proxy, and CNI into a single process. Because Kubernetes uses one Prometheus metrics registry per process, all component metrics are multiplexed onto every exposed metrics endpoint. Scraping the kubelet /metrics path, the API server /metrics path, and any other component endpoint from the same node will yield overlapping series. Operators must deduplicate in their time-series database or scrape only one endpoint per node to avoid duplicate data.

k3s uses embedded SQLite by default on single-node installations, and embedded etcd when running in multi-node HA mode. Both backends expose etcd metrics, but the SQLite datastore does not produce WAL or compaction metrics that etcd-heavy operators rely on for storage health. If you need full etcd disk and revision telemetry, run HA embedded etcd.

By default, k3s does not expose etcd metrics externally. You must set --etcd-expose-metrics=true and configure etcd-arg: ["listen-metrics-urls=http://0.0.0.0:2381"] in the k3s config to make etcd metrics scrapable from outside the node. Without this step, etcd disk latency and WAL metrics are invisible, and you will diagnose storage issues only after they surface as API server latency spikes.

rke2

rke2 launches the control plane (kube-apiserver, etcd, kube-controller-manager, kube-scheduler) as static pods managed by the kubelet, rather than embedding them in a single process like k3s. This means metrics are not multiplexed and duplicated; you can scrape each component endpoint independently as you would on a kubeadm cluster. However, rke2 adds a supervisor layer that also exposes metrics. When supervisor-metrics: true is set in the rke2 config, the supervisor serves metrics on port 9345 via /metrics. These include certificate expiration, load balancer health, etcd snapshot timing, and lasso controller metrics.

What is missing by default is supervisor telemetry. The setting is opt-in, so many clusters run without visibility into the supervisor layer until a certificate or snapshot issue appears. rke2 also bundles Traefik as its ingress controller, which exposes its own metrics endpoint. If you rely on the ingress layer for traffic health, you need to scrape Traefik separately.

Because control plane components are static pods, their logs and metrics behave like standard Kubernetes workloads: logs go through the container runtime, and metrics are reachable via the node network or host networking. This is more familiar to operators coming from kubeadm, but it requires monitoring the supervisor endpoint explicitly to cover the Rancher-specific management plane.

Comparison summary

Variant	Control plane visibility	etcd disk metrics	Key caveat
Vanilla / kubeadm	Full direct access	Direct on port 2379	You build and secure all collection layers
Amazon EKS	CloudWatch (1.28+), opaque before	Indirect only	Worker SG must allow 10250 for metrics-server
Google GKE	Standard: Cloud Monitoring; Autopilot: none	Indirect on Standard, none on Autopilot	10255 disabled by default; RBAC strict for kube-state-metrics
Azure AKS	Azure Monitor Managed Prometheus (addon)	Indirect via managed Prometheus	Default Container Insights omits control plane; custom metrics retired May 2024
k3s	Multiplexed single-process endpoints	Requires `--etcd-expose-metrics=true`; SQLite lacks WAL metrics	Scraping multiple endpoints produces duplicate series
rke2	Static pods, full component access	Direct on port 2379	Supervisor metrics on 9345 require explicit opt-in

How Netdata helps

Netdata agents run on every node and can collect kubelet metrics, cAdvisor container metrics, and node-level resource usage regardless of which distribution you use. For managed services, this gives you a consistent node-level baseline while you supplement with cloud provider control plane feeds. For k3s, Netdata can be configured to scrape a single endpoint per node to avoid the duplicate-series problem caused by single-process metric multiplexing. For rke2, Netdata can reach supervisor metrics on port 9345 alongside standard kubelet and container runtime metrics. Netdata also tracks kubelet PLEG relist latency, certificate TTL, and disk latency on the node, which are leading indicators of trouble even when the control plane is opaque.

How the Kubernetes control plane works: a mental model for operators: /guides/kubernetes/how-kubernetes-control-plane-works/
Kubernetes anonymous API access: detection, audit, and lockdown: /guides/kubernetes/kubernetes-anonymous-access-detection/
Kubernetes API server audit logging: policy, backends, and forensics: /guides/kubernetes/kubernetes-api-server-audit-logging/
Kubernetes API server certificate rotation: detection and grace handling: /guides/kubernetes/kubernetes-api-server-certificate-rotation/
Kubernetes API server etcd latency: detection and cascading failures: /guides/kubernetes/kubernetes-api-server-etcd-latency/
Kubernetes API server FlowSchemas and PriorityLevels: design and tuning: /guides/kubernetes/kubernetes-api-server-flow-schemas/
Kubernetes API server memory pressure: OOM cycle and tuning: /guides/kubernetes/kubernetes-api-server-memory-pressure/
Kubernetes API server rate limiting: APF priority levels and starvation: /guides/kubernetes/kubernetes-api-server-rate-limited/
Kubernetes API server slow or unresponsive: causes and fixes: /guides/kubernetes/kubernetes-api-server-slow/
Kubernetes API server watch storm: re-list cascades and connection floods: /guides/kubernetes/kubernetes-api-server-watch-storm/
Kubernetes bound service account tokens: rotation, audience, and expiry: /guides/kubernetes/kubernetes-bound-service-account-tokens/
Kubernetes conntrack exhaustion: dropped connections under load: /guides/kubernetes/kubernetes-conntrack-exhaustion/

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Monitoring Kubernetes across vanilla, EKS, GKE, AKS, k3s, and rke2

Monitoring Kubernetes across vanilla, EKS, GKE, AKS, k3s, and rke2

Vanilla and kubeadm

Amazon EKS

Google GKE

Azure AKS

k3s

rke2

Comparison summary

How Netdata helps

Related guides

Kubernetes monitoring with Netdata