Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. It is extremely important to monitor the state of the Kubernetes Cluster to ensure all the applications are running as expected.
The prerequisites for monitoring Kubernetes with Netdata are to have an online Kubernetes Cluster and Netdata installed on your system.
Netdata auto discovers hundreds of services, and for those it doesn’t turning on manual discovery is a one line configuration. For more information on configuring Netdata for Kubernetes Cluster State monitoring please read the collector documentation.
You should now see the Kubernetes State section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.
Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.
This metric indicates the percentage of CPU requests that have been used on a node. It is calculated by dividing the total CPU requests used on the node by the total CPU requests that are allocatable for the node. This metric is important for understanding if the node has enough resources to meet the demands of its workloads. If the utilization is too high, it could lead to performance issues or even failures.
This metric indicates the amount of CPU requests that have been used on a node. It is measured in millicpu and can be used to determine the amount of CPU resources that have been allocated to the various workloads running on the node. Monitoring this metric can help identify if there is an imbalance of resources being allocated, which could cause performance issues.
This metric indicates the percentage of CPU limits that have been used on a node. It is calculated by dividing the total CPU limits used on the node by the total CPU limits that are allocatable for the node. This metric is important for understanding if the node has enough resources to meet the demands of its workloads. If the utilization is too high, it could lead to performance issues or even failures.
This metric indicates the amount of CPU limits that have been used on a node. It is measured in millicpu and can be used to determine the amount of CPU resources that have been allocated to the various workloads running on the node. Monitoring this metric can help identify if there is an imbalance of resources being allocated, which could cause performance issues.
This metric indicates the percentage of memory requests that have been used on a node. It is calculated by dividing the total memory requests used on the node by the total memory requests that are allocatable for the node. This metric is important for understanding if the node has enough resources to meet the demands of its workloads. If the utilization is too high, it could lead to performance issues or even failures.
This metric indicates the amount of memory requests that have been used on a node. It is measured in bytes and can be used to determine the amount of memory resources that have been allocated to the various workloads running on the node. Monitoring this metric can help identify if there is an imbalance of resources being allocated, which could cause performance issues.
This metric indicates the percentage of memory limits that have been used on a node. It is calculated by dividing the total memory limits used on the node by the total memory limits that are allocatable for the node. This metric is important for understanding if the node has enough resources to meet the demands of its workloads. If the utilization is too high, it could lead to performance issues or even failures.
This metric indicates the amount of memory limits that have been used on a node. It is measured in bytes and can be used to determine the amount of memory resources that have been allocated to the various workloads running on the node. Monitoring this metric can help identify if there is an imbalance of resources being allocated, which could cause performance issues.
This metric indicates the percentage of pods that have been allocated on a node. It is calculated by dividing the total number of allocated pods on the node by the total number of pods that are allocatable for the node. This metric is important for understanding if the node has enough resources to meet the demands of its workloads. If the utilization is too high, it could lead to performance issues or even failures.
This metric indicates the number of pods that have been allocated on the node, as well as the number of pods that are available for allocation. It is measured in pods, and can be used to determine the number of workloads running on the node and if the node has enough resources to satisfy the demands of its workloads. Monitoring this metric can help identify any imbalances in resource allocation, which could cause performance issues.
This metric indicates the current condition of the node. It is a dynamic metric and can take on different values based on the current state of the node. Monitoring this metric can help detect any changes in the node’s condition, which could lead to performance issues or even failure.
This metric indicates whether or not a node is schedulable. It is measured in a state of either “schedulable” or “unschedulable” and can be used to determine if the node is able to accept any new workloads. Monitoring this metric can help identify if a node is overworked or underutilized.
This metric indicates the percentage of pods that are ready on a node. It is calculated by dividing the number of ready pods on the node by the total number of pods on the node. This metric is important for understanding if the node has enough resources to meet the demands of its workloads. If the readiness is too low, it could lead to performance issues or even failures.
This metric indicates the number of ready and unready pods on the node. It is measured in pods, and can be used to determine the number of workloads that are ready to run on the node and if the node has enough resources to satisfy the demands of its workloads. Monitoring this metric can help identify any imbalances in resource allocation, which could cause performance issues.
This metric indicates the current condition of the pods on the node. It is a dynamic metric and can take on different values based on the current state of the pods. Monitoring this metric can help detect any changes in the pods' condition, which could lead to performance issues or even failure.
This metric indicates the phase of the pods on the node. It is measured in a state of either “running”, “failed”, “succeeded”, or “pending” and can be used to determine the status of the pods running on the node. Monitoring this metric can help identify any issues with the pods on the node, which could lead to performance issues or even failure.
This metric indicates the number of containers and init containers running on the node. It is measured in containers and can be used to determine the number of workloads running on the node. Monitoring this metric can help identify any imbalances in resource allocation, which could cause performance issues.
This metric indicates the state of the containers on the node. It is measured in either “running”, “waiting”, or “terminated” and can be used to determine the status of the containers running on the node. Monitoring this metric can help identify any issues with the containers on the node, which could lead to performance issues or even failure.
This metric indicates the state of the init containers on the node. It is measured in either “running”, “waiting”, or “terminated” and can be used to determine the status of the init containers running on the node. Monitoring this metric can help identify any issues with the init containers on the node, which could lead to performance issues or even failure.
This metric indicates the age of the node. It is measured in seconds and can be used to determine the age of the node and if the node needs to be replaced or upgraded. Monitoring this metric can help identify if the node is outdated or not, which could lead to performance issues or even failure.
The amount of CPU requested by all containers within a pod. This value is the sum of the CPU requests for each container in the pod. Monitoring the amount of CPU requested by a pod allows for capacity planning, understanding of resource utilization, and workload optimization. If the CPU requests are consistently higher than the actual usage, there is an opportunity to reduce costs by reducing the requested resources.
The amount of CPU allocated to all containers within a pod. This value is the sum of the CPU limits for each container in the pod. Monitoring the amount of CPU limits used by a pod allows for capacity planning, understanding of resource utilization, and workload optimization. If the CPU limits are consistently higher than the actual usage, there is an opportunity to reduce costs by reducing the requested resources.
The amount of Memory requested by all containers within a pod. This value is the sum of the Memory requests for each container in the pod. Monitoring the amount of Memory requested by a pod allows for capacity planning, understanding of resource utilization, and workload optimization. If the Memory requests are consistently higher than the actual usage, there is an opportunity to reduce costs by reducing the requested resources.
The amount of Memory allocated to all containers within a pod. This value is the sum of the Memory limits for each container in the pod. Monitoring the amount of Memory limits used by a pod allows for capacity planning, understanding of resource utilization, and workload optimization. If the Memory limits are consistently higher than the actual usage, there is an opportunity to reduce costs by reducing the requested resources.
The status of a pod. This value is the sum of the conditions of each container in the pod. Monitoring the condition of a pod allows for the identification of potential issues such as containers not being ready, containers not being scheduled, containers not being initialized, or containers not being ready. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The phase of a pod. This value is the sum of the phases of each container in the pod. Monitoring the phase of a pod allows for the identification of potential issues such as containers being in a running phase, failed phase, succeeded phase, or pending phase. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The age of a pod. This value is the sum of the age of each container in the pod. Monitoring the age of a pod allows for the identification of potential issues such as containers being too old or too young. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The number of containers in a pod. This value is the sum of the containers in each container in the pod. Monitoring the number of containers in a pod allows for the identification of potential issues such as not having enough containers or having too many containers. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The state of containers in a pod. This value is the sum of the states of each container in the pod. Monitoring the state of containers in a pod allows for the identification of potential issues such as containers being in a running state, waiting state, or terminated state. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The state of init containers in a pod. This value is the sum of the states of each init container in the pod. Monitoring the state of init containers in a pod allows for the identification of potential issues such as init containers being in a running state, waiting state, or terminated state. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The readiness state of a container in a pod. This value is the sum of the readiness states of each container in the pod. Monitoring the readiness state of a container in a pod allows for the identification of potential issues such as containers not being ready. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The number of restarts of a container in a pod. This value is the sum of the restarts of each container in the pod. Monitoring the number of restarts of a container in a pod allows for the identification of potential issues such as containers restarting too frequently. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The state of a container in a pod. This value is the sum of the states of each container in the pod. Monitoring the state of a container in a pod allows for the identification of potential issues such as containers being in a running state, waiting state, or terminated state. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues. ![Pod Container State](image" src=“https://user-images.githubusercontent.com/96257330/215512422-639f434c-8265-4dbe-8046-704a8977188c.png)
The reason for container being in a waiting state in a pod. This value is the sum of the reasons for each container being in a waiting state in the pod. Monitoring the reason for container being in a waiting state in a pod allows for the identification of potential issues such as containers waiting for resources or containers waiting for an image pull. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
The reason for container being in a terminated state in a pod. This value is the sum of the reasons for each container being in a terminated state in the pod. Monitoring the reason for container being in a terminated state in a pod allows for the identification of potential issues such as containers being terminated due to an error or containers being terminated due to an out of memory error. By monitoring this metric, potential issues can be identified and addressed before they cause any outages or performance issues.
This metric collects the running discovers state