

# Amazon EKS and Kubernetes Container Insights metrics
<a name="Container-Insights-metrics-EKS"></a>

The following tables list the metrics and dimensions that Container Insights collects for Amazon EKS and Kubernetes. These metrics are in the `ContainerInsights` namespace. For more information, see [Metrics](cloudwatch_concepts.md#Metric).

If you do not see any Container Insights metrics in your console, be sure that you have completed the setup of Container Insights. Metrics do not appear before Container Insights has been set up completely. For more information, see [Setting up Container Insights](deploy-container-insights.md).


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `cluster_failed_node_count`  |  `ClusterName`  |  The number of failed worker nodes in the cluster. A node is considered failed if it is suffering from any *node conditions*. For more information, see [Conditions](https://kubernetes.io/docs/concepts/architecture/nodes/#condition) in the Kubernetes documentation.  | 
|  `cluster_node_count`  |  `ClusterName`  |  The total number of worker nodes in the cluster.  | 
|  `namespace_number_of_running_pods`  |  `Namespace` `ClusterName` `ClusterName`  |  The number of pods running per namespace in the resource that is specified by the dimensions that you're using.  | 
|  `node_cpu_limit`  |  `ClusterName`   |  The maximum number of CPU units that can be assigned to a single node in this cluster.  | 
|  `node_cpu_reserved_capacity`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of CPU units that are reserved for node components, such as kubelet, kube-proxy, and Docker. Formula: `node_cpu_request / node_cpu_limit`  `node_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_cpu_usage_total`  |  `ClusterName`  |  The number of CPU units being used on the nodes in the cluster.  | 
|  `node_cpu_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total percentage of CPU units being used on the nodes in the cluster. Formula: `node_cpu_usage_total / node_cpu_limit`  | 
|  `node_gpu_limit` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The total number of GPU(s) available on the node.  | 
|  `node_gpu_usage_total` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The number of GPU(s) being used by the running pods on the node.  | 
|  `node_gpu_reserved_capacity` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The percentage of GPU currently being reserved on the node. The formula is, `node_gpu_request / node_gpu_limit`.  `node_gpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).   | 
|  `node_filesystem_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total percentage of file system capacity being used on nodes in the cluster. Formula: `node_filesystem_usage / node_filesystem_capacity`  `node_filesystem_usage` and `node_filesystem_capacity` are not reported directly as metrics, but are fields in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_memory_limit`  |  `ClusterName`  |  The maximum amount of memory, in bytes, that can be assigned to a single node in this cluster.  | 
|  `node_memory_reserved_capacity`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of memory currently being used on the nodes in the cluster. Formula: `node_memory_request / node_memory_limit`  `node_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_memory_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of memory currently being used by the node or nodes. It is the percentage of node memory usage divided by the node memory limitation. Formula: `node_memory_working_set / node_memory_limit`.   | 
|  `node_memory_working_set`  |  `ClusterName`   |  The amount of memory, in bytes, being used in the working set of the nodes in the cluster.  | 
|  `node_network_total_bytes`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total number of bytes per second transmitted and received over the network per node in a cluster. Formula: `node_network_rx_bytes + node_network_tx_bytes`  `node_network_rx_bytes` and `node_network_tx_bytes` are not reported directly as metrics, but are fields in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_number_of_running_containers`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The number of running containers per node in a cluster.  | 
|  `node_number_of_running_pods`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The number of running pods per node in a cluster.  | 
|  `pod_cpu_reserved_capacity`  |  `PodName`, `Namespace`, `ClusterName` `ClusterName`  |  The CPU capacity that is reserved per pod in a cluster. Formula: `pod_cpu_request / node_cpu_limit`  `pod_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_cpu_utilization`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of CPU units being used by pods. Formula: `pod_cpu_usage_total / node_cpu_limit`  | 
|  `pod_cpu_utilization_over_pod_limit`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of CPU units being used by pods relative to the pod limit. Formula: `pod_cpu_usage_total / pod_cpu_limit`  | 
|  `pod_gpu_request` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`  |  The GPU requests for the pod. This value must always be equal to `pod_gpu_limit`.  | 
|  `pod_gpu_limit` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`  |  The maximum number of GPU(s) that can be assigned to the pod in a node.  | 
|  `pod_gpu_usage_total` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`  |  The number of GPU(s) being allocated on the pod.  | 
|  `pod_gpu_reserved_capacity` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`  |  The percentage of GPU currently being reserved for the pod. The formula is - pod\$1gpu\$1request / node\$1gpu\$1reserved\$1capacity.  | 
|  `pod_memory_reserved_capacity`  |  `PodName`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of memory that is reserved for pods. Formula: `pod_memory_request / node_memory_limit`  `pod_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_memory_utilization`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of memory currently being used by the pod or pods. Formula: `pod_memory_working_set / node_memory_limit`  | 
|  `pod_memory_utilization_over_pod_limit`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of memory that is being used by pods relative to the pod limit. If any containers in the pod don't have a memory limit defined, this metric doesn't appear. Formula: `pod_memory_working_set / pod_memory_limit`  | 
|  `pod_network_rx_bytes`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The number of bytes per second being received over the network by the pod. Formula: `sum(pod_interface_network_rx_bytes)`  `pod_interface_network_rx_bytes` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_network_tx_bytes`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The number of bytes per second being transmitted over the network by the pod. Formula: `sum(pod_interface_network_tx_bytes)`  `pod_interface_network_tx_bytes` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_number_of_container_restarts`  |  `PodName`, `Namespace`, `ClusterName`  |  The total number of container restarts in a pod.  | 
|  `service_number_of_running_pods`  |  `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The number of pods running the service or services in the cluster.  | 

## Kueue metrics
<a name="Container-Insights-metrics-Kueue"></a>

Beginning with version `v2.4.0-eksbuild.1` of the the CloudWatch Observability EKS add-on, Container Insights for Amazon EKS supports collecting Kueue metrics from Amazon EKS clusters. For more information about the add-on, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md).

For information about enabling the metrics, see [Enable Kueue metrics](install-CloudWatch-Observability-EKS-addon.md#enable-Kueue-metrics) to enable them.

The Kueue metrics that are collected are listed in the following table. These metrics are published into the `ContainerInsights/Prometheus` namespace in CloudWatch. Some of these metrics use the following dimensions:
+ `ClusterQueue` is the name of the ClusterQueue
+ The possible values of `Status` are `active` and `inadmissible`
+ The possible values of `Reason` are `Preempted`, `PodsReadyTimeout`, `AdmissionCheck`, `ClusterQueueStopped`, and `InactiveWorkload`
+ `Flavor` is the referenced flavor.
+ `Resource` refers to cluster computer resources, such as `cpu`, `memory`, `gpu`, and so on. 


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `kueue_pending_workloads` |  `ClusterName`, `ClusterQueue`, `Status` `ClusterName`, `ClusterQueue` `ClusterName`, `Status` `ClusterName`  |  The number of pending workloads.  | 
|  `kueue_evicted_workloads_total` |  `ClusterName`, `ClusterQueue`, `Reason` `ClusterName`, `ClusterQueue` `ClusterName`, `Reason` `ClusterName`  |  The total number of evicted workloads.  | 
|  `kueue_admitted_active_workloads` |  `ClusterName`, `ClusterQueue` `ClusterName`  |  The number of admitted workloads that are active (unsuspended and not finished).  | 
|  `kueue_cluster_queue_resource_usage` |  `ClusterName`, `ClusterQueue`, `Resource`, `Flavor` `ClusterName`, `ClusterQueue`, `Resource` `ClusterName`, `ClusterQueue`, `Flavor` `ClusterName`, `ClusterQueue` `ClusterName`  |  Reports the total resource usage of the ClusterQueue.  | 
|  `kueue_cluster_queue_nominal_quota` |  `ClusterName`, `ClusterQueue`, `Resource`, `Flavor` `ClusterName`, `ClusterQueue`, `Resource` `ClusterName`, `ClusterQueue`, `Flavor` `ClusterName`, `ClusterQueue` `ClusterName`  |  Reports the resource quota of the ClusterQueue.  | 