

# Metrics for multi-container endpoints with direct invocation
<a name="multi-container-metrics"></a>

In addition to the endpoint metrics that are listed in [Amazon SageMaker AI metrics in Amazon CloudWatch](monitoring-cloudwatch.md), SageMaker AI also provides per-container metrics.

Per-container metrics for multi-container endpoints with direct invocation are located in CloudWatch and categorized into two namespaces: `AWS/SageMaker` and `aws/sagemaker/Endpoints`. The `AWS/SageMaker` namespace includes invocation-related metrics, and the `aws/sagemaker/Endpoints` namespace includes memory and CPU utilization metrics.

The following table lists the per-container metrics for multi-container endpoints with direct invocation. All the metrics use the [`EndpointName, VariantName, ContainerName`] dimension, which filters metrics at a specific endpoint, for a specific variant and corresponding to a specific container. These metrics share the same metric names as in those for inference pipelines, but at a per-container level [`EndpointName, VariantName, ContainerName`].

 


|  |  |  |  | 
| --- |--- |--- |--- |
|  Metric Name  |  Description  |  Dimension  |  NameSpace  | 
|  Invocations  |  The number of InvokeEndpoint requests sent to a container inside an endpoint. To get the total number of requests sent to that container, use the Sum statistic. Units: None Valid statistics: Sum, Sample Count |  EndpointName, VariantName, ContainerName  | AWS/SageMaker | 
|  Invocation4XX Errors  |  The number of InvokeEndpoint requests that the model returned a 4xx HTTP response code for on a specific container. For each 4xx response, SageMaker AI sends a 1. Units: None Valid statistics: Average, Sum  |  EndpointName, VariantName, ContainerName  | AWS/SageMaker | 
|  Invocation5XX Errors  |  The number of InvokeEndpoint requests that the model returned a 5xx HTTP response code for on a specific container. For each 5xx response, SageMaker AI sends a 1. Units: None Valid statistics: Average, Sum  |  EndpointName, VariantName, ContainerName  | AWS/SageMaker | 
|  ContainerLatency  |  The time it took for the target container to respond as viewed from SageMaker AI. ContainerLatency includes the time it took to send the request, to fetch the response from the model's container, and to complete inference in the container. Units: Microseconds Valid statistics: Average, Sum, Min, Max, Sample Count |  EndpointName, VariantName, ContainerName  | AWS/SageMaker | 
|  OverheadLatency  |  The time added to the time taken to respond to a client request by SageMaker AI for overhead. OverheadLatency is measured from the time that SageMaker AI receives the request until it returns a response to the client, minus theModelLatency. Overhead latency can vary depending on request and response payload sizes, request frequency, and authentication or authorization of the request, among other factors. Units: Microseconds Valid statistics: Average, Sum, Min, Max, `Sample Count `  |  EndpointName, VariantName, ContainerName  | AWS/SageMaker | 
|  CPUUtilization  | The percentage of CPU units that are used by each container running on an instance. The value ranges from 0% to 100%, and is multiplied by the number of CPUs. For example, if there are four CPUs, CPUUtilization can range from 0% to 400%. For endpoints with direct invocation, the number of CPUUtilization metrics equals the number of containers in that endpoint. Units: Percent  |  EndpointName, VariantName, ContainerName  | aws/sagemaker/Endpoints | 
|  MemoryUtilizaton  |  The percentage of memory that is used by each container running on an instance. This value ranges from 0% to 100%. Similar as CPUUtilization, in endpoints with direct invocation, the number of MemoryUtilization metrics equals the number of containers in that endpoint. Units: Percent  |  EndpointName, VariantName, ContainerName  | aws/sagemaker/Endpoints | 

All the metrics in the previous table are specific to multi-container endpoints with direct invocation. Besides these special per-container metrics, there are also metrics at the variant level with dimension `[EndpointName, VariantName]` for all the metrics in the table expect `ContainerLatency`.