Amazon SageMaker AI enhanced metrics for inference endpoints
Enhanced metrics provide instance-level and container-level monitoring data for Amazon SageMaker AI
real-time endpoints. When you enable enhanced metrics, Amazon CloudWatch metrics can include
InstanceId, ContainerId, and AcceleratorId dimensions
(availability varies by namespace) for granular per-instance, per-container, and per-GPU
visibility. Enhanced metrics are available
for single-model endpoints and inference components. Multi-Container Endpoints (MCE) support
instance-level enhanced metrics but not container-level metrics.
Key characteristics of enhanced metrics:
-
Instance-level granularity. Utilization and invocation metrics include an
InstanceIddimension that identifies the specific instance hosting the endpoint. This is available for all real-time endpoints. -
Container-level granularity. For endpoints that use inference components, metrics include a
ContainerIddimension that identifies the specific container running the model. Container-level dimensions appear in both theAWS/SageMakernamespace (invocation metrics) and the/aws/sagemaker/InferenceComponentsnamespace (utilization metrics). -
Per-GPU granularity. GPU utilization metrics include an
AcceleratorIddimension that identifies the specific GPU on an instance. -
Configurable publishing frequency. You can configure the metric publishing interval to 10, 30, 60, 120, 180, 240, or 300 seconds. The default is 60 seconds. This interval applies to utilization metrics regardless of whether enhanced metrics is enabled. With enhanced metrics enabled, it also applies to invocation metrics.
Enabling enhanced metrics
You enable enhanced metrics by setting EnableEnhancedMetrics to
True in the MetricsConfig parameter when you
call the CreateEndpointConfig API.
The MetricsConfig parameter has the following fields:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
EnableEnhancedMetrics |
Boolean | No | False |
Enables instance-level and container-level metric dimensions. |
MetricPublishFrequencyInSeconds |
Integer | No | 60 |
The interval, in seconds, at which metrics are published to Amazon CloudWatch. Defaults
to |
Note
MetricsConfig is set at the endpoint configuration level. You cannot
configure different settings for individual inference components on the same
endpoint.
To enable enhanced metrics on an existing endpoint, create a new endpoint
configuration with the desired MetricsConfig settings, and then call UpdateEndpoint with the
new endpoint configuration name. This triggers a blue/green or rolling deployment.
Enhanced metrics do not appear until the deployment completes. The same process applies
when changing MetricsConfig settings on an already-configured endpoint.
When you configure MetricsConfig, both DescribeEndpoint and
DescribeEndpointConfig return MetricsConfig in the response.
When you enable enhanced metrics, SageMaker AI adds additional dimensions to metrics across three
CloudWatch namespaces: /aws/sagemaker/Endpoints for utilization metrics,
AWS/SageMaker for invocation metrics, and
/aws/sagemaker/InferenceComponents for inference component utilization
metrics.
Instance-level utilization metrics
The /aws/sagemaker/Endpoints namespace includes utilization
metrics for all real-time endpoints, including those that use inference components.
When you enable enhanced metrics, the InstanceId and
AcceleratorId (GPU metrics only) dimensions become available alongside the
existing namespace dimensions. For a complete list of metrics and
dimensions, see SageMaker AI endpoint metrics.
When you enable enhanced metrics, the following additional dimensions are available:
| Dimension | Description |
|---|---|
InstanceId |
Filters utilization metrics for a specific instance. |
AcceleratorId |
(GPU metrics only) Filters utilization metrics for a specific GPU. |
Instance and container-level invocation metrics
The AWS/SageMaker namespace includes invocation metrics.
When you enable enhanced metrics, the InstanceId and
ContainerId (inference components only) dimensions become available alongside
the existing namespace dimensions. For a complete list of metrics and
dimensions, see SageMaker AI endpoint invocation metrics.
When you enable enhanced metrics, the following additional dimensions are available:
| Dimension | Description |
|---|---|
InstanceId |
Filters invocation metrics for a specific instance. |
ContainerId |
(Inference components only) Filters invocation metrics for a specific container. |
Container-level utilization metrics
The /aws/sagemaker/InferenceComponents namespace includes
utilization metrics for endpoints that use inference components.
When you enable enhanced metrics, the InstanceId,
ContainerId, and AcceleratorId (GPU metrics only) dimensions become
available alongside the existing namespace dimensions. For a
complete list of metrics and dimensions, see SageMaker AI inference component metrics.
When you enable enhanced metrics, the following additional dimensions are available:
| Dimension | Description |
|---|---|
InstanceId |
Filters utilization metrics for a specific instance. |
ContainerId |
Filters utilization metrics for a specific container. |
AcceleratorId |
(GPU metrics only) Filters utilization metrics for a specific GPU. |
Configurable metric frequency
You can configure the interval at which metrics are published to CloudWatch. The default frequency is 60 seconds.
Valid values: 10, 30, 60, 120, 180, 240, or 300 seconds.
When EnableEnhancedMetrics is set to False, this frequency
applies to utilization metrics only; invocation metrics continue to be published at the
default 60-second interval. When set to True, this frequency applies
to both utilization and invocation metrics.
Note
Metrics published at intervals less than 60 seconds (high-resolution) are retained for 3 hours.
Standard CloudWatch pricing applies per metric per unique dimension combination. Enhanced
metrics increase the number of metric streams because each instance, container, and GPU creates
additional dimension combinations. For pricing details, see Amazon CloudWatch pricing
Code examples: configure enhanced metrics
The following examples show how to create an endpoint configuration with enhanced metrics enabled and how to verify the configuration.