View a markdown version of this page

Amazon SageMaker AI enhanced metrics for inference endpoints - Amazon SageMaker AI

Amazon SageMaker AI enhanced metrics for inference endpoints

Enhanced metrics provide instance-level and container-level monitoring data for Amazon SageMaker AI real-time endpoints. When you enable enhanced metrics, Amazon CloudWatch metrics can include InstanceId, ContainerId, and AcceleratorId dimensions (availability varies by namespace) for granular per-instance, per-container, and per-GPU visibility. Enhanced metrics are available for single-model endpoints and inference components. Multi-Container Endpoints (MCE) support instance-level enhanced metrics but not container-level metrics.

Key characteristics of enhanced metrics:

  • Instance-level granularity. Utilization and invocation metrics include an InstanceId dimension that identifies the specific instance hosting the endpoint. This is available for all real-time endpoints.

  • Container-level granularity. For endpoints that use inference components, metrics include a ContainerId dimension that identifies the specific container running the model. Container-level dimensions appear in both the AWS/SageMaker namespace (invocation metrics) and the /aws/sagemaker/InferenceComponents namespace (utilization metrics).

  • Per-GPU granularity. GPU utilization metrics include an AcceleratorId dimension that identifies the specific GPU on an instance.

  • Configurable publishing frequency. You can configure the metric publishing interval to 10, 30, 60, 120, 180, 240, or 300 seconds. The default is 60 seconds. This interval applies to utilization metrics regardless of whether enhanced metrics is enabled. With enhanced metrics enabled, it also applies to invocation metrics.

Enabling enhanced metrics

You enable enhanced metrics by setting EnableEnhancedMetrics to True in the MetricsConfig parameter when you call the CreateEndpointConfig API.

The MetricsConfig parameter has the following fields:

MetricsConfig parameters
Parameter Type Required Default Description
EnableEnhancedMetrics Boolean No False

Enables instance-level and container-level metric dimensions.

MetricPublishFrequencyInSeconds Integer No 60

The interval, in seconds, at which metrics are published to Amazon CloudWatch. Defaults to 60. Valid values: 10, 30, 60, 120, 180, 240, 300. When EnableEnhancedMetrics is set to False, this interval applies to utilization metrics only; invocation metrics continue to be published at the default 60-second interval. When set to True, this interval applies to both utilization and invocation metrics.

Note

MetricsConfig is set at the endpoint configuration level. You cannot configure different settings for individual inference components on the same endpoint.

To enable enhanced metrics on an existing endpoint, create a new endpoint configuration with the desired MetricsConfig settings, and then call UpdateEndpoint with the new endpoint configuration name. This triggers a blue/green or rolling deployment. Enhanced metrics do not appear until the deployment completes. The same process applies when changing MetricsConfig settings on an already-configured endpoint.

When you configure MetricsConfig, both DescribeEndpoint and DescribeEndpointConfig return MetricsConfig in the response.

When you enable enhanced metrics, SageMaker AI adds additional dimensions to metrics across three CloudWatch namespaces: /aws/sagemaker/Endpoints for utilization metrics, AWS/SageMaker for invocation metrics, and /aws/sagemaker/InferenceComponents for inference component utilization metrics.

Instance-level utilization metrics

The /aws/sagemaker/Endpoints namespace includes utilization metrics for all real-time endpoints, including those that use inference components. When you enable enhanced metrics, the InstanceId and AcceleratorId (GPU metrics only) dimensions become available alongside the existing namespace dimensions. For a complete list of metrics and dimensions, see SageMaker AI endpoint metrics.

When you enable enhanced metrics, the following additional dimensions are available:

Additional dimensions for instance-level utilization metrics
Dimension Description
InstanceId

Filters utilization metrics for a specific instance.

AcceleratorId

(GPU metrics only) Filters utilization metrics for a specific GPU.

Instance and container-level invocation metrics

The AWS/SageMaker namespace includes invocation metrics. When you enable enhanced metrics, the InstanceId and ContainerId (inference components only) dimensions become available alongside the existing namespace dimensions. For a complete list of metrics and dimensions, see SageMaker AI endpoint invocation metrics.

When you enable enhanced metrics, the following additional dimensions are available:

Additional dimensions for invocation metrics
Dimension Description
InstanceId

Filters invocation metrics for a specific instance.

ContainerId

(Inference components only) Filters invocation metrics for a specific container.

Container-level utilization metrics

The /aws/sagemaker/InferenceComponents namespace includes utilization metrics for endpoints that use inference components. When you enable enhanced metrics, the InstanceId, ContainerId, and AcceleratorId (GPU metrics only) dimensions become available alongside the existing namespace dimensions. For a complete list of metrics and dimensions, see SageMaker AI inference component metrics.

When you enable enhanced metrics, the following additional dimensions are available:

Additional dimensions for container-level utilization metrics
Dimension Description
InstanceId

Filters utilization metrics for a specific instance.

ContainerId

Filters utilization metrics for a specific container.

AcceleratorId

(GPU metrics only) Filters utilization metrics for a specific GPU.

Configurable metric frequency

You can configure the interval at which metrics are published to CloudWatch. The default frequency is 60 seconds.

Valid values: 10, 30, 60, 120, 180, 240, or 300 seconds.

When EnableEnhancedMetrics is set to False, this frequency applies to utilization metrics only; invocation metrics continue to be published at the default 60-second interval. When set to True, this frequency applies to both utilization and invocation metrics.

Note

Metrics published at intervals less than 60 seconds (high-resolution) are retained for 3 hours.

Standard CloudWatch pricing applies per metric per unique dimension combination. Enhanced metrics increase the number of metric streams because each instance, container, and GPU creates additional dimension combinations. For pricing details, see Amazon CloudWatch pricing.

Code examples: configure enhanced metrics

The following examples show how to create an endpoint configuration with enhanced metrics enabled and how to verify the configuration.

Create an endpoint configuration with enhanced metrics

AWS SDK for Python (Boto3)
Example Create an endpoint configuration with enhanced metrics
import boto3 sagemaker_client = boto3.client('sagemaker') response = sagemaker_client.create_endpoint_config( EndpointConfigName='my-enhanced-metrics-config', ProductionVariants=[ { 'VariantName': 'AllTraffic', 'ModelName': 'my-model', 'InstanceType': 'ml.m5.xlarge', 'InitialInstanceCount': 2, } ], MetricsConfig={ 'EnableEnhancedMetrics': True, 'MetricPublishFrequencyInSeconds': 60 } )
AWS CLI
Example Create an endpoint configuration with enhanced metrics
aws sagemaker create-endpoint-config \ --endpoint-config-name my-enhanced-metrics-config \ --production-variants file://production-variants.json \ --metrics-config file://metrics-config.json

Where metrics-config.json contains:

{ "EnableEnhancedMetrics": true, "MetricPublishFrequencyInSeconds": 60 }

Verify enhanced metrics configuration

AWS SDK for Python (Boto3)
Example Verify enhanced metrics configuration
response = sagemaker_client.describe_endpoint_config( EndpointConfigName='my-enhanced-metrics-config' ) print(response['MetricsConfig']) # {'EnableEnhancedMetrics': True, 'MetricPublishFrequencyInSeconds': 60} response = sagemaker_client.describe_endpoint( EndpointName='my-endpoint' ) print(response['MetricsConfig'])
AWS CLI
Example Verify enhanced metrics configuration
aws sagemaker describe-endpoint-config \ --endpoint-config-name my-enhanced-metrics-config \ --query 'MetricsConfig'
aws sagemaker describe-endpoint \ --endpoint-name my-endpoint \ --query 'MetricsConfig'