Multi-container endpoints

SageMaker AI multi-container endpoints enable customers to deploy multiple containers, that use different models or frameworks, on a single SageMaker AI endpoint. The containers can be run in a sequence as an inference pipeline, or each container can be accessed individually by using direct invocation to improve endpoint utilization and optimize costs.

For information about invoking the containers in a multi-container endpoint in sequence, see Inference pipelines in Amazon SageMaker AI.

For information about invoking a specific container in a multi-container endpoint, see Invoke a multi-container endpoint with direct invocation

Topics

The following policy allows invoke_endpoint requests only when the value of the TargetContainerHostname field matches one of the specified regular expressions.

The following policy denies invoke_endpoint requests when the value of the TargetContainerHostname field matches one of the specified regular expressions in the Deny statement.

For information about SageMaker AI condition keys, see Condition Keys for SageMaker AI in the AWS Identity and Access Management User Guide.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Set Auto Scaling Policies for Multi-Model Endpoint Deployments

Create a multi-container endpoint (Boto 3)