Automatic scaling of Amazon SageMaker AI models
Amazon SageMaker AI supports automatic scaling (auto scaling) for your hosted models. Auto
scaling dynamically adjusts the number of instances provisioned for a model
in response to changes in your workload. When the workload increases, auto scaling brings
more instances online. When the workload decreases, auto scaling removes unnecessary
instances so that you don't pay for provisioned instances that you aren't using. For
more information about using per-instance metrics for scaling decisions, see Amazon SageMaker AI enhanced metrics for inference endpoints and Enhanced metrics for Amazon SageMaker AI endpoints