ProductionVariantManagedInstanceScalingScaleInPolicy - Amazon SageMaker

ProductionVariantManagedInstanceScalingScaleInPolicy

Configures the scale-in behavior for managed instance scaling.

Contents

Strategy

The strategy for scaling in instances.

IDLE_RELEASE

Releases instances that have no hosted inference component copies.

CONSOLIDATION

Consolidates inference component copies onto fewer instances to release more instances. Consolidation honors the scheduling configuration of each inference component. For example, if an inference component specifies Availability Zone balance, consolidation only proceeds when the resulting distribution does not increase the imbalance.

Type: String

Valid Values: IDLE_RELEASE | CONSOLIDATION

Required: Yes

CooldownInMinutes

The cooldown period, in minutes, after the last endpoint operation before the endpoint evaluates consolidation scale-in opportunities.

Default value: 20.

Type: Integer

Valid Range: Minimum value of 5. Maximum value of 1440.

Required: No

MaximumStepSize

The maximum number of instances that the endpoint can terminate at a time during a consolidation scale-in operation.

Default value: 1.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 100.

Required: No

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: