ServerlessProductionVariantProps

class aws_cdk.aws_sagemaker_alpha.ServerlessProductionVariantProps(*, max_concurrency, memory_size_in_mb, model, variant_name, initial_variant_weight=None, provisioned_concurrency=None)

Bases: object

(experimental) Construction properties for a serverless production variant.

Parameters:
  • max_concurrency (Union[int, float]) – (experimental) The maximum number of concurrent invocations your serverless endpoint can process. Valid range: 1-200

  • memory_size_in_mb (Union[int, float]) – (experimental) The memory size of your serverless endpoint. Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB.

  • model (IModel) – (experimental) The model to host.

  • variant_name (str) – (experimental) Name of the production variant.

  • initial_variant_weight (Union[int, float, None]) – (experimental) Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. The traffic to a production variant is determined by the ratio of the variant weight to the sum of all variant weight values across all production variants. Default: 1.0

  • provisioned_concurrency (Union[int, float, None]) – (experimental) The number of concurrent invocations that are provisioned and ready to respond to your endpoint. Valid range: 1-200, must be less than or equal to maxConcurrency. Default: - none

Stability:

experimental

ExampleMetadata:

infused

Example:

import aws_cdk.aws_sagemaker_alpha as sagemaker

# model: sagemaker.Model


endpoint_config = sagemaker.EndpointConfig(self, "ServerlessEndpointConfig",
    serverless_production_variant=sagemaker.ServerlessProductionVariantProps(
        model=model,
        variant_name="serverlessVariant",
        max_concurrency=10,
        memory_size_in_mB=2048,
        provisioned_concurrency=5
    )
)

Attributes

initial_variant_weight

(experimental) Determines initial traffic distribution among all of the models that you specify in the endpoint configuration.

The traffic to a production variant is determined by the ratio of the variant weight to the sum of all variant weight values across all production variants.

Default:

1.0

Stability:

experimental

max_concurrency

(experimental) The maximum number of concurrent invocations your serverless endpoint can process.

Valid range: 1-200

Stability:

experimental

memory_size_in_mb

(experimental) The memory size of your serverless endpoint.

Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB.

Stability:

experimental

model

(experimental) The model to host.

Stability:

experimental

provisioned_concurrency

(experimental) The number of concurrent invocations that are provisioned and ready to respond to your endpoint.

Valid range: 1-200, must be less than or equal to maxConcurrency.

Default:
  • none

Stability:

experimental

variant_name

(experimental) Name of the production variant.

Stability:

experimental