interface ServerlessProductionVariantProps
| Language | Type name |
|---|---|
.NET | Amazon.CDK.AWS.Sagemaker.Alpha.ServerlessProductionVariantProps |
Go | github.com/aws/aws-cdk-go/awscdksagemakeralpha/v2#ServerlessProductionVariantProps |
Java | software.amazon.awscdk.services.sagemaker.alpha.ServerlessProductionVariantProps |
Python | aws_cdk.aws_sagemaker_alpha.ServerlessProductionVariantProps |
TypeScript (source) | @aws-cdk/aws-sagemaker-alpha ยป ServerlessProductionVariantProps |
Construction properties for a serverless production variant.
Example
import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';
declare const model: sagemaker.Model;
const endpointConfig = new sagemaker.EndpointConfig(this, 'ServerlessEndpointConfig', {
serverlessProductionVariant: {
model: model,
variantName: 'serverlessVariant',
maxConcurrency: 10,
memorySizeInMB: 2048,
provisionedConcurrency: 5, // optional
},
});
Properties
| Name | Type | Description |
|---|---|---|
| max | number | The maximum number of concurrent invocations your serverless endpoint can process. |
| memory | number | The memory size of your serverless endpoint. |
| model | IModel | The model to host. |
| variant | string | Name of the production variant. |
| initial | number | Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. |
| provisioned | number | The number of concurrent invocations that are provisioned and ready to respond to your endpoint. |
maxConcurrency
Type:
number
The maximum number of concurrent invocations your serverless endpoint can process.
Valid range: 1-200
memorySizeInMB
Type:
number
The memory size of your serverless endpoint.
Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB.
model
Type:
IModel
The model to host.
variantName
Type:
string
Name of the production variant.
initialVariantWeight?
Type:
number
(optional, default: 1.0)
Determines initial traffic distribution among all of the models that you specify in the endpoint configuration.
The traffic to a production variant is determined by the ratio of the variant weight to the sum of all variant weight values across all production variants.
provisionedConcurrency?
Type:
number
(optional, default: none)
The number of concurrent invocations that are provisioned and ready to respond to your endpoint.
Valid range: 1-200, must be less than or equal to maxConcurrency.

.NET
Go
Java
Python
TypeScript (