

# Deployment guardrails for updating models in production
<a name="deployment-guardrails"></a>

Deployment guardrails are a set of model deployment options in Amazon SageMaker AI Inference to update your machine learning models in production. Using the fully managed deployment options, you can control the switch from the current model in production to a new one. Traffic shifting modes in blue/green deployments, such as canary and linear, give you granular control over the traffic shifting process from your current model to the new one during the course of the update. There are also built-in safeguards such as auto-rollbacks that help you catch issues early and automatically take corrective action before they significantly impact production.

Deployment guardrails provide the following benefits:
+ **Deployment safety while updating production environments.** A regressive update to a production environment can cause unplanned downtime and business impact, such as increased model latency and high error rates. Deployment guardrails help you mitigate those risks by providing best practices and built-in operational safety guardrails.
+ **Fully managed deployment.** SageMaker AI takes care of setting up and orchestrating these deployments and integrates them with endpoint update mechanisms. You do not need to build and maintain orchestration, monitoring, or rollback mechanisms. You can leverage SageMaker AI to set up and orchestrate these deployments and focus on leveraging ML for your applications.
+ **Visibility.** You can track the progress of your deployment through the [DescribeEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeEndpoint.html) API or through Amazon CloudWatch Events (for [supported endpoints](deployment-guardrails-exclusions.md)). To learn more about events in SageMaker AI, see the Endpoint deployment state change section in [Events that Amazon SageMaker AI sends to Amazon EventBridge](automating-sagemaker-with-eventbridge.md). Note that if your endpoint uses any of the features in the [Exclusions](deployment-guardrails-exclusions.md) page, you cannot use CloudWatch Events.

**Note**  
Deployment guardrails only apply to [Asynchronous inference](async-inference.md) and [Real-time inference](realtime-endpoints.md) endpoint types.

## How to get started
<a name="deployment-guardrails-get-started"></a>

We support two types of deployments to update models in production: blue/green deployments and rolling deployments.
+ [Blue/Green Deployments](deployment-guardrails-blue-green.md): You can shift traffic from your old fleet (the blue fleet) to a new fleet (green fleet) with the updates. Blue/green deployments offer [multiple traffic shifting modes](https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails-blue-green.html). A traffic shifting mode is a configuration that specifies how SageMaker AI routes endpoint traffic to a new fleet containing your updates. The following traffic shifting modes provide you with different levels of control over the endpoint update process:
  + [Use all at once traffic shifting](deployment-guardrails-blue-green-all-at-once.md) shifts all of your endpoint traffic from the blue fleet to the green fleet. Once the traffic shifts to the green fleet, your pre-specified Amazon CloudWatch alarms begin monitoring the green fleet for a set amount of time (the *baking period*). If no alarms trip during the baking period, then SageMaker AI terminates the blue fleet.
  + [Use canary traffic shifting](deployment-guardrails-blue-green-canary.md) shifts one small portion of your traffic (a *canary*) to the green fleet and monitor it for a baking period. If the canary succeeds on the green fleet, then SageMaker AI shifts the rest of the traffic from the blue fleet to the green fleet before terminating the blue fleet.
  + [Use linear traffic shifting](deployment-guardrails-blue-green-linear.md) provides even more customization over the number of traffic-shifting steps and the percentage of traffic to shift for each step. While canary shifting lets you shift traffic in two steps, linear shifting extends this to *n* linearly spaced steps.
+ [Use rolling deployments](deployment-guardrails-rolling.md): You can update your endpoint as SageMaker AI incrementally provisions capacity and shifts traffic to a new fleet in steps of a batch size that you specify. Instances on the new fleet are updated with the new deployment configuration, and if no CloudWatch alarms trip during the baking period, then SageMaker AI cleans up instances on the old fleet. This option gives you granular control over the instance count or capacity percentage shifted during each step.

You can create and manage your deployment through the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) and [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) SageMaker API and AWS Command Line Interface commands. See the individual deployment pages for more details on how to set up your deployment. Note that if your endpoint uses any of the features listed in the [Exclusions](deployment-guardrails-exclusions.md) page, you cannot use deployment guardrails.

To follow guided examples that shows how to use deployment guardrails, see our example [Jupyter notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-inference-deployment-guardrails) for the canary and linear traffic shifting modes.

# Auto-Rollback Configuration and Monitoring
<a name="deployment-guardrails-configuration"></a>

Amazon CloudWatch alarms are a prerequisite for using baking periods in deployment guardrails. You can only use the auto-rollback functionality in deployment guardrails if you set up CloudWatch alarms that can monitor an endpoint. If any of your alarms trip during the specified monitoring period, SageMaker AI initiates a complete rollback to the old endpoint to protect your application. If you do not have any CloudWatch alarms set up to monitor your endpoint, then the auto-rollback functionality does not work during your deployment.

To learn more about Amazon CloudWatch, see [What is Amazon CloudWatch?](https://docs.aws.amazon.com/IAM/latest/UserGuide/access.html) in the *Amazon CloudWatch User Guide*.

**Note**  
Ensure that your IAM execution role has permission to perform the `cloudwatch:DescribeAlarms` action on the auto-rollback alarms you specify.

## Alarm Examples
<a name="deployment-guardrails-configuration-alarm-examples"></a>

To help you get started, we provide the following examples to demonstrate the capabilities of CloudWatch alarms. In addition to using or modifying the following examples, you can create your own alarms and configure the alarms to monitor various metrics on the specified fleets for a certain period of time. To see more SageMaker AI metrics and dimensions you can add to your alarms, see [Amazon SageMaker AI metrics in Amazon CloudWatch](monitoring-cloudwatch.md).

**Topics**
+ [Monitor invocation errors on both old and new fleets](#deployment-guardrails-configuration-alarm-examples-errors-both)
+ [Monitor model latency on the new fleet](#deployment-guardrails-configuration-alarm-examples-latency-new)

### Monitor invocation errors on both old and new fleets
<a name="deployment-guardrails-configuration-alarm-examples-errors-both"></a>

The following CloudWatch alarm monitors an endpoint's average error rate. You can use this alarm with any deployment guardrails traffic shifting type to provide overall monitoring on both the old and new fleets. If the alarm trips, then SageMaker AI initiates a rollback to the old fleet.

Invocation errors coming from both the old fleet and new fleet contribute to the average error rate. If the average error rate exceeds the specified threshold, then the alarm trips. This particular example monitors the 4xx errors (client errors) on both the old and new fleets for the duration of a deployment. You can also monitor the 5xx errors (server errors) by using the metric `Invocation5XXErrors`.

**Note**  
For this alarm type, if your old fleet trips the alarm during the deployment, SageMaker AI terminates your deployment. Therefore, if your current production fleet already causes errors, consider using or modifying one of the following examples that only monitors the new fleet for errors.

```
#Applied deployment type: all types
{
    "AlarmName": "EndToEndDeploymentHighErrorRateAlarm",
    "AlarmDescription": "Monitors the error rate of 4xx errors",
    "MetricName": "Invocation4XXErrors",
    "Namespace": "AWS/SageMaker",
    "Statistic": "Average",
    "Dimensions": [
        {
            "Name": "EndpointName",
            "Value": <your-endpoint-name>
        },
        {
            "Name": "VariantName",
            "Value": "AllTraffic"
        }
    ],
    "Period": 600,
    "EvaluationPeriods": 2,
    "Threshold": 1,
    "ComparisonOperator": "GreaterThanThreshold",
    "TreatMissingData": "notBreaching"
}
```

In the previous example, note the values for the following fields:
+ For `AlarmName` and `AlarmDescription`, enter a name and description you choose for the alarm.
+ For `MetricName`, use the value `Invocation4XXErrors` to monitor for 4xx errors on the endpoint
+ For `Namespace`, use the value `AWS/SageMaker`. You can also specify your own custom metric, if applicable.
+ For `Statistic`, use `Average`. This means that the alarm takes the average error rate over the evaluation periods when calculating whether the error rate has exceeded the threshold.
+ For the dimension `EndpointName`, use the name of the endpoint you are updating as the value.
+ For the dimension `VariantName`, use the value `AllTraffic` to specify all endpoint traffic.
+ For `Period`, use `600`. This sets the alarm’s evaluation periods to 10 minutes long.
+ For `EvaluationPeriods`, use `2`. This value tells the alarm to consider the two most recent evaluation periods when determining the alarm status.

### Monitor model latency on the new fleet
<a name="deployment-guardrails-configuration-alarm-examples-latency-new"></a>

The following CloudWatch alarm example monitors the new fleet’s model latency during your deployment. You can use this alarm to monitor only the new fleet and exclude the old fleet. The alarm lasts for the entire deployment. This example gives you comprehensive, end-to-end monitoring of the new fleet and initiates a rollback to the old fleet if the new fleet has any response time issues.

CloudWatch publishes the metrics with the dimension `EndpointConfigName:{New-Ep-Config}` after the new fleet starts receiving traffic, and these metrics last even after the deployment is complete.

You can use the following alarm example with any deployment type.

```
#Applied deployment type: all types
{
    "AlarmName": "NewEndpointConfigVersionHighModelLatencyAlarm",
    "AlarmDescription": "Monitors the model latency on new fleet",
    "MetricName": "ModelLatency",
    "Namespace": "AWS/SageMaker",
    "Statistic": "Average",
    "Dimensions": [
        {
            "Name": "EndpointName",
            "Value": <your-endpoint-name>
        },
        {
            "Name": "VariantName",
            "Value": "AllTraffic"
        },
        {
            "Name": "EndpointConfigName",
            "Value": <your-config-name>
    ],
    "Period": 300,
    "EvaluationPeriods": 2,
    "Threshold": 100000, # 100ms
    "ComparisonOperator": "GreaterThanThreshold",
    "TreatMissingData": "notBreaching"
}
```

In the previous example, note the values for the following fields:
+ For `MetricName`, use the value `ModelLatency` to monitor the model’s response time.
+ For `Namespace`, use the value `AWS/SageMaker`. You can also specify your own custom metric, if applicable.
+ For the dimension `EndpointName`, use the name of the endpoint you are updating as the value.
+ For the dimension `VariantName`, use the value `AllTraffic` to specify all endpoint traffic.
+ For the dimension `EndpointConfigName`, the value should refer to the endpoint configuration name for your new or updated endpoint.

**Note**  
If you want to monitor your old fleet instead of the new fleet, you can change the dimension `EndpointConfigName` to specify the name of your old fleet’s configuration.

# Blue/Green Deployments
<a name="deployment-guardrails-blue-green"></a>

When you update your endpoint, Amazon SageMaker AI automatically uses a blue/green deployment to maximize the availability of your endpoints. In a blue/green deployment, SageMaker AI provisions a new fleet with the updates (the green fleet). Then, SageMaker AI shifts traffic from the old fleet (the blue fleet) to the green fleet. Once the green fleet operates smoothly for a set evaluation period (called the baking period), SageMaker AI terminates the blue fleet. With the additional capabilities in blue/green deployments, you can utilize traffic shifting modes and auto-rollback monitoring to protect your endpoint from significant production impact.

The following list describes the key features of blue/green deployments in SageMaker AI:
+ **Traffic shifting modes.** The traffic shifting modes for deployment guardrails let you control the volume of traffic and number of traffic-shifting steps between the blue fleet and the green fleet. This capability gives you the ability to progressively evaluate the performance of the green fleet without fully committing to a 100% traffic shift.
+ **Baking period.** The baking period is a set amount of time to monitor the green fleet before proceeding to the next deployment stage. If any of the pre-specified alarms trip during any baking period, then all endpoint traffic rolls back to the blue fleet. The baking period helps you to build confidence in your update before making the traffic shift permanent.
+ **Auto-rollbacks.** You can specify Amazon CloudWatch alarms that SageMaker AI uses to monitor the green fleet. If an issue with the updated code trips any of the alarms, SageMaker AI initiates an auto-rollback to the blue fleet in order to maintain availability thereby minimizing risk.

## Traffic Shifting Modes
<a name="deployment-guardrails-blue-green-traffic-modes"></a>

The various traffic shifting modes in blue/green deployments give you more granular control over traffic shifting between the blue fleet and the green fleet. The available traffic shifting modes for blue/green deployments are all at once, canary, and linear. The following table shows a comparison of the options.

**Important**  
For blue/green deployments that involve multiple stage traffic shifting or baking periods, you are billed for both the fleets for the duration of the update, irrespective of the traffic to the fleet. This is in contrast to blue/green deployments with all at once traffic shifting and no baking periods, where you are only billed for one fleet during the course of the update.


| Name | What is it? | Pros | Cons | Recommendation | 
| --- | --- | --- | --- | --- | 
| All at once | Shifts all of the traffic to the new fleet in a single step. | Minimizes the overall update duration. | Regressive updates affect 100% of the traffic. | Use this option to minimize update time and cost. | 
| Canary | Traffic shifts in two steps. The first (canary) step shifts a small portion of the traffic followed by the second step, which shifts the remainder of the traffic. | Confines the blast radius of regressive updates to only the canary fleet. | Both fleets are operational in parallel for entire deployment. | Use this option to balance between minimizing the blast radius of regressive updates and minimizing the time that two fleets are operational. | 
| Linear | A fixed portion of the traffic shifts in a pre-specified number of equally spaced steps. | Minimizes the risk of regressive updates by shifting traffic over several steps. | The update duration and cost are proportional to the number of steps. | Use this option to minimize risk by spreading out deployment across multiple steps. | 

## Get Started
<a name="deployment-guardrails-blue-green-get-started"></a>

Once you specify your desired deployment configuration, SageMaker AI handles provisioning new instances, terminating old instances, and shifting traffic for you. You can create and manage your deployment through the existing [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) and [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) SageMaker API and AWS Command Line Interface commands. Note that if your endpoint uses any of the features listed in the [Exclusions](deployment-guardrails-exclusions.md) page, you cannot use deployment guardrails. See the individual deployment pages for more details on how to set up your deployment:
+ [ Blue/Green Update with All At Once Traffic Shifting](deployment-guardrails-blue-green-all-at-once.md)
+ [ Blue/Green Update with Canary Traffic Shifting](deployment-guardrails-blue-green-canary.md)
+ [ Blue/Green Update with Linear Traffic Shifting](deployment-guardrails-blue-green-linear.md)

To follow guided examples that show how to use deployment guardrails, see our example [Jupyter notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-inference-deployment-guardrails) for the canary and linear traffic shifting modes.

# Use all at once traffic shifting
<a name="deployment-guardrails-blue-green-all-at-once"></a>

With all at once traffic shifting, you can quickly roll out an endpoint update using the safety guardrails of a blue/green deployment. You can use this traffic shifting option to minimize the update duration while still taking advantage of the availability guarantees of blue/green deployments. The baking period feature helps you to monitor the performance and functionality of your new instances before terminating your old instances, ensuring that your new fleet is fully operational.

The following diagram shows how all at once traffic shifting manages the old and new fleets.

![\[A successful 100% traffic shift from the old fleet to the new fleet.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/deployment-guardrails-blue-green-all-at-once.png)


When you use all at once traffic shifting, SageMaker AI routes 100% of the traffic to the new fleet (green fleet). Once the green fleet starts receiving traffic, the baking period begins. The baking period is a set amount of time in which pre-specified Amazon CloudWatch alarms monitor the performance of the green fleet. If no alarms trip during the baking period, SageMaker AI terminates the old fleet (blue fleet). If any alarms trip during the baking period, then an auto-rollback initiates and 100% of the traffic shifts back to the blue fleet.

## Prerequisites
<a name="deployment-guardrails-blue-green-all-at-once-prereqs"></a>

Before setting up a deployment with all at once traffic shifting, you must create Amazon CloudWatch alarms to watch metrics from your endpoint. If any of the alarms trip during the baking period, then the traffic rolls back to your blue fleet. To learn how to set up CloudWatch alarms on an endpoint, see the prerequisite page [Auto-Rollback Configuration and Monitoring](deployment-guardrails-configuration.md). To learn more about CloudWatch alarms, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) in the *Amazon CloudWatch User Guide*.

## Configure All At Once Traffic Shifting
<a name="deployment-guardrails-blue-green-all-at-once-configure"></a>

Once you are ready for your deployment and have set up CloudWatch alarms for your endpoint, you can use either the SageMaker AI [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API or the [update-endpoint](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-endpoint.html) command in the AWS Command Line Interface to initiate the deployment.

**Topics**
+ [How to update an endpoint (API)](#deployment-guardrails-blue-green-all-at-once-configure-api-update)
+ [How to update an endpoint with an existing blue/green update policy (API)](#deployment-guardrails-blue-green-all-at-once-configure-api-existing)
+ [How to update an endpoint (CLI)](#deployment-guardrails-blue-green-all-at-once-configure-cli-update)

### How to update an endpoint (API)
<a name="deployment-guardrails-blue-green-all-at-once-configure-api-update"></a>

The following example shows how you can update your endpoint with all at once traffic shifting using [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) in the Amazon SageMaker API.

```
import boto3
client = boto3.client("sagemaker")

response = client.update_endpoint(
    EndpointName="<your-endpoint-name>",
    EndpointConfigName="<your-config-name>",
    DeploymentConfig={
        "BlueGreenUpdatePolicy": {
            "TrafficRoutingConfiguration": {
                "Type": "ALL_AT_ONCE"
            },
            "TerminationWaitInSeconds": 600,
            "MaximumExecutionTimeoutInSeconds": 1800
        },
        "AutoRollbackConfiguration": {
            "Alarms": [
                {
                    "AlarmName": "<your-cw-alarm>"
                },
            ]
        }
    }
)
```

To configure the all at once traffic shifting option, do the following:
+ For `EndpointName`, use the name of the existing endpoint you want to update.
+ For `EndpointConfigName`, use the name of the endpoint configuration you want to use.
+ Under `DeploymentConfig` and `BlueGreenUpdatePolicy`, in `TrafficRoutingConfiguration`, set the `Type` parameter to `ALL_AT_ONCE`. This specifies that the deployment uses the all at once traffic shifting mode.
+ For `TerminationWaitInSeconds`, use `600`. This parameter tells SageMaker AI to wait for the specified amount of time (in seconds) after your green fleet is fully active before terminating the instances in the blue fleet. In this example, SageMaker AI waits for 10 minutes after the final baking period before terminating the blue fleet.
+ For `MaximumExecutionTimeoutInSeconds`, use `1800`. This parameter sets the maximum amount of time that the deployment can run before it times out. In the preceding example, your deployment has a limit of 30 minutes to finish.
+ In `AutoRollbackConfiguration`, within the `Alarms` field, you can add your CloudWatch alarms by name. Create one `AlarmName: <your-cw-alarm>` entry for each alarm you want to use.

### How to update an endpoint with an existing blue/green update policy (API)
<a name="deployment-guardrails-blue-green-all-at-once-configure-api-existing"></a>

When you use the [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) API to create an endpoint, you can optionally specify a deployment configuration to reuse for future endpoint updates. You can use the same `DeploymentConfig` options as the previous UpdateEndpoint API example. There are no changes to the CreateEndpoint API behavior. Specifying the deployment configuration does not automatically perform a blue/green update on your endpoint.

The option to use a previous deployment configuration happens when using the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API to update your endpoint. When updating your endpoint, you can use the `RetainDeploymentConfig` option to keep the deployment configuration you specified when you created the endpoint.

When calling the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API, set `RetainDeploymentConfig` to `True` to keep the `DeploymentConfig` options from your original endpoint configuration.

```
response = client.update_endpoint(
    EndpointName="<your-endpoint-name>",
    EndpointConfigName="<your-config-name>",
    RetainDeploymentConfig=True
)
```

### How to update an endpoint (CLI)
<a name="deployment-guardrails-blue-green-all-at-once-configure-cli-update"></a>

If you are using the AWS CLI, the following example shows how to start a blue/green all at once deployment using the [update-endpoint](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-endpoint.html) command.

```
update-endpoint
--endpoint-name <your-endpoint-name> 
--endpoint-config-name <your-config-name> 
--deployment-config '"BlueGreenUpdatePolicy": {"TrafficRoutingConfiguration": {"Type": "ALL_AT_ONCE"},
    "TerminationWaitInSeconds": 600, "MaximumExecutionTimeoutInSeconds": 1800},
    "AutoRollbackConfiguration": {"Alarms": [{"AlarmName": "<your-alarm>"}]}'
```

To configure the all at once traffic shifting option, do the following:
+ For `endpoint-name`, use the name of the endpoint you want to update.
+ For `endpoint-config-name`, use the name of the endpoint configuration you want to use.
+ For `deployment-config`, use a [BlueGreenUpdatePolicy](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_BlueGreenUpdatePolicy.html) JSON object.

**Note**  
If you would rather save your JSON object in a file, see [Generating AWS CLI skeleton and input parameters](https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-skeleton.html) in the *AWS CLI User Guide*.

# Use canary traffic shifting
<a name="deployment-guardrails-blue-green-canary"></a>

With canary traffic shifting, you can test a portion of your endpoint traffic on the new fleet while the old fleet serves the remainder of the traffic. This testing step is a safety guardrail that validates the new fleet’s functionality before shifting all of your traffic to the new fleet. You still have the benefits of a blue/green deployment, and the added canary feature lets you ensure that your new (green) fleet can serve inference before letting it handle 100% of the traffic.

The portion of your green fleet that turns on to receive traffic is called the canary, and you can choose the size of this canary. Note that the canary size should be less than or equal to 50% of the new fleet's capacity. Once the baking period finishes and no pre-specified Amazon CloudWatch alarms trip, the rest of the traffic shifts from the old (blue) fleet to the green fleet. Canary traffic shifting provides you with more safety during your deployment since any issues with the updated model only impact the canary.

The following diagram shows how canary traffic shifting manages the distribution of traffic between the blue and green fleets.

![\[A successful two step canary traffic shift from the old fleet to the new fleet.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/deployment-guardrails-blue-green-canary.png)


Once SageMaker AI provisions the green fleet, SageMaker AI routes a portion of the incoming traffic (for example, 25%) to the canary. Then the baking period begins, during which your CloudWatch alarms monitor the performance of the green fleet. During this time, both the blue fleet and green fleet are partially active and receiving traffic. If any of the alarms trip during the baking period, then SageMaker AI initiates a rollback and all traffic returns to the blue fleet. If none of the alarms trip, then all of the traffic shifts to the green fleet and there is a final baking period. If the final baking period finishes without tripping any alarms, then the green fleet serves all traffic and SageMaker AI terminates the blue fleet.

## Prerequisites
<a name="deployment-guardrails-blue-green-canary-prereqs"></a>

Before setting up a deployment with canary traffic shifting, you must create Amazon CloudWatch alarms to monitor metrics from your endpoint. The alarms are active during the baking period, and if any alarms trip, then all endpoint traffic rolls back to the blue fleet. To learn how to set up CloudWatch alarms on an endpoint, see the prerequisite page [Auto-Rollback Configuration and Monitoring](deployment-guardrails-configuration.md). To learn more about CloudWatch alarms, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) in the *Amazon CloudWatch User Guide*.

## Configure Canary Traffic Shifting
<a name="deployment-guardrails-blue-green-canary-configure"></a>

Once you are ready for your deployment and have set up Amazon CloudWatch alarms for your endpoint, you can use either the Amazon SageMaker AI [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API or the [update-endpoint](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-endpoint.html) command in the AWS CLI to initiate the deployment.

**Topics**
+ [How to update an endpoint (API)](#deployment-guardrails-blue-green-canary-configure-api-update)
+ [How to update an endpoint with an existing blue/green update policy (API)](#deployment-guardrails-blue-green-canary-configure-api-existing)
+ [How to update an endpoint (CLI)](#deployment-guardrails-blue-green-canary-configure-cli-update)

### How to update an endpoint (API)
<a name="deployment-guardrails-blue-green-canary-configure-api-update"></a>

The following example of the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API shows how you can update an endpoint with canary traffic shifting.

```
import boto3
client = boto3.client("sagemaker")

response = client.update_endpoint(
    EndpointName="<your-endpoint-name>",
    EndpointConfigName="<your-config-name>",
    DeploymentConfig={
        "BlueGreenUpdatePolicy": {
            "TrafficRoutingConfiguration": {
                "Type": "CANARY",
                "CanarySize": {
                    "Type": "CAPACITY_PERCENT",
                    "Value": 30
                },
                "WaitIntervalInSeconds": 600
            },
            "TerminationWaitInSeconds": 600,
            "MaximumExecutionTimeoutInSeconds": 1800
        },
        "AutoRollbackConfiguration": {
            "Alarms": [
                {
                    "AlarmName": "<your-cw-alarm>"
                }
            ]
        }
    }
)
```

To configure the canary traffic shifting option, do the following:
+ For `EndpointName`, use the name of the existing endpoint you want to update.
+ For `EndpointConfigName`, use the name of the endpoint configuration you want to use.
+ Under `DeploymentConfig` and `BlueGreenUpdatePolicy`, in `TrafficRoutingConfiguration`, set the `Type` parameter to `CANARY`. This specifies that the deployment uses canary traffic shifting.
+ In the `CanarySize` field, you can change the size of the canary by modifying the `Type` and `Value` parameters. For `Type`, use `CAPACITY_PERCENT`, meaning the percentage of your green fleet you want to use as the canary, and then set `Value` to `30`. In this example, you use 30% of the green fleet’s capacity as the canary. Note that the canary size should be equal to or less than 50% of the green fleet's capacity.
+ For `WaitIntervalInSeconds`, use `600`. The parameter tells SageMaker AI to wait for the specified amount of time (in seconds) between each interval shift. This interval is the duration of the canary baking period. In the preceding example, SageMaker AI waits for 10 minutes after the canary shift and then completes the second and final traffic shift.
+ For `TerminationWaitInSeconds`, use `600`. This parameter tells SageMaker AI to wait for the specified amount of time (in seconds) after your green fleet is fully active before terminating the instances in the blue fleet. In this example, SageMaker AI waits for 10 minutes after the final baking period before terminating the blue fleet.
+ For `MaximumExecutionTimeoutInSeconds`, use `1800`. This parameter sets the maximum amount of time that the deployment can run before it times out. In the preceding example, your deployment has a limit of 30 minutes to finish.
+ In `AutoRollbackConfiguration`, within the `Alarms` field, you can add your CloudWatch alarms by name. Create one `AlarmName: <your-cw-alarm>` entry for each alarm you want to use.

### How to update an endpoint with an existing blue/green update policy (API)
<a name="deployment-guardrails-blue-green-canary-configure-api-existing"></a>

When you use the [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) API to create an endpoint, you can optionally specify a deployment configuration to reuse for future endpoint updates. You can use the same `DeploymentConfig` options as the previous UpdateEndpoint API example. There are no changes to the CreateEndpoint API behavior. Specifying the deployment configuration does not automatically perform a blue/green update on your endpoint.

The option to use a previous deployment configuration happens when using the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API to update your endpoint. When updating your endpoint, you can use the `RetainDeploymentConfig` option to keep the deployment configuration you specified when you created the endpoint.

When calling the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API, set `RetainDeploymentConfig` to `True` to keep the `DeploymentConfig` options from your original endpoint configuration.

```
response = client.update_endpoint(
    EndpointName="<your-endpoint-name>",
    EndpointConfigName="<your-config-name>",
    RetainDeploymentConfig=True
)
```

### How to update an endpoint (CLI)
<a name="deployment-guardrails-blue-green-canary-configure-cli-update"></a>

If you are using the AWS CLI, the following example shows how to start a blue/green canary deployment using the [update-endpoint](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-endpoint.html) command.

```
update-endpoint
--endpoint-name <your-endpoint-name>
--endpoint-config-name <your-config-name> 
--deployment-config '"BlueGreenUpdatePolicy": {"TrafficRoutingConfiguration": {"Type": "CANARY",
    "CanarySize": {"Type": "CAPACITY_PERCENT", "Value": 30}, "WaitIntervalInSeconds": 600},
    "TerminationWaitInSeconds": 600, "MaximumExecutionTimeoutInSeconds": 1800},
    "AutoRollbackConfiguration": {"Alarms": [{"AlarmName": "<your-alarm>"}]}'
```

To configure the canary traffic shifting option, do the following:
+ For `endpoint-name`, use the name of the endpoint you want to update.
+ For `endpoint-config-name`, use the name of the endpoint configuration you want to use.
+ For `deployment-config`, use a [BlueGreenUpdatePolicy](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_BlueGreenUpdatePolicy.html) JSON object.

**Note**  
If you would rather save your JSON object in a file, see [Generating AWS CLI skeleton and input parameters](https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-skeleton.html) in the *AWS CLI User Guide*.

# Use linear traffic shifting
<a name="deployment-guardrails-blue-green-linear"></a>

Linear traffic shifting enables you to gradually shift traffic from your old fleet (blue fleet) to your new fleet (green fleet). With linear traffic shifting, you can shift traffic in multiple steps, minimizing the chance of a disruption to your endpoint. This blue/green deployment option gives you the most granular control over traffic shifting.

You can choose either the number of instances or the percentage of the green fleet’s capacity to activate during each step. Each linear step should only be between 10-50% of the green fleet's capacity. For each step, there is a baking period during which your pre-specified Amazon CloudWatch alarms monitor metrics on the green fleet. Once the baking period finishes and no alarms trip, the active portion of your green fleet continues receiving traffic and a new step begins. If alarms trip during any of the baking periods, 100% of the endpoint traffic rolls back to the blue fleet.

The following diagram shows how linear traffic shifting routes traffic to the blue and green fleets.

![\[A successful three step linear traffic shift from the old fleet to the new fleet.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/deployment-guardrails-blue-green-linear.png)


Once SageMaker AI provisions the new fleet, the first portion of the green fleet turns on and receives traffic. SageMaker AI deactivates the same size portion of the blue fleet, and the baking period begins. If any alarms trip, all of the endpoint traffic rolls back to the blue fleet. If the baking period finishes, then the next step begins. Another portion of the green fleet activates and receives traffic, part of the blue fleet deactivates, and another baking period begins. The same process repeats until the blue fleet is fully deactivated and the green fleet is fully active and receiving all traffic. If an alarm goes off at any point, SageMaker AI terminates the shifting process and 100% of the traffic rolls back to the blue fleet.

## Prerequisites
<a name="deployment-guardrails-blue-green-linear-prereqs"></a>

Before setting up a deployment with linear traffic shifting, you must create CloudWatch alarms to monitor metrics from your endpoint. The alarms are active during the baking period, and if any alarms trip, then all endpoint traffic rolls back to the blue fleet. To learn how to set up CloudWatch alarms on an endpoint, see the prerequisite page [Auto-Rollback Configuration and Monitoring](deployment-guardrails-configuration.md). To learn more about CloudWatch alarms, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) in the *Amazon CloudWatch User Guide*.

## Configure Linear Traffic Shifting
<a name="deployment-guardrails-blue-green-linear-configure"></a>

Once you are ready for your deployment and have set up CloudWatch alarms for your endpoint, you can use either the Amazon SageMaker AI [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API or the [update-endpoint](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-endpoint.html) command in the AWS CLI to initiate the deployment.

**Topics**
+ [How to update an endpoint (API)](#deployment-guardrails-blue-green-linear-configure-api-update)
+ [How to update an endpoint with an existing blue/green update policy (API)](#deployment-guardrails-blue-green-linear-configure-api-existing)
+ [How to update an endpoint (CLI)](#deployment-guardrails-blue-green-canary-configure-cli-update)

### How to update an endpoint (API)
<a name="deployment-guardrails-blue-green-linear-configure-api-update"></a>

The following example of the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API shows how you can update an endpoint with linear traffic shifting.

```
import boto3
client = boto3.client("sagemaker")

response = client.update_endpoint(
    EndpointName="<your-endpoint-name>",
    EndpointConfigName="<your-config-name>",
    DeploymentConfig={
        "BlueGreenUpdatePolicy": {
            "TrafficRoutingConfiguration": {
                "Type": "LINEAR",
                "LinearStepSize": {
                    "Type": "CAPACITY_PERCENT",
                    "Value": 20
                },
                "WaitIntervalInSeconds": 300
            },
            "TerminationWaitInSeconds": 300,
            "MaximumExecutionTimeoutInSeconds": 3600
        },
        "AutoRollbackConfiguration": {
            "Alarms": [
                {
                    "AlarmName": "<your-cw-alarm>"
                }
            ]
        }
    }
)
```

To configure the linear traffic shifting option, do the following:
+ For `EndpointName`, use the name of the existing endpoint you want to update.
+ For `EndpointConfigName`, use the name of the endpoint configuration you want to use.
+ Under `DeploymentConfig` and `BlueGreenUpdatePolicy`, in `TrafficRoutingConfiguration`, set the `Type` parameter to `LINEAR`. This specifies that the deployment uses linear traffic shifting.
+ In the `LinearStepSize` field, you can change the size of the steps by modifying the `Type` and `Value` parameters. For `Type`, use `CAPACITY_PERCENT`, meaning the percentage of your green fleet you want to use as the step size, and then set `Value` to `20`. In this example, you turn on 20% of the green fleet’s capacity for each traffic shifting step. Note that when customizing your linear step size, you should only use steps that are 10-50% of the green fleet's capacity.
+ For `WaitIntervalInSeconds`, use `300`. The parameter tells SageMaker AI to wait for the specified amount of time (in seconds) between each traffic shift. This interval is the duration of the baking period between each linear step. In the preceding example, SageMaker AI waits for 5 minutes between each traffic shift.
+ For `TerminationWaitInSeconds`, use `300`. This parameter tells SageMaker AI to wait for the specified amount of time (in seconds) after your green fleet is fully active before terminating the instances in the blue fleet. In this example, SageMaker AI waits for 5 minutes after the final baking period before terminating the blue fleet.
+ For `MaximumExecutionTimeoutInSeconds`, use `3600`. This parameter sets the maximum amount of time that the deployment can run before it times out. In the preceding example, your deployment has a limit of 1 hour to finish.
+ In `AutoRollbackConfiguration`, within the `Alarms` field, you can add your CloudWatch alarms by name. Create one `AlarmName: <your-cw-alarm>` entry for each alarm you want to use.

### How to update an endpoint with an existing blue/green update policy (API)
<a name="deployment-guardrails-blue-green-linear-configure-api-existing"></a>

When you use the [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) API to create an endpoint, you can optionally specify a deployment configuration to reuse for future endpoint updates. You can use the same `DeploymentConfig` options as the previous UpdateEndpoint API example. There are no changes to the CreateEndpoint API behavior. Specifying the deployment configuration does not automatically perform a blue/green update on your endpoint.

The option to use a previous deployment configuration happens when using the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API to update your endpoint. When updating your endpoint, you can use the `RetainDeploymentConfig` option to keep the deployment configuration you specified when you created the endpoint.

When calling the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API, set `RetainDeploymentConfig` to `True` to keep the `DeploymentConfig` options from your original endpoint configuration.

```
response = client.update_endpoint(
    EndpointName="<your-endpoint-name>",
    EndpointConfigName="<your-config-name>",
    RetainDeploymentConfig=True
)
```

### How to update an endpoint (CLI)
<a name="deployment-guardrails-blue-green-canary-configure-cli-update"></a>

If you are using the AWS CLI, the following example shows how to start a blue/green linear deployment using the [update-endpoint](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-endpoint.html) command.

```
update-endpoint
--endpoint-name <your-endpoint-name>
--endpoint-config-name <your-config-name> 
--deployment-config '{"BlueGreenUpdatePolicy": {"TrafficRoutingConfiguration": {"Type": "LINEAR",
    "LinearStepSize": {"Type": "CAPACITY_PERCENT", "Value": 20}, "WaitIntervalInSeconds": 300},
    "TerminationWaitInSeconds": 300, "MaximumExecutionTimeoutInSeconds": 3600},
    "AutoRollbackConfiguration": {"Alarms": [{"AlarmName": "<your-alarm>"}]}'
```

To configure the linear traffic shifting option, do the following:
+ For `endpoint-name`, use the name of the endpoint you want to update.
+ For `endpoint-config-name`, use the name of the endpoint configuration you want to use.
+ For `deployment-config`, use a [BlueGreenUpdatePolicy](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_BlueGreenUpdatePolicy.html) JSON object.

**Note**  
If you would rather save your JSON object in a file, see [Generating AWS CLI skeleton and input parameters](https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-skeleton.html) in the *AWS CLI User Guide*.

# Use rolling deployments
<a name="deployment-guardrails-rolling"></a>

When you update your endpoint, you can specify a rolling deployment to gradually shift traffic from your old fleet to a new fleet. You can control the size of the traffic shifting steps, as well as specify an evaluation period to monitor the new instances for issues before terminating instances from the old fleet. With rolling deployments, instances on the old fleet are cleaned up after each traffic shift to the new fleet, reducing the amount of additional instances needed to update your endpoint. This is useful especially for accelerated instances that are in high demand.

Rolling deployments gradually replace the previous deployment of your model version with the new version by updating your endpoint in configurable batch sizes. The traffic shifting behavior of rolling deployments is similar to the [linear traffic shifting mode](https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails-blue-green-linear.html) in blue/green deployments, but rolling deployments provide you with the benefit of reduced capacity requirements when compared to blue/green deployments. With rolling deployments, fewer instances are active at a time, and you have more granular control over how many instances you want to update in the new fleet. You should consider using a rolling deployment instead of a blue/green deployment if you have large models or a large endpoint with many instances.

The following list describes the key features of rolling deployments in Amazon SageMaker AI:
+ **Baking period. **The baking period is a set amount of time to monitor the new fleet before proceeding to the next deployment stage. If any of the pre-specified alarms trip during any baking period, then all endpoint traffic rolls back to the old fleet. The baking period helps you to build confidence in your update before making the traffic shift permanent.
+ **Rolling batch size.** You have granular control over the size of each batch for traffic shifting, or the number of instances you want to update in each batch. This number can range for 5–50% of the size of your fleet. You can specify the batch size as a number of instances or as the overall percentage of your fleet.
+ **Auto-rollbacks. **You can specify Amazon CloudWatch alarms that SageMaker AI uses to monitor the new fleet. If an issue with the updated code trips any of the alarms, SageMaker AI initiates an auto-rollback to the old fleet in order to maintain availability, thereby minimizing risk.

**Note**  
If your endpoint uses any of the features listed in the [Exclusions](https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails-exclusions.html) page, you cannot use rolling deployments.

## How it works
<a name="deployment-guardrails-rolling-how-it-works"></a>

During a rolling deployment, SageMaker AI provides the infrastructure to shift traffic from the old fleet to the new fleet without having to provision all of the new instances at once. SageMaker AI uses the following steps to shift traffic:

1. SageMaker AI provisions the first batch of instances in the new fleet.

1. A portion of traffic is shifted from the old instances to the first batch of new instances.

1. After the baking period, if no Amazon CloudWatch alarms are tripped, then SageMaker AI cleans up a batch of old instances.

1. SageMaker AI continues to provision, shift, and clean up instances in batches until the deployment is complete.

If an alarm is tripped during one of the baking periods, then traffic is rolled back to the old fleet in batches of a size that you specify. Alternatively, you can specify the rolling deployment to shift 100% of the traffic back to the old fleet if an alarm is tripped.

The following diagram shows the progression of a successful rolling deployment, as described in the previous steps.

![\[The steps of a rolling deployment's traffic shifting successfully from the old to the new fleet.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/deployment-guardrails-rolling-diagram.png)


To create a rolling deployment, you only have to specify your desired deployment configuration. Then SageMaker AI handles provisioning new instances, terminating old instances, and shifting traffic for you. You can create and manage your deployment through the existing [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) and [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) SageMaker API and AWS Command Line Interface commands.

## Prerequisites
<a name="deployment-guardrails-prereqs"></a>

Before setting up a rolling deployment, you must create Amazon CloudWatch alarms to watch metrics from your endpoint. If any of the alarms trip during the baking period, then the traffic begins rolling back to your old fleet. To learn how to set up CloudWatch alarms on an endpoint, see the prerequisite page [Auto-Rollback Configuration and Monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails-configuration.html). To learn more about CloudWatch alarms, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) in the *Amazon CloudWatch User Guide*.

Also, review the [Exclusions](https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails-exclusions.html) page to make sure that your endpoint meets the requirements for a rolling deployment.

## Determine the rolling batch size
<a name="deployment-guardrails-rolling-batch-size"></a>

Before updating your endpoint, determine the batch size that you want to use for incrementally shifting traffic to the new fleet.

For rolling deployments, you can specify a batch size that is 5–50% of the capacity of your fleet. If you choose a large batch size, the deployment completes more quickly. However, keep in mind that the endpoint requires more capacity while updating, roughly the batch size overhead. If you choose a smaller batch size, the deployment takes longer, but you use less capacity during the deployment.

## Configure a rolling deployment
<a name="deployment-guardrails-rolling-configure"></a>

Once you are ready for your deployment and have set up CloudWatch alarms for your endpoint, you can use the SageMaker AI [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API or the [update-endpoint](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/update-endpoint.html) command in the AWS Command Line Interface to initiate the deployment.

**How to update an endpoint**

The following example shows how you can update your endpoint with a rolling deployment using the [update\$1endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/update_endpoint.html) method of the Boto3 SageMaker AI client.

To configure a rolling deployment, use the following example and fields:
+ For `EndpointName`, use the name of the existing endpoint you want to update.
+ For `EndpointConfigName`, use the name of the endpoint configuration you want to use.
+ In the `AutoRollbackConfiguration` object, within the `Alarms` field, you can add your CloudWatch alarms by name. Create one `AlarmName: <your-cw-alarm>` entry for each alarm you want to use.
+ Under `DeploymentConfig`, for the `RollingUpdatePolicy` object, specify the following fields:
  + `MaximumExecutionTimeoutInSeconds` — The time limit for the total deployment. Exceeding this limit causes a timeout. The maximum value you can specify for this field is 28800 seconds, or 8 hours.
  + `WaitIntervalInSeconds` — The length of the baking period, during which SageMaker AI monitors alarms for each batch on the new fleet.
  + `MaximumBatchSize` — Specify the `Type` of batch you want to use (either instance count or overall percentage of your fleet) and the `Value`, or the size of each batch.
  + `RollbackMaximumBatchSize` — Use this object to specify the rollback strategy in case an alarm trips. Specify the `Type` of batch you want to use (either instance count or overall percentage of your fleet), and the `Value`, or the size of each batch. If you don’t specify these fields, or if you set the value to 100% of your endpoint, then SageMaker AI uses a blue/green rollback strategy and rolls all traffic back to the old fleet when an alarm trips.

```
import boto3
client = boto3.client("sagemaker")

response = client.update_endpoint(
    EndpointName="<your-endpoint-name>",
    EndpointConfigName="<your-config-name>",
    DeploymentConfig={
        "AutoRollbackConfiguration": {
            "Alarms": [
                {
                    "AlarmName": "<your-cw-alarm>"
                },
            ]
        },
        "RollingUpdatePolicy": { 
            "MaximumExecutionTimeoutInSeconds": number,
            "WaitIntervalInSeconds": number,
            "MaximumBatchSize": {
                "Type": "INSTANCE_COUNT" | "CAPACITY_PERCENTAGE" (default),
                "Value": number
            },
            "RollbackMaximumBatchSize": {
                "Type": "INSTANCE_COUNT" | "CAPACITY_PERCENTAGE" (default),
                "Value": number
            },
        }  
    }
)
```

After updating your endpoint, you might want to check the status of your rolling deployment and check the health of your endpoint. You can review your endpoint’s status in the SageMaker AI console, or you can review the status of your endpoint by using the [DescribeEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeEndpoint.html) API.

In the `VariantStatus` object returned by the `DescribeEndpoint` API, the `Status` field tells you the current deployment or operational status of your endpoint. For more information about the possible statuses and what they mean, see [ProductionVariantStatus](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProductionVariantStatus.html).

If you attempted to do a rolling deployment and the status of your endpoint is `UpdateRollbackFailed`, see the following section for troubleshooting help.

## Failure handling
<a name="deployment-guardrails-rolling-failures"></a>

If your rolling deployments fails and the auto-rollback fails as well, your endpoint can be left with a status of `UpdateRollbackFailed`. This status means that different endpoint configurations are deployed to the instances behind your endpoint, and your endpoint is in service with a mix of old and new endpoint configurations.

You can make another call to the [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API to return your endpoint to a healthy state. Specify your desired endpoint configuration and deployment configuration (either as a rolling deployment, a blue/green deployment, or neither) to update your endpoint.

You can call the [DescribeEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeEndpoint.html) API to check the health of your endpoint again, which is returned in the `VariantStatus` object as the `Status` field. If your update is successful, your endpoint’s `Status` returns to `InService`.

# Exclusions
<a name="deployment-guardrails-exclusions"></a>

When doing a blue/green or rolling deployment, your new endpoint configuration must have the same variant name as the old endpoint configuration. There are also feature-based exclusions that make your endpoint incompatible with deployment guardrails at this time. If your endpoint uses any of the following features, you cannot use deployment guardrails on your endpoint, and your endpoint will fall back to using a blue/green deployment with all at once traffic shifting and no final baking period:
+ Marketplace containers
+ Endpoints that use Inf1 (Inferentia-based) instances

If you're doing a rolling deployment, there are additional feature-based exclusions:
+ Serverless inference endpoints
+ Multi-variant inference endpoints