

# Baseline calculation, drift detection and lifecycle with ClarifyCheck and QualityCheck steps in Amazon SageMaker Pipelines
<a name="pipelines-quality-clarify-baseline-lifecycle"></a>

The following topic discusses how baselines and model versions evolve in the Amazon SageMaker Pipelines when using the [`ClarifyCheck`](build-and-manage-steps-types.md#step-type-clarify-check) and [`QualityCheck`](build-and-manage-steps-types.md#step-type-quality-check) steps.

For the `ClarifyCheck` step, a baseline is a single file that resides in the step properties with the suffix `constraints`. For the `QualityCheck` step, a baseline is a combination of two files that resides in the step properties: one with the suffix `statistics` and the other with the suffix `constraints`. In the following topics we discuss these properties with a prefix that describes how they are used, impacting baseline behavior and lifecycle in these two pipeline steps. For example, the `ClarifyCheck` step always calculates and assigns the new baselines in the `CalculatedBaselineConstraints` property and the `QualityCheck` step does the same in the `CalculatedBaselineConstraints` and `CalculatedBaselineStatistics` properties.

## Baseline calculation and registration for ClarifyCheck and QualityCheck steps
<a name="pipelines-quality-clarify-baseline-calculations"></a>

Both the `ClarifyCheck` and `QualityCheck` steps always calculate new baselines based on step inputs through the underlying processing job run. These newly calculated baselines are accessed through the properties with the prefix `CalculatedBaseline`. You can record these properties as the `ModelMetrics` of your model package in the [Model step](build-and-manage-steps-types.md#step-type-model). This model package can be registered with 5 different baselines. You can register it with one for each check type: data bias, model bias, and model explainability from running the `ClarifyCheck` step and model quality, and data quality from running the `QualityCheck` step. The `register_new_baseline` parameter dictates the value set in the properties with the prefix `BaselineUsedForDriftCheck` after a step runs.

The following table of potential use cases shows different behaviors resulting from the step parameters you can set for the `ClarifyCheck` and `QualityCheck` steps:


| Possible use case that you may consider for selecting this configuration  | `skip_check` / `register_new_baseline` | Does step do a drift check? | Value of step property `CalculatedBaseline` | Value of step property `BaselineUsedForDriftCheck` | 
| --- | --- | --- | --- | --- | 
| You are doing regular retraining with checks enabled to get a new model version, but you *want to carry over the previous baselines* as the `DriftCheckBaselines` in the model registry for your new model version. | False/ False | Drift check runs against existing baselines | New baselines calculated by running the step | Baseline from the latest approved model in Model Registry or the baseline supplied as step parameter | 
| You are doing regular retraining with checks enabled to get a new model version, but you *want to refresh the `DriftCheckBaselines` in the model registry with the newly calculated baselines* for your new model version. | False/ True | Drift check runs against existing baselines | New baselines calculated by running the step | Newly calculated baseline by running the step (value of property CalculatedBaseline) | 
| You are initiating the pipeline to retrain a new model version because there is a violation detected by Amazon SageMaker Model Monitor on an endpoint for a particular type of check, and you want to *skip this type of check against the previous baseline, but carry over the previous baseline as `DriftCheckBaselines` in the model registry* for your new model version. | True/ False | No drift check | New baselines calculated by running | Baseline from the latest approved model in the model registry or the baseline supplied as step parameter | 
| This happens in the following cases: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html)  | True/ True | No drift check | New baselines calculated by running the step | Newly calculated baseline by running the step (value of property CalculatedBaseline) | 

**Note**  
If you use scientific notation in your constraint, you need to convert to float. For a preprocessing script example of how to do this, see [Create a Model Quality Baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-baseline.html).

When you register a model with [Model step](build-and-manage-steps-types.md#step-type-model), you can register the `BaselineUsedForDriftCheck` property as `DriftCheckBaselines`. These baseline files can then be used by Model Monitor for model and data quality checks. In addition, these baselines can also be used in the ClarifyCheckStep and `QualityCheck` step to compare newly trained models against the existing models that are registered in the model registry for future pipeline runs.

## Drift Detection against Previous Baselines in Pipelines
<a name="pipelines-quality-clarify-baseline-drift-detection"></a>

In the case of the `QualityCheck` step, when you initiate the pipeline for regular retraining to get a new model version, you may not want to run the training step if the data quality and the data bias has [Schema for Violations (constraint\$1violations.json file)](model-monitor-interpreting-violations.md) on the baselines of your previous approved model version. You also may not want to register the newly trained model version if the model quality, model bias, or model explainability violates the registered baseline of your previous approved model version when running the `ClarifyCheck` step. In these cases, you can enable the checks you want by setting the `skip_check` property of the corresponding check step set to `False`, resulting in the `ClarifyCheck` and `QualityCheck` step failing if violation is detected against previous baselines. The pipeline process then does not proceed so that the model drifted from the baseline isn't registered. `ClarifyCheck` and `QualityCheck` steps are able to get `DriftCheckBaselines` of the latest approved model version of a given model package group against which to compare. Previous baselines can also be supplied directly through `supplied_baseline_constraints` (in addition to `supplied_baseline_statistics` if it is a `QualityCheck` step) and are always prioritized over any baselines pulled from the model package group. 

## Baseline and model version lifecycle and evolution with Pipelines
<a name="pipelines-quality-clarify-baseline-evolution"></a>

By setting `register_new_baseline` of your `ClarifyCheck` and `QualityCheck` step to `False`, your previous baseline is accessible through the step property prefix `BaselineUsedForDriftCheck`. You can then register these baselines as the `DriftCheckBaselines` in the new model version when you register a model with [Model step](build-and-manage-steps-types.md#step-type-model). Once you approve this new model version in the model registry, the `DriftCheckBaseline` in this model version becomes available for the `ClarifyCheck` and `QualityCheck` steps in the next pipeline process. If you want to refresh the baseline of a certain check type for future model versions, you can set `register_new_baseline` to `True` so that the properties with prefix `BaselineUsedForDriftCheck` become the newly calculated baseline. In these ways, you can preserve your preferred baselines for a model trained in the future, or refresh the baselines for drift checks when needed, managing your baseline evolution and lifecycle throughout your model training iterations. 

The following diagram illustrates a model-version-centric view of the baseline evolution and lifecycle.

![\[A model-version-centric view of the baseline evolution and lifecycle.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipelines/Baseline-Lifecycle.png)
