

# Evaluate data quality transform
<a name="evaluate-data-quality-transform"></a>

The Evaluate data quality transform validates data against a set of rules as it flows through your Visual ETL job. You define rules using DQDL (Data Quality Definition Language), a domain-specific language for defining data quality rules, with 31 built-in rule types.

**To add an Evaluate Data Quality transform**

1. On the Visual ETL canvas, choose the plus icon to open the **Add nodes** panel.

1. Under the **Transforms** tab, search for **Evaluate Data Quality**.

1. Select the node to add it to the canvas, then connect it to your source.  
![The Amazon SageMaker Unified Studio UI showing the Evaluate Data Quality node added to the Visual ETL canvas.](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-evaluate-dq-add-node.png)

**To configure the rule set**

1. Select the node to open the configuration panel.

1. For **Ruleset name** (optional), customize the name for evaluation context of this node. The name defaults to the node name. If your job has multiple Evaluate Data Quality nodes, give each a unique name so you can identify results later.

1. Expand **Rule Types Reference** to see the 31 rules available. For details on each rule type and syntax, see [DQDL rule types](https://docs.aws.amazon.com/glue/latest/dg/dqdl-rule-types.html) in the AWS Glue documentation.

1. For **Ruleset**, define rules using DQDL syntax. The editor provides autocomplete for rule types and column names from the input schema. Use Ctrl\+Tab to trigger column suggestions.  
![The Amazon SageMaker Unified Studio UI showing the ruleset configuration for the Evaluate Data Quality transform.](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-evaluate-dq-ruleset.png)

The following example shows a rule set with two rules:

```
Rules = [ColumnExists "phone", ColumnLength "account_length" > 10]
```

**Note**  
The rule set cannot be empty. You must define at least one rule.

Select an output to add it as a child node that downstream transforms can read from. You can add multiple output nodes and route them independently. The following table describes the available outputs.


| \# | Output | Description | 
| --- | --- | --- | 
| 1 | Original data | Outputs original data. This option is ideal if you want to stop the job when quality issues are detected. | 
| 2 | Evaluation results | Outputs configured rules and their pass or fail status. This option is useful if you want to take a custom action on the results. | 
| 3 | Row level results | Outputs original data with additional columns depicting the rule result for each row. This option is best for row-specific manipulation based on result. | 
| 4 | Row level results - Failed rows | Outputs original data with additional columns depicting the rule result for each row, filtering for only rows that failed the data quality evaluation checks. | 
| 5 | Row level results - Passed rows | Outputs original data with additional columns depicting the rule result for each row, filtering for only rows that passed the data quality evaluation checks. | 

## Additional options
<a name="evaluate-data-quality-additional-options"></a>

The Evaluate Data Quality node includes options for publishing results and controlling job behavior on failure.


| \# | Option | Description | 
| --- | --- | --- | 
| 1 | Publish results to Amazon CloudWatch | Send evaluation metrics to CloudWatch for monitoring and alerting. | 
| 2 | Publish data quality evaluation results to S3 | Write detailed results to an S3 folder. Choose Browse S3 to select the target location. | 
| 3 | Stop job on rule set failure | Halt the job if any rule fails, preventing bad data from flowing downstream. | 

## Viewing results after job runs
<a name="evaluate-data-quality-viewing-results"></a>

After the job completes, results are available on the **Data quality** tab of the data processing job detail page. For details, see [Monitor data quality in data processing jobs](sagemaker-data-quality-monitoring.md).