

# SQL analysis templates
<a name="sql-analysis-templates"></a>

SQL analysis templates enable you to query and analyze data across different datasets within a collaboration. You can use these templates to perform various types of analysis, such as identifying audience overlaps and calculating aggregated metrics.

With SQL analysis templates, you can:
+ Write standard SQL queries
+ Add parameters to make your queries dynamic
+ Control access to specific columns and tables
+ Set aggregation requirements for sensitive data
+ Define input data for the generation of privacy-enhanced synthetic datasets for custom machine learning (ML) models

**Topics**
+ [Creating a SQL analysis template](create-sql-analysis-template.md)
+ [Reviewing a SQL analysis template](review-analysis-template.md)

# Creating a SQL analysis template
<a name="create-sql-analysis-template"></a>

**Prerequisites**

 Before you create a SQL analysis template, you must have:
+ An active AWS Clean Rooms collaboration
+ Access to at least one configured table in the collaboration

  For information about configuring tables in AWS Clean Rooms, see [Creating a configured table in AWS Clean Rooms](create-configured-table.md).
+ Permissions to create analysis templates
+ Basic knowledge of SQL query syntax

The following procedure describes the process of creating a SQL analysis template using the [AWS Clean Rooms console](https://console.aws.amazon.com/cleanrooms/home).

For information about how to create a SQL analysis template using the AWS SDKs, see the [AWS Clean Rooms API Reference](https://docs.aws.amazon.com/clean-rooms/latest/apireference/Welcome.html).

**To create a SQL analysis template**

1. Sign in to the AWS Management Console and open the [AWS Clean Rooms console](https://console.aws.amazon.com/cleanrooms/home) with the AWS account that will function as the collaboration creator.

1. In the left navigation pane, choose **Collaborations**.

1. Choose the collaboration.

1. On the **Templates** tab, go to the **Analysis templates created by you** section.

1. Choose **Create analysis template**.

1. On the **Create analysis template** page, for **Details**, 

   1. Enter a **Name** for the analysis template.

   1. (Optional) Enter a **Description**.

   1. For **Format**, leave the **SQL** option selected.

1. For **Tables**, view the configured tables associated with the collaboration.

1. For **Definition**,

   1. Enter the definition for the analysis template.

   1. Choose **Import from** to import a definition.

   1. (*Optional*) Specify a parameter in the SQL editor by entering a colon (`:`) in front of the parameter name.

      For example: 

      `WHERE table1.date + :date_period > table1.date`

1. If you added parameters previously, under **Parameters – optional**, for each **Parameter name**, choose the **Type** and **Default value** (optional).

1. For **Synthetic data**, if you want to generate synthetic data for model training, select the **Require analysis template output to be synthetic** checkbox.

   For more information, see [Privacy-enhanced synthetic dataset generation](synthetic-data-generation.md).

   1. For **Column classification**, choose a **Column** from the dropdown list. At least five columns are required.

      1. Choose a **Classification** from the dropdown list. This identifies the data type for each column.

         Classification types include:
         + **Numerical** – Continuous numerical values such as measurements or counts
         + **Categorical** – Discrete values or categories such as labels or types

      1. To remove a column, select **Remove**.

      1. To add another column, select **Add another column**. Choose the **Column** and **Classification** from the dropdown lists.

      1. For **Predictive value**, choose a **Column** from the dropdown list. This is the column the custom model uses for prediction after it's trained on the synthetic dataset.

   1. **Advanced settings** allow you to set the **Privacy level** and **Privacy threshold**. Adjust the settings to fit your needs.

      1. For **Privacy level**, enter an epsilon value to determine how much noise the synthetic model adds to protect privacy in your generated dataset. The value must be between 0.0001 and 10.
        + Lower values add more noise, providing stronger privacy protection but potentially reducing utility for downstream custom model trained on this data.
        + Higher values add less noise, providing more accuracy but potentially reducing privacy protection.

        For **Privacy threshold**, enter the highest allowed probability that a membership inference attack could identify members of the original dataset. The value must be between 50.0 and 100.
        + Scores of 50% indicate that a membership inference attack can't successfully distinguish members from non-members better than a random guess.
        + For no privacy limit, enter 100%.

        The optimal value depends on your specific use case and privacy requirements. If the privacy threshold is exceeded, the ML input channel creation fails, and you can't use the synthetic dataset to train a model.
**Warning**  
Synthetic data generation protects against inferring individual attributes whether specific individuals are present in the original dataset or learning attributes of those individuals are present. However, it doesn't prevent literal values from the original dataset, including personally identifiable information (PII) from appearing in the synthetic dataset.  
We recommend avoiding values in the input dataset that are associated with only one data subject because these may re-identify a data subject. For example, if only one user lives in a zip code, the presence of that zip code in the synthetic dataset would confirm that user was in the original dataset. Techniques like truncating high precision values or replacing uncommon catalogues with *other* can be used to mitigate this risk. These transformations can be part of the query used to create the ML input channel.

1. If you want to enable **Tags** for the resource, choose **Add new tag** and then enter the **Key** and **Value** pair.

1. Choose **Create**.

1. You are now ready to inform your collaboration member that they can [Review an analysis template](review-analysis-template.md). (Optional if you want to query your own data.)

# Reviewing a SQL analysis template
<a name="review-analysis-template"></a>

After a collaboration member has created a SQLanalysis template, you can review and approve it. After the analysis template and approved, it can be used in a query in AWS Clean Rooms.

**Note**  
When you bring your analysis code into a collaboration, be aware of the following:   
AWS Clean Rooms does not validate or guarantee the behavior of the analysis code.   
If you need to ensure certain behavior, review the code of your collaboration partner directly or work with a trusted third-party auditor to review it.
In the shared security model:  
You (the customer) are responsible for the security of the code running in the environment.
AWS Clean Rooms is responsible for the security of the environment, ensuring that  
only the approved code runs 
only specified configured tables are accessible 
the only output destination is the result receiver's S3 bucket.

**To review a SQL analysis template using the AWS Clean Rooms console**

1. Sign in to the AWS Management Console and open the [AWS Clean Rooms console](https://console.aws.amazon.com/cleanrooms/home) with the AWS account that will function as the collaboration creator.

1. In the left navigation pane, choose **Collaborations**.

1. Choose the collaboration.

1. On the **Templates** tab, go to the **Analysis templates created by other members** section.

1. Choose the analysis template that has the **Can run status** of **No requires your review**.

1. Choose **Review**.

1. Review the analysis rule **Overview**, **Definition**, and **Parameters** (if any). 

1. Review the configured tables listed under **Tables referenced in definition**. 

   The **Status** next to each table will read **Template not allowed**.

1. Choose a table.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/clean-rooms/latest/userguide/review-analysis-template.html)

You are now ready to query the configured table using a SQL analysis template. For more information, see [Running SQL queries](running-sql-queries.md).