Running SQL queries - AWS Clean Rooms

Running SQL queries

Note

You can only run queries if the member who is responsible to pay for query compute costs has joined the collaboration as an active member.

As the member who can query, you can run a SQL query by:

  • Building a SQL query manually using the SQL code editor.

  • Using an approved SQL analysis template.

  • Using the Analysis builder UI to build a query without having to write SQL code.

When the member who can query runs a SQL query on the tables in the collaboration, AWS Clean Rooms assumes the relevant roles to access the tables on their behalf. AWS Clean Rooms applies the analysis rules as necessary to the input query and its output.

The analysis rules and output constraints are enforced automatically. AWS Clean Rooms only returns the results that comply with the defined analysis rules.

AWS Clean Rooms supports SQL queries that can be different than other query engines. For specifications, see the AWS Clean Rooms SQL Reference. If you want to run queries on data tables protected with differential privacy, you should ensure that your queries are compatible with the general-purpose query structure of AWS Clean Rooms Differential Privacy.

Note

When using Cryptographic Computing for Clean Rooms, not all SQL operations generate valid results. For example, you can conduct a COUNT on an encrypted column but conducting a SUM on encrypted numbers leads to errors. In addition, queries might also yield incorrect results. For example, queries that SUM sealed columns produce errors. However, a GROUP BY query over sealed columns seems to succeed but produces different groups than those produced by a GROUP BY query over the cleartext.

The member paying for query compute costs is charged for the queries run in the collaboration.

The member who can query can select multiple members who can receive results to receive the results from a single query. For more information, see Querying configured tables using the SQL code editor. For general information about receiving query results, see Receiving and using analysis results.

Prerequisites

Before you run a SQL query, make sure that you have the following:

  • An active membership in AWS Clean Rooms collaboration

  • Access to at least one configured table in the collaboration

  • Confirmation that the member responsible for query compute costs is an active collaboration member

For information about how to query data or view queries by calling the AWS Clean Rooms StartProtectedQuery API operation directly or by using the AWS SDKs, see the AWS Clean Rooms API Reference.

For information about query logging, see Analysis logging in AWS Clean Rooms.

Note

If you run a query on encrypted data tables, the results from the encrypted columns are encrypted.

Spark properties configuration for SQL queries

AWS Clean Rooms enables you to optionally customize Spark runtime behavior by configuring supported Spark properties for SQL queries when using the Spark analytics engine. This feature is only available for analyses using the Spark analytics engine in AWS Clean Rooms, not for the AWS Clean Rooms analytics engine. These properties let you fine-tune performance, memory usage, and query execution parameters. With this feature, you have greater control over how your Spark-based queries are processed, allowing for optimization based on your specific workload requirements.

You can now adjust settings such as shuffle partitions, broadcast join thresholds, and adaptive query execution parameters directly from the AWS Clean Rooms console for Spark analytics engine analyses. This feature is particularly useful for complex queries or large datasets where default configurations may not be optimal. By fine-tuning these Spark properties, you can potentially improve query performance, reduce resource consumption, and better manage memory usage for your Spark-based collaboration analyses.

To leverage this feature, you'll find a new Spark properties section in the query interface for Spark analytics engine analyses. You can select from a list of supported properties and specify custom values. You can also configure Spark properties programmatically using the StartProtectedQuery API. This advanced configuration option empowers data analysts and engineers to optimize their analysis using the Spark analytics engine for enhanced efficiency and scalability.

For more information about Spark properties, including default values, see Spark Properties in the Apache Spark documentation.

The following topics explain how to query data in a collaboration using the AWS Clean Rooms console.