Partitioning hints - AWS Clean Rooms

Partitioning hints

Partitioning hints control data distribution across executor nodes. When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.

COALESCE

Reduces the number of partitions to the specified number of partitions.

Parameters: Numeric value (required) - must be a positive integer between 1 and 2147483647

Examples:

-- Reduce to 5 partitions SELECT /*+ COALESCE(5) */ employee_id, salary FROM employees;

REPARTITION

Repartitions data to the specified number of partitions using the specified partitioning expressions. Uses round-robin distribution.

Parameters:

  • Numeric value (optional) - number of partitions; Must be a positive integer between 1 and 2147483647

  • Column identifiers (optional) - columns to partition by; These columns must exist in the input schema.

  • If both are specified, numeric value must come first

Examples:

-- Repartition to 10 partitions SELECT /*+ REPARTITION(10) */ * FROM employees; -- Repartition by column SELECT /*+ REPARTITION(department) */ * FROM employees; -- Repartition to 8 partitions by department SELECT /*+ REPARTITION(8, department) */ * FROM employees; -- Repartition by multiple columns SELECT /*+ REPARTITION(8, department, location) */ * FROM employees;

REPARTITION_BY_RANGE

Repartitions data to the specified number of partitions using range partitioning on the specified columns.

Parameters:

  • Numeric value (optional) - number of partitions; Must be a positive integer between 1 and 2147483647

  • Column identifiers (optional) - columns to partition by; These columns must exist in the input schema.

  • If both are specified, numeric value must come first

Examples:

SELECT /*+ REPARTITION_BY_RANGE(10) */ * FROM employees; -- Repartition by range on age column SELECT /*+ REPARTITION_BY_RANGE(age) */ * FROM employees; -- Repartition to 5 partitions by range on age SELECT /*+ REPARTITION_BY_RANGE(5, age) */ * FROM employees; -- Repartition by range on multiple columns SELECT /*+ REPARTITION_BY_RANGE(5, age, salary) */ * FROM employees;

REBALANCE

Rebalances the query result output partitions so that every partition is of a reasonable size (not too small and not too big). This is a best-effort operation: if there are skews, AWS Clean Rooms will split the skewed partitions to make them not too big. This hint is useful when you need to write the result of a query to a table to avoid too small or too big files.

Parameters:

  • Numeric value (optional) - number of partitions; Must be a positive integer between 1 and 2147483647

  • Column identifiers (optional) - columns must appear in the SELECT output list

  • If both are specified, numeric value must come first

Examples:

-- Rebalance to 10 partitions SELECT /*+ REBALANCE(10) */ employee_id, name FROM employees; -- Rebalance by specific columns in output SELECT /*+ REBALANCE(employee_id, name) */ employee_id, name FROM employees; -- Rebalance to 8 partitions by specific columns SELECT /*+ REBALANCE(8, employee_id, name) */ employee_id, name, department FROM employees;