Join hints
Join hints suggest join strategies for query execution. The syntax, arguments, and some examples come from the Apache Spark SQL Reference
BROADCAST
Suggests that AWS Clean Rooms use broadcast join. The join side with the hint will be broadcast regardless of autoBroadcastJoinThreshold. If both sides of the join have the broadcast hints, the one with the smaller size (based on stats) will be broadcast.
Aliases: BROADCASTJOIN, MAPJOIN
Parameters: Table identifiers (optional)
Examples:
-- Broadcast a specific table SELECT /*+ BROADCAST(students) */ e.name, s.course FROM employees e JOIN students s ON e.id = s.id; -- Broadcast multiple tables SELECT /*+ BROADCASTJOIN(s, d) */ * FROM employees e JOIN students s ON e.id = s.id JOIN departments d ON e.dept_id = d.id;
MERGE
Suggests that AWS Clean Rooms use shuffle sort merge join.
Aliases: SHUFFLE_MERGE, MERGEJOIN
Parameters: Table identifiers (optional)
Examples:
-- Use merge join for a specific table SELECT /*+ MERGE(employees) */ * FROM employees e JOIN students s ON e.id = s.id; -- Use merge join for multiple tables SELECT /*+ MERGEJOIN(e, s, d) */ * FROM employees e JOIN students s ON e.id = s.id JOIN departments d ON e.dept_id = d.id;
SHUFFLE_HASH
Suggests that AWS Clean Rooms use shuffle hash join. If both sides have the shuffle hash hints, the query optimizer chooses the smaller side (based on stats) as the build side.
Parameters: Table identifiers (optional)
Examples:
-- Use shuffle hash join SELECT /*+ SHUFFLE_HASH(students) */ * FROM employees e JOIN students s ON e.id = s.id;
SHUFFLE_REPLICATE_NL
Suggests that AWS Clean Rooms use shuffle-and-replicate nested loop join.
Parameters: Table identifiers (optional)
Examples:
-- Use shuffle-replicate nested loop join SELECT /*+ SHUFFLE_REPLICATE_NL(students) */ * FROM employees e JOIN students s ON e.id = s.id;
Troubleshooting Hints in Spark SQL
The following table shows common scenarios where hints are not applied in SparkSQL. For additional information, see Considerations and limitations.
| Use Case | Example Query |
|---|---|
| Table reference not found |
|
| Table not participating in join operation |
|
| Table reference in nested subquery |
|
| Column name instead of table reference |
|
| Hint without required parameters |
|
| Base table name instead of table alias |
|