

 Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# Amazon Redshift best practices for designing tables
Best practices for designing tables

As you plan your database, certain key table design decisions heavily influence overall query performance. These design choices also have a significant effect on storage requirements, which in turn affects query performance by reducing the number of I/O operations and minimizing the memory required to process queries.

In this section, you can find a summary of the most important design decisions and best practices for optimizing query performance. [Automatic table optimization](t_Creating_tables.md) provides more detailed explanations and examples of table design options.

**Topics**
+ [

# Choose the best sort key
](c_best-practices-sort-key.md)
+ [

# Choose the best distribution style
](c_best-practices-best-dist-key.md)
+ [

# Let COPY choose compression encodings
](c_best-practices-use-auto-compression.md)
+ [

# Define primary key and foreign key constraints
](c_best-practices-defining-constraints.md)
+ [

# Use the smallest possible column size
](c_best-practices-smallest-column-size.md)
+ [

# Use date/time data types for date columns
](c_best-practices-timestamp-date-columns.md)

# Choose the best sort key
Choose the best sort key

Amazon Redshift stores your data on disk in sorted order according to the sort key. The Amazon Redshift query optimizer uses sort order when it determines optimal query plans. 

**Note**  
When you use automatic table optimization, you don't need to choose the sort key of your table. For more information, see [Automatic table optimization](t_Creating_tables.md).

Some suggestions for the best approach follow:
+ To have Amazon Redshift choose the appropriate sort order, specify `AUTO` for the sort key. 
+ If recent data is queried most frequently, specify the timestamp column as the leading column for the sort key. 

  Queries are more efficient because they can skip entire blocks that fall outside the time range.
+ If you do frequent range filtering or equality filtering on one column, specify that column as the sort key. 

   Amazon Redshift can skip reading entire blocks of data for that column. It can do so because it tracks the minimum and maximum column values stored on each block and can skip blocks that don't apply to the predicate range.
+ If you frequently join a table, specify the join column as both the sort key and the distribution key. 

  Doing this enables the query optimizer to choose a sort merge join instead of a slower hash join. Because the data is already sorted on the join key, the query optimizer can bypass the sort phase of the sort merge join.

# Choose the best distribution style
Choose the best distribution style

When you run a query, the query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. The goal in selecting a table distribution style is to minimize the impact of the redistribution step by locating the data where it needs to be before the query is run. 

**Note**  
When you use automatic table optimization, you don't need to choose the distribution style of your table. For more information, see [Automatic table optimization](t_Creating_tables.md).

Some suggestions for the best approach follow:

1. Distribute the fact table and one dimension table on their common columns.

   Your fact table can have only one distribution key. Any tables that join on another key aren't collocated with the fact table. Choose one dimension to collocate based on how frequently it is joined and the size of the joining rows. Designate both the dimension table's primary key and the fact table's corresponding foreign key as the DISTKEY. 

1. Choose the largest dimension based on the size of the filtered dataset. 

   Only the rows that are used in the join must be distributed, so consider the size of the dataset after filtering, not the size of the table. 

1. Choose a column with high cardinality in the filtered result set. 

   If you distribute a sales table on a date column, for example, you should probably get fairly even data distribution, unless most of your sales are seasonal. However, if you commonly use a range-restricted predicate to filter for a narrow date period, most of the filtered rows occur on a limited set of slices and the query workload is skewed. 

1. Change some dimension tables to use ALL distribution.

   If a dimension table cannot be collocated with the fact table or other important joining tables, you can improve query performance significantly by distributing the entire table to all of the nodes. Using ALL distribution multiplies storage space requirements and increases load times and maintenance operations, so you should weigh all factors before choosing ALL distribution.

To have Amazon Redshift choose the appropriate distribution style, specify `AUTO` for the distribution style. 

For more information about choosing distribution styles, see [Data distribution for query optimization](t_Distributing_data.md).

# Let COPY choose compression encodings
Use automatic compression

You can specify compression encodings when you create a table, but in most cases, automatic compression produces the best results.

ENCODE AUTO is the default for tables. When a table is set to ENCODE AUTO, Amazon Redshift automatically manages compression encoding for all columns in the table. For more information, see [CREATE TABLE](r_CREATE_TABLE_NEW.md) and [ALTER TABLE](r_ALTER_TABLE.md).

The COPY command analyzes your data and applies compression encodings to an empty table automatically as part of the load operation. 

Automatic compression balances overall performance when choosing compression encodings. Range-restricted scans might perform poorly if sort key columns are compressed much more highly than other columns in the same query. As a result, automatic compression chooses a less efficient compression encoding to keep the sort key columns balanced with other columns.

Suppose that your table's sort key is a date or timestamp and the table uses many large varchar columns. In this case, you might get better performance by not compressing the sort key column at all. Run the [ANALYZE COMPRESSION](r_ANALYZE_COMPRESSION.md) command on the table, then use the encodings to create a new table, but leave out the compression encoding for the sort key.

There is a performance cost for automatic compression encoding, but only if the table is empty and does not already have compression encoding. For short-lived tables and tables that you create frequently, such as staging tables, load the table once with automatic compression or run the ANALYZE COMPRESSION command. Then use those encodings to create new tables. You can add the encodings to the CREATE TABLE statement, or use CREATE TABLE LIKE to create a new table with the same encoding. 

For more information, see [Loading tables with automatic compression](c_Loading_tables_auto_compress.md).

# Define primary key and foreign key constraints
Define constraints

Define primary key and foreign key constraints between tables wherever appropriate. Even though they are informational only, the query optimizer uses those constraints to generate more efficient query plans.

Do not define primary key and foreign key constraints unless your application enforces the constraints. Amazon Redshift does not enforce unique, primary-key, and foreign-key constraints. 

See [Table constraints](t_Defining_constraints.md) for additional information about how Amazon Redshift uses constraints.

# Use the smallest possible column size
Use the smallest possible column size

Don't make it a practice to use the maximum column size for convenience. 

Instead, consider the largest values you are likely to store in your columns and size them accordingly. For instance, a CHAR column for storing U.S. state and territory abbreviations used by the post office only needs to be CHAR(2).

# Use date/time data types for date columns
Use date/time data types for date columns

Amazon Redshift stores DATE and TIMESTAMP data more efficiently than CHAR or VARCHAR, which results in better query performance. Use the DATE or TIMESTAMP data type, depending on the resolution you need, rather than a character type when storing date/time information. For more information, see [Datetime types](r_Datetime_types.md).