# Use partition projection with Amazon Athena
Partition projection

You can use partition projection in Athena to speed up query processing of highly partitioned tables and automate partition management.

In partition projection, Athena calculates partition values and locations using the table properties that you configure directly on your table in AWS Glue. The table properties allow Athena to 'project', or determine, the necessary partition information instead of having to do a more time-consuming metadata lookup in the AWS Glue Data Catalog. Because in-memory operations are often faster than remote operations, partition projection can reduce the runtime of queries against highly partitioned tables. Depending on the specific characteristics of the query and underlying data, partition projection can significantly reduce query runtime for queries that are constrained on partition metadata retrieval.

## Understand partition pruning vs. partition projection


Partition pruning gathers metadata and "prunes" it to only the partitions that apply to your query. This often speeds up queries. Athena uses partition pruning for all tables with partition columns, including those tables configured for partition projection.

Normally, when processing queries, Athena makes a `GetPartitions` call to the AWS Glue Data Catalog before performing partition pruning. If a table has a large number of partitions, using `GetPartitions` can affect performance negatively. To avoid this, you can use partition projection. Partition projection allows Athena to avoid calling `GetPartitions` because the partition projection configuration gives Athena all of the necessary information to build the partitions itself.

## How to use partition projection


To use partition projection, you specify the ranges of partition values and projection types for each partition column in the table properties in the AWS Glue Data Catalog or in your [external Hive metastore](connect-to-data-source-hive.md). These custom properties on the table allow Athena to know what partition patterns to expect when it runs a query on the table. During query execution, Athena uses this information to project the partition values instead of retrieving them from the AWS Glue Data Catalog or external Hive metastore. This not only reduces query execution time but also automates partition management because it removes the need to manually create partitions in Athena, AWS Glue, or your external Hive metastore.

**Important**  
Enabling partition projection on a table causes Athena to ignore any partition metadata registered to the table in the AWS Glue Data Catalog or Hive metastore.

## Some use cases


Scenarios in which partition projection is useful include the following:
+ Queries against a highly partitioned table do not complete as quickly as you would like.
+ You regularly add partitions to tables as new date or time partitions are created in your data. With partition projection, you configure relative date ranges that can be used as new data arrives. 
+ You have highly partitioned data in Amazon S3. The data is impractical to model in your AWS Glue Data Catalog or Hive metastore, and your queries read only small parts of it.

### Projectable partition structures


Partition projection is most easily configured when your partitions follow a predictable pattern such as, but not limited to, the following:
+ **Integers** – Any continuous sequence of integers such as `[1, 2, 3, 4, ..., 1000]` or `[0500, 0550, 0600, ..., 2500]`.
+ **Dates** – Any continuous sequence of dates or datetimes such as `[20200101, 20200102, ..., 20201231]` or `[1-1-2020 00:00:00, 1-1-2020 01:00:00, ..., 12-31-2020 23:00:00]`.
+ **Enumerated values** – A finite set of enumerated values such as airport codes or AWS Regions.
+ **AWS service logs** – AWS service logs typically have a known structure whose partition scheme you can specify in AWS Glue and that Athena can therefore use for partition projection.

### How to customize the partition path template


By default, Athena builds partition locations using the form `s3://amzn-s3-demo-bucket/<table-root>/partition-col-1=<partition-col-1-val>/partition-col-2=<partition-col-2-val>/`, but if your data is organized differently, Athena offers a mechanism for customizing this path template. For steps, see [How to specify custom S3 storage locations](partition-projection-setting-up.md#partition-projection-specifying-custom-s3-storage-locations).

## Considerations and limitations


The following considerations apply:
+ Partition projection eliminates the need to specify partitions manually in AWS Glue or an external Hive metastore.
+ When you enable partition projection on a table, Athena ignores any partition metadata in the AWS Glue Data Catalog or external Hive metastore for that table.
+ If a projected partition does not exist in Amazon S3, Athena will still project the partition. Athena does not throw an error, but no data is returned. However, if too many of your partitions are empty, performance can be slower compared to traditional AWS Glue partitions. If more than half of your projected partitions are empty, it is recommended that you use traditional partitions.
+ Queries for values that are beyond the range bounds defined for partition projection do not return an error. Instead, the query runs, but returns zero rows. For example, if you have time-related data that starts in 2020 and is defined as `'projection.timestamp.range'='2020/01/01,NOW'`, a query like `SELECT * FROM table-name WHERE timestamp = '2019/02/02'` will complete successfully, but return zero rows.
+ Partition projection is usable only when the table is queried through Athena. If the same table is read through another service such as Amazon Redshift Spectrum, Athena for Spark, or Amazon EMR, the standard partition metadata is used.
+ Because partition projection is a DML-only feature, `SHOW PARTITIONS` does not list partitions that are projected by Athena but not registered in the AWS Glue catalog or external Hive metastore. 
+ Athena does not use the table properties of views as configuration for partition projection. To work around this limitation, configure and enable partition projection in the table properties for the tables that the views reference.

## Video


The following video shows how to use partition projection to improve the performance of your queries in Athena.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/iUD5pPpcyZk/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/iUD5pPpcyZk)


**Topics**
+ [

## Understand partition pruning vs. partition projection
](#partition-projection-pruning-vs-projection)
+ [

## How to use partition projection
](#partition-projection-using)
+ [

## Some use cases
](#partition-projection-use-cases)
+ [

## Considerations and limitations
](#partition-projection-considerations-and-limitations)
+ [

## Video
](#partition-projection-video)
+ [

# Set up partition projection
](partition-projection-setting-up.md)
+ [

# Supported types for partition projection
](partition-projection-supported-types.md)
+ [

# Use dynamic ID partitioning
](partition-projection-dynamic-id-partitioning.md)
+ [

# Amazon Data Firehose example
](partition-projection-kinesis-firehose-example.md)

# Set up partition projection


Setting up partition projection in a table's properties is a two-step process:

1. Specify the data ranges and relevant patterns for each partition column, or use a custom template.

1. Enable partition projection for the table.

**Note**  
Before you add partition projection properties to an existing table, the partition column for which you are setting up partition projection properties must already exist in the table schema. If the partition column does not yet exist, you must add a partition column to the existing table manually. AWS Glue does not perform this step for you automatically. 

This section shows how to set the table properties for AWS Glue. To set them, you can use the AWS Glue console, Athena [CREATE TABLE](create-table.md) queries, or [AWS Glue API](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html) operations. The following procedure shows how to set the properties in the AWS Glue console.

**To configure and enable partition projection using the AWS Glue console**

1. Sign in to the AWS Management Console and open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. Choose the **Tables** tab.

   On the **Tables** tab, you can edit existing tables, or choose **Add tables** to create new ones. For information about adding tables manually or with a crawler, see [Working with tables on the AWS Glue console](https://docs.aws.amazon.com/glue/latest/dg/console-tables.html) in the *AWS Glue Developer Guide*.

1. In the list of tables, choose the link for the table that you want to edit.  
![\[In the AWS Glue console, choose a table to edit.\]](http://docs.aws.amazon.com/athena/latest/ug/images/partition-projection-1.png)

1. Choose **Actions**, **Edit table**.

1. On the **Edit table** page, in the **Table properties** section, for each partitioned column, add the following key-value pair:

   1. For **Key**, add `projection.columnName.type`.

   1. For **Value**, add one of the supported types: `enum`, `integer`, `date`, or `injected`. For more information, see [Supported types for partition projection](partition-projection-supported-types.md).

1. Following the guidance in [Supported types for partition projection](partition-projection-supported-types.md), add additional key-value pairs according to your configuration requirements.

   The following example table configuration configures the `year` column for partition projection, restricting the values that can be returned to a range from 2010 through 2016.  
![\[Configuring partition projection for a partition column in the AWS Glue console table properties.\]](http://docs.aws.amazon.com/athena/latest/ug/images/partition-projection-3.png)

1. Add a key-value pair to enable partition projection. For **Key**, enter `projection.enabled`, and for its **Value**, enter `true`.
**Note**  
You can disable partition projection on this table at any time by setting `projection.enabled` to `false`.

1. When you are finished, choose **Save**.

1. In the Athena Query Editor, test query the columns that you configured for the table.

   The following example query uses `SELECT DISTINCT` to return the unique values from the `year` column. The database contains data from 1987 to 2016, but the `projection.year.range` property restricts the values returned to the years 2010 to 2016.  
![\[Querying a column that uses partition projection.\]](http://docs.aws.amazon.com/athena/latest/ug/images/partition-projection-5.png)
**Note**  
If you set `projection.enabled` to `true` but fail to configure one or more partition columns, you receive an error message like the following:  
`HIVE_METASTORE_ERROR: Table database_name.table_name is configured for partition projection, but the following partition columns are missing projection configuration: [column_name] (table database_name.table_name)`.

## How to specify custom S3 storage locations


When you edit table properties in AWS Glue, you can also specify a custom Amazon S3 path template for the projected partitions. A custom template enables Athena to properly map partition values to custom Amazon S3 file locations that do not follow a typical `.../column=value/...` pattern. 

Using a custom template is optional. However, if you use a custom template, the template must contain a placeholder for each partition column. Templated locations must end with a forward slash so that the partitioned data files live in a "folder" per partition.

**To specify a custom partition location template**

1. Following the steps to [configure and enable partition projection using the AWS Glue console](#partition-projection-setting-up-procedure), add an additional a key-value pair that specifies a custom template as follows:

   1. For **Key**, enter `storage.location.template`.

   1. For **Value**, specify a location that includes a placeholder for every partition column. Make sure that each placeholder (and the S3 path itself) is terminated by a single forward slash.

      The following example template values assume a table with partition columns `a`, `b`, and `c`.

      ```
      s3://amzn-s3-demo-bucket/table_root/a=${a}/${b}/some_static_subdirectory/${c}/      
      ```

      ```
      s3://amzn-s3-demo-bucket/table_root/c=${c}/${b}/some_static_subdirectory/${a}/${b}/${c}/${c}/      
      ```

      For the same table, the following example template value is invalid because it contains no placeholder for column `c`.

      ```
      s3://amzn-s3-demo-bucket/table_root/a=${a}/${b}/some_static_subdirectory/         
      ```

1. Choose **Apply**.

# Supported types for partition projection


A table can have any combination of `enum`, `integer`, `date,` or `injected` partition column types.

## Enum type


Use the `enum` type for partition columns whose values are members of an enumerated set (for example, airport codes or AWS Regions).

Define the partition properties in the table as follows:


****  

| Property name | Example values | Description | 
| --- | --- | --- | 
| projection.columnName.type |  `enum`  | Required. The projection type to use for column columnName. The value must be enum (case insensitive) to signal the use of the enum type. Leading and trailing white space is allowed. | 
| projection.columnName.values |  `A,B,C,D,E,F,G,Unknown`  | Required. A comma-separated list of enumerated partition values for column columnName. Any white space is considered part of an enum value. | 

**Note**  
As a best practice we recommend limiting the use of `enum` based partition projections to a few dozen or less. Although there is no specific limit for `enum` projections, the total size of your table's metadata cannot exceed the AWS Glue limit of about 1 MB when gzip compressed. Note that this limit is shared across key parts of your table like column names, location, storage format, and others. If you find yourself using more than a few dozen unique IDs in your `enum` projection, consider an alternative approach such as bucketing into a smaller number of unique values in a surrogate field. By trading off cardinality, you can control the number of unique values in your `enum` field. 

## Integer type


Use the integer type for partition columns whose possible values are interpretable as integers within a defined range. Projected integer columns are currently limited to the range of a Java signed long (-263 to 263-1 inclusive).


****  

| Property name | Example values | Description | 
| --- | --- | --- | 
| projection.columnName.type |  `integer`  | Required. The projection type to use for column columnName. The value must be integer (case insensitive) to signal the use of the integer type. Leading and trailing white space is allowed. | 
| projection.columnName.range |  `0,10` `-1,8675309` `0001,9999`  | Required. A two-element comma-separated list that provides the minimum and maximum range values to be returned by queries on the column columnName. Note that the values must be separated by a comma, not a hyphen. These values are inclusive, can be negative, and can have leading zeroes. Leading and trailing white space is allowed. | 
| projection.columnName.interval |  `1` `5`  | Optional. A positive integer that specifies the interval between successive partition values for the column columnName. For example, a range value of "1,3" with an interval value of "1" produces the values 1, 2, and 3. The same range value with an interval value of "2" produces the values 1 and 3, skipping 2. Leading and trailing white space is allowed. The default is 1. | 
| projection.columnName.digits |  `1` `5`  | Optional. A positive integer that specifies the number of digits to include in the partition value's final representation for column columnName. For example, a range value of "1,3" that has a digits value of "1" produces the values 1, 2, and 3. The same range value with a digits value of "2" produces the values 01, 02, and 03. Leading and trailing white space is allowed. The default is no static number of digits and no leading zeroes. | 

## Date type


Use the date type for partition columns whose values are interpretable as dates (with optional times) within a defined range.

**Important**  
Projected date columns are generated in Coordinated Universal Time (UTC) at query execution time.


****  

| Property name | Example values | Description | 
| --- | --- | --- | 
| projection.columnName.type |  `date`  | Required. The projection type to use for column columnName. The value must be date (case insensitive) to signal the use of the date type. Leading and trailing white space is allowed. | 
| projection.columnName.range |  `201701,201812` `01-01-2010,12-31-2018` `NOW-3YEARS,NOW` `201801,NOW+1MONTH`  |  Required. A two-element, comma-separated list which provides the minimum and maximum `range` values for the column *columnName*. These values are inclusive and can use any format compatible with the Java `java.time.*` date types. Both the minimum and maximum values must use the same format. The format specified in the `.format` property must be the format used for these values. This column can also contain relative date strings, formatted in this regular expression pattern: `\s*NOW\s*(([\+\-])\s*([0-9]+)\s*(YEARS?\|MONTHS?\|WEEKS?\|DAYS?\|HOURS?\|MINUTES?\|SECONDS?)\s*)?` White spaces are allowed, but in date literals are considered part of the date strings themselves.  | 
| projection.columnName.format |  `yyyyMM` `dd-MM-yyyy` `dd-MM-yyyy-HH-mm-ss`  | Required. A date format string based on the Java date format [DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). Can be any supported Java.time.\$1 type. | 
| projection.columnName.interval |  `1` `5`  |  A positive integer that specifies the interval between successive partition values for column *columnName*. For example, a `range` value of `2017-01,2018-12` with an `interval` value of `1` and an `interval.unit` value of `MONTHS` produces the values 2017-01, 2017-02, 2017-03, and so on. The same `range` value with an `interval` value of `2` and an `interval.unit` value of `MONTHS` produces the values 2017-01, 2017-03, 2017-05, and so on. Leading and trailing white space is allowed. When the provided dates are at single-day or single-month precision, the `interval` is optional and defaults to 1 day or 1 month, respectively. Otherwise, `interval` is required.  | 
| projection.columnName.interval.unit |  `YEARS` `MONTHS` `WEEKS` `DAYS` `HOURS` `MINUTES` `SECONDS` `MILLIS`  |  A time unit word that represents the serialized form of a [ChronoUnit](https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoUnit.html). Possible values are `YEARS`, `MONTHS`, `WEEKS`, `DAYS`, `HOURS`, `MINUTES`, `SECONDS`, or `MILLIS`. These values are case insensitive. When the provided dates are at single-day or single-month precision, the `interval.unit` is optional and defaults to 1 day or 1 month, respectively. Otherwise, the `interval.unit` is required.  | 

**Example – Partitioning by month**  
The following example table configuration partitions data by month from 2015 to the present.  

```
'projection.month.type'='date', 
'projection.month.format'='yyyy-MM', 
'projection.month.interval'='1', 
'projection.month.interval.unit'='MONTHS', 
'projection.month.range'='2015-01,NOW', 
...
```

## Injected type


Use the injected type for partition columns with possible values that cannot be procedurally generated within some logical range but that are provided in a query's `WHERE` clause as a single value.

It is important to keep in mind the following points:
+ Queries on injected columns fail if a filter expression is not provided for each injected column.
+ Queries with multiple values for a filter expression on an injected column succeed only if the values are disjunct.
+ Only columns of `string` type are supported.
+ When you use the `WHERE IN` clause with an injected partition column, there is a limit of 1,000 values that you can specify in the `IN` list. To query a dataset with more than 1,000 partitions for an injected column, split the query into multiple smaller queries, each with up to 1,000 values in the `WHERE IN` clause, and then aggregate the results.


****  

| Property name | Value | Description | 
| --- | --- | --- | 
| projection.columnName.type |  `injected`  | Required. The projection type to use for the column columnName. Only the string type is supported. The value specified must be injected (case insensitive). Leading and trailing white space is allowed. | 

For more information, see [When to use the `injected` projection type](partition-projection-dynamic-id-partitioning.md#partition-projection-injection).

# Use dynamic ID partitioning


When your data is partitioned by a property with high cardinality or when the values cannot be known in advance, you can use the `injected` projection type. Examples of such properties are user names, and IDs of devices or products. When you use the `injected` projection type to configure a partition key, Athena uses values from the query itself to compute the set of partitions that will be read.

For Athena to be able to run a query on a table that has a partition key configured with the `injected` projection type, the following must be true:
+ Your query must include at least one value for the partition key.
+ The value(s) must be literals or expressions that can be evaluated without reading any data.

If any of these criteria are not met, your query fails with the following error:

CONSTRAINT\$1VIOLATION: For the injected projected partition column *column\$1name*, the WHERE clause must contain only static equality conditions, and at least one such condition must be present.

## When to use the `injected` projection type


Imagine you have a data set that consists of events from IoT devices, partitioned on the devices' IDs. This data set has the following characteristics:
+ The device IDs are generated randomly.
+ New devices are provisioned frequently.
+ There are currently hundreds of thousands of devices, and in the future there will be millions.

This data set is difficult to manage using traditional metastores. It is difficult to keep the partitions in sync between the data storage and the metastore, and filtering partitions can be slow during query planning. But if you configure a table to use partition projection and use the `injected` projection type, you have two advantages: you don't have to manage partitions in the metastore, and your queries don't have to look up partition metadata.

The following `CREATE TABLE` example creates a table for the device event data set just described. The table uses the injected projection type.

```
CREATE EXTERNAL TABLE device_events (
  event_time TIMESTAMP,
  data STRING,
  battery_level INT
)
PARTITIONED BY (
  device_id STRING
)
LOCATION "s3://amzn-s3-demo-bucket/prefix/"
TBLPROPERTIES (
  "projection.enabled" = "true",
  "projection.device_id.type" = "injected",
  "storage.location.template" = "s3://amzn-s3-demo-bucket/prefix/${device_id}"
)
```

The following example query looks up the number of events received from three specific devices over the course of 12 hours.

```
SELECT device_id, COUNT(*) AS events
FROM device_events
WHERE device_id IN (
  '4a770164-0392-4a41-8565-40ed8cec737e',
  'f71d12cf-f01f-4877-875d-128c23cbde17',
  '763421d8-b005-47c3-ba32-cc747ab32f9a'
)
AND event_time BETWEEN TIMESTAMP '2023-11-01 20:00' AND TIMESTAMP '2023-11-02 08:00'
GROUP BY device_id
```

When you run this query, Athena sees the three values for the `device_id` partition key and uses them to compute the partition locations. Athena uses the value for the `storage.location.template` property to generate the following locations:
+ `s3://amzn-s3-demo-bucket/prefix/4a770164-0392-4a41-8565-40ed8cec737e`
+ `s3://amzn-s3-demo-bucket/prefix/f71d12cf-f01f-4877-875d-128c23cbde17`
+ `s3://amzn-s3-demo-bucket/prefix/763421d8-b005-47c3-ba32-cc747ab32f9a`

If you leave out the `storage.location.template` property from the partition projection configuration, Athena uses Hive-style partitioning to project partition locations based on the value in `LOCATION` (for example, `s3://amzn-s3-demo-bucket/prefix/device_id=4a770164-0392-4a41-8565-40ed8cec737e`).

# Amazon Data Firehose example


When you use Firehose to deliver data to Amazon S3, the default configuration writes objects with keys that look like the following example:

```
s3://amzn-s3-demo-bucket/prefix/yyyy/MM/dd/HH/file.extension
```

To create an Athena table that finds the partitions automatically at query time, instead of having to add them to the AWS Glue Data Catalog as new data arrives, you can use partition projection.

The following `CREATE TABLE` example uses the default Firehose configuration.

```
CREATE EXTERNAL TABLE my_ingested_data (
 ...
)
...
PARTITIONED BY (
 datehour STRING
)
LOCATION "s3://amzn-s3-demo-bucket/prefix/"
TBLPROPERTIES (
 "projection.enabled" = "true",
 "projection.datehour.type" = "date",
 "projection.datehour.format" = "yyyy/MM/dd/HH",
 "projection.datehour.range" = "2021/01/01/00,NOW",
 "projection.datehour.interval" = "1",
 "projection.datehour.interval.unit" = "HOURS",
 "storage.location.template" = "s3://amzn-s3-demo-bucket/prefix/${datehour}/"
)
```

The `TBLPROPERTIES` clause in the `CREATE TABLE` statement tells Athena the following:
+ Use partition projection when querying the table
+ The partition key `datehour` is of type `date` (which includes an optional time)
+ How the dates are formatted
+ The range of date times. Note that the values must be separated by a comma, not a hyphen.
+ Where to find the data on Amazon S3.

When you query the table, Athena calculates the values for `datehour` and uses the storage location template to generate a list of partition locations.

**Topics**
+ [

# How to use the `date` type
](partition-projection-kinesis-firehose-example-using-the-date-type.md)
+ [

# How to choose partition keys
](partition-projection-kinesis-firehose-example-choosing-partition-keys.md)
+ [

# How to use custom prefixes and dynamic partitioning
](partition-projection-kinesis-firehose-example-using-custom-prefixes-and-dynamic-partitioning.md)

# How to use the `date` type


When you use the `date` type for a projected partition key, you must specify a range. Because you have no data for dates before the Firehose delivery stream was created, you can use the date of creation as the start. And because you do not have data for dates in the future, you can use the special token `NOW` as the end.

In the `CREATE TABLE` example, the start date is specified as January 1, 2021 at midnight UTC.

**Note**  
Configure a range that matches your data as closely as possible so that Athena looks only for existing partitions.

When a query is run on the sample table, Athena uses the conditions on the `datehour` partition key in combination with the range to generate values. Consider the following query:

```
SELECT *
FROM my_ingested_data
WHERE datehour >= '2020/12/15/00'
AND datehour < '2021/02/03/15'
```

The first condition in the `SELECT` query uses a date that is before the start of the date range specified by the `CREATE TABLE` statement. Because the partition projection configuration specifies no partitions for dates before January 1, 2021, Athena looks for data only in the following locations, and ignores the earlier dates in the query.

```
s3://amzn-s3-demo-bucket/prefix/2021/01/01/00/
s3://amzn-s3-demo-bucket/prefix/2021/01/01/01/
s3://amzn-s3-demo-bucket/prefix/2021/01/01/02/
...
s3://amzn-s3-demo-bucket/prefix/2021/02/03/12/
s3://amzn-s3-demo-bucket/prefix/2021/02/03/13/
s3://amzn-s3-demo-bucket/prefix/2021/02/03/14/
```

Similarly, if the query ran at a date and time before February 3, 2021 at 15:00, the last partition would reflect the current date and time, not the date and time in the query condition.

If you want to query for the most recent data, you can take advantage of the fact that Athena does not generate future dates and specify only a beginning `datehour`, as in the following example.

```
SELECT *
FROM my_ingested_data
WHERE datehour >= '2021/11/09/00'
```

# How to choose partition keys


You can specify how partition projection maps the partition locations to partition keys. In the `CREATE TABLE` example in the previous section, the date and hour were combined into one partition key called datehour, but other schemes are possible. For example, you could also configure a table with separate partition keys for the year, month, day, and hour. 

However, splitting dates into year, month, and day means that the `date` partition projection type can't be used. An alternative is to separate the date from the hour to still leverage the `date` partition projection type, but make queries that specify hour ranges easier to read.

With that in mind, the following `CREATE TABLE` example separates the date from the hour. Because `date` is a reserved word in SQL, the example uses `day` as the name for the partition key that represents the date.

```
CREATE EXTERNAL TABLE my_ingested_data2 (
 ...
)
...
PARTITIONED BY (
 day STRING,
 hour INT
)
LOCATION "s3://amzn-s3-demo-bucket/prefix/"
TBLPROPERTIES (
 "projection.enabled" = "true",
 "projection.day.type" = "date",
 "projection.day.format" = "yyyy/MM/dd",
 "projection.day.range" = "2021/01/01,NOW",
 "projection.day.interval" = "1",
 "projection.day.interval.unit" = "DAYS",
 "projection.hour.type" = "integer",
 "projection.hour.range" = "0,23",
 "projection.hour.digits" = "2",
 "storage.location.template" = "s3://amzn-s3-demo-bucket/prefix/${day}/${hour}/"
)
```

In the example `CREATE TABLE` statement, the hour is a separate partition key, configured as an integer. The configuration for the hour partition key specifies the range 0 to 23, and that the hour should be formatted with two digits when Athena generates the partition locations.

A query for the `my_ingested_data2` table might look like this:

```
SELECT *
FROM my_ingested_data2
WHERE day = '2021/11/09'
AND hour > 3
```

## Understand partition key and partition projection data types


Note that `datehour` key in the first `CREATE TABLE` example is configured as `date` in the partition projection configuration, but the type of the partition key is `string`. The same is true for `day` in the second example. The types in the partition projection configuration only tell Athena how to format the values when it generates the partition locations. The types that you specify do not change the type of the partition key — in queries, `datehour` and `day` are of type `string`.

When a query includes a condition like `day = '2021/11/09'`, Athena parses the string on the right side of the expression using the date format specified in the partition projection configuration. After Athena verifies that the date is within the configured range, it uses the date format again to insert the date as a string into the storage location template.

Similarly, for a query condition like `day > '2021/11/09'`, Athena parses the right side and generates a list of all matching dates within the configured range. It then uses the date format to insert each date into the storage location template to create the list of partition locations.

Writing the same condition as `day > '2021-11-09'` or `day > DATE '2021-11-09'` does not work. In the first case, the date format does not match (note the hyphens instead of the forward slashes), and in the second case, the data types do not match.

# How to use custom prefixes and dynamic partitioning
Use custom prefixes

Firehose can be configured with [custom prefixes](https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html) and [dynamic partitioning](https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html). Using these features, you can configure the Amazon S3 keys and set up partitioning schemes that better support your use case. You can also use partition projection with these partitioning schemes and configure them accordingly.

For example, you could use the custom prefix feature to get Amazon S3 keys that have ISO formatted dates instead of the default `yyyy/MM/dd/HH` scheme.

You can also combine custom prefixes with dynamic partitioning to extract a property like `customer_id` from Firehose messages, as in the following example.

```
prefix/!{timestamp:yyyy}-!{timestamp:MM}-!{timestamp:dd}/!{partitionKeyFromQuery:customer_id}/
```

With that Amazon S3 prefix, the Firehose delivery stream would write objects to keys such as `s3://amzn-s3-demo-bucket/prefix/2021-11-01/customer-1234/file.extension`. For a property like `customer_id`, where the values may not be known in advance, you can use the partition projection type `injected` and use a `CREATE TABLE` statement like the following:

```
CREATE EXTERNAL TABLE my_ingested_data3 (
 ...
)
...
PARTITIONED BY (
 day STRING,
 customer_id STRING
)
LOCATION "s3://amzn-s3-demo-bucket/prefix/"
TBLPROPERTIES (
 "projection.enabled" = "true",
 "projection.day.type" = "date",
 "projection.day.format" = "yyyy-MM-dd",
 "projection.day.range" = "2021-01-01,NOW",
 "projection.day.interval" = "1",
 "projection.day.interval.unit" = "DAYS",
 "projection.customer_id.type" = "injected",
 "storage.location.template" = "s3://amzn-s3-demo-bucket/prefix/${day}/${customer_id}/"
)
```

When you query a table that has a partition key of type `injected`, your query must include a value for that partition key. A query for the `my_ingested_data3` table might look like this:

```
SELECT *
FROM my_ingested_data3
WHERE day BETWEEN '2021-11-01' AND '2021-11-30'
AND customer_id = 'customer-1234'
```

## Use the DATE type for the day partition key


Because the values for the `day` partition key are ISO formatted, you can also use the `DATE` type for the day partition key instead of `STRING`, as in the following example:

```
PARTITIONED BY (day DATE, customer_id STRING)
```

When you query, this strategy allows you to use date functions on the partition key without parsing or casting, as in the following example:

```
SELECT *
FROM my_ingested_data3
WHERE day > CURRENT_DATE - INTERVAL '7' DAY
AND customer_id = 'customer-1234'
```

**Note**  
Specifying a partition key of the `DATE` type assumes that you have used the [custom prefix](https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html) feature to create Amazon S3 keys that have ISO formatted dates. If you are using the default Firehose format of `yyyy/MM/dd/HH`, you must specify the partition key as type `string` even though the corresponding table property is of type `date`, as in the following example:  

```
PARTITIONED BY ( 
  `mydate` string)
TBLPROPERTIES (
  'projection.enabled'='true', 
   ...
  'projection.mydate.type'='date',
  'storage.location.template'='s3://amzn-s3-demo-bucket/prefix/${mydate}')
```