# Data types in Amazon Athena
Data types in Athena

When you run `CREATE TABLE`, you specify column names and the data type that each column can contain. The tables that you create are stored in the AWS Glue Data Catalog. 

To facilitate interoperability with other query engines, Athena uses [Apache Hive](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types) data type names for DDL statements like `CREATE TABLE`. For DML queries like `SELECT`, `CTAS`, and `INSERT INTO`, Athena uses [Trino](https://trino.io/docs/current/language/types.html) data type names. The following table shows the data types supported in Athena. Where DDL and DML types differ in terms of name, availability, or syntax, they are shown in separate columns.


****  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/athena/latest/ug/data-types.html)

**Topics**
+ [

# Data type examples
](data-types-examples.md)
+ [

# Considerations for data types
](data-types-considerations.md)
+ [

# Work with timestamp data
](data-types-timestamps.md)

# Data type examples


The following table shows example literals for DML data types.


****  

| Data type | Examples | 
| --- | --- | 
| BOOLEAN |  `true` `false `  | 
| TINYINT |  `TINYINT '123'`  | 
| SMALLINT |  `SMALLINT '123'`  | 
| INT, INTEGER |  `123456790`  | 
| BIGINT |  `BIGINT '1234567890'` `2147483648`  | 
| REAL |  `'123456.78'`  | 
| DOUBLE |  `1.234`  | 
| DECIMAL(precision, scale) |  `DECIMAL '123.456'`  | 
| CHAR, CHAR(length) |  `CHAR 'hello world'`, `CHAR 'hello ''world''!'`  | 
| VARCHAR, VARCHAR(length) |  `VARCHAR 'hello world'`, `VARCHAR 'hello ''world''!'`  | 
| VARBINARY |  `X'00 01 02'`  | 
| TIME, TIME(precision) |  `TIME '10:11:12'`, `TIME '10:11:12.345'`  | 
| TIME WITH TIME ZONE |  `TIME '10:11:12.345 -06:00'`  | 
| DATE |  `DATE '2024-03-25'`  | 
|  TIMESTAMP, TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP(*precision*), TIMESTAMP(*precision*) WITHOUT TIME ZONE   |  `TIMESTAMP '2024-03-25 11:12:13'`, `TIMESTAMP '2024-03-25 11:12:13.456'`  | 
| TIMESTAMP WITH TIME ZONE, TIMESTAMP(precision) WITH TIME ZONE |  `TIMESTAMP '2024-03-25 11:12:13.456 Europe/Berlin'`  | 
| INTERVAL YEAR TO MONTH |  `INTERVAL '3' MONTH`  | 
| INTERVAL DAY TO SECOND |  `INTERVAL '2' DAY`  | 
| ARRAY[element\$1type] |  `ARRAY['one', 'two', 'three']`  | 
| MAP(key\$1type, value\$1type) |  `MAP(ARRAY['one', 'two', 'three'], ARRAY[1, 2, 3])` Note that maps are created from an array of keys and an array of values. The following example creates a table that maps strings to integers. <pre>CREATE TABLE map_table(col1 map<string, integer>) LOCATION '...';<br />INSERT INTO map_table values(MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]));</pre>  | 
| ROW(field\$1name\$11 field\$1type\$11, field\$1name\$12 field\$1type\$12, …) |  `ROW('one', 'two', 'three')` Note that rows created this way have no column names. To add column names, you can use `CAST`, as in the following example: <pre>CAST(ROW(1, 2, 3) AS ROW(one INT, two INT, three INT))</pre>  | 
| JSON |  `JSON '{"one":1, "two": 2, "three": 3}'`  | 
| UUID |  `UUID '12345678-90ab-cdef-1234-567890abcdef'`  | 
| IPADDRESS |  `IPADDRESS '10.0.0.1'` `IPADDRESS '2001:db8::1'`  | 

# Considerations for data types


## Size limits


For data types that do not specify a size limit, keep in mind that there is a practical limit of 32MB for all of the data in a single row. For more information, see [Row or column size limitation](other-notable-limitations.md#sql-limitations-rowsize) in [Considerations and limitations for SQL queries in Amazon Athena](other-notable-limitations.md).

## CHAR and VARCHAR


A `CHAR(n)` value always has a count of `n` characters. For example, if you cast 'abc' to `CHAR(7)`, 4 trailing spaces are added. 

Comparisons of `CHAR` values include leading and trailing spaces. 

If a length is specified for `CHAR` or `VARCHAR`, strings are truncated at the specified length when read. If the underlying data string is longer, the underlying data string remains unchanged.

To escape a single quote in a `CHAR` or `VARCHAR`, use an additional single quote.

To cast a non-string data type to a string in a DML query, cast to the `VARCHAR` data type.

To use the `substr` function to return a substring of specified length from a `CHAR` data type, you must first cast the `CHAR` value as a `VARCHAR`. In the following example, `col1` uses the `CHAR` data type.

```
substr(CAST(col1 AS VARCHAR), 1, 4)
```

## DECIMAL


To specify decimal values as literals in `SELECT` queries, such as when selecting rows with a specific decimal value, you can specify the `DECIMAL` type and list the decimal value as a literal in single quotes in your query, as in the following examples.

```
SELECT * FROM my_table
WHERE decimal_value = DECIMAL '0.12'
```

```
SELECT DECIMAL '44.6' + DECIMAL '77.2'
```

# Work with timestamp data


This section describes some considerations for working with timestamp data in Athena.

**Note**  
The treatment of timestamps has changed somewhat between previous engine versions and Athena engine version 3. For information about timestamp-related errors that can occur in Athena engine version 3 and suggested solutions, see [Timestamp changes](engine-versions-reference-0003.md#engine-versions-reference-0003-timestamp-changes) in the [Athena engine version 3](engine-versions-reference-0003.md) reference.

## Format for writing timestamp data to Amazon S3 objects


The format in which timestamp data should be written into Amazon S3 objects depends on both the column data type and the [SerDe library](https://docs.aws.amazon.com/athena/latest/ug/supported-serdes.html) that you use.
+ If you have a table column of type `DATE`, Athena expects the corresponding column or property of the data to be a string in the ISO format `YYYY-MM-DD`, or a built-in date type like those for Parquet or ORC.
+ If you have a table column of type `TIME`, Athena expects the corresponding column or property of the data to be a string in the ISO format `HH:MM:SS`, or a built-in time type like those for Parquet or ORC.
+ If you have a table column of type `TIMESTAMP`, Athena expects the corresponding column or property of the data to be a string in the format `YYYY-MM-DD HH:MM:SS.SSS` (note the space between the date and time), or a built-in time type like those for Parquet, ORC, or Ion. Note that Athena does not guarantee the behavior for timestamps that are invalid (for example, `0000-00-00 08:00:00.000`).
**Note**  
OpenCSVSerDe timestamps are an exception and must be encoded as millisecond resolution UNIX epochs.

## Ensuring that time-partitioned data matches the timestamp field in a record


The producer of the data must make sure partition values align with the data within the partition. For example, if your data has a `timestamp` property and you use Firehose to load the data into Amazon S3, you must use [dynamic partitioning](https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html) because the default partitioning of Firehose is wall-clock-based.

## Use string as the data type for partition keys


For performance reasons, it is preferable to use `STRING` as the data type for partition keys. Even though Athena recognizes partition values in the format `YYYY-MM-DD` as dates when you use the `DATE` type, this can lead to poor performance. For this reason, we recommend that you use the `STRING` data type for partition keys instead.

## How to write queries for timestamp fields that are also time-partitioned


How you write queries for timestamp fields that are time-partitioned depends on the type of table that you want to query.

### Hive tables


With the Hive tables most commonly used in Athena, the query engine has no knowledge of relationships between columns and partition keys. For this reason, you must always add predicates in your queries for both the column and the partition key.

For example, suppose you have an `event_time` column and an `event_date` partition key and want to query events between 23:00 and 03:00. In this case, you must include predicates in your query for both the column and the partition key, as in the following example.

```
WHERE event_time BETWEEN start_time AND end_time 
  AND event_date BETWEEN start_time_date AND end_time_date
```

### Iceberg tables


With Iceberg tables, you can use computed partition values, which simplifies your queries. For example, suppose your Iceberg table was created with a `PARTITIONED BY` clause like the following:

```
PARTITIONED BY (event_date month(event_time))
```

In this case, the query engine automatically prunes partitions based on the values of the `event_time` predicates. Because of this, your query only needs to specify a predicate for `event_time`, as in the following example.

```
WHERE event_time BETWEEN start_time AND end_time
```

For more information, see [Create Iceberg tables](querying-iceberg-creating-tables.md).

When using Iceberg's hidden partitioning for a timestamp column, Iceberg might create a partition on a constructed table column derived from a timestamp column and transformed into a date for more effective partitioning. For example, it might create `event_date` from the timestamp column `event_time` and automatically partition on `event_date`. In this case, the partition **type** is a **date**.

For optimal query performance when you use the partition, filter on full day ranges to enable predicate pushdown. For example, the following query wouldn't be pushed down because the range can't be converted to a single date partition, even though it falls within a single day:

```
WHERE event_time >= TIMESTAMP '2024-04-18 00:00:00' AND event_time < TIMESTAMP '2024-04-18 12:00:00'
```

Instead, use a full day range to allow predicate pushdown and improve query performance as in following example.

```
WHERE event_time >= TIMESTAMP '2024-04-18 00:00:00' AND event_time < TIMESTAMP '2024-04-19 00:00:00'
```

You can also use the `BETWEEN start_time AND end_time` syntax or use the multi-day ranges as long as the timestamps portions are `00:00:00`.

For more information, see the [Trino blog post](https://trino.io/blog/2023/04/11/date-predicates.html).