Amazon Ion SerDe property reference
This topic contains information about the SerDe properties for CREATE TABLE
statements in Athena. For more information and examples of Amazon Ion SerDe property usage,
see SerDe properties
How to specify Amazon Ion SerDe properties
To specify properties for the Amazon Ion Hive SerDe in your CREATE TABLE
statement, use the WITH SERDEPROPERTIES clause. Because WITH
SERDEPROPERTIES is a subfield of the ROW FORMAT SERDE clause,
you must specify ROW FORMAT SERDE and the Amazon Ion Hive SerDe class path
first, as the following syntax shows.
... ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe' WITH SERDEPROPERTIES ( 'property' = 'value', 'property' = 'value', ... )
Note that although the ROW FORMAT SERDE clause is required if you want to
use WITH SERDEPROPERTIES, you can use either STORED AS ION or
the longer INPUTFORMAT and OUTPUTFORMAT syntax to specify the
Amazon Ion format.
Amazon Ion SerDe properties
Following are the Amazon Ion SerDe properties that can be used in CREATE
TABLE statements in Athena.
- ion.encoding
-
Optional
Default:
BINARYValues:
BINARY,TEXTThis property declares whether new values added are serialized as Amazon Ion binary
or Amazon Ion text format. The following SerDe property example specifies Amazon Ion text format.
'ion.encoding' = 'TEXT' - ion.fail_on_overflow
-
Optional
Default:
trueValues:
true,falseAmazon Ion allows for arbitrarily large numerical types while Hive does not. By default, the SerDe fails if the Amazon Ion value does not fit the Hive column, but you can use the
fail_on_overflowconfiguration option to let the value overflow instead of failing.This property can be set at either the table or column level. To specify it at the table level, specify
ion.fail_on_overflowas in the following example. This sets the default behavior for all columns.'ion.fail_on_overflow' = 'true'To control a specific column, specify the column name between
ionandfail_on_overflow, delimited by periods, as in the following example.'ion.<column>.fail_on_overflow' = 'false' - ion.path_extractor.case_sensitive
-
Optional
Default:
falseValues:
true,falseDetermines whether to treat Amazon Ion field names as case sensitive. When
false, the SerDe ignores case parsing Amazon Ion field names.For example, suppose you have a Hive table schema that defines a field
aliasin lower case and an Amazon Ion document with both analiasfield and anALIASfield, as in the following example.-- Hive Table Schema alias: STRING -- Amazon Ion Document { 'ALIAS': 'value1'} { 'alias': 'value2'}The following example shows SerDe properties and the resulting extracted table when case sensitivity is set to
false:-- Serde properties 'ion.alias.path_extractor' = '(alias)' 'ion.path_extractor.case_sensitive' = 'false' --Extracted Table | alias | |----------| | "value1" | | "value2" |The following example shows SerDe properties and the resulting extracted table when case sensitivity is set to
true:-- Serde properties 'ion.alias.path_extractor' = '(alias)' 'ion.path_extractor.case_sensitive' = 'true' --Extracted Table | alias | |----------| | "value2" |In the second case,
value1for theALIASfield is ignored when case sensitivity is set totrueand the path extractor is specified asalias. - ion.
<column>.path_extractor -
Optional
Default: NA
Values: String with search path
Creates a path extractor with the specified search path for the given column. Path extractors map Amazon Ion fields to Hive columns. If no path extractors are specified, Athena dynamically creates path extractors at run time based on column names.
The following example path extractor maps the
example_ion_fieldto theexample_hive_column.'ion.example_hive_column.path_extractor' = '(example_ion_field)'For more information about path extractors and search paths, see Use path extractors.
- ion.timestamp.serialization_offset
-
Optional
Default:
'Z'Values:
OFFSET, whereOFFSETis represented as. Example values:<signal>hh:mm01:00,+01:00,-09:30,Z(UTC, same as 00:00)Unlike Apache Hive timestamps
, which have no built-in time zone and are stored as an offset from the UNIX epoch, Amazon Ion timestamps do have an offset. Use this property to specify the offset when you serialize to Amazon Ion. The following example adds an offset of one hour.
'ion.timestamp.serialization_offset' = '+01:00' - ion.serialize_null
-
Optional
Default:
OMITValues:
OMIT,UNTYPED,TYPEDThe Amazon Ion SerDe can be configured to either serialize or omit columns that have null values. You can choose to write out strongly typed nulls (
TYPED) or untyped nulls (UNTYPED). Strongly typed nulls are determined based on the default Amazon Ion to Hive type mapping.The following example specifies strongly typed nulls.
'ion.serialize_null'='TYPED' - ion.ignore_malformed
-
Optional
Default:
falseValues:
true,falseWhen
true, ignores malformed entries or the whole file if the SerDe is unable to read it. For more information, see Ignore malformedin the documentation on GitHub. - ion.
<column>.serialize_as -
Optional
Default: Default type for the column.
Values: String containing Amazon Ion type
Determines the Amazon Ion data type in which a value is serialized. Because Amazon Ion and Hive types do not always have a direct mapping, a few Hive types have multiple valid data types for serialization. To serialize data as a non-default data type, use this property. For more information about type mapping, see the Amazon Ion Type mapping
page on GitHub. By default, binary Hive columns are serialized as Amazon Ion blobs, but they can also be serialized as an Amazon Ion clob
(character large object). The following example serializes the column example_hive_binary_columnas a clob.'ion.example_hive_binary_column.serialize_as' = 'clob'