

# Create tables for ETL jobs
<a name="schema-classifier"></a>

You can use Athena to create tables that AWS Glue can use for ETL jobs. AWS Glue jobs perform ETL operations. An AWS Glue job runs a script that extracts data from sources, transforms the data, and loads it into targets. For more information, see [Authoring Jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/author-job-glue.html) in the *AWS Glue Developer Guide*.

## Creating Athena tables for AWS Glue ETL jobs
<a name="schema-etl-tables"></a>

Tables that you create in Athena must have a table property added to them called a `classification`, which identifies the format of the data. This allows AWS Glue to use the tables for ETL jobs. The classification values can be `avro`, `csv`, `json`, `orc`, `parquet`, or `xml`. An example `CREATE TABLE` statement in Athena follows:

```
CREATE EXTERNAL TABLE sampleTable (
  column1 INT,
  column2 INT
  ) STORED AS PARQUET
  TBLPROPERTIES (
  'classification'='parquet')
```

If the `classification` table property was not added when the table was created, you can add it using the AWS Glue console.

**To add the classification table property using the AWS Glue console**

1. Sign in to the AWS Management Console and open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the console navigation pane, choose **Tables**.

1. Choose the link for the table that you want to edit, and then choose **Actions**, **Edit table**.

1. Scroll down to the **Table properties** section.

1. Choose **Add**.

1. For **Key**, enter **classification**.

1. For **Value**, enter a data type (for example, **json**).

1. Choose **Save**.

   In the **Table details** section, the data type that you entered appears in the **Classification** field for the table.

For more information, see [Working with tables](https://docs.aws.amazon.com/glue/latest/dg/console-tables.html) in the *AWS Glue Developer Guide*.

## Use ETL jobs to optimize query performance
<a name="schema-etl-performance"></a>

AWS Glue jobs can help you transform data to a format that optimizes query performance in Athena. Data formats have a large impact on query performance and query costs in Athena.

AWS Glue supports writing to the Parquet and ORC data formats. You can use this feature to transform your data for use in Athena. For more information about using Parquet and ORC, and other ways to improve performance in Athena, see [Top 10 performance tuning tips for Amazon Athena](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/).

**Note**  
To reduce the likelihood that Athena is unable to read the `SMALLINT` and `TINYINT` data types produced by an AWS Glue ETL job, convert `SMALLINT` and `TINYINT` to `INT` when you create an ETL job that converts data to ORC.

## Automate AWS Glue jobs for ETL
<a name="schema-etl-automate"></a>

You can configure AWS Glue ETL jobs to run automatically based on triggers. This feature is ideal when data from outside AWS is being pushed to an Amazon S3 bucket in an otherwise suboptimal format for querying in Athena. For more information, see [Triggering AWS Glue jobs](https://docs.aws.amazon.com/glue/latest/dg/trigger-job.html) in the *AWS Glue Developer Guide*.