Use the Athena console to connect to a data source
You can use the Athena console to create and configure a data source connection.
To create a connection to a data source
Open the Athena console at https://console.aws.amazon.com/athena/
. If the console navigation pane is not visible, choose the expansion menu on the left.
-
In the navigation pane, choose Data sources and catalogs.
-
On the Data sources and catalogs page, choose Create data source.
-
For Choose a data source, choose the data source that you want Athena to query, considering the following guidelines:
-
Choose a connection option that corresponds to your data source. Athena has prebuilt data source connectors that you can configure for sources including MySQL, Amazon DocumentDB, and PostgreSQL.
-
Choose S3 - AWS Glue Data Catalog if you want to query data in Amazon S3 and you are not using an Apache Hive metastore or one of the other federated query data source options on this page. Athena uses the AWS Glue Data Catalog to store metadata and schema information for data sources in Amazon S3. This is the default (non-federated) option. For more information, see Use AWS Glue Data Catalog to connect to your data. For steps using this workflow, see Register and use data catalogs in Athena.
-
Choose S3 - Apache Hive metastore to query data sets in Amazon S3 that use an Apache Hive metastore. For more information about this option, see Connect Athena to an Apache Hive metastore.
-
Choose Custom or shared connector if you want to create your own data source connector for use with Athena. For information about writing a data source connector, see Develop a data source connector using the Athena Query Federation SDK.
-
Choose Next.
-
On the Enter data source details page, for Data source name, use the autogenerated name, or enter a unique name that you want to use in your SQL statements when you query the data source from Athena. The name can be up to 127 characters and must be unique within your account. It cannot be changed after you create it. Valid characters are a-z, A-Z, 0-9, _ (underscore), @ (at sign) and - (hyphen). The names
awsdatacatalog,hive,jmx, andsystemare reserved by Athena and cannot be used for data source names. -
If the data source you choose uses an AWS Glue connection.
-
For AWS Glue connection details, enter the information required. A connection contains the properties that are required to connect to a particular data source. The properties required vary depending on the connection type. For more information on properties related to your connector, see Available data source connectors. For information about additional connection properties, see AWS Glue connection properties in the AWS Glue User Guide.
Warning
-
The following properties are not allowed to be updated in the Glue Connection. You must create a new connection.
-
VPC configuration –
security_group_ids,subnet_ids
-
-
-
For Glue Data Catalog IAM Role, see AWS Glue Data Catalog federated connectors without Lambda permissions.
Note
For Lambda execution IAM role, choose one of the following:
-
Create and use a new execution role – (Default) Athena creates an execution role that it will then use to access resources in AWS Lambda on your behalf. Athena requires this role to create your federated data source.
-
Use an existing execution role – Use this option to choose an existing execution role. For this option, choose execution role that you want to use from Execution role drop-down.
-
-
-
If the data source you choose does not use an AWS Glue connection.
-
For Lambda function, choose Create Lambda function. The function page for the connector that you chose opens in the AWS Lambda console. The page includes detailed information about the connector.
-
Under Application settings, read the description for each application setting carefully, and then enter values that correspond to your requirements.
The application settings that you see vary depending on the connector for your data source. The minimum required settings include:
-
AthenaCatalogName – A name, in lower case, for the Lambda function that indicates the data source that it targets, such as
cloudwatchlogs. -
SpillBucket – An Amazon S3 bucket in your account to store data that exceeds Lambda function response size limits.
Note
Spilled data is not reused in subsequent executions and can be safely deleted. Athena does not delete this data for you. To manage these objects, consider adding an object lifecycle policy that deletes old data from your Amazon S3 spill bucket. For more information, see Managing your storage lifecycle in the Amazon S3 User Guide.
-
Select I acknowledge that this app creates custom IAM roles and resource policies. For more information, choose the Info link.
-
Choose Deploy. When the deployment is complete, the Lambda function appears in the Resources section in the Lambda console.
After you deploy the data source connector to your account, you can connect Athena to it.
-
Return to the Enter data source details page of the Athena console.
-
In the Connection details section, choose the refresh icon next to the Select or enter a Lambda function search box.
-
Choose the name of the function that you just created in the Lambda console. The ARN of the Lambda function displays.
-
-
(Optional) For Tags, add key-value pairs to associate with this data source. For more information about tags, see Tag Athena resources.
-
Choose Next.
-
On the Review and create page, review the data source details. To make changes, choose Edit.
-
Read the information in Athena will create resources in your account. If you agree, select I acknowledge that Athena will create resources on my behalf.
-
Choose Create data source. Athena will create the following resources in your account.
-
For AWS Glue Data Catalog federated connectors without Lambda
Note
If your data source is in a VPC, Athena also creates an Elastic Network Interface (ENI) in your account to connect to the VPC.
-
AWS Glue connection
-
AWS Glue catalog
-
-
For AWS Glue Data Catalog federated connectors with Lambda
-
AWS Glue connection
-
Lambda execution IAM role
-
Lambda function
-
-
For Athena data catalog federated connectors
-
Lambda execution IAM role
-
Lambda function
-
-
The Data source details section of the page for your data source shows information about your new connector. You can now use the connector in your Athena queries.
For information about using data connectors in queries, see Run federated queries.