SageMaker Lakehouse with S3 storage S3 Tables target SageMaker Lakehouse with Amazon Redshift storage Amazon Redshift data warehouse target Configuring the integration target

Configuring a zero-ETL integration target

There are several options offered by AWS when configuring a target for a zero-ETL integration. The target may be an encrypted Amazon Redshift data warehouse or an Amazon SageMaker Lakehouse catalog.

Before selecting the target for the zero-ETL integration, you need to configure one of the following target resources.

The configuration options for a target in a zero-ETL integration include:

An Amazon SageMaker Lakehouse catalog and database configured with regular Amazon S3 storage. See Configuring an Amazon SageMaker Lakehouse catalog with regular S3 storage.
An Amazon SageMaker Lakehouse catalog configured with Amazon S3 Tables bucket. See Configuring Amazon S3 tables as a target.
An Amazon SageMaker Lakehouse catalog configured with Amazon Redshift managed storage. See Configuring an Amazon SageMaker Lakehouse catalog with Amazon Redshift managed storage.
An Amazon Redshift data warehouse identified by a Redshift namespace. See Configuring an Amazon Redshift data warehouse target.

Note

You cannot modify the target of a zero-ETL integration after creation.

Configuring an Amazon SageMaker Lakehouse catalog with regular S3 storage

This section describes the prerequisites and setup steps for configuring a regular Amazon S3 bucket as storage for your Amazon SageMaker Lakehouse catalog target in a zero-ETL integration.

Prerequisites for setting up an integration

Before creating a zero-ETL integration with an Amazon SageMaker Lakehouse catalog using regular S3 storage, you need to complete the following setup tasks:

Set up an AWS Glue database
Provide Catalog RBAC policy
Create target IAM role

After configuring the Amazon SageMaker Lakehouse catalog with regular Amazon S3 storage, you can proceed to Configuring the integration with your target to complete the integration setup.

Configuring Amazon S3 tables as a target

This section describes the prerequisites and setup steps for configuring Amazon S3 Tables as a target for your zero-ETL integration.

Prerequisites for setting up an integration

Before creating a zero-ETL integration with Amazon S3 Tables as a target, you need to complete the following setup tasks:

Setup Amazon S3 tables bucket
Provide Catalog RBAC policy
Create target IAM role

Setup Amazon S3 tables bucket

Create an S3 table bucket in your account by following the instructions at Getting started with Amazon S3 Tables.
Enable Analytics integrations with your S3-Table bucket by following these instructions: Integrating AWS services with Amazon S3 Tables.

Provide Catalog RBAC Policy

The following permissions must be added to the Catalog RBAC Policy to allow for integrations between source and Amazon S3 tables catalog target.

Target AWS Glue Catalog resource policy needs to include Glue Service permissions to AuthorizeInboundIntegration. Additionally, CreateInboundIntegration permission is required either on the source principal creating the Integration or in the target AWS Glue resource policy.

Note

For cross-account scenario, both source principal as well as target AWS Glue Catalog resource policy need to include glue:CreateInboundIntegration permissions on the resource.

Note

Replace <s3tablescatalog> with the catalog name of your S3 tables.

Create target IAM Role

Create a target IAM role with the following permissions and trust relationships:

Example IAM policy:

Add the following trust policy in the Target IAM role to allow AWS Glue Service to assume it:

Note

Make sure there is no explicit DENY statement for this target IAM role in the S3-Tables bucket resource policy. An explicit DENY would override any ALLOW permissions and prevent the integration from working properly.

Configuring an Amazon SageMaker Lakehouse catalog with Amazon Redshift managed storage

This section describes the prerequisites and setup steps for configuring an Amazon SageMaker Lakehouse catalog with Amazon Redshift managed storage (RMS) as a target for your zero-ETL integration.

Prerequisites for setting up an integration

Before creating a zero-ETL integration with an Amazon SageMaker Lakehouse catalog using Redshift managed storage, you need to complete the following setup tasks:

Set up an Amazon Redshift cluster or Serverless workgroup
Register the Amazon Redshift integration with Lake Formation
Create a managed catalog in Lake Formation
Configure IAM permissions

Setting up Amazon Redshift managed storage

To set up Amazon Redshiftmanaged storage for your zero-ETL integration:

Create or use an existing Amazon Redshift cluster or Serverless workgroup. Make sure the target Amazon Redshift workgroup or cluster has the enable_case_sensitive_identifier parameter turned on for the integration to be successful. For more information on enabling case sensitivity, see Turn on case sensitivity for your data warehouse in the Amazon Redshift management guide.
Register an integration from Redshift into the catalog in AWS Lake Formation. See Registering Amazon Redshift clusters and namespaces to the AWS Glue Data Catalog.
Create a federated or managed catalog in AWS Lake Formation. For more information, see:
- Bringing Amazon Redshift data into the AWS Glue Data Catalog
- Creating an Amazon Redshift managed catalog in the AWS Glue Data Catalog
Configure IAM permissions for the target role. The role needs permissions to access both Redshift and Lake Formation resources. At minimum, the role should have:
- Permissions to access the Redshift cluster or workgroup
- Permissions to access the Lake Formation catalog
- Permissions to create and manage tables in the catalog
- CloudWatch and CloudWatch Logs permissions for monitoring

After configuring the Amazon SageMaker Lakehouse catalog with Amazon Redshift managed storage, you can proceed to Configuring the integration with your target to complete the integration setup.

Configuring an Amazon Redshift data warehouse target

This section describes the prerequisites and setup steps for configuring an Amazon Redshift data warehouse as a target for your zero-ETL integration.

Prerequisites for setting up an integration

Before creating a zero-ETL integration with an Amazon Redshift data warehouse target, you need to complete the following setup tasks:

Set up an Amazon Redshift cluster or Serverless workgroup
Configure case sensitivity
Configure IAM permissions

Setting up the Amazon Redshift data warehouse

To set up an Amazon Redshift data warehouse for your zero-ETL integration:

Navigate to the Amazon Redshift console and click Create cluster or use an existing cluster. For Amazon Redshift Serverless, click Create workgroup.
If creating a new cluster, choose an appropriate cluster size and ensure your cluster is encrypted. For Serverless, configure the workgroup settings according to your requirements.
Make sure the target Amazon Redshift workgroup or cluster has the enable_case_sensitive_identifier parameter turned on for the integration to be successful. For more information on enabling case sensitivity, see Turn on case sensitivity for your data warehouse in the Amazon Redshift management guide.
Configure IAM permissions to allow the zero-ETL integration to access your Amazon Redshift data warehouse. You'll need to create an IAM role with the following permissions:
- Permissions to access the Amazon Redshift cluster or workgroup
- Permissions to create and manage databases and tables in Amazon Redshift
- CloudWatch and Amazon CloudWatch Logs permissions for monitoring
After the Amazon Redshift workgroup or cluster setup is complete, you need to configure your data warehouse for zero-ETL integrations. See Getting started with zero-ETL integrations in the Amazon Redshift Management Guide for more information.

Note

When using a Amazon Redshift data warehouse as a target, the integration creates a schema in the specified database to store the replicated data. The schema name is derived from the integration name.

After configuring the Amazon Redshift data warehouse, you can proceed to Configuring the integration with your target to complete the integration setup.

Configuring the integration with your target

After you have configured your target resources and selected your connection and specified a source IAM role, follow these steps to complete the integration setup:

Specify the target you've configured in the previous steps.
Select the AWS Glue Fix it for me option. For the Amazon Redshift target, this will:
- Apply an authorized service principal on the Amazon Redshift cluster or Serverless workgroup.
- Apply an authorized AWS Glue source ARN to the Amazon Redshift cluster or Serverless workgroup.
- Associate a new parameter group with enable_case_sensitive_identifier = true.
Provide the integration name and choose Create and launch Integration.
Once your integration is in the active state, navigate to the integration details page and choose Create a database from integration.
Finally, you can navigate to the Redshift query editor, and connect to your database to validate the snapshot and incremental data.

Note

You can only use lowercase alphanumeric characters and underscores in the namespace or catalog name. This is different from what the AWS Glue Data Catalog allows to create a database with any name (including special characters).

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Unsupported ServiceNow fields

Partition and schema unnesting