

# Using the Aurora PostgreSQL Limitless Database data loading utility
Using the Limitless Database data loading utility

Aurora provides a utility for loading data directly into Limitless Database from an Aurora PostgreSQL DB cluster or RDS for PostgreSQL DB instance.

You perform the following steps to use the data loading utility:

1. [Prerequisites](#limitless-load.prereqs)

1. [Preparing the source database](#limitless-load.source)

1. [Preparing the destination database](#limitless-load.destination)

1. [Creating database credentials](#limitless-load.users)

1. One of the following:
   + [Setting up database authentication and resource access using a script](limitless-load.script.md) (recommended)
   + [Setting up database authentication and resource access manually](limitless-load.manual.md)

1. [Loading data from an Aurora PostgreSQL DB cluster or RDS for PostgreSQL DB instance](limitless-load.data.md)

## Limitations


The data loading utility has the following limitations:
+ The following data types aren't supported: `enum`, `ARRAY`, `BOX`, `CIRCLE`, `LINE`, `LSEG`, `PATH`, `PG_LSN`, `PG_SNAPSHOT`, `POLYGON`, `TSQUERY`, `TSVECTOR`, and `TXID_SNAPSHOT`.
+ Leading zeroes (`0`) are stripped from the `VARBIT` data type during loading.
+ Data migration fails when there are foreign keys on the destination tables.
+ Limitless Data Utility supports the following source configurations for Amazon RDS for PostgreSQL Multi-AZ DB clusters:
  + Primary instance
    + Supported modes: snapshot, snapshot\$1then\$1cdc
  + Replica instance
    + Supported mode: snapshot only
      + Requirement: hot\$1standby\$1feedback must be enabled
    + Not supported: snapshot\$1then\$1cdc

## Prerequisites


The data loading utility has the following prerequisites:
+ The source database uses Aurora PostgreSQL or RDS for PostgreSQL version 11.x and higher.
+ The source database is in the same AWS account and AWS Region as the destination DB shard group.
+ The source DB cluster or DB instance is in the `available` state.
+ Tables on the source database and limitless database have the same table names, column names, and column data types.
+ The source and destination tables have primary keys that use the same columns and column orders.
+ You must have an environment for connecting to a limitless database to run data loading commands. Available commands are the following:
  + `rds_aurora.limitless_data_load_start`
  + `rds_aurora.limitless_data_load_cancel`
+ For CDC:
  + Both the source database and the destination DB shard group must use the same DB subnet group, VPC security group, and database port. These setups are for network connections to both the source database and the routers in the DB shard group.
  + You must enable logical replication on the source database. The source database user must have privileges to read logical replication.

## Preparing the source database


To access the source database for data loading, you must allow incoming network traffic to it. Perform the following steps.

**To allow network traffic to the source database**

1. Sign in to the AWS Management Console and open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. Navigate to the **Security groups** page.

1. Choose the **Security group ID** for the security group used by the source DB cluster or instance.

   For example, its security group ID is `sg-056a84f1712b77926`.

1. On the **Inbound rules** tab:

   1. Choose **Edit inbound rules**.

   1. Add a new inbound rule for the source DB cluster or instance:
      + Port range – Database port for the source database, usually `5432`
      + Security group ID – `sg-056a84f1712b77926` in this example  
![\[Add inbound rule for the source database.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/limitless_self_access_inbound_rule.png)

1. On the **Outbound rules** tab:

   1. Choose **Edit outbound rules**.

   1. Add a new outbound rule for the source DB cluster or instance:
      + Database port – `All traffic` (includes ports `0-65535`)
      + Security group ID – `sg-056a84f1712b77926` in this example  
![\[Add outbound rule for the source database.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/limitless_self_access_outbound_rule.png)

1. Sign in to the AWS Management Console and open the Amazon VPC console at [https://console.aws.amazon.com/vpc/](https://console.aws.amazon.com/vpc/).

1. Navigate to the **Network ACLs** page.

1. Add the default network ACL configuration as outlined in [Default network ACL](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html#default-network-acl).

## Preparing the destination database


Follow the procedures in [Creating Aurora PostgreSQL Limitless Database tables](limitless-creating.md) to create the destination tables in the DB shard group.

Your destination tables must have the same schemas, table names, and primary keys as the source tables.

## Creating database credentials


You must create database users in the source and destination databases, and grant necessary privileges to the users. For more information, see [CREATE USER](https://www.postgresql.org/docs/current/sql-createuser.html) and [GRANT](https://www.postgresql.org/docs/current/sql-grant.html) in the PostgreSQL documentation.

### Create the source database credentials


The source database user is passed in the command to start loading. This user must have privileges to perform replication from the source database.

1. Use the database master user (or another user with the `rds_superuser` role) to create a source database user with `LOGIN` privileges.

   ```
   CREATE USER source_db_username WITH PASSWORD 'source_db_user_password';
   ```

1. Grant the `rds_superuser` role to your source database user.

   ```
   GRANT rds_superuser to source_db_username;
   ```

1. If you're using `full_load_and_cdc` mode, grant the `rds_replication` role to your source database user. The `rds_replication` role grants permissions to manage logical slots and to stream data using logical slots.

   ```
   GRANT rds_replication to source_db_username;
   ```

### Create the destination database credentials


The destination database user must have permission to write to the destination tables in the DB shard group.

1. Use the database master user (or another user with the `rds_superuser` role) to create a destination database user with `LOGIN` privileges.

   ```
   CREATE USER destination_db_username WITH PASSWORD 'destination_db_user_password';
   ```

1. Grant the `rds_superuser` role to your destination database user.

   ```
   GRANT rds_superuser to destination_db_username;
   ```

# Setting up database authentication and resource access using a script
Setting up access using a script

The setup script creates one customer-managed AWS KMS key, one AWS Identity and Access Management (IAM) role, and two AWS Secrets Manager secrets.

Perform the following steps to use the setup script:

1. Make sure that you have the AWS CLI installed and configured with your AWS account credentials.

1. Install the `jq` command-line JSON processor. For more information, see [jqlang/jq](https://github.com/jqlang/jq).

1. Copy the [data\$1loading\$1script.zip](samples/data_loading_script.zip) file to your computer, and extract the `data_load_aws_setup_script.sh` file from it.

1. Edit the script to replace the placeholder variables with the appropriate values for the following:
   + Your AWS account
   + The AWS Region
   + Source database credentials
   + Destination database credentials

1. Open a new terminal on your computer and run the following command:

   ```
   bash ./data_load_aws_setup_script.sh
   ```

## Setup script for the data loading utility


We provide the text of the `data_load_aws_setup_script.sh` file here for reference.

```
#!/bin/bash
# Aurora Limitless data loading - AWS resources setup script #
# Set up the account credentials in advance. #
# Update the following script variables. #

###################################
#### Start of variable section ####

ACCOUNT_ID="12-digit_AWS_account_ID"
REGION="AWS_Region"
DATE=$(date +'%m%d%H%M%S')
RANDOM_SUFFIX="${DATE}"
SOURCE_SECRET_NAME="secret-source-${DATE}"
SOURCE_USERNAME="source_db_username"
SOURCE_PASSWORD="source_db_password"
DESTINATION_SECRET_NAME="secret-destination-${DATE}"
DESTINATION_USERNAME="destination_db_username"
DESTINATION_PASSWORD="destination_db_password"
DATA_LOAD_IAM_ROLE_NAME="aurora-data-loader-${RANDOM_SUFFIX}"
TMP_WORK_DIR="./tmp_data_load_aws_resource_setup/"

#### End of variable section ####
#################################

# Main logic start
echo "DATE - [${DATE}]"
echo "RANDOM_SUFFIX - [${RANDOM_SUFFIX}]"
echo 'START!'

mkdir -p $TMP_WORK_DIR

# Create the symmetric KMS key for encryption and decryption.
TMP_FILE_PATH="${TMP_WORK_DIR}tmp_create_key_response.txt"
aws kms create-key --region $REGION | tee $TMP_FILE_PATH
KMS_KEY_ARN=$(cat $TMP_FILE_PATH | jq -r '.KeyMetadata.Arn')
aws kms create-alias \
    --alias-name alias/"${DATA_LOAD_IAM_ROLE_NAME}-key" \
    --target-key-id $KMS_KEY_ARN \
    --region $REGION

# Create the source secret.
TMP_FILE_PATH="${TMP_WORK_DIR}tmp_create_source_secret_response.txt"
aws secretsmanager create-secret \
    --name $SOURCE_SECRET_NAME \
    --kms-key-id $KMS_KEY_ARN \
    --secret-string "{\"username\":\"$SOURCE_USERNAME\",\"password\":\"$SOURCE_PASSWORD\"}" \
    --region $REGION \
    | tee $TMP_FILE_PATH
SOURCE_SECRET_ARN=$(cat $TMP_FILE_PATH | jq -r '.ARN')

# Create the destination secret.
TMP_FILE_PATH="${TMP_WORK_DIR}tmp_create_destination_secret_response.txt"
aws secretsmanager create-secret \
    --name $DESTINATION_SECRET_NAME \
    --kms-key-id $KMS_KEY_ARN \
    --secret-string "{\"username\":\"$DESTINATION_USERNAME\",\"password\":\"$DESTINATION_PASSWORD\"}" \
    --region $REGION \
    | tee $TMP_FILE_PATH
DESTINATION_SECRET_ARN=$(cat $TMP_FILE_PATH | jq -r '.ARN')

# Create the RDS trust policy JSON file.
# Use only rds.amazonaws.com for RDS PROD use cases.
TRUST_POLICY_PATH="${TMP_WORK_DIR}rds_trust_policy.json"
echo '{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "rds.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}' > $TRUST_POLICY_PATH

# Create the IAM role.
TMP_FILE_PATH="${TMP_WORK_DIR}tmp_create_iam_role_response.txt"
aws iam create-role \
    --role-name $DATA_LOAD_IAM_ROLE_NAME \
    --assume-role-policy-document "file://${TRUST_POLICY_PATH}" \
    --tags Key=assumer,Value=aurora_limitless_table_data_load \
    --region $REGION \
    | tee $TMP_FILE_PATH
IAM_ROLE_ARN=$(cat $TMP_FILE_PATH | jq -r '.Role.Arn')

# Create the permission policy JSON file.
PERMISSION_POLICY_PATH="${TMP_WORK_DIR}data_load_permission_policy.json"
permission_json_policy=$(cat &lt;&lt;EOF
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "Ec2Permission",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeNetworkInterfaces",
                "ec2:CreateNetworkInterface",
                "ec2:DeleteNetworkInterface",
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterfacePermission",
                "ec2:DescribeNetworkInterfacePermissions",
                "ec2:ModifyNetworkInterfaceAttribute",
                "ec2:DescribeNetworkInterfaceAttribute",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeRegions",
                "ec2:DescribeVpcs",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeNetworkAcls"
            ],
            "Resource": "*"
        },
        {
            "Sid": "SecretsManagerPermissions",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret"
            ],
            "Resource": [
                "$SOURCE_SECRET_ARN",
                "$DESTINATION_SECRET_ARN"
            ]
        },
        {
            "Sid": "KmsPermissions",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:DescribeKey",
                "kms:GenerateDataKey"
            ],
            "Resource": "$KMS_KEY_ARN"
        },
        {
            "Sid": "RdsPermissions",
            "Effect": "Allow",
            "Action": [
                "rds:DescribeDBClusters",
                "rds:DescribeDBInstances"
            ],
            "Resource": "*"
        }
    ]
}
EOF
)
echo $permission_json_policy > $PERMISSION_POLICY_PATH

# Add the inline policy.
aws iam put-role-policy \
    --role-name $DATA_LOAD_IAM_ROLE_NAME \
    --policy-name aurora-limitless-data-load-policy \
    --policy-document "file://${PERMISSION_POLICY_PATH}" \
    --region $REGION

# Create the key policy JSON file.
KEY_POLICY_PATH="${TMP_WORK_DIR}data_load_key_policy.json"
key_json_policy=$(cat &lt;&lt;EOF
{
    "Id": "key-aurora-limitless-data-load-$RANDOM_SUFFIX",
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::$ACCOUNT_ID:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "Allow use of the key",
            "Effect": "Allow",
            "Principal": {
                "AWS": "$IAM_ROLE_ARN"
            },
            "Action": [
                "kms:Decrypt",
                "kms:DescribeKey",
                "kms:GenerateDataKey"
            ],
            "Resource": "*"
        }
    ]
}
EOF
)
echo $key_json_policy > $KEY_POLICY_PATH

# Add the key policy.
TMP_FILE_PATH="${TMP_WORK_DIR}tmp_put_key_policy_response.txt"
sleep 10 # sleep 10 sec for IAM role ready
aws kms put-key-policy \
    --key-id $KMS_KEY_ARN \
    --policy-name default \
    --policy "file://${KEY_POLICY_PATH}" \
    --region $REGION \
    | tee $TMP_FILE_PATH

echo 'DONE!'

echo "ACCOUNT_ID : [${ACCOUNT_ID}]"
echo "REGION : [${REGION}]"
echo "RANDOM_SUFFIX : [${RANDOM_SUFFIX}]"
echo "IAM_ROLE_ARN : [${IAM_ROLE_ARN}]"
echo "SOURCE_SECRET_ARN : [${SOURCE_SECRET_ARN}]"
echo "DESTINATION_SECRET_ARN : [${DESTINATION_SECRET_ARN}]"

# Example of a successful run:
# ACCOUNT_ID : [012345678912]
# REGION : [ap-northeast-1]
# RANDOM_SUFFIX : [0305000703]
# IAM_ROLE_ARN : [arn:aws:iam::012345678912:role/aurora-data-loader-0305000703]
# SOURCE_SECRET_ARN : [arn:aws:secretsmanager:ap-northeast-1:012345678912:secret:secret-source-0305000703-yQDtow]
# DESTINATION_SECRET_ARN : [arn:aws:secretsmanager:ap-northeast-1:012345678912:secret:secret-destination-0305000703-5d5Jy8]

# If you want to manually clean up failed resource, 
# please remove them in the following order:
# 1. IAM role.
 # aws iam delete-role-policy --role-name Test-Role --policy-name ExamplePolicy --region us-east-1
# aws iam delete-role --role-name Test-Role --region us-east-1
# 2. Source and destination secrets. 
# aws secretsmanager delete-secret --secret-id MyTestSecret --force-delete-without-recovery --region us-east-1
# 3. KDM key. 
# aws kms schedule-key-deletion --key-id arn:aws:kms:us-east-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab --pending-window-in-days 7 --region us-east-1
```

## Output from the data loading utility setup script


The following example shows the output from a successful run of the script.

```
% bash ./data_load_aws_setup_script.sh 
DATE - [0305000703]
RANDOM_SUFFIX - [0305000703]
START!
{
    "KeyMetadata": {
        "AWSAccountId": "123456789012",
        "KeyId": "1234abcd-12ab-34cd-56ef-1234567890ab",
        "Arn": "arn:aws:kms:ap-northeast-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab",
        "CreationDate": "2024-03-05T00:07:49.852000+00:00",
        "Enabled": true,
        "Description": "",
        "KeyUsage": "ENCRYPT_DECRYPT",
        "KeyState": "Enabled",
        "Origin": "AWS_KMS",
        "KeyManager": "CUSTOMER",
        "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
        "KeySpec": "SYMMETRIC_DEFAULT",
        "EncryptionAlgorithms": [
            "SYMMETRIC_DEFAULT"
        ],
        "MultiRegion": false
    }
}
{
    "ARN": "arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:secret-source-0305000703-yQDtow",
    "Name": "secret-source-0305000703",
    "VersionId": "a017bebe-a71b-4220-b923-6850c2599c26"
}
{
    "ARN": "arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:secret-destination-0305000703-5d5Jy8",
    "Name": "secret-destination-0305000703",
    "VersionId": "32a1f989-6391-46b1-9182-f65d242f5eb6"
}
{
    "Role": {
        "Path": "/",
        "RoleName": "aurora-data-loader-0305000703",
        "RoleId": "AROAYPX63ITQOYORQSC6U",
        "Arn": "arn:aws:iam::123456789012:role/aurora-data-loader-0305000703",
        "CreateDate": "2024-03-05T00:07:54+00:00",
        "AssumeRolePolicyDocument": {
           "Version": "2012-10-17",		 	 	 
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": [
                            "rds.amazonaws.com"
                        ]
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        },
        "Tags": [
            {
                "Key": "assumer",
                "Value": "aurora_limitless_table_data_load"
            }
        ]
    }
}
DONE!
ACCOUNT_ID : [123456789012]
REGION : [ap-northeast-1]
RANDOM_SUFFIX : [0305000703]
IAM_ROLE_ARN : [arn:aws:iam::123456789012:role/aurora-data-loader-0305000703]
SOURCE_SECRET_ARN : [arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:secret-source-0305000703-yQDtow]
DESTINATION_SECRET_ARN : [arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:secret-destination-0305000703-5d5Jy8]
```

## Cleaning up failed resources


If you want to clean up failed resources manually, remove them in the following order:

1. IAM role, for example:

   ```
   aws iam delete-role-policy \
   --role-name Test-Role \
   --policy-name ExamplePolicy
   
   aws iam delete-role \
   --role-name Test-Role
   ```

1. Source and destination secrets, for example:

   ```
   aws secretsmanager delete-secret \
   --secret-id MyTestSecret \
   --force-delete-without-recovery
   ```

1. KMS key, for example:

   ```
   aws kms schedule-key-deletion \
   --key-id arn:aws:kms:us-west-2:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab \
   --pending-window-in-days 7
   ```

Then you can retry the script.

# Setting up database authentication and resource access manually
Setting up access manually

The manual process for setting up database authentication and resource access has the following steps:

1. [Creating the customer-managed AWS KMS key](#limitless-load.auth.create-kms)

1. [Adding the IAM role permission policies](#limitless-load.auth.iam-policy)

1. [Creating the database secrets](#limitless-load.auth.secrets)

1. [Creating the IAM role](#limitless-load.auth.iam-role)

1. [Updating the customer-managed AWS KMS key](#limitless-load.auth.update-kms)

This process is optional, and performs the same tasks as in [Setting up database authentication and resource access using a script](limitless-load.script.md). We recommend using the script.

## Creating the customer-managed AWS KMS key
Creating the KMS key

Follow the procedures in [Creating symmetric encryption keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html#create-symmetric-cmk) to create a customer-managed KMS key. You can also use an existing key if it meets these requirements.

**To create a customer-managed KMS key**

1. Sign in to the AWS Management Console and open the AWS KMS console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. Navigate to the **Customer managed keys** page.

1. Choose **Create key**.

1. On the **Configure key** page:

   1. For **Key type**, select **Symmetric**.

   1. For **Key usage**, select **Encrypt and decrypt**.

   1. Choose **Next**.

1. On the **Add labels** page, enter an **Alias** such as **limitless**, then choose **Next**.

1. On the **Define key administrative permissions** page, make sure that the **Allow key administrators to delete this key** check box is selected, then choose **Next**.

1. On the **Define key usage permissions** page, choose **Next**.

1. On the **Review** page, choose **Finish**.

   You update the key policy later.

Record the Amazon Resource Names (ARN) of the KMS key to use in [Adding the IAM role permission policies](#limitless-load.auth.iam-policy).

For information on using the AWS CLI to create the customer-managed KMS key, see [create-key](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kms/create-key.html) and [create-alias](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kms/create-alias.html).

## Creating the database secrets


To allow the data loading utility to access the source and destination database tables, you create two secrets in AWS Secrets Manager: one for the source database and one for the destination database. These secrets store the usernames and passwords for accessing the source and destination databases.

Follow the procedures in [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) to create the key-value pair secrets.

**To create the database secrets**

1. Open the Secrets Manager console at [https://console.aws.amazon.com/secretsmanager/](https://console.aws.amazon.com/secretsmanager/).

1. Choose **Store a new secret**.

1. On the **Choose secret type** page:

   1. For **Secret type**, select **Other type of secret**.

   1. For **Key/value pairs**, choose the **Plaintext** tab.

   1. Enter the following JSON code, where `sourcedbreader` and `sourcedbpassword` are the credentials for the source database user from [Create the source database credentials](limitless-load.utility.md#limitless-load.users.source).

      ```
      {
          "username":"sourcedbreader",
          "password":"sourcedbpassword"
      }
      ```

   1. For **Encryption key**, choose the KMS key that you created in [Creating the customer-managed AWS KMS key](#limitless-load.auth.create-kms), for example `limitless`.

   1. Choose **Next**.

1. On the **Configure secret** page, enter a **Secret name**, such as **source\$1DB\$1secret**, then choose **Next**.

1. On the **Configure rotation - *optional*** page, choose **Next**.

1. On the **Review** page, choose **Store**.

1. Repeat the procedure for the destination database secret:

   1. Enter the following JSON code, where `destinationdbwriter` and `destinationdbpassword` are the credentials for the destination database user from [Create the destination database credentials](limitless-load.utility.md#limitless-load.users.destination).

      ```
      {
          "username":"destinationdbwriter",
          "password":"destinationdbpassword"
      }
      ```

   1. Enter a **Secret name**, such as **destination\$1DB\$1secret**.

Record the ARNs of the secrets to use in [Adding the IAM role permission policies](#limitless-load.auth.iam-policy).

## Creating the IAM role


Data loading requires you to provide access to AWS resources. To provide access, you create the `aurora-data-loader` IAM role by following the procedures in [Creating a role to delegate permissions to an IAM user.](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html)

**To create the IAM role**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Navigate to the **Roles** page.

1. Choose **Create role**.

1. On the **Select trusted entity** page:

   1. For **Trusted entity type**, select **Custom trust policy**.

   1. Enter the following JSON code for the custom trust policy:

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "Service": [
                          "rds.amazonaws.com"
                      ]
                  },
                  "Action": "sts:AssumeRole"
              }
          ]
      }
      ```

------

   1. Choose **Next**.

1. On the **Add permissions** page, choose **Next**.

1. On the **Name, review, and create** page:

   1. For **Role name**, enter **aurora-data-loader** or another name that you prefer.

   1. Choose **Add tag**, and enter the following tag:
      + **Key**: **assumer**
      + **Value**: **aurora\$1limitless\$1table\$1data\$1load**
**Important**  
The Aurora PostgreSQL Limitless Database can only assume an IAM role that has this tag.

   1. Choose **Create role**.

## Updating the customer-managed AWS KMS key
Updating the KMS key

Follow the procedures in [Changing a key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-modifying.html) to add the IAM role `aurora-data-loader` to the default key policy.

**To add the IAM role to the key policy**

1. Sign in to the AWS Management Console and open the AWS KMS console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. Navigate to the **Customer managed keys** page.

1. Choose the KMS key that you created in [Creating the customer-managed AWS KMS key](#limitless-load.auth.create-kms), for example `limitless`.

1. On the **Key policy** tab, for **Key users**, choose **Add**.

1. In the **Add key users** window, select the name of the IAM role that you created in [Creating the IAM role](#limitless-load.auth.iam-role), for example **aurora-data-loader**.

1. Choose **Add**.

## Adding the IAM role permission policies


You must add permission policies to the IAM role that you created. This allows the Aurora PostgreSQL Limitless Database data loading utility to access related AWS resources for building network connections and retrieving the source and destination DB credential secrets.

For more information, see [Modifying a role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_manage_modify.html#roles-modify_gen-policy).

**To add the permission policies**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Navigate to the **Roles** page.

1. Choose the IAM role that you created in [Creating the IAM role](#limitless-load.auth.iam-role), for example **aurora-data-loader**.

1. On the **Permissions** tab, for **Permissions policies** choose **Add permissions**, then **Create inline policy**.

1. On the **Specify permissions** page, choose the **JSON** editor.

1. Copy and paste the following template into the JSON editor, replacing the placeholders with the ARNs for your database secrets and KMS key.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "Ec2Permission",
               "Effect": "Allow",
               "Action": [
                   "ec2:DescribeNetworkInterfaces",
                   "ec2:CreateNetworkInterface",
                   "ec2:DeleteNetworkInterface",
                   "ec2:CreateNetworkInterfacePermission",
                   "ec2:DeleteNetworkInterfacePermission",
                   "ec2:DescribeNetworkInterfacePermissions",
                   "ec2:ModifyNetworkInterfaceAttribute",
                   "ec2:DescribeNetworkInterfaceAttribute",
                   "ec2:DescribeAvailabilityZones",
                   "ec2:DescribeRegions",
                   "ec2:DescribeVpcs",
                   "ec2:DescribeSubnets",
                   "ec2:DescribeSecurityGroups",
                   "ec2:DescribeNetworkAcls"
               ],
               "Resource": "*"
           },
           {
               "Sid": "SecretsManagerPermissions",
               "Effect": "Allow",
               "Action": [
                   "secretsmanager:GetSecretValue",
                   "secretsmanager:DescribeSecret"
               ],
               "Resource": [
                   "arn:aws:secretsmanager:us-east-1:123456789012:secret:source_DB_secret-ABC123",
                   "arn:aws:secretsmanager:us-east-1:123456789012:secret:destination_DB_secret-456DEF"
               ]
           },        {
               "Sid": "KmsPermissions",
               "Effect": "Allow",
               "Action": [
                   "kms:Decrypt",
                   "kms:DescribeKey",
                   "kms:GenerateDataKey"
               ],
               "Resource": "arn:aws:kms:us-east-1:123456789012:key/aa11bb22-####-####-####-fedcba123456"
           },
           {
               "Sid": "RdsPermissions",
               "Effect": "Allow",
               "Action": [
                   "rds:DescribeDBClusters",
                   "rds:DescribeDBInstances"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Check for errors and correct them.

1. Choose **Next**.

1. On the **Review and create** page, enter a **Policy name** such as **data\$1loading\$1policy**, then choose **Create policy**.

# Loading data from an Aurora PostgreSQL DB cluster or RDS for PostgreSQL DB instance
Loading data into Limitless Database

After you complete the resource and authentication setup, connect to the cluster endpoint and call the `rds_aurora.limitless_data_load_start` stored procedure from a limitless database, such as `postgres_limitless`. The limitless database is a database on the DB shard group into which you want to migrate data.

This function connects asynchronously in the background to the source database specified in the command, reads the data from the source, and loads the data onto the shards. For better performance, the data is loaded using parallel threads. The function retrieves a point-in-time table snapshot by running a `SELECT` command to read the data of the table(s) provided in the command.

You can load data into sharded, reference, and standard tables.

You can load data at the database, schema, or table level in `rds_aurora.limitless_data_load_start` calls.
+ Database – You can load one database at a time in each call, with no limit on the schema or table count within the database.
+ Schema – You can load a maximum of 15 schemas in each call, with no limit on the table count within each schema.
+ Table – You can load a maximum of 15 tables in each call.

**Note**  
This feature doesn't use Amazon RDS snapshots or point-in-time isolation of the database. For consistency across tables, we recommend cloning the source database and pointing to that cloned database as the source.

The stored procedure uses the following syntax:

```
CALL rds_aurora.limitless_data_load_start('source_type',
    'source_DB_cluster_or_instance_ID',
    'source_database_name',
    'streaming_mode',
    'data_loading_IAM_role_arn',
    'source_DB_secret_arn',
    'destination_DB_secret_arn',
    'ignore_primary_key_conflict_boolean_flag',
    'is_dry_run',
    (optional parameter) schemas/tables => ARRAY['name1', 'name2', ...]);
```

The input parameters are the following:
+ `source_type` – The source type: `aurora_postgresql` or `rds_postgresql`
+ `source_DB_cluster_or_instance_ID` – The source Aurora PostgreSQL DB cluster identifier or RDS for PostgreSQL DB instance identifier
+ `source_database_name` – The source database name, such as *postgres*
+ `streaming_mode` – Whether to include change data capture (CDC): `full_load` or `full_load_and_cdc`
+ `data_loading_IAM_role_arn` – The IAM role Amazon Resource Name (ARN) for `aurora-data-loader`
+ `source_DB_secret_arn` – The source DB secret ARN
+ `destination_DB_secret_arn` – The destination DB secret ARN
+ `ignore_primary_key_conflict_boolean_flag` – Whether to continue if a primary key conflict occurs:
  + If set to `true`, data loading ignores new changes for rows with a primary key conflict.
  + If set to `false`, data loading overwrites the existing rows on destination tables when it encounters a primary key conflict.
+ `is_dry_run` – Whether to test that the data loading job can connect to the source and destination databases:
  + If set to `true`, tests the connections without loading data
  + If set to `false`, loads the data
+ (optional) `schemas` or `tables` – An array of schemas or tables to load. You can specify either of the following:
  + A list of tables in the format `tables => ARRAY['schema1.table1', 'schema1.table2', 'schema2.table1', ...]`
  + A list of schemas in the format `schemas => ARRAY['schema1', 'schema2', ...]`

  If you don't include this parameter, the entire specified source database is migrated.

The output parameter is the job ID with a message.

The following example shows how to use the `rds_aurora.limitless_data_load_start` stored procedure to load data from an Aurora PostgreSQL DB cluster.

```
CALL rds_aurora.limitless_data_load_start('aurora_postgresql',
    'my-db-cluster',
    'postgres',
    'full_load_and_cdc',
    'arn:aws:iam::123456789012:role/aurora-data-loader-8f2c66',
    'arn:aws:secretsmanager:us-east-1:123456789012:secret:secret-source-8f2c66-EWrr0V',
    'arn:aws:secretsmanager:us-east-1:123456789012:secret:secret-destination-8f2c66-d04fbD',
    'true',
    'false',
    tables => ARRAY['public.customer', 'public.order', 'public.orderdetails']);

INFO: limitless data load job id 1688761223647 is starting.
```

# Monitoring data loading


Aurora PostgreSQL Limitless Database provides several ways to monitor data loading jobs:
+ [Listing data loading jobs](#limitless-load.monitor-list)
+ [Viewing details of data loading jobs using the job ID](#limitless-load.monitor-describe)
+ [Monitoring the Amazon CloudWatch log group](#limitless-load.monitor-cwl)
+ [Monitoring RDS events](#limitless-load.monitor-events)

## Listing data loading jobs


You can connect to the cluster endpoint and use the `rds_aurora.limitless_data_load_jobs` view to list data loading jobs.

```
postgres_limitless=> SELECT * FROM rds_aurora.limitless_data_load_jobs LIMIT 6;

    job_id     |  status   | message |     source_db_identifier      | source_db_name | full_load_complete_time |                                                                progress_details                                                                 |       start_time       |   last_updated_time    |  streaming_mode   | source_engine_type | ignore_primary_key_conflict | is_dryrun 
---------------+-----------+---------+-------------------------------+----------------+-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+------------------------+-------------------+--------------------+-----------------------------+-----------
 1725697520693 | COMPLETED |         | persistent-kdm-auto-source-01 | postgres       | 2024-09-07 08:48:15+00  | {"FULL_LOAD": {"STATUS": "COMPLETED", "DETAILS": "9 of 9 tables loaded", "COMPLETED_AT": "2024/09/07 08:48:15+00", "RECORDS_MIGRATED": 600003}} | 2024-09-07 08:47:13+00 | 2024-09-07 08:48:15+00 | full_load         | aurora_postgresql  | t                           | f
 1725696114225 | COMPLETED |         | persistent-kdm-auto-source-01 | postgres       | 2024-09-07 08:24:20+00  | {"FULL_LOAD": {"STATUS": "COMPLETED", "DETAILS": "3 of 3 tables loaded", "COMPLETED_AT": "2024/09/07 08:24:20+00", "RECORDS_MIGRATED": 200001}} | 2024-09-07 08:23:56+00 | 2024-09-07 08:24:20+00 | full_load         | aurora_postgresql  | t                           | f
 1725696067630 | COMPLETED |         | persistent-kdm-auto-source-01 | postgres       | 2024-09-07 08:23:45+00  | {"FULL_LOAD": {"STATUS": "COMPLETED", "DETAILS": "6 of 6 tables loaded", "COMPLETED_AT": "2024/09/07 08:23:45+00", "RECORDS_MIGRATED": 400002}} | 2024-09-07 08:23:10+00 | 2024-09-07 08:23:45+00 | full_load         | aurora_postgresql  | t                           | f
 1725694221753 | CANCELED  |         | persistent-kdm-auto-source-01 | postgres       |                         | {}                                                                                                                                              | 2024-09-07 07:31:18+00 | 2024-09-07 07:51:49+00 | full_load_and_cdc | aurora_postgresql  | t                           | f
 1725691698210 | COMPLETED |         | persistent-kdm-auto-source-01 | postgres       | 2024-09-07 07:10:51+00  | {"FULL_LOAD": {"STATUS": "COMPLETED", "DETAILS": "1 of 1 tables loaded", "COMPLETED_AT": "2024/09/07 07:10:51+00", "RECORDS_MIGRATED": 100000}} | 2024-09-07 07:10:42+00 | 2024-09-07 07:10:52+00 | full_load         | aurora_postgresql  | t                           | f
 1725691695049 | COMPLETED |         | persistent-kdm-auto-source-01 | postgres       | 2024-09-07 07:10:48+00  | {"FULL_LOAD": {"STATUS": "COMPLETED", "DETAILS": "1 of 1 tables loaded", "COMPLETED_AT": "2024/09/07 07:10:48+00", "RECORDS_MIGRATED": 100000}} | 2024-09-07 07:10:41+00 | 2024-09-07 07:10:48+00 | full_load         | aurora_postgresql  | t                           | f
(6 rows)
```

Job records are deleted after 90 days.

## Viewing details of data loading jobs using the job ID


If you know a job ID, you can connect to the cluster endpoint and use the `rds_aurora.limitless_data_load_job_details` view to see the details of that data loading job, including the table name, job status, and number of rows loaded. You can get the job ID in the responses to the data loading start functions, or from the `rds_aurora.limitless_data_load_jobs` view.

```
postgres_limitless=> SELECT * FROM rds_aurora.limitless_data_load_job_details WHERE job_id='1725696114225';

job_id        | destination_table_name | destination_schema_name | start_time             | status    | full_load_rows | full_load_total_rows | full_load_complete_time | cdc_insert | cdc_update | cdc_delete
--------------+------------------------+-------------------------+------------------------+-----------+----------------+----------------------+-------------------------+------------+------------+------------
1725696114225 | standard_1             | public                  | 2024-09-07 08:23:57+00 | COMPLETED | 100000         | 100000               | 2024-09-07 08:24:08+00  | 0          | 0          | 0
1725696114225 | standard_2             | public                  | 2024-09-07 08:24:08+00 | COMPLETED | 100000         | 100000               | 2024-09-07 08:24:17+00  | 0          | 0          | 0
1725696114225 | standard_3             | public                  | 2024-09-07 08:24:18+00 | COMPLETED | 1              | 1                    | 2024-09-07 08:24:20+00  | 0          | 0          | 0
1725696114225 | standard_4             | public                  | 2024-09-07 08:23:58+00 | PENDING   | 0              | 0                    |                         | 0          | 0          | 0
(4 rows)
```

Job records are deleted after 90 days.

## Monitoring the Amazon CloudWatch log group


After the data loading job status changes to `RUNNING`, you can check the runtime progress using Amazon CloudWatch Logs.

**To monitor CloudWatch log streams**

Sign in to the AWS Management Console and open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Navigate to **Logs**, then **Log groups**.

1. Choose the **/aws/rds/aurora-limitless-database** log group.

1. Search for the log stream of your data loading job by **job\$1id**.

   The log stream has the pattern **Data-Load-Job-*job\$1id***.

1. Choose the log stream to see the log events.

Each log stream shows events containing the job status and the number of rows loaded to the Aurora PostgreSQL Limitless Database destination tables. If a data loading job fails, an error log is also created that shows the failure status and the reason.

Job records are deleted after 90 days.

## Monitoring RDS events


The data loading job also publishes RDS events, for example when a job succeeds, fails, or is canceled. You can view the events from the destination database.

For more information, see [DB shard group events](USER_Events.Messages.md#USER_Events.Messages.shard-group).

# Canceling data loading


To cancel a data loading job, call the `rds_aurora.limitless_data_load_cancel` stored procedure, with the job ID as the input parameter. You call this stored procedure from the same database in the DB shard group where the specific data loading job was started. For example:

```
CALL rds_aurora.limitless_data_load_cancel(12345);

INFO: limitless data load job with id 12345 is canceling without rollback.
```

You can't cancel a data loading job that doesn't exist or isn't running in the same DB shard group.

The Aurora PostgreSQL Limitless Database data loading utility leaves loaded data in the destination tables without rollback, as the response shows. If you don’t want to keep the loaded data, we recommend truncating the destination tables.