

# Bulk import data into a graph
<a name="bulk-import"></a>

 The task system in Neptune Analytics provides a powerful and flexible way to bulk import data into your graph. The `import` task is specifically designed to handle large-scale data ingestion from various data [formats](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/loading-data-formats.html). 

 To initiate a bulk data import, you would first create an import task by specifying the data source, the target graph, and any necessary configuration options. This can be done through the AWS console or programmatically via the API. 

 Throughout the import process, you can monitor the progress of the import task through the user interface or via API calls. Progress reports, and any potential errors or warnings will be accessible in your CloudWatch account, allowing for close monitoring and [troubleshooting](bulk-import-troubleshooting.md) if needed. 

 Importing of data through Import Task is supported in two ways: 
+  During graph creation: [Create a graph from Amazon S3, a Neptune cluster, or a snapshot](bulk-import-into-a-graph.md) 
+  On an existing empty graph: [Bulk import data into an existing Neptune Analytics graph](loading-data-existing-graph.md) 

# Create a graph from Amazon S3, a Neptune cluster, or a snapshot
<a name="bulk-import-into-a-graph"></a>

 You can create a Neptune Analytics graph directly from Amazon S3 or from Neptune using the [CreateGraphUsingImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html) API. This is recommended for importing large graphs from files in Amazon S3 (>50GB of data), importing from existing Neptune clusters, or importing from existing Neptune snapshots. This API automatically analyzes the data, provisions a new graph based on the analysis, and imports data as one atomic operation using maximum available resources. 

**Note**  
 The graph is made available for querying only after the data loading is completed successfully. 

 If errors are encountered during the import process, Neptune Analytics will automatically roll back the provisioned resources, and perform the cleanup. No manual cleanup actions are needed. Error details are available in the CloudWatch logs. See [troubleshooting](bulk-import-troubleshooting.md) for more details. 

**Topics**
+ [

# Creating a Neptune Analytics graph from Amazon S3
](bulk-import-create-from-s3.md)
+ [

# Creating a Neptune Analytics graph from Neptune cluster or snapshot
](bulk-import-create-from-neptune.md)

# Creating a Neptune Analytics graph from Amazon S3
<a name="bulk-import-create-from-s3"></a>

 Neptune Analytics supports bulk importing of CSV, ntriples, and Parquet data directly from Amazon S3 into a Neptune Analytics graph using the `CreateGraphUsingImportTask` API. The data formats supported are listed in [Data format for loading from Amazon S3 into Neptune Analytics](loading-data-formats.md). It is recommended that you try the batch load process with a subset of your data first to validate that it is correctly formatted. Once you have validated that your data files are fully compatible with Neptune Analytics, you can prepare your full dataset and perform the bulk import using the steps below. 

 A quick summary of steps needed to import a graph from Amazon S3: 
+  [Copy the data files to an Amazon S3 bucket](#create-bucket-copy-data): Copy the data files to an Amazon Simple Storage Service bucket in the same region where you want the Neptune Analytics graph to be created. See [Data format for loading from Amazon S3 into Neptune Analytics](loading-data-formats.md) for the details of the format when loading data from Amazon S3 into Neptune Analytics. 
+  [Create your IAM role for Amazon S3 access](#create-iam-role-for-s3-access): Create an IAM role with `read` and `list` access to the bucket and a trust relationship that allows Neptune Analytics graphs to use your IAM role for importing. 
+  Use the `CreateGraphUsingImportTask` API to import from Amazon S3: Create a graph using the `CreateGraphUsingImportTask` API. This will generate a `taskId` for the operation. 
+  Use the `GetImportTask` API to get the details of the import task. The response will indicate the status of the task (ie. INITIALIZING, ANALYZING\$1DATA, IMPORTING etc.). 
+  Once the task has completed successfully, you will see a `COMPLETED` status for the import task and also the `graphId` for the newly created graph. 
+  Use the `GetGraphs` API to fetch all the details about your new graph, including the ARN, endpoint, etc. 

**Note**  
 If you're creating a private graph endpoint, the following permissions are required:   
ec2:CreateVpcEndpoint
ec2:DescribeAvailabilityZones
ec2:DescribeSecurityGroups
ec2:DescribeSubnets
ec2:DescribeVpcAttribute
ec2:DescribeVpcEndpoints
ec2:DescribeVpcs
ec2:ModifyVpcEndpoint
route53:AssociateVPCWithHostedZone
 For more information about required permissions, see [ Actions defined by Neptune Analytics](https://docs.aws.amazon.com//service-authorization/latest/reference/list_amazonneptuneanalytics.html#amazonneptuneanalytics-actions-as-permissions). 

## Copy the data files to an Amazon S3 bucket
<a name="create-bucket-copy-data"></a>

 The Amazon S3 bucket must be in the same AWS region as the cluster that loads the data. You can use the following AWS CLI command to copy the files to the bucket. 

```
aws s3 cp data-file-name s3://bucket-name/object-key-name
```

**Note**  
 In Amazon S3, an object key name is the entire path of a file, including the file name.   
 In the command   

```
aws s3 cp datafile.txt s3://examplebucket/mydirectory/datafile.txt
```
 the object key name is `mydirectory/datafile.txt` 

 You can also use the AWS management console to upload files to the Amazon S3 bucket. Open the Amazon S3 [console](https://console.aws.amazon.com/s3/), and choose a bucket. In the upper-left corner, choose **Upload** to upload files. 

## Create your IAM role for Amazon S3 access
<a name="create-iam-role-for-s3-access"></a>

 Create an IAM role with permissions to `read` and `list` the contents of your bucket. Add a trust relationship that allows Neptune Analytics to assume this role for doing the import task. You could do this using the AWS console, or through the CLI/SDK. 

1.  Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Choose **Roles**, and then choose **Create Role**. 

1.  Provide a role name. 

1.  Choose **Amazon S3** as the AWS service. 

1.  In the **permissions** section, choose `AmazonS3ReadOnlyAccess`. 
**Note**  
 This policy grants s3:Get\$1 and s3:List\$1 permissions to all buckets. Later steps restrict access to the role using the trust policy. The loader only requires s3:Get\$1 and s3:List\$1 permissions to the bucket you are loading from, so you can also restrict these permissions by the Amazon S3 resource. If your Amazon S3 bucket is encrypted, you need to add `kms:Decrypt` permissions as well. `kms:Decrypt` permission is needed for the exported data from Neptune Database 

1.  On the **Trust Relationships** tab, choose **Edit trust relationship**, and paste the following trust policy. Choose **Save** to save the trust relationship. 

------
#### [ JSON ]

****  

   ```
   {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Principal": {
                     "Service": [
                         "neptune-graph.amazonaws.com"
                     ]
                 },
                 "Action": "sts:AssumeRole"
             }
         ]
     }
   ```

------

Your IAM role is now ready for import.

## Use the CreateGraphUsingImportTask API to import from Amazon S3
<a name="use-createGraphUsingImportTask-to-import"></a>

 You can perform this operation from the Neptune console as well as from AWS CLI/SDK. For more information on different parameters, see [https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html) 

**Via CLI/SDK**

```
aws neptune-graph create-graph-using-import-task \
  --graph-name <name> \
  --format <format> \
  --source <s3 path> \
  --role-arn <role arn> \
  [--blank-node-handling "convertToIri"--] \
  [--fail-on-error | --no-fail-on-error] \
  [--deletion-protection | --no-deletion-protection]
  [--public-connectivity | --no-public-connectivity]
  [--min-provisioned-memory]
  [--max-provisioned-memory]
  [--vector-search-configuration]
```
+  **Different Minimum and Maximum Provisioned Memory**: When the `--min-provisioned-memory` and `--max-provisioned-memory` values are specified differently, the graph is created with the maximum provisioned memory specified by `--max-provisioned-memory`. 
+  **Single Provisioned Memory Value**: When only one of `--min-provisioned-memory` or `--max-provisioned-memory` is provided, the graph is created with the specified memory value. 
+  **No Provisioned Memory Values**: If neither `--min-provisioned-memory` nor `--max-provisioned-memory` is provided, the graph is created with a default provisioned memory of 128 m-NCU (memory optimized Neptune Compute Units). 

 Example 1: Create a graph from Amazon S3, with no min/max provisioned memory. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV
```

 Example 2: Create a graph from Amazon S3, with min & max provisioned memory. A graph with m-NCU of 1024 is created. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --min-provisioned-memory 128 \
  --max-provisioned-memory 1024
```

Example 3: Create a graph from Amazon S3, and not fail on parsing errors.

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --no-fail-on-error
```

Example 4: Create a graph from Amazon S3, with 2 replicas.

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --replica-count 2
```

Example 5: Create a graph from Amazon S3 with vector search index.

**Note**  
 The `dimension` must match the dimension of the embeddings in the vertex files. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --replica-count 2 \
  --vector-search-configuration "{\"dimension\":768}"
```

**Via Neptune console**

1. Start the Create Graph wizard and choose **Create graph from existing source**.  
![\[Step 1 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-1.png)

1. Choose type of source as Amazon S3, minimum and maximum provisioned memory, Amazon S3 path, and load role ARN.  
![\[Step 2 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-2.png)

1. Choose the Network Settings and Replica counts.  
![\[Step 3 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-3.png)

1. Create graph.

# Creating a Neptune Analytics graph from Neptune cluster or snapshot
<a name="bulk-import-create-from-neptune"></a>

 Neptune Analytics provides an easy way to bulk import data from an existing Neptune Database cluster or snapshot into a new Neptune Analytics graph, using the `CreateGraphUsingImportTask` API. Data from your source cluster or snapshot is bulk exported into an Amazon S3 bucket that you configure, analyzed to find the right memory configuration, and bulk imported into a new Neptune Analytics graph. You can check the progress of your bulk import at any time using the `GetImportTask` API as well. 

 A few things to consider while using this feature: 
+  You can only import from Neptune Database clusters and snapshots running on a version newer than or equal to 1.3.0. 
+  Import from an existing Neptune Database cluster only supports the ingest of property graph data. RDF data within a Neptune Database cluster cannot be ingested using an import task. If looking to ingest RDF data into Neptune Analytics, this data needs to be manually exported from the Neptune Database cluster to an Amazon S3 bucket before it can be ingested using an import task with an Amazon S3 bucket source. 
+  The exported data from your source Neptune Database cluster or snapshot will reside in your buckets only, and will be encrypted using a KMS key that you provide. The exported data is not directly consumable in any other way into Neptune outside of the `CreateGraphUsingImportTask` API. The exported data is not used after the lifetime of the request, and can be deleted by the user. 
+  You need to provide permissions to perform the export task on the Neptune Database cluster or snapshot, write to your Amazon S3 bucket, and use your KMS key while writing data. 
+  If your source is a Neptune Database cluster, a clone is taken from it and used for export. The original Neptune Database cluster will not be impacted. The cloned cluster is internally managed by the service and is deleted upon completion. 
+  If your source is a Neptune snapshot, a restored DBCluster is created from it, and used for export. The restored cluster is internally managed by the service and is deleted upon completion. 
+  This process is not recommended for small sized graphs. The export process is async, and works best for medium/large sized graphs with a size greater than 25GB. For smaller graphs, a better alternative is to use the [Neptune export](https://docs.aws.amazon.com//neptune/latest/userguide/neptune-export.html) feature to generate CSV data directly from your source, upload that to Amazon S3 and then use the [Batch load](batch-load.md) API instead. 

 A quick summary of steps to import from a Neptune cluster or a Neptune snapshot: 

1.  [Obtain the ARN of your Neptune cluster or snapshot](#obtain-arn-of-neptune-cluster): This can be done from the AWS console or using the Neptune CLI. 

1.  [Create an IAM role with permissions to export from Neptune to Neptune Analytics](#iam-create-role-export-neptune-analytics): Create an IAM role that has permissions to perform an export of your Neptune graph, write to Amazon S3 and use your KMS key for writing data in Amazon S3. 

1.  Use the `CreateGraphUsingImportTask` API with source = NEPTUNE, and provide the ARN of your source, Amazon S3 path to export the data, KMS key to use for exporting data and additional arguments for your Neptune Analytics graph. This should return a `task-id`. 

1.  Use `GetImportTask` API to get the details of your task. 

## Obtain the ARN of your Neptune cluster or snapshot
<a name="obtain-arn-of-neptune-cluster"></a>

 The following instructions demonstrate how to obtain the Amazon Resource Name (ARN) for an existing Amazon Neptune database cluster or snapshot using the AWS Command Line Interface (CLI). The ARN is a unique identifier for an AWS resource, such as a Neptune cluster or snapshot, and is commonly used when interacting with AWS services programmatically or through the AWS management console. 

**Via the CLI:**

```
# Obtaining the ARN of an existing DB Cluster
  aws neptune describe-db-clusters   \
      --db-cluster-identifier *<name> \
      --query 'DBClusters[0].DBClusterArn'
      
      
  # Obtaining the ARN of an existing DB Cluster Snapshot 
  aws neptune describe-db-cluster-snapshots \
      --db-cluster-snapshot-identifier <snapshot name> \
      --query 'DBClusterSnapshots[0].DBClusterSnapshotArn'
```

**Via the AWS console. The ARN can be found on the cluster details page.**

![\[Cluster details option 1.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/cluster-details-1.png)


![\[Cluster details option 2.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/cluster-details-2.png)


## Create an IAM role with permissions to export from Neptune to Neptune Analytics
<a name="iam-create-role-export-neptune-analytics"></a>

1.  Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Choose **Roles**, and then choose **Create Role**. 

1.  Provide a role name. 

1.  Choose **Amazon S3** as the AWS service. 

1.  In the **permissions** section, choose: 
   + `AmazonS3FullAccess`
   + `NeptuneFullAccess`
   + `AmazonRDSFullAccess`

1.  Also create a custom policy with at least the following permissions for the AWS KMS key used: 
   + `kms:ListGrants`
   + `kms:CreateGrant`
   + `kms:RevokeGrant`
   + `kms:DescribeKey`
   + `kms:GenerateDataKey`
   + `kms:Encrypt`
   + `kms:ReEncrypt*`
   + `kms:Decrypt`
**Note**  
 Make sure there are no resource-level `Deny` policies attached to your AWS KMS key. If there are, explicitly allow the AWS KMS permissions for the `Export` role. 

1.  On the **Trust Relationships** tab, choose **Edit trust relationship**, and paste the following trust policy. Choose **Save** to save the trust relationship. 

------
#### [ JSON ]

****  

   ```
   {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Principal": {
                     "Service": [
                         "export.rds.amazonaws.com",
                         "neptune-graph.amazonaws.com"      
                     ]
                 },
                 "Action": "sts:AssumeRole"
             }
         ]
     }
   ```

------

Your IAM role is now ready for import.

**Via CLI/SDK**

 For importing data via Neptune , the API expects additional import-options as defined here [ NeptuneImportOptions ](https://docs.aws.amazon.com/neptune-analytics/latest/apiref/API_NeptuneImportOptions.html). 

Example 1: Create a graph from a Neptune cluster.

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>"
   }}'
```

Example 2: Create a graph from a Neptune cluster with the default vertex preserved.

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>",
      "preserveDefaultVertexLabels" : true 
   }}'
```

Example 3: Create a graph from Neptune cluster with the default edge Id preserved

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>",
      "preserveEdgeIds" : true 
   }}'
```

# Bulk import data into an existing Neptune Analytics graph
<a name="loading-data-existing-graph"></a>

 Neptune Analytics now allows you to efficiently import large datasets into an already provisioned graph database using the `StartImportTask` API. This API facilitates the direct loading of data from an Amazon S3 bucket into an **empty** Neptune Analytics graph. This is designed for loading data into existing empty clusters. 

Two common use cases for using this feature:

1.  Bulk importing data multiple times without provisioning a new graph for each dataset. This helps during the development phase of a project where datasets are being converted into Neptune Analytics compatible load formats. 

1.  Use cases where graph provisioning privileges need to be separated from data operation privileges. For example, scenarios where graph provisioning needs to be done by only by the infrastructure team, and data loading and querying is done by the data engineering team. 

 For use cases where you want to create a new graph loaded with data, use the `CreateGraphUsingImportTask` API instead. 

 For incrementally loading data from Amazon S3 you can use the loader integration with the openCypher `CALL` clause. For more information see [Batch load](batch-load.md). 

**Prerequisites**
+  An empty Amazon Neptune Analytics graph. 
+  Data stored in an Amazon Amazon S3 bucket in the same region as the graph. 
+  An IAM role with permissions to access the Amazon S3 bucket. For more information, see [Create your IAM role for Amazon S3 access](bulk-import-create-from-s3.md#create-iam-role-for-s3-access). 

**Important considerations**
+  **Data integrity**: The `StartImportTask` API is designed to work with graphs that are empty. If your graph contains data, you can first reset the graph using the [reset-graph](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_ResetGraph.html) API. If the Import task finds that the graph is not empty the operation will fail. This operation will delete all data from the graph, so ensure you have backups if necessary. You can use the [ create-graph-snapshot](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphSnapshot.html) API to create snapshot of your existing graph. 
+  **Atomic Operation**: The data import is atomic, meaning it either completes fully or does not apply at all. If the import fails we would reset the state back to an empty graph. 
+  **Format Support**: Loading data supports the same data format as supported by `create-graph-using-import-task` and `neptune.load()` This API doesn’t support importing data from Neptune . 
+  **Queries**: Queries will stop working while the import is in progress. You will get a `Cannot execute any query until bulk import is complete` error until the import finishes. 

**Steps for bulk importing data**

1.  Resetting the graph (if necessary): 

    If your graph is not empty, reset it using the following command: 

   ```
   aws neptune-graph reset-graph --graph-identifier <graph-id>
   ```
**Note**  
 This command will completely remove all existing data from your graph. It is recommended that you take a graph snapshot before performing this action. 

1.  Start the import task: 

    To load data into your Neptune graph, use the `start-import-task` command as follows: 

   ```
   aws neptune-graph start-import-task \
   --graph-identifier <graph-id> \
   --source <s3-path-to-data> \
   --format <data-format> \
   --role-arn <IAM-role-ARN> \
   [--fail-on-error | --no-fail-on-error]
   ```
   +  `graph-identifier`: The unique identifier of your Neptune graph. 
   +  `source`: An Amazon S3 URI prefix. All object names with matching prefixes are loaded. See [ Neptune loader request parameters](/neptune/latest/userguide/load-api-reference-load.html#load-api-reference-load-parameters) for Amazon S3 URI prefix examples. 
   +  `format`: The data format of the Amazon S3 data to be loaded, either `csv`, `openCypher`, or `ntriples`. For more information, see [Data formats](loading-data-formats.md). 
   +  `role-arn`: The ARN of the IAM role that Neptune Analytics can assume to access your Amazon S3 data. 
   +  `(--no-)fail-on-error`: (Optional) Stops the import process early if an error occurs. By default, the system attempts to stop at the first error. 

## Troubleshooting bulk import
<a name="loading-data-existing-graph-troubleshooting"></a>

 The following troubleshooting guidance is for common errors encountered during bulk import of data into an Amazon Neptune graph database. It covers three main issues: the Amazon S3 bucket and the graph being in different regions, the IAM role used not having the correct permissions, and the bulk load files in a public Amazon S3 bucket not being made public for reading. 

**Common errors**

1. The Amazon S3 bucket and your graph are in different regions.

   Verify that your graph and the Amazon S3 bucket are in the same region. Neptune Analytics only supports loading data in the same region.

   ```
   export GRAPH_ID="<graphId>"                // Replace with your graph identifier
   export S3_BUCKET_NAME="<bucketName>"        // Replace with your S3 bucket which contains your graph data files. 
   
   # Make sure your graph and S3 bucket are in the same region
   aws neptune-graph get-graph --graph-identifier $GRAPH_ID
   aws s3api get-bucket-location --bucket $S3_BUCKET_NAME
   ```

1. The IAM role used does not have the correct permissions.

   Verify that you have created the IAM role correctly with read permission to Amazon S3 - see [ Create your IAM role for Amazon S3 access](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/bulk-import-create-from-s3.html#create-iam-role-for-s3-access).

   ```
   export GRAPH_EXEC_ROLE="GraphExecutionRole"
   aws iam list-attached-role-policies --role-name $GRAPH_EXEC_ROLE
   # Output should contain "PolicyName": "AmazonS3*Access".
   ```

1. The `AssumeRole` permission is not granted to Neptune Analytics through the AssumeRolePolicy.

   Verify that you have attached the policy that allows Neptune Analytics to assume the IAM role to access the Amazon S3 bucket. See [ Create your IAM role for Amazon S3 access](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/bulk-import-create-from-s3.html#create-iam-role-for-s3-access).

   ```
   export GRAPH_EXEC_ROLE="GraphExecutionRole"   // Replace with your IAM role. 
   
   
   #Check to make sure Neptune Analytics can assume this role to read from the specificed S3 bucket. 
   aws iam get-role --role-name $GRAPH_EXEC_ROLE --query 'Role.AssumeRolePolicyDocument' --output text
   # Output should contain - SERVICE neptune-graph.amazonaws.com
   ```

1.  The bulk load files are in a public Amazon S3 bucket, but the files themselves are not made public for reading. 

    When adding bulk load files to a public Amazon S3 bucket, ensure that each file's access control list (ACL) is set to allow public reads. For example, to set this through the AWS CLI: 

   ```
     aws s3 cp <FileSourceLocation> <FileTargetLocation> --acl public-read
   ```

    This setting can also be done through the Amazon S3 console or the AWS SDKs. For more details, refer to the documentation for [ Configuring ACLs](https://docs.aws.amazon.com//AmazonS3/latest/userguide/managing-acls.html). 

# Checking the details and progress of an import task
<a name="bulk-import-checking-details"></a>

 You can use the [ GetImportTask](https://docs.aws.amazon.com/neptune-analytics/latest/apiref/API_GetImportTask.html) API to track the progress and the status of your import task. 

```
aws neptune-graph get-import-task --task-id <task-id>
```

 An Import task can be in the following state: 
+  **INITIALIZING**: The task is preparing for import, including provisioning a graph when using the `CreateGraphUsingImportTask` API. 
+  **ANALYZING\$1DATA**: The task is taking an initial pass through the dataset to determine the optimal configuration for the graph. 
+  **IMPORTING**: The data is being loaded into the graph. 
+  **EXPORTING**: Data is being exported from the Neptune cluster or snapshot. This is only applicable when performing an import task with a source of Neptune and through the `CreateGraphUsingImportTask` API. 
+  **ROLLING\$1BACK**: The import task encountered an error. Refer to the [troubleshooting](bulk-import-troubleshooting.md) section to investigate the errors. The import task will be rolled back and eventually marked as `FAILED`. 
+  **SUCCEEDED**: Graph creation and data loading have succeeded. Use the `get-graph` API to view details of the final graph. 
+  **REPROVISIONING**: A temporary state while the graph is being reconfigured during the import task. 
+  **FAILED**: Graph creation or data loading has failed. Refer to the [troubleshooting](bulk-import-troubleshooting.md) section to understand the reason for the failure. 
+  **CANCELLING**: The user has cancelled the import task, and cancellation is in progress. 
+  **CANCELLED**: The import task has been cancelled, and all resources have been released. 

 Additionally, import task can be used to track the progress of the load, error count and graph summary. 

# Canceling an import task
<a name="bulk-import-cancelling-import"></a>

 You can cancel a running import task by using the [ CancelImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CancelImportTask.html) API. 

```
aws neptune-graph cancel-import-task \ 
--task-id <task-id>
```

 The import task will will be canceled and all changes rolled back. The state of the import task will switch to `CANCELING` after `cancel-import-task` API is called and eventually the state will be `CANCELED` when rollback finishes. You can check the current state of your import task using the [GetImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/userguide/bulk-import-checking-details.html) API. 

```
aws neptune-graph get-import-task \
--task-id <task-id>
```

# Troubleshooting
<a name="bulk-import-troubleshooting"></a>

 For both bulk load and batch load, all the errors and summary of the load is sent to the CloudWatch log group in your account. To view the logs go to CloudWatch, click log groups from the left column, then search for and click `/aws/neptune/import-task-logs/`. 

1.  **Batch Load**: The logs for each load is saved under `/aws/neptune/import-task-logs/<graph-id>/<load-id>` CloudWatch log stream. 

1.  **Bulk Load using Import Task**: The logs are saved under `/aws/neptune/import-task-logs/<graph-id>/<task-id>` CloudWatch log stream. 
+  **S3\$1ACCESS\$1DENIED**: The server does not have permissions to list or download the given file. Fix the permissions and retry. See [Create your IAM role for Amazon S3 access](bulk-import-create-from-s3.md#create-iam-role-for-s3-access) for help setting up the Amazon S3 permissions. 
+  **LARGE\$1STRING\$1ERROR**: One or more strings exceeded the limit on the size of strings. This data cannot be inserted as is. Update the strings exceeding the limit and retry. 
+  **PARSING\$1ERROR**: Error parsing the given value(s). Correct the value(s) and retry. More information on different parsing errors is provided in this section. 
+  **OUT\$1OF\$1MEMORY**: No more data can be loaded in the current m-NCU. If encountered during import task, set a higher m-NCU and retry. If encountered during batch load, scale the number of m-NCU and retry the batch load. 
+  **PARTITION\$1FULL\$1ERROR**: No more data can be loaded in the internal server configuration. If encountered during import task, the import workflow would change the server configuration and retry. If encountered during batch load, reach out to the AWS service team to unblock loading of new data. 

**Common parsing errors and solutions**


| Error template | Solution | 
| --- | --- | 
|  Invalid data type encountered for header val:badtype when parsing line `[:ID,firstName:String,val:badtype,:LABEL]`.  |  Incorrect Datatype provided. Check the documentation for supported data types. See [Data formats](loading-data-formats.md) for more information.  | 
|  Multi-valued columns are not supported `firstName:String[]` when parsing line `[:ID,firstName:String[],val:String,:LABEL]`.  |  The `opencypher` format does not support multivalued user defined properties. Try using the `csv` format to insert multivalued vertex properties, or remove multivalued properties.  | 
|  Bad header for a file in '`OPEN_CYPHER`' format, could not determine node or relationship file, found system columns from '`csv`' format when parsing line `[~id,firstName:String,val:int,:LABEL]`.  |  Both the `opencypher` and `csv` format expect certain header columns to be present. Make sure you have entered them correctly.  Check the [Data formats](loading-data-formats.md) documentation for required fields by format.   | 
|  Bad header for a file in '`OPEN_CYPHER`' format, could not determine node or relationship file.  |  The header of the files does not have the required system columns. Check the [Data formats](loading-data-formats.md) for required fields by format.  | 
|  Relationship file in '`OPEN_CYPHER`' format should contain both `:START_ID` and `:END_ID` columns when parsing line `[:START_ID,firstName:String]`.  |  The header of the edge files does not have all the required system columns. Check the [Data formats](loading-data-formats.md) for required fields by format.  | 
|  Invalid data type. Found system columns from '`OPEN_CYPHER`' format `:ID` when parsing line `[:ID,firstName:String,val:Int,~label]`.  |  The `opencypher` and `csv` formats have different system column names, and they begin with `:` and `~` respectively. User defined properties cannot begin with those reserved prefixes in the respective formats. Confirm the format name and system column names, or update user defined properties to not use reserved prefixes.  | 
|  Named column name is not present for header field `:BLAH` when parsing line `[:ID,:BLAH,firstName:String]`.  |  The `opencypher` and `csv` formats have different system column names, and they begin with `:` and `~` respectively. User defined properties cannot begin with those reserved prefixes in the respective formats. Confirm the format name and system column names, or update user defined properties to not use reserved prefixes.  | 
|  System column other than `ID` cannot be stored as a property: <columnHeader>.  |  The `opencypher` and `csv` formats have different system column names, and they begin with `:` and `~` respectively. User defined properties cannot begin with those reserved prefixes in the respective formats. Confirm the format name and system column names, or update user defined properties to not use reserved prefixes.  | 
|  Duplicate user column `firstName` when parsing line `[:ID,:LABEL, firstName:String, firstName:String]`.  |  The file contains duplicate user defined property column names in the header. Remove all of the duplicate columns.  | 
|  Duplicate system column `:ID` found when parsing line `[:ID,:ID,firstName:String,:LABEL]`.  |  The file contains duplicate system column names in the header. Remove all of the duplicate columns.  | 
|  Invalid column name provided for loading embeddings: `[abcd]` for filename: someFilename. Embedding column name must be the same as their corresponding vector index name when parsing line `[:ID,firstName:String,abcd:Vector,:LABEL] in [filename]`.  |  An incorrect name is used for the vector embeddings.  | 
|  "date" type is curretly not supported. "datetime" may be an alternative type.  |  Use `datetime` as the field type as date type suppoorted yet in Neptune Analytics.  | 
|  Headers must be non-empty.  |  Headers need to be non empty. If the file has an empty line in the beginning, remove the empty line.  | 
|  Failure encounted while parsing the `csv` file.  |  Likely reason is the number of columns in the row doesn't match the number of columns provided in the header. If you dont have a value for a column, provide an empty value. For example: `123,vertex,,,`.  | 
|  Could not process value of type:`http://www.w3.org/2001/XMLSchema#int` for value: `a` when parsing line `[v1,v19683,con,a]` in [file].  |  There is a mismatch between the type of the value provided for that column in the row and the type specified in the header. In this specific case the column header is annotated with integer type but a is not parseable as an integer.  | 
|  Could not load vector embedding: `[a,bc]`. Check the dimensionality for this vector.  |  The size of the vector does not match the dimension defined in the vector search configuration for the graph.  | 
|  Could not load vector embedding: `[a,NaN]`. Check the value for this vector.  |  Float and double values in scientific notation are currently not supported. Also `Infinity`, `-Infinity`, `INF`, `-INF`, and `NaN` are not recognized.  | 
|  Could not process value of type: date for value: "2024-11-22T21:40:40Z".  |   The values in columns of type 'date' must not contain time. For instance, "2024-11-22T21:40:40Z" is not a valid value for the 'date' column since it contains the time component '21:40:40Z'. Change the column type to 'dateTime' or remove the time from the column values.   | 
|   Please check if you are loading lines longer than 65536.   |   The CSV format does not support lines longer than 65536 characters. Check if some lines are unexpectedly longer than 65536 characters, and fix those. Also check for properties with long string values and consider excluding those. For files with vector embeddings, if vector embeddings are too long then consider shortening the precision of floating point values. Alternatively, try the Parquet format to ingest data with long lines.   | 