# Create a graph from Amazon S3, a Neptune cluster, or a snapshot


 You can create a Neptune Analytics graph directly from Amazon S3 or from Neptune using the [CreateGraphUsingImportTask](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html) API. This is recommended for importing large graphs from files in Amazon S3 (>50GB of data), importing from existing Neptune clusters, or importing from existing Neptune snapshots. This API automatically analyzes the data, provisions a new graph based on the analysis, and imports data as one atomic operation using maximum available resources. 

**Note**  
 The graph is made available for querying only after the data loading is completed successfully. 

 If errors are encountered during the import process, Neptune Analytics will automatically roll back the provisioned resources, and perform the cleanup. No manual cleanup actions are needed. Error details are available in the CloudWatch logs. See [troubleshooting](bulk-import-troubleshooting.md) for more details. 

**Topics**
+ [

# Creating a Neptune Analytics graph from Amazon S3
](bulk-import-create-from-s3.md)
+ [

# Creating a Neptune Analytics graph from Neptune cluster or snapshot
](bulk-import-create-from-neptune.md)

# Creating a Neptune Analytics graph from Amazon S3


 Neptune Analytics supports bulk importing of CSV, ntriples, and Parquet data directly from Amazon S3 into a Neptune Analytics graph using the `CreateGraphUsingImportTask` API. The data formats supported are listed in [Data format for loading from Amazon S3 into Neptune Analytics](loading-data-formats.md). It is recommended that you try the batch load process with a subset of your data first to validate that it is correctly formatted. Once you have validated that your data files are fully compatible with Neptune Analytics, you can prepare your full dataset and perform the bulk import using the steps below. 

 A quick summary of steps needed to import a graph from Amazon S3: 
+  [Copy the data files to an Amazon S3 bucket](#create-bucket-copy-data): Copy the data files to an Amazon Simple Storage Service bucket in the same region where you want the Neptune Analytics graph to be created. See [Data format for loading from Amazon S3 into Neptune Analytics](loading-data-formats.md) for the details of the format when loading data from Amazon S3 into Neptune Analytics. 
+  [Create your IAM role for Amazon S3 access](#create-iam-role-for-s3-access): Create an IAM role with `read` and `list` access to the bucket and a trust relationship that allows Neptune Analytics graphs to use your IAM role for importing. 
+  Use the `CreateGraphUsingImportTask` API to import from Amazon S3: Create a graph using the `CreateGraphUsingImportTask` API. This will generate a `taskId` for the operation. 
+  Use the `GetImportTask` API to get the details of the import task. The response will indicate the status of the task (ie. INITIALIZING, ANALYZING\$1DATA, IMPORTING etc.). 
+  Once the task has completed successfully, you will see a `COMPLETED` status for the import task and also the `graphId` for the newly created graph. 
+  Use the `GetGraphs` API to fetch all the details about your new graph, including the ARN, endpoint, etc. 

**Note**  
 If you're creating a private graph endpoint, the following permissions are required:   
ec2:CreateVpcEndpoint
ec2:DescribeAvailabilityZones
ec2:DescribeSecurityGroups
ec2:DescribeSubnets
ec2:DescribeVpcAttribute
ec2:DescribeVpcEndpoints
ec2:DescribeVpcs
ec2:ModifyVpcEndpoint
route53:AssociateVPCWithHostedZone
 For more information about required permissions, see [ Actions defined by Neptune Analytics](https://docs.aws.amazon.com//service-authorization/latest/reference/list_amazonneptuneanalytics.html#amazonneptuneanalytics-actions-as-permissions). 

## Copy the data files to an Amazon S3 bucket


 The Amazon S3 bucket must be in the same AWS region as the cluster that loads the data. You can use the following AWS CLI command to copy the files to the bucket. 

```
aws s3 cp data-file-name s3://bucket-name/object-key-name
```

**Note**  
 In Amazon S3, an object key name is the entire path of a file, including the file name.   
 In the command   

```
aws s3 cp datafile.txt s3://examplebucket/mydirectory/datafile.txt
```
 the object key name is `mydirectory/datafile.txt` 

 You can also use the AWS management console to upload files to the Amazon S3 bucket. Open the Amazon S3 [console](https://console.aws.amazon.com/s3/), and choose a bucket. In the upper-left corner, choose **Upload** to upload files. 

## Create your IAM role for Amazon S3 access


 Create an IAM role with permissions to `read` and `list` the contents of your bucket. Add a trust relationship that allows Neptune Analytics to assume this role for doing the import task. You could do this using the AWS console, or through the CLI/SDK. 

1.  Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Choose **Roles**, and then choose **Create Role**. 

1.  Provide a role name. 

1.  Choose **Amazon S3** as the AWS service. 

1.  In the **permissions** section, choose `AmazonS3ReadOnlyAccess`. 
**Note**  
 This policy grants s3:Get\$1 and s3:List\$1 permissions to all buckets. Later steps restrict access to the role using the trust policy. The loader only requires s3:Get\$1 and s3:List\$1 permissions to the bucket you are loading from, so you can also restrict these permissions by the Amazon S3 resource. If your Amazon S3 bucket is encrypted, you need to add `kms:Decrypt` permissions as well. `kms:Decrypt` permission is needed for the exported data from Neptune Database 

1.  On the **Trust Relationships** tab, choose **Edit trust relationship**, and paste the following trust policy. Choose **Save** to save the trust relationship. 

------
#### [ JSON ]

****  

   ```
   {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Principal": {
                     "Service": [
                         "neptune-graph.amazonaws.com"
                     ]
                 },
                 "Action": "sts:AssumeRole"
             }
         ]
     }
   ```

------

Your IAM role is now ready for import.

## Use the CreateGraphUsingImportTask API to import from Amazon S3
CreateGraphUsingImportTask API

 You can perform this operation from the Neptune console as well as from AWS CLI/SDK. For more information on different parameters, see [https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html](https://docs.aws.amazon.com//neptune-analytics/latest/apiref/API_CreateGraphUsingImportTask.html) 

**Via CLI/SDK**

```
aws neptune-graph create-graph-using-import-task \
  --graph-name <name> \
  --format <format> \
  --source <s3 path> \
  --role-arn <role arn> \
  [--blank-node-handling "convertToIri"--] \
  [--fail-on-error | --no-fail-on-error] \
  [--deletion-protection | --no-deletion-protection]
  [--public-connectivity | --no-public-connectivity]
  [--min-provisioned-memory]
  [--max-provisioned-memory]
  [--vector-search-configuration]
```
+  **Different Minimum and Maximum Provisioned Memory**: When the `--min-provisioned-memory` and `--max-provisioned-memory` values are specified differently, the graph is created with the maximum provisioned memory specified by `--max-provisioned-memory`. 
+  **Single Provisioned Memory Value**: When only one of `--min-provisioned-memory` or `--max-provisioned-memory` is provided, the graph is created with the specified memory value. 
+  **No Provisioned Memory Values**: If neither `--min-provisioned-memory` nor `--max-provisioned-memory` is provided, the graph is created with a default provisioned memory of 128 m-NCU (memory optimized Neptune Compute Units). 

 Example 1: Create a graph from Amazon S3, with no min/max provisioned memory. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV
```

 Example 2: Create a graph from Amazon S3, with min & max provisioned memory. A graph with m-NCU of 1024 is created. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --min-provisioned-memory 128 \
  --max-provisioned-memory 1024
```

Example 3: Create a graph from Amazon S3, and not fail on parsing errors.

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --no-fail-on-error
```

Example 4: Create a graph from Amazon S3, with 2 replicas.

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --replica-count 2
```

Example 5: Create a graph from Amazon S3 with vector search index.

**Note**  
 The `dimension` must match the dimension of the embeddings in the vertex files. 

```
aws neptune-graph create-graph-using-import-task \
  --graph-name 'graph-1' \
  --source "s3://bucket-name/gremlin-format-dataset/" \
  --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
  --format CSV 
  --replica-count 2 \
  --vector-search-configuration "{\"dimension\":768}"
```

**Via Neptune console**

1. Start the Create Graph wizard and choose **Create graph from existing source**.  
![\[Step 1 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-1.png)

1. Choose type of source as Amazon S3, minimum and maximum provisioned memory, Amazon S3 path, and load role ARN.  
![\[Step 2 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-2.png)

1. Choose the Network Settings and Replica counts.  
![\[Step 3 of import using console.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/import-step-3.png)

1. Create graph.

# Creating a Neptune Analytics graph from Neptune cluster or snapshot


 Neptune Analytics provides an easy way to bulk import data from an existing Neptune Database cluster or snapshot into a new Neptune Analytics graph, using the `CreateGraphUsingImportTask` API. Data from your source cluster or snapshot is bulk exported into an Amazon S3 bucket that you configure, analyzed to find the right memory configuration, and bulk imported into a new Neptune Analytics graph. You can check the progress of your bulk import at any time using the `GetImportTask` API as well. 

 A few things to consider while using this feature: 
+  You can only import from Neptune Database clusters and snapshots running on a version newer than or equal to 1.3.0. 
+  Import from an existing Neptune Database cluster only supports the ingest of property graph data. RDF data within a Neptune Database cluster cannot be ingested using an import task. If looking to ingest RDF data into Neptune Analytics, this data needs to be manually exported from the Neptune Database cluster to an Amazon S3 bucket before it can be ingested using an import task with an Amazon S3 bucket source. 
+  The exported data from your source Neptune Database cluster or snapshot will reside in your buckets only, and will be encrypted using a KMS key that you provide. The exported data is not directly consumable in any other way into Neptune outside of the `CreateGraphUsingImportTask` API. The exported data is not used after the lifetime of the request, and can be deleted by the user. 
+  You need to provide permissions to perform the export task on the Neptune Database cluster or snapshot, write to your Amazon S3 bucket, and use your KMS key while writing data. 
+  If your source is a Neptune Database cluster, a clone is taken from it and used for export. The original Neptune Database cluster will not be impacted. The cloned cluster is internally managed by the service and is deleted upon completion. 
+  If your source is a Neptune snapshot, a restored DBCluster is created from it, and used for export. The restored cluster is internally managed by the service and is deleted upon completion. 
+  This process is not recommended for small sized graphs. The export process is async, and works best for medium/large sized graphs with a size greater than 25GB. For smaller graphs, a better alternative is to use the [Neptune export](https://docs.aws.amazon.com//neptune/latest/userguide/neptune-export.html) feature to generate CSV data directly from your source, upload that to Amazon S3 and then use the [Batch load](batch-load.md) API instead. 

 A quick summary of steps to import from a Neptune cluster or a Neptune snapshot: 

1.  [Obtain the ARN of your Neptune cluster or snapshot](#obtain-arn-of-neptune-cluster): This can be done from the AWS console or using the Neptune CLI. 

1.  [Create an IAM role with permissions to export from Neptune to Neptune Analytics](#iam-create-role-export-neptune-analytics): Create an IAM role that has permissions to perform an export of your Neptune graph, write to Amazon S3 and use your KMS key for writing data in Amazon S3. 

1.  Use the `CreateGraphUsingImportTask` API with source = NEPTUNE, and provide the ARN of your source, Amazon S3 path to export the data, KMS key to use for exporting data and additional arguments for your Neptune Analytics graph. This should return a `task-id`. 

1.  Use `GetImportTask` API to get the details of your task. 

## Obtain the ARN of your Neptune cluster or snapshot


 The following instructions demonstrate how to obtain the Amazon Resource Name (ARN) for an existing Amazon Neptune database cluster or snapshot using the AWS Command Line Interface (CLI). The ARN is a unique identifier for an AWS resource, such as a Neptune cluster or snapshot, and is commonly used when interacting with AWS services programmatically or through the AWS management console. 

**Via the CLI:**

```
# Obtaining the ARN of an existing DB Cluster
  aws neptune describe-db-clusters   \
      --db-cluster-identifier *<name> \
      --query 'DBClusters[0].DBClusterArn'
      
      
  # Obtaining the ARN of an existing DB Cluster Snapshot 
  aws neptune describe-db-cluster-snapshots \
      --db-cluster-snapshot-identifier <snapshot name> \
      --query 'DBClusterSnapshots[0].DBClusterSnapshotArn'
```

**Via the AWS console. The ARN can be found on the cluster details page.**

![\[Cluster details option 1.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/cluster-details-1.png)


![\[Cluster details option 2.\]](http://docs.aws.amazon.com/neptune-analytics/latest/userguide/images/bulk-import/cluster-details-2.png)


## Create an IAM role with permissions to export from Neptune to Neptune Analytics


1.  Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Choose **Roles**, and then choose **Create Role**. 

1.  Provide a role name. 

1.  Choose **Amazon S3** as the AWS service. 

1.  In the **permissions** section, choose: 
   + `AmazonS3FullAccess`
   + `NeptuneFullAccess`
   + `AmazonRDSFullAccess`

1.  Also create a custom policy with at least the following permissions for the AWS KMS key used: 
   + `kms:ListGrants`
   + `kms:CreateGrant`
   + `kms:RevokeGrant`
   + `kms:DescribeKey`
   + `kms:GenerateDataKey`
   + `kms:Encrypt`
   + `kms:ReEncrypt*`
   + `kms:Decrypt`
**Note**  
 Make sure there are no resource-level `Deny` policies attached to your AWS KMS key. If there are, explicitly allow the AWS KMS permissions for the `Export` role. 

1.  On the **Trust Relationships** tab, choose **Edit trust relationship**, and paste the following trust policy. Choose **Save** to save the trust relationship. 

------
#### [ JSON ]

****  

   ```
   {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Principal": {
                     "Service": [
                         "export.rds.amazonaws.com",
                         "neptune-graph.amazonaws.com"      
                     ]
                 },
                 "Action": "sts:AssumeRole"
             }
         ]
     }
   ```

------

Your IAM role is now ready for import.

**Via CLI/SDK**

 For importing data via Neptune , the API expects additional import-options as defined here [ NeptuneImportOptions ](https://docs.aws.amazon.com/neptune-analytics/latest/apiref/API_NeptuneImportOptions.html). 

Example 1: Create a graph from a Neptune cluster.

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>"
   }}'
```

Example 2: Create a graph from a Neptune cluster with the default vertex preserved.

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>",
      "preserveDefaultVertexLabels" : true 
   }}'
```

Example 3: Create a graph from Neptune cluster with the default edge Id preserved

```
aws neptune-graph create-graph-using-import-task \
   --graph-name <graph-name>
   --source arn:aws:rds:<region>:123456789101:cluster:neptune-cluster \
   --min-provisioned-memory 1024 \
   --max-provisioned-memory 1024 \
   --role-arn <role-arn> \
   --import-options '{"neptune": {
      "s3ExportKmsKeyId":"arn:aws:kms:<region>:<account>:key/<key>",
      "s3ExportPath": :"<s3 path for exported data>",
      "preserveEdgeIds" : true 
   }}'
```