# Using the Amazon Neptune bulk loader to ingest data
Neptune Bulk Loader

Amazon Neptune provides a `Loader` command for loading data from external files directly into a Neptune DB cluster. You can use this command instead of executing a large number of `INSERT` statements, `addV` and `addE` steps, or other API calls.

The Neptune **Loader** command is faster, has less overhead, is optimized for large datasets, and supports both Gremlin data and the RDF (Resource Description Framework) data used by SPARQL.

The following diagram shows an overview of the load process:

![\[Diagram showing the basic steps involved in loading data into Neptune.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/load-diagram.png)


Here are the steps of the loading process:

1. Copy the data files to an Amazon Simple Storage Service (Amazon S3) bucket.

1. Create an IAM role with Read and List access to the bucket.

1. Create an Amazon S3 VPC endpoint.

1. Start the Neptune loader by sending a request via HTTP to the Neptune DB instance.

1. The Neptune DB instance assumes the IAM role to load the data from the bucket.

**Note**  
You can load encrypted data from Amazon S3 if it was encrypted using either the Amazon S3 `SSE-S3` or the `SSE-KMS` mode, provided that the role you use for bulk load has access to the Amazon S3 object, and also in the case of SSE-KMS, to `kms:decrypt`. Neptune can then impersonate your credentials and issue `s3:getObject` calls on your behalf.  
However, Neptune does not currently support loading data encrypted using the `SSE-C` mode.

The following sections provide instructions for preparing and loading data into Neptune.

**Topics**
+ [

# Prerequisites: IAM Role and Amazon S3 Access
](bulk-load-tutorial-IAM.md)
+ [

# Load Data Formats
](bulk-load-tutorial-format.md)
+ [

# Example: Loading Data into a Neptune DB Instance
](bulk-load-data.md)
+ [

# Optimizing an Amazon Neptune bulk load
](bulk-load-optimize.md)
+ [

# Neptune Loader Reference
](load-api-reference.md)

# Prerequisites: IAM Role and Amazon S3 Access
IAM Role and Amazon S3 Access

Loading data from an Amazon Simple Storage Service (Amazon S3) bucket requires an AWS Identity and Access Management (IAM) role that has access to the bucket. Amazon Neptune assumes this role to load the data.

**Note**  
You can load encrypted data from Amazon S3 if it was encrypted using the Amazon S3 `SSE-S3` mode. In that case, Neptune is able to impersonate your credentials and issue `s3:getObject` calls on your behalf.  
You can also load encrypted data from Amazon S3 that was encrypted using the `SSE-KMS` mode, as long as your IAM role includes the necessary permissions to access AWS KMS. Without proper AWS KMS permissions, the bulk load operation fails and returns a `LOAD_FAILED` response.  
Neptune does not currently support loading Amazon S3 data encrypted using the `SSE-C` mode.

The following sections show how to use a managed IAM policy to create an IAM role for accessing Amazon S3 resources, and then attach the role to your Neptune cluster.

**Topics**
+ [

# Creating an IAM role to allow Amazon Neptune to access Amazon S3 resources
](bulk-load-tutorial-IAM-CreateRole.md)
+ [

# Adding the IAM Role to an Amazon Neptune Cluster
](bulk-load-tutorial-IAM-add-role-cluster.md)
+ [

# Creating the Amazon S3 VPC Endpoint
](bulk-load-tutorial-vpc.md)
+ [

# Chaining IAM roles in Amazon Neptune
](bulk-load-tutorial-chain-roles.md)

**Note**  
These instructions require that you have access to the IAM console and permissions to manage IAM roles and policies. For more information, see [Permissions for Working in the AWS Management Console](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_permissions-required.html#Credentials-Permissions-overview-console) in the *IAM User Guide*.  
The Amazon Neptune console requires the user to have the following IAM permissions to attach the role to the Neptune cluster:  

```
iam:GetAccountSummary on resource: *
iam:ListAccountAliases on resource: *
iam:PassRole on resource: * with iam:PassedToService restricted to rds.amazonaws.com
```

# Creating an IAM role to allow Amazon Neptune to access Amazon S3 resources
Creating an IAM role to access Amazon S3

Use the `AmazonS3ReadOnlyAccess` managed IAM policy to create a new IAM role that will allow Amazon Neptune access to Amazon S3 resources.

**To create a new IAM role that allows Neptune access to Amazon S3**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**.

1. Choose **Create role**.

1. Under **AWS service**, choose **S3**.

1. Choose **Next: Permissions**.

1. Use the filter box to filter by the term **S3** and check the box next to **AmazonS3ReadOnlyAccess**.
**Note**  
This policy grants `s3:Get*` and `s3:List*` permissions to all buckets. Later steps restrict access to the role using the trust policy.  
The loader only requires `s3:Get*` and `s3:List*` permissions to the bucket you are loading from, so you can also restrict these permissions by the Amazon S3 resource.  
If your S3 bucket is encrypted, you need to add `kms:Decrypt` permissions

1. Choose **Next: Review**.

1. Set **Role Name** to a name for your IAM role, for example: `NeptuneLoadFromS3`. You can also add an optional **Role Description** value, such as "Allows Neptune to access Amazon S3 resources on your behalf."

1. Choose **Create Role**.

1. In the navigation pane, choose **Roles**.

1. In the **Search** field, enter the name of the role you created, and choose the role when it appears in the list.

1. On the **Trust Relationships** tab, choose **Edit trust relationship**.

1. In the text field, paste the following trust policy.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "",
         "Effect": "Allow",
         "Principal": {
           "Service": [
             "rds.amazonaws.com"
           ]
         },
         "Action": "sts:AssumeRole"
       }
     ]
   }
   ```

------

1. Choose **Update trust policy**.

1. Complete the steps in [Adding the IAM Role to an Amazon Neptune Cluster](bulk-load-tutorial-IAM-add-role-cluster.md).

# Adding the IAM Role to an Amazon Neptune Cluster
Adding the IAM Role to a Cluster

Use the console to add the IAM role to an Amazon Neptune cluster. This allows any Neptune DB instance in the cluster to assume the role and load from Amazon S3.

**Note**  
The Amazon Neptune console requires the user to have the following IAM permissions to attach the role to the Neptune cluster:  

```
iam:GetAccountSummary on resource: *
iam:ListAccountAliases on resource: *
iam:PassRole on resource: * with iam:PassedToService restricted to rds.amazonaws.com
```

**To add an IAM role to an Amazon Neptune cluster**

1. Sign in to the AWS Management Console, and open the Amazon Neptune console at [https://console.aws.amazon.com/neptune/home](https://console.aws.amazon.com/neptune/home).

1. In the navigation pane, choose **Databases**.

1. Choose the cluster identifier for the cluster that you want to modify.

1. Choose the **Connectivity & Security** tab.

1. In the IAM Roles section, choose the role you created in the previous section.

1. Choose **Add role**.

1. Wait until the IAM role becomes accessible to the cluster before you use it.

# Creating the Amazon S3 VPC Endpoint
Creating the VPC Endpoint

The Neptune loader requires a VPC endpoint of type Gateway for Amazon S3.

**To set up access for Amazon S3**

1. Sign in to the AWS Management Console and open the Amazon VPC console at [https://console.aws.amazon.com/vpc/](https://console.aws.amazon.com/vpc/).

1. In the navigation pane, choose **Endpoints**.

1. Choose **Create Endpoint**.

1. Choose the **Service Name** `com.amazonaws.region.s3` for the Gateway type endpoint.
**Note**  
If the Region here is incorrect, make sure that the console Region is correct.

1. Choose the VPC that contains your Neptune DB instance (it is listed for your DB instance in the Neptune console).

1. Select the check box next to the route tables that are associated with the subnets related to your cluster. If you only have one route table, you must select that box.

1. Choose **Create Endpoint**.

For information about creating the endpoint, see [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html#create-vpc-endpoint) in the *Amazon VPC User Guide*. For information about the limitations of VPC endpoints, [VPC Endpoints for Amazon S3](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html).

**Next Steps**  
Now that you have granted access to the Amazon S3 bucket, you can prepare to load data. For information about supported formats, see [Load Data Formats](bulk-load-tutorial-format.md).

# Chaining IAM roles in Amazon Neptune
Chaining IAM roles

**Important**  
The new bulk load cross-account feature introduced in [engine release 1.2.1.0.R3](engine-releases-1.2.1.0.R3.md) that takes advantage of chaining IAM roles may in some cases cause you to observe degraded bulk load performance. As a result, upgrades to engine releases that support this feature have been temporarily suspended until this problem is resolved.

When you attach a role to your cluster, your cluster can assume that role to gain access to data stored in Amazon S3. Starting with [engine release 1.2.1.0.R3](engine-releases-1.2.1.0.R3.md), if that role doesn't have access to all the resources you need, you can chain one or more additional roles that your cluster can assume to gain access to other resources. Each role in the chain assumes the next role in the chain, until your cluster has assumed the role at the end of chain.

To chain roles, you establish a trust relationship between them. For example, to chain `RoleB` onto `RoleA`, `RoleA` must have a permissions policy that allows it to assume `RoleB`, and `RoleB` must have a trust policy that allows it to pass its permissions back to `RoleA`. For more information, see [Using IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html).

The first role in a chain must be attached to the cluster that is loading data.

The first role, and each subsequent role that assumes the following role in the chain, must have:
+ A policy that includes a specific statement with the `Allow` effect on the `sts:AssumeRole` action.
+ The Amazon Resource Name (ARN) of the next role in a `Resource` element.

**Note**  
The target Amazon S3 bucket must be in the same AWS Region as the cluster.

## Cross-account access using chained roles
Cross-account access

You can grant cross-account access by chaining a role or roles that belong to another account. When your cluster temporarily assumes a role belonging to another account, it can gain access to resources there.

For example, suppose **Account A** wants to access data in an Amazon S3 bucket that belongs to **Account B**:
+ **Account A** creates an AWS service role for Neptune named `RoleA` and attaches it to a cluster.
+ **Account B** creates a role named `RoleB` that's authorized to access the data in an **Account B** bucket.
+ **Account A** attaches a permissions policy to `RoleA` that allows it to assume `RoleB`.
+ **Account B** attaches a trust policy to `RoleB` that allows it to pass its permissions back to `RoleA`.
+ To access the data in the **Account B** bucket, **Account A** runs a loader command using an `iamRoleArn` parameter that chains `RoleA` and `RoleB`. For the duration of the loader operation, `RoleA` then temporarily assumes `RoleB` to access the Amazon S3 bucket in **Account B**.

![\[Diagram illustrating cross-account access using chained roles\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/cross-account-bulk-load.png)


For example, `RoleA` would have a trust policy that establishes a trust relationship with Neptune:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
          "Service": "rds.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

------

`RoleA` would also have a permission policy that allows it to assume `RoleB`, which is owned by **Account B**:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "Stmt1487639602000",
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": "arn:aws:iam::111122223333:role/RoleB"
        }
    ]
}
```

------

Conversely, `RoleB` would have a trust policy to establish a trust relationship with `RoleA`:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/RoleA"
            }
        }
    ]
}
```

------

`RoleB` would also need permission to access data in the Amazon S3 bucket located in **Account B**.

## Creating an AWS Security Token Service (STS) VPC endpoint
AWS STS endpoint

The Neptune loader requires a VPC endpoint for AWS STS when you are chaining IAM roles to privately access AWS STS APIs through private IP addresses. You can connect directly from an Amazon VPC to AWS STS through a VPC Endpoint in a secure and scalable manner. When you use an interface VPC endpoint, it provides a better security posture because you don't need to open outbound traffic firewalls. It also provides the other benefits of using Amazon VPC endpoints.

When using a VPC Endpoint, traffic to AWS STS does not transmit over the internet and never leaves the Amazon network. Your VPC is securely connected to AWS STS without availability risks or bandwidth constraints on your network traffic. For more information, see [Using AWS STS interface VPC endpoints](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_sts_vpce.html).

**To set up access for AWS Security Token Service (STS)**

1. Sign in to the AWS Management Console and open the Amazon VPC console at [https://console.aws.amazon.com/vpc/](https://console.aws.amazon.com/vpc/).

1. In the navigation pane, choose **Endpoints**.

1. Choose **Create Endpoint**.

1. Choose the **Service Name**: `com.amazonaws.region.sts` for the Interface type endpoint.

1. Choose the **VPC** that contains your Neptune DB instance and EC2 instance.

1. Select the check box next to the subnet in which your EC2 instance is present. You can't select multiple subnets from the same Availability Zone.

1. For IP address type, choose from the following options:
   + **IPv4** – Assign IPv4 addresses to your endpoint network interfaces. This option is supported only if all selected subnets have IPv4 address ranges.
   + **IPv6** – Assign IPv6 addresses to your endpoint network interfaces. This option is supported only if all selected subnets are IPv6-only subnets.
   + **Dualstack** – Assign both IPv4 and IPv6 addresses to your endpoint network interfaces. This option is supported only if all selected subnets have both IPv4 and IPv6 address ranges.

1. For **Security groups**, select the security groups to associate with the endpoint network interfaces for the VPC endpoint. You would need to select all the security groups that is attached to your Neptune DB instance and EC2 instance.

1. For **Policy**, select **Full access** to allow all operations by all principals on all resources over the VPC endpoint. Otherwise, select **Custom** to attach a VPC endpoint policy that controls the permissions that principals have for performing actions on resources over the VPC endpoint. This option is available only if the service supports VPC endpoint policies. For more information, see [Endpoint policies](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-access.html).

1. (*Optional*) To add a tag, choose **Add new tag** and enter the tag key and the tag value you want.

1. Choose **Create endpoint**.

For information about creating the endpoint, see [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html) in the Amazon VPC User Guide. Please note that Amazon STS VPC Endpoint is a required prerequisite for IAM role chaining.

Now that you have granted access to the AWS STS endpoint, you can prepare to load data. For information about supported formats, see [Load Data Formats](bulk-load-tutorial-format.md).

## Chaining roles within a loader command
Chaining in the loader

You can specify role chaining when you run a loader command by including a comma-separated list of role ARNs in the `iamRoleArn` parameter.

Although you'll mostly only need to have two roles in a chain, it is certainly possible to chain three or more together. For example, this loader command chains three roles:

------
#### [ AWS CLI ]

```
aws neptunedata start-loader-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --source "s3://(the target bucket name)/(the target date file name)" \
  --format "csv" \
  --iam-role-arn "arn:aws:iam::(Account A ID):role/(RoleA),arn:aws:iam::(Account B ID):role/(RoleB),arn:aws:iam::(Account C ID):role/(RoleC)" \
  --s3-bucket-region "us-east-1"
```

For more information, see [start-loader-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-loader-job.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_loader_job(
    source='s3://(the target bucket name)/(the target date file name)',
    format='csv',
    iamRoleArn='arn:aws:iam::(Account A ID):role/(RoleA),arn:aws:iam::(Account B ID):role/(RoleB),arn:aws:iam::(Account C ID):role/(RoleC)',
    s3BucketRegion='us-east-1'
)

print(response)
```

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/loader \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "source" : "s3://(the target bucket name)/(the target date file name)",
        "iamRoleArn" : "arn:aws:iam::(Account A ID):role/(RoleA),arn:aws:iam::(Account B ID):role/(RoleB),arn:aws:iam::(Account C ID):role/(RoleC)",
        "format" : "csv",
        "region" : "us-east-1"
      }'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/loader \
  -H 'Content-Type: application/json' \
  -d '{
        "source" : "s3://(the target bucket name)/(the target date file name)",
        "iamRoleArn" : "arn:aws:iam::(Account A ID):role/(RoleA),arn:aws:iam::(Account B ID):role/(RoleB),arn:aws:iam::(Account C ID):role/(RoleC)",
        "format" : "csv",
        "region" : "us-east-1"
      }'
```

------

# Load Data Formats
Data Formats

The Amazon Neptune `Load` API supports loading data in a variety of formats.

**Property-graph load formats**

Data loaded in one of the following property-graph formats can then be queried using both Gremlin and openCypher:
+ [Gremlin load data format](bulk-load-tutorial-format-gremlin.md) (`csv`): a comma-separated values (CSV) format.
+ [openCypher data load format](bulk-load-tutorial-format-opencypher.md) (`opencypher`): a comma-separated values (CSV) format.

**RDF load formats**

To load Resource Description Framework (RDF) data that you query using SPARQL, you can use one of the following standard formats as specified by the World Wide Web Consortium (W3C):
+ N-Triples (`ntriples`) from the specification at [https://www.w3.org/TR/n-triples/](https://www.w3.org/TR/n-triples/).
+ N-Quads (`nquads`) from the specification at [https://www.w3.org/TR/n-quads/](https://www.w3.org/TR/n-quads/).
+ RDF/XML (`rdfxml`) from the specification at [https://www.w3.org/TR/rdf-syntax-grammar/](https://www.w3.org/TR/rdf-syntax-grammar/).
+ Turtle (`turtle`) from the specification at [https://www.w3.org/TR/turtle/](https://www.w3.org/TR/turtle/).

**Load data must use UTF-8 encoding**

**Important**  
All load-data files must be encoded in UTF-8 form. If a file is not UTF-8 encoded, Neptune tries to load it as UTF-8 anyway.

For N-Quads and N-triples data that includes Unicode characters, `\uxxxxx` escape sequences are supported. However, Neptune does not support normalization. If a value is present that requires normalization, it will not match byte-to-byte during querying. For more information about normalization, see the [Normalization](https://unicode.org/faq/normalization.html) page on [Unicode.org](https://unicode.org).

If your data is not in a supported format, you must convert it before you load it.

A tool for converting GraphML to the Neptune CSV format is available in the [GraphML2CSV project](https://github.com/awslabs/amazon-neptune-tools/blob/master/graphml2csv/README.md) on [GitHub](https://github.com/).

## Compression support for load-data files
Compression support

Neptune supports compression of individual files in `gzip` or `bzip2` format.

The compressed file must have a `.gz` or `.bz2` extension, and must be a single text file encoded in UTF-8 format. You can load multiple files, but each one must be a separate `.gz`, `.bz2`, or uncompressed text file. Archive files with extensions such as `.tar`, `.tar.gz`, and `.tgz` are not supported.

The following sections describe the formats in more detail.

**Topics**
+ [

## Compression support for load-data files
](#bulk-load-tutorial-format-compression)
+ [

# Gremlin load data format
](bulk-load-tutorial-format-gremlin.md)
+ [

# Load format for openCypher data
](bulk-load-tutorial-format-opencypher.md)
+ [

# RDF load data formats
](bulk-load-tutorial-format-rdf.md)

# Gremlin load data format
Gremlin data format

To load Apache TinkerPop Gremlin data using the CSV format, you must specify the vertices and the edges in separate files.

The loader can load from multiple vertex files and multiple edge files in a single load job.

For each load command, the set of files to be loaded must be in the same folder in the Amazon S3 bucket, and you specify the folder name for the `source` parameter. The file names and file name extensions are not important.

The Amazon Neptune CSV format follows the RFC 4180 CSV specification. For more information, see [Common Format and MIME Type for CSV Files](https://tools.ietf.org/html/rfc4180) on the Internet Engineering Task Force (IETF) website.

**Note**  
All files must be encoded in UTF-8 format.

Each file has a comma-separated header row. The header row consists of both system column headers and property column headers.

## System Column Headers
System Column Headers

The required and allowed system column headers are different for vertex files and edge files.

Each system column can appear only once in a header.

All labels are case sensitive.

**Vertex headers**
+ `~id` - **Required**

  An ID for the vertex.
+ `~label`

  A label for the vertex. Multiple label values are allowed, separated by semicolons (`;`).

  If `~label` is not present, TinkerPop supplies a label with the value `vertex`, because every vertex must have at least one label.

**Edge headers**
+ `~id` - **Required**

  An ID for the edge.
+ `~from` - **Required**

  The vertex ID of the *from* vertex.
+ `~to` - **Required**

  The vertex ID of the *to* vertex.
+ `~label`

  A label for the edge. Edges can only have a single label.

  If `~label` is not present, TinkerPop supplies a label with the value `edge`, because every edge must have a label.

## Property Column Headers
Property Column Headers

You can specify a column (`:`) for a property by using the following syntax. The type names are not case sensitive. Note, however, that if a colon appears within a property name, it must be escaped by preceding it with a backslash: `\:`.

```
propertyname:type
```

**Note**  
Space, comma, carriage return and newline characters are not allowed in the column headers, so property names cannot include these characters.

You can specify a column for an array type by adding `[]` to the type:

```
propertyname:type[]
```

**Note**  
Edge properties can only have a single value and will cause an error if an array type is specified or a second value is specified.

The following example shows the column header for a property named `age` of type `Int`.

```
age:Int
```

Every row in the file would be required to have an integer in that position or be left empty.

Arrays of strings are allowed, but strings in an array cannot include the semicolon (`;`) character unless it is escaped using a backslash (like this: `\;`).

**Specifying the Cardinality of a Column**

The column header can be used to specify *cardinality* for the property identified by the column. This allows the bulk loader to honor cardinality similarly to the way Gremlin queries do.

You specify the cardinality of a column like this:

```
propertyname:type(cardinality)
```

The *cardinality* value can be either `single` or `set`. The default is assumed to be `set`, meaning that the column can accept multiple values. In the case of edge files, cardinality is always single and specifying any other cardinality causes the loader to throw an exception.

If the cardinality is `single`, the loader throws an error if a previous value is already present when a value is loaded, or if multiple values are loaded. This behavior can be overridden so that an existing value is replaced when a new value is loaded by using the `updateSingleCardinalityProperties` flag. See [Loader Command](load-api-reference-load.md).

It is possible to use a cardinality setting with an array type, although this is not generally necessary. Here are the possible combinations:
+ `name:type`   –   the cardinality is `set`, and the content is single-valued.
+ `name:type[]`   –   the cardinality is `set`, and the content is multi-valued.
+ `name:type(single)`   –   the cardinality is `single`, and the content is single-valued.
+ `name:type(set)`   –   the cardinality is `set`, which is the same as the default, and the content is single-valued.
+ `name:type(set)[]`   –   the cardinality is `set`, and the content is multi-valued.
+ `name:type(single)[]`   –   this is contradictory and causes an error to be thrown.

The following section lists all the available Gremlin data types.

## Gremlin Data Types
Data Types

This is a list of the allowed property types, with a description of each type.

**Bool (or Boolean)**  
Indicates a Boolean field. Allowed values: `false`, `true`

**Note**  
Any value other than `true` will be treated as false.

**Whole Number Types**  
Values outside of the defined ranges result in an error.


| 
| 
| Type | Range | 
| --- |--- |
| Byte | -128 to 127 | 
| Short | -32768 to 32767 | 
| Int | -2^31 to 2^31-1 | 
| Long | -2^63 to 2^63-1 | 

**Decimal Number Types**  
Supports both decimal notation or scientific notation. Also allows symbols such as (\$1/-) Infinity or NaN. INF is not supported.


| 
| 
| Type | Range | 
| --- |--- |
| Float | 32-bit IEEE 754 floating point | 
| Double | 64-bit IEEE 754 floating point | 

Float and double values that are too long are loaded and rounded to the nearest value for 24-bit (float) and 53-bit (double) precision. A midway value is rounded to 0 for the last remaining digit at the bit level.

**String**  
Quotation marks are optional. Commas, newline, and carriage return characters are automatically escaped if they are included in a string surrounded by double quotation marks (`"`). *Example:* `"Hello, World"`

To include quotation marks in a quoted string, you can escape the quotation mark by using two in a row: *Example:* `"Hello ""World"""`

Arrays of strings are allowed, but strings in an array cannot include the semicolon (`;`) character unless it is escaped using a backslash (like this: `\;`).

If you want to surround strings in an array with quotation marks, you must surround the whole array with one set of quotation marks. *Example:* `"String one; String 2; String 3"`

**Date**  
Java date in ISO-8601 format. Supports the following formats: `yyyy-MM-dd`, `yyyy-MM-ddTHH:mm`, `yyyy-MM-ddTHH:mm:ss`, `yyyy-MM-ddTHH:mm:ssZ`. The values are converted to epoch time and stored.

**Datetime**  
Java date in ISO-8601 format. Supports the following formats: `yyyy-MM-dd`, `yyyy-MM-ddTHH:mm`, `yyyy-MM-ddTHH:mm:ss`, `yyyy-MM-ddTHH:mm:ssZ`. The values are converted to epoch time and stored.

## Gremlin Row Format
Row format

**Delimiters**  
Fields in a row are separated by a comma. Records are separated by a newline or a newline followed by a carriage return.

**Blank Fields**  
Blank fields are allowed for non-required columns (such as user-defined properties). A blank field still requires a comma separator. Blank fields on required columns will result in a parsing error. Empty string values are interpreted as empty string value for the field; not as a blank field. The example in the next section has a blank field in each example vertex.

**Vertex IDs**  
`~id` values must be unique for all vertices in every vertex file. Multiple vertex rows with identical `~id` values are applied to a single vertex in the graph. Empty string (`""`) is a valid id, and the vertex is created with an empty string as the id.

**Edge IDs**  
Additionally, `~id` values must be unique for all edges in every edge file. Multiple edge rows with identical `~id` values are applied to the single edge in the graph. Empty string (`""`) is a valid id, and the edge is created with an empty string as the id.

**Labels**  
Labels are case sensitive and cannot be empty. A value of `""` will result in an error.

**String Values**  
Quotation marks are optional. Commas, newline, and carriage return characters are automatically escaped if they are included in a string surrounded by double quotation marks (`"`). Empty string values `("")` are interpreted as an empty string value for the field; not as a blank field.

## CSV Format Specification
CSV Specification

The Neptune CSV format follows the RFC 4180 CSV specification, including the following requirements.
+ Both Unix and Windows style line endings are supported (\$1n or \$1r\$1n).
+ Any field can be quoted (using double quotation marks).
+ Fields containing a line-break, double-quote, or commas must be quoted. (If they are not, load aborts immediately.)
+ A double quotation mark character (`"`) in a field must be represented by two (double) quotation mark characters. For example, a string `Hello "World"` must be present as `"Hello ""World"""` in the data.
+ Surrounding spaces between delimiters are ignored. If a row is present as `value1, value2`, they are stored as `"value1"` and `"value2"`.
+ Any other escape characters are stored verbatim. For example, `"data1\tdata2"` is stored as `"data1\tdata2"`. No further escaping is needed as long as these characters are enclosed within quotation marks.
+ Blank fields are allowed. A blank field is considered an empty value.
+ Multiple values for a field are specified with a semicolon (`;`) between values.

For more information, see [Common Format and MIME Type for CSV Files](https://tools.ietf.org/html/rfc4180) on the Internet Engineering Task Force (IETF) website.

## Gremlin Example
Example

The following diagram shows an example of two vertices and an edge taken from the TinkerPop Modern Graph.

![\[Diagram depicting two vertices and an edge, contains marko age 29 and lop software with lang: java.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/tiny-modern-graph.png)


The following is the graph in Neptune CSV load format.

Vertex file:

```
~id,name:String,age:Int,lang:String,interests:String[],~label
v1,"marko",29,,"sailing;graphs",person
v2,"lop",,"java",,software
```

Tabular view of the vertex file:

|  |  |  |  |  |  | 
| --- |--- |--- |--- |--- |--- |
| \$1id | name:String | age:Int | lang:String | interests:String[] | \$1label | 
| v1 | "marko" | 29 |  | ["sailing", "graphs"] | person | 
| v2 | "lop" |  | "java" |  | software | 

Edge file:

```
~id,~from,~to,~label,weight:Double
e1,v1,v2,created,0.4
```

Tabular view of the edge file:

|  |  |  |  |  | 
| --- |--- |--- |--- |--- |
| \$1id | \$1from | \$1to | \$1label | weight:Double | 
| e1 | v1 | v2 | created | 0.4 | 

**Next Steps**  
Now that you know more about the loading formats, see [Example: Loading Data into a Neptune DB Instance](bulk-load-data.md).

# Load format for openCypher data
openCypher data format

To load openCypher data using the openCypher CSV format, you must specify nodes and relationships in separate files. The loader can load from multiple of these node files and relationship files in a single load job.

For each load command, the set of files to be loaded must have the same path prefix in an Amazon Simple Storage Service bucket. You specify that prefix in the source parameter. The actual file names and extensions are not important.

In Amazon Neptune, the openCypher CSV format conforms to the RFC 4180 CSV specification. For more information, see [Common Format and MIME Type for CSV Files](https://tools.ietf.org/html/rfc4180) (https://tools.ietf.org/html/rfc4180) on the Internet Engineering Task Force (IETF) website.

**Note**  
These files MUST be encoded in UTF-8 format.

Each file has a comma-separated header row that contains both system column headers and property column headers.

## System column headers in openCypher data loading files
System column headers

A given system column can only appear once in each file. All system column header labels are case-sensitive.

The system column headers that are required and allowed are different for openCypher node load files and relationship load files:

### System column headers in node files
Node file system headers
+ **`:ID`**   –   (Required) An ID for the node.

  An optional ID space can be added to the node `:ID` column header like this: `:ID(ID Space)`. An example is `:ID(movies)`.

  When loading relationships that connect the nodes in this file, use the same ID spaces in the relationship files' `:START_ID` and/or `:END_ID` columns.

  The node `:ID` column can optionally be stored as a property in the form, `property name:ID`. An example is `name:ID`.

  Node IDs should be unique across all node files in the current and previous loads. If an ID space is used, node IDs should be unique across all node files that use the same ID space in the current and previous loads.
+ **`:LABEL`**   –   A label for the node.

  When using multiple label values for a single node, each label should be separated by semicolons(`;`).

### System column headers in relationship files
Relationship file system headers
+ **`:ID`**   –   An ID for the relationship. This is required when `userProvidedEdgeIds` is true (the default), but invalid when `userProvidedEdgeIds` is `false`.

  Relationship IDs should be unique across all relationship files in current and previous loads.
+ **`:START_ID`**   –   (*Required*) The node ID of the node this relationship starts from.

  Optionally, an ID space can be associated with the start ID column in the form `:START_ID(ID Space)`. The ID space assigned to the start node ID should match the ID space assigned to the node in its node file.
+ **`:END_ID`**   –   (*Required*) The node ID of the node this relationship ends at.

  Optionally, an ID space can be associated with the end ID column in the form `:END_ID(ID Space)`. The ID space assigned to the end node ID should match the ID space assigned to the node in its node file.
+ **`:TYPE`**   –   A type for the relationship. Relationships can only have a single type.

**Note**  
See [Loading openCypher data](load-api-reference-load.md#load-api-reference-load-parameters-opencypher) for information about how duplicate node or relationship IDs are handled by the bulk load process.

### Property column headers in openCypher data loading files
Property column headers

You can specify that a column holds the values for a particular property using a property column header in the following form:

```
propertyname:type
```

Space, comma, carriage return and newline characters are not allowed in the column headers, so property names cannot include these characters. Here is an example of a column header for a property named `age` of type `Int`:

```
age:Int
```

The column with `age:Int` as a column header would then have to contain either an integer or an empty value in every row.

## Data types in Neptune openCypher data loading files
Data types
+ **`Bool`** or **`Boolean`**   –   A Boolean field. Allowed values are `true` and `false`.

  Any value other than `true` is treated as `false`.
+ **`Byte`**   –   A whole number in the range `-128` through `127`.
+ **`Short`**   –   A whole number in the range `-32,768` through `32,767`.
+ **`Int`**   –   A whole number in the range `-2^31` through `2^31 - 1`.
+ **`Long`**   –   A whole number in the range `-2^63` through `2^63 - 1`.
+ **`Float`**   –   A 32-bit IEEE 754 floating point number. Decimal notation and scientific notation are both supported. `Infinity`, `-Infinity`, and `NaN` are all recognized, but `INF` is not.

  Values with too many digits to fit are rounded to the nearest value (a midway value is rounded to 0 for the last remaining digit at the bit level).
+ **`Double`**   –   A 64-bit IEEE 754 floating point number. Decimal notation and scientific notation are both supported. `Infinity`, `-Infinity`, and `NaN` are all recognized, but `INF` is not.

  Values with too many digits to fit are rounded to the nearest value (a midway value is rounded to 0 for the last remaining digit at the bit level).
+ **`String`**   –   Quotation marks are optional. Comma, newline, and carriage return characters are automatically escaped if they are included in a string that is surrounded by double quotation marks (`"`) like `"Hello, World"`.

  You can include quotation marks in a quoted string by using two in a row, like `"Hello ""World"""`.
+ **`DateTime`**   –   A Java date in one of the following ISO-8601 formats:
  + `yyyy-MM-dd`
  + `yyyy-MM-ddTHH:mm`
  + `yyyy-MM-ddTHH:mm:ss`
  + `yyyy-MM-ddTHH:mm:ssZ`

### Auto-cast data types in Neptune openCypher data loading files
Auto-cast types

Auto-cast data types are provided to load data types not currently supported natively by Neptune. Data in such columns are stored as strings, verbatim with no verification against their intended formats. The following auto-cast data types are allowed:
+ **`Char`**   –   A `Char` field. Stored as a string.
+ **`Date`**, **`LocalDate`**, and **`LocalDateTime`**,   –   See [Neo4j Temporal Instants](https://neo4j.com/docs/cypher-manual/current/values-and-types/temporal/#cypher-temporal-instants) for a description of the `date`, `localdate`, and `localdatetime` types. The values are loaded verbatim as strings, without validation.
+ **`Duration`**   –   See the [Neo4j Duration format](https://neo4j.com/docs/cypher-manual/current/values-and-types/temporal/#cypher-temporal-durations). The values are loaded verbatim as strings, without validation.
+ **Point**   –   A point field, for storing spatial data. See [Spatial instants](https://neo4j.com/docs/cypher-manual/current/values-and-types/spatial/#spatial-values-spatial-instants). The values are loaded verbatim as strings, without validation.

## Example of the openCypher load format
openCypher example

The following diagram taken from the TinkerPop Modern Graph shows an example of two nodes and a relationship:

![\[Diagram of two nodes and a relationship between them.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/tinkerpop-2-nodes-and-relationship.png)


The following is the graph in the normal Neptune openCypher load format.

**Node file:**

```
:ID,name:String,age:Int,lang:String,:LABEL
v1,"marko",29,,person
v2,"lop",,"java",software
```

**Relationship file:**

```
:ID,:START_ID,:END_ID,:TYPE,weight:Double
e1,v1,v2,created,0.4
```

Alternatively, you could use ID spaces and ID as a property, as follows:

**First node file:**

```
name:ID(person),age:Int,lang:String,:LABEL
"marko",29,,person
```

**Second node file:**

```
name:ID(software),age:Int,lang:String,:LABEL
"lop",,"java",software
```

**Relationship file:**

```
:ID,:START_ID(person),:END_ID(software),:TYPE,weight:Double
e1,"marko","lop",created,0.4
```

# RDF load data formats
RDF data formats

To load Resource Description Framework (RDF) data, you can use one of the following standard formats as specified by the World Wide Web Consortium (W3C):
+ N-Triples (`ntriples`) from the specification at [https://www.w3.org/TR/n-triples/](https://www.w3.org/TR/n-triples/)
+ N-Quads (`nquads`) from the specification at [https://www.w3.org/TR/n-quads/](https://www.w3.org/TR/n-quads/)
+ RDF/XML (`rdfxml`) from the specification at [https://www.w3.org/TR/rdf-syntax-grammar/](https://www.w3.org/TR/rdf-syntax-grammar/)
+ Turtle (`turtle`) from the specification at [https://www.w3.org/TR/turtle/](https://www.w3.org/TR/turtle/)

**Important**  
All files must be encoded in UTF-8 format.  
For N-Quads and N-triples data that includes Unicode characters, `\uxxxxx` escape sequences are supported. However, Neptune does not support normalization. If a value is present that requires normalization, it will not match byte-to-byte during querying. For more information about normalization, see the [Normalization](https://unicode.org/faq/normalization.html) page on [Unicode.org](https://unicode.org).

**Next Steps**  
Now that you know more about the loading formats, see [Example: Loading Data into a Neptune DB Instance](bulk-load-data.md).

# Example: Loading Data into a Neptune DB Instance
Loading Example

This example shows how to load data into Amazon Neptune. Unless stated otherwise, you must follow these steps from an Amazon Elastic Compute Cloud (Amazon EC2) instance in the same Amazon Virtual Private Cloud (VPC) as your Neptune DB instance.

## Prerequisites for the Data Loading Example
Prerequisites

Before you begin, you must have the following:
+ A Neptune DB instance.

  For information about launching a Neptune DB instance, see [Creating an Amazon Neptune cluster](get-started-create-cluster.md).
+ An Amazon Simple Storage Service (Amazon S3) bucket to put the data files in.

  You can use an existing bucket. If you don't have an S3 bucket, see [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/CreatingABucket.html) in the *[Amazon S3 Getting Started Guide](https://docs.aws.amazon.com/AmazonS3/latest/userguide/)*.
+ Graph data to load, in one of the formats supported by the Neptune loader:

  If you are using Gremlin to query your graph, Neptune can load data in a comma-separated-values (`CSV`) format, as described in [Gremlin load data format](bulk-load-tutorial-format-gremlin.md).

  If you are using openCypher to query your graph, Neptune can also load data in an openCypher-specific `CSV` format, as described in [Load format for openCypher data](bulk-load-tutorial-format-opencypher.md).

  If you are using SPARQL, Neptune can load data in a number of RDF formats, as described in [RDF load data formats](bulk-load-tutorial-format-rdf.md).
+ An IAM role for the Neptune DB instance to assume that has an IAM policy that allows access to the data files in the S3 bucket. The policy must grant Read and List permissions.

   For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see [Prerequisites: IAM Role and Amazon S3 Access](bulk-load-tutorial-IAM.md).
**Note**  
The Neptune `Load` API needs read access to the data files only. The IAM policy doesn't need to allow write access or access to the entire bucket.
+ An Amazon S3 VPC endpoint. For more information, see the [Creating an Amazon S3 VPC Endpoint](#bulk-load-prereqs-s3) section.

### Creating an Amazon S3 VPC Endpoint
Amazon S3 VPC Endpoint

The Neptune loader requires a VPC endpoint for Amazon S3.

**To set up access for Amazon S3**

1. Sign in to the AWS Management Console and open the Amazon VPC console at [https://console.aws.amazon.com/vpc/](https://console.aws.amazon.com/vpc/).

1. In the left navigation pane, choose **Endpoints**.

1. Choose **Create Endpoint**.

1. Choose the **Service Name** `com.amazonaws.region.s3`.
**Note**  
If the Region here is incorrect, make sure that the console Region is correct.

1. Choose the VPC that contains your Neptune DB instance.

1. Select the check box next to the route tables that are associated with the subnets related to your cluster. If you only have one route table, you must select that box.

1. Choose **Create Endpoint**.

For information about creating the endpoint, see [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html#create-vpc-endpoint) in the *Amazon VPC User Guide*. For information about the limitations of VPC endpoints, [VPC Endpoints for Amazon S3](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html).

**To load data into a Neptune DB instance**

1. Copy the data files to an Amazon S3 bucket. The S3 bucket must be in the same AWS Region as the cluster that loads the data.

   You can use the following AWS CLI command to copy the files to the bucket.
**Note**  
This command does not need to be run from the Amazon EC2 instance.

   ```
   aws s3 cp data-file-name s3://bucket-name/object-key-name
   ```
**Note**  
In Amazon S3, an **object key name** is the entire path of a file, including the file name.  
*Example:* In the command `aws s3 cp datafile.txt s3://examplebucket/mydirectory/datafile.txt`, the object key name is **`mydirectory/datafile.txt`**.

   Alternatively, you can use the AWS Management Console to upload files to the S3 bucket. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/), and choose a bucket. In the upper-left corner, choose **Upload** to upload files.

1. From a command line window, enter the following to run the Neptune loader, using the correct values for your endpoint, Amazon S3 path, format, and IAM role ARN.

   The `format` parameter can be any of the following values: `csv` for Gremlin, `opencypher` for openCypher, or `ntriples`, `nquads`, `turtle`, and `rdfxml` for RDF. For information about the other parameters, see [Neptune Loader Command](load-api-reference-load.md).

   For information about finding the hostname of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

   The Region parameter must match the Region of the cluster and the S3 bucket.

Amazon Neptune is available in the following AWS Regions:
   + US East (N. Virginia):   `us-east-1`
   + US East (Ohio):   `us-east-2`
   + US West (N. California):   `us-west-1`
   + US West (Oregon):   `us-west-2`
   + Canada (Central):   `ca-central-1`
   + Canada West (Calgary):   `ca-west-1`
   + South America (São Paulo):   `sa-east-1`
   + Europe (Stockholm):   `eu-north-1`
   + Europe (Spain):   `eu-south-2`
   + Europe (Ireland):   `eu-west-1`
   + Europe (London):   `eu-west-2`
   + Europe (Paris):   `eu-west-3`
   + Europe (Frankfurt):   `eu-central-1`
   + Middle East (Bahrain):   `me-south-1`
   + Middle East (UAE):   `me-central-1`
   + Israel (Tel Aviv):   `il-central-1`
   + Africa (Cape Town):   `af-south-1`
   + Asia Pacific (Hong Kong):   `ap-east-1`
   + Asia Pacific (Tokyo):   `ap-northeast-1`
   + Asia Pacific (Seoul):   `ap-northeast-2`
   + Asia Pacific (Osaka):   `ap-northeast-3`
   + Asia Pacific (Singapore):   `ap-southeast-1`
   + Asia Pacific (Sydney):   `ap-southeast-2`
   + Asia Pacific (Jakarta):   `ap-southeast-3`
   + Asia Pacific (Melbourne):   `ap-southeast-4`
   + Asia Pacific (Malaysia):   `ap-southeast-5`
   + Asia Pacific (Mumbai):   `ap-south-1`
   + Asia Pacific (Hyderabad):   `ap-south-2`
   + China (Beijing):   `cn-north-1`
   + China (Ningxia):   `cn-northwest-1`
   + AWS GovCloud (US-West):   `us-gov-west-1`
   + AWS GovCloud (US-East):   `us-gov-east-1`

------
#### [ AWS CLI ]

   ```
   aws neptunedata start-loader-job \
     --endpoint-url https://your-neptune-endpoint:port \
     --source "s3://bucket-name/object-key-name" \
     --format "format" \
     --iam-role-arn "arn:aws:iam::account-id:role/role-name" \
     --s3-bucket-region "region" \
     --no-fail-on-error \
     --parallelism "MEDIUM" \
     --no-update-single-cardinality-properties \
     --queue-request \
     --dependencies "load_A_id" "load_B_id"
   ```

   For more information, see [start-loader-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-loader-job.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

   ```
   import boto3
   from botocore.config import Config
   
   client = boto3.client(
       'neptunedata',
       endpoint_url='https://your-neptune-endpoint:port',
       config=Config(read_timeout=None, retries={'total_max_attempts': 1})
   )
   
   response = client.start_loader_job(
       source='s3://bucket-name/object-key-name',
       format='format',
       iamRoleArn='arn:aws:iam::account-id:role/role-name',
       s3BucketRegion='region',
       failOnError=False,
       parallelism='MEDIUM',
       updateSingleCardinalityProperties=False,
       queueRequest=True,
       dependencies=['load_A_id', 'load_B_id']
   )
   
   print(response)
   ```

------
#### [ awscurl ]

   ```
   awscurl https://your-neptune-endpoint:port/loader \
     --region us-east-1 \
     --service neptune-db \
     -X POST \
     -H 'Content-Type: application/json' \
     -d '{
           "source" : "s3://bucket-name/object-key-name",
           "format" : "format",
           "iamRoleArn" : "arn:aws:iam::account-id:role/role-name",
           "region" : "region",
           "failOnError" : "FALSE",
           "parallelism" : "MEDIUM",
           "updateSingleCardinalityProperties" : "FALSE",
           "queueRequest" : "TRUE",
           "dependencies" : ["load_A_id", "load_B_id"]
         }'
   ```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

   ```
   curl -X POST https://your-neptune-endpoint:port/loader \
     -H 'Content-Type: application/json' \
     -d '{
           "source" : "s3://bucket-name/object-key-name",
           "format" : "format",
           "iamRoleArn" : "arn:aws:iam::account-id:role/role-name",
           "region" : "region",
           "failOnError" : "FALSE",
           "parallelism" : "MEDIUM",
           "updateSingleCardinalityProperties" : "FALSE",
           "queueRequest" : "TRUE",
           "dependencies" : ["load_A_id", "load_B_id"]
         }'
   ```

------

   For information about creating and associating an IAM role with a Neptune cluster, see [Prerequisites: IAM Role and Amazon S3 Access](bulk-load-tutorial-IAM.md).
**Note**  
See [Neptune Loader Request Parameters](load-api-reference-load.md#load-api-reference-load-parameters)) for detailed information about load request parameters. In brief:  
The `source` parameter accepts an Amazon S3 URI that points to either a single file or a folder. If you specify a folder, Neptune loads every data file in the folder.  
The folder can contain multiple vertex files and multiple edge files.  
The URI can be in any of the following formats.  
`s3://bucket_name/object-key-name`
`https://s3.amazonaws.com/bucket_name/object-key-name`
`https://s3-us-east-1.amazonaws.com/bucket_name/object-key-name`
The `format` parameter can be one of the following:  
Gremlin CSV format (`csv`) for Gremlin property graphs
openCypher CSV format (`opencypher`) for openCypher property graphs
N -Triples (`ntriples`) format for RDF / SPARQL
N-Quads (`nquads`) format for RDF / SPARQL
RDF/XML (`rdfxml`) format for RDF / SPARQL
Turtle (`turtle`) format for RDF / SPARQL
The optional `parallelism` parameter lets you restrict the number of threads used in the bulk load process. It can be set to `LOW`, `MEDIUM`, `HIGH`, or `OVERSUBSCRIBE`.  
When `updateSingleCardinalityProperties` is set to `"FALSE"`, the loader returns an error if more than one value is provided in a source file being loaded for an edge or single-cardinality vertex property.  
Setting `queueRequest` to `"TRUE"` causes the load request to be placed in a queue if there is already a load job running.  
The `dependencies` parameter makes execution of the load request contingent on the successful completion of one or more load jobs that have already been placed in the queue.

1. The Neptune loader returns a job `id` that allows you to check the status or cancel the loading process; for example:

   ```
   {
       "status" : "200 OK",
       "payload" : {
           "loadId" : "ef478d76-d9da-4d94-8ff1-08d9d4863aa5"
       }
   }
   ```

1. Enter the following to get the status of the load with the `loadId` from **Step 3**:

------
#### [ AWS CLI ]

   ```
   aws neptunedata get-loader-job-status \
     --endpoint-url https://your-neptune-endpoint:port \
     --load-id ef478d76-d9da-4d94-8ff1-08d9d4863aa5
   ```

   For more information, see [get-loader-job-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-loader-job-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

   ```
   import boto3
   from botocore.config import Config
   
   client = boto3.client(
       'neptunedata',
       endpoint_url='https://your-neptune-endpoint:port',
       config=Config(read_timeout=None, retries={'total_max_attempts': 1})
   )
   
   response = client.get_loader_job_status(
       loadId='ef478d76-d9da-4d94-8ff1-08d9d4863aa5'
   )
   
   print(response)
   ```

------
#### [ awscurl ]

   ```
   awscurl 'https://your-neptune-endpoint:port/loader/ef478d76-d9da-4d94-8ff1-08d9d4863aa5' \
     --region us-east-1 \
     --service neptune-db
   ```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

   ```
   curl -G 'https://your-neptune-endpoint:port/loader/ef478d76-d9da-4d94-8ff1-08d9d4863aa5'
   ```

------

   If the status of the load lists an error, you can request more detailed status and a list of the errors. For more information and examples, see [Neptune Loader Get-Status API](load-api-reference-status.md).

1. (Optional) Cancel the `Load` job.

   Enter the following to `Delete` the loader job with the job `id` from **Step 3**:

------
#### [ AWS CLI ]

   ```
   aws neptunedata cancel-loader-job \
     --endpoint-url https://your-neptune-endpoint:port \
     --load-id ef478d76-d9da-4d94-8ff1-08d9d4863aa5
   ```

   For more information, see [cancel-loader-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/cancel-loader-job.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

   ```
   import boto3
   from botocore.config import Config
   
   client = boto3.client(
       'neptunedata',
       endpoint_url='https://your-neptune-endpoint:port',
       config=Config(read_timeout=None, retries={'total_max_attempts': 1})
   )
   
   response = client.cancel_loader_job(
       loadId='ef478d76-d9da-4d94-8ff1-08d9d4863aa5'
   )
   
   print(response)
   ```

------
#### [ awscurl ]

   ```
   awscurl 'https://your-neptune-endpoint:port/loader/ef478d76-d9da-4d94-8ff1-08d9d4863aa5' \
     --region us-east-1 \
     --service neptune-db \
     -X DELETE
   ```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

   ```
   curl -X DELETE 'https://your-neptune-endpoint:port/loader/ef478d76-d9da-4d94-8ff1-08d9d4863aa5'
   ```

------

   The `DELETE` command returns the HTTP code `200 OK` upon successful cancellation.

   The data from files from the load job that has finished loading is not rolled back. The data remains in the Neptune DB instance.

# Optimizing an Amazon Neptune bulk load
Optimizing a bulk load

Use the following strategies to keep the load time to a minimum for a Neptune bulk load:
+ **Clean your data:**
  + Be sure to convert your data into a [supported data format](bulk-load-tutorial-format.md) before loading.
  + Remove any duplicates or known errors.
  + Reduce the number of unique predicates (such as properties of edges and vertices) as much as you can.
+ **Optimize your files:**
  + If you load large files such as CSV files from an Amazon S3 bucket, the loader manages concurrency for you by parsing them into chunks that it can load in parallel. Using a very large number of tiny files can slow this process.
  +  If you load multiple files from an Amazon S3 prefix, the loader automatically loads vertex files first, then edge files afterwards. However, if you know that you will only be loading edge files, `edgeOnlyLoad` can be set to `TRUE` to skip the first-pass where all files are scanned to determine their contents (vertices or edges) so that any vertex files found are loaded before any edge files. This can speed up the load time significantly, especially when many edge files are involved. In case some vertex files are also present in the same Amazon S3 prefix (`source` parameter), they will get loaded but without any ordering guarantees relative to other files. Also, if some `from` or `to` vertices are not present in the database, the edge insertion may report errors with the message `FROM_OR_TO_VERTEX_ARE_MISSING`. As a best practice, put nodes and edges in separate Amazon S3 prefix. 
+ **Check your loader settings:**
  + If you don't need to perform any other operations during the load, use the [`OVERSUBSCRIBE`  `parallelism`](load-api-reference-load.md#load-api-reference-load-syntax) parameter. This parameter setting causes the bulk loader to use all available CPU resources when it runs. It generally takes 60%-70% of CPU capacity to keep the operation running as fast as I/O constraints permit.
**Note**  
When `parallelism` is set to `OVERSUBSCRIBE` or `HIGH` (the default setting), there is the risk when loading openCypher data that threads may encounter a race condition and deadlock, resulting in a `LOAD_DATA_DEADLOCK` error. In this case, set `parallelism` to a lower setting and retry the load.
  + If your load job will include multiple load requests, use the `queueRequest` parameter. Setting `queueRequest` to `TRUE` lets Neptune queue up your requests so you don't have to wait for one to finish before issuing another.
  +  If your load requests are being queued, you can set up levels of dependency using the `dependencies` parameter, so that the failure of one job causes dependent jobs to fail. This can prevent inconsistencies in the loaded data.
  + If a load job is going to involve updating previously loaded values, be sure to set the `updateSingleCardinalityProperties` parameter to `TRUE`. If you don't, the loader will treat an attempt to update an existing single-cardinality value as an error. For Gremlin data, cardinality is also specified in property column headers (see [Property Column Headers](bulk-load-tutorial-format-gremlin.md#bulk-load-tutorial-format-gremlin-propheaders)).
**Note**  
The `updateSingleCardinalityProperties` parameter is not available for Resource Description Framework (RDF) data.
  + You can use the `failOnError` parameter to determine whether bulk load operations should fail or continue when an error is encountered. Also, you can use the `mode` parameter to be sure that a load job resumes loading from the point where a previous job failed rather than reloading data that had already been loaded.
+ **Scale up**   –   Set the writer instance of your DB cluster to the maximum size before bulk loading. Note that if you do this, you must either scale up any read-replica instances in the DB cluster as well, or remove them until you have finished loading the data.

   When your bulk load is complete, be sure to scale the writer instance down again. 

**Important**  
If you experience a cycle of repeated read-replica restarts because of replication lag during a bulk load, your replicas are likely unable to keep up with the writer in your DB cluster. Either scale the readers to be larger than the writer, or temporarily remove them during the bulk load and then recreate them after it completes.

See [Request Parameters](load-api-reference-load.md#load-api-reference-load-parameters) for more details about setting loader request parameters.

# Neptune Loader Reference
Loader Reference

This section describes the `Loader` APIs for Amazon Neptune that are available from the HTTP endpoint of a Neptune DB instance.

**Note**  
See [Neptune Loader Error and Feed Messages](loader-message.md) for a list of the error and feed messages returned by the loader in case of errors.

**Contents**
+ [

# Neptune Loader Command
](load-api-reference-load.md)
  + [

## Neptune Loader Request Syntax
](load-api-reference-load.md#load-api-reference-load-syntax)
  + [

## Neptune Loader Request Parameters
](load-api-reference-load.md#load-api-reference-load-parameters)
    + [

### Special considerations for loading openCypher data
](load-api-reference-load.md#load-api-reference-load-parameters-opencypher)
  + [

## Neptune Loader Response Syntax
](load-api-reference-load.md#load-api-reference-load-return)
  + [

# Neptune Loader Errors
](load-api-reference-load-errors.md)
  + [

# Neptune Loader Examples
](load-api-reference-load-examples.md)
+ [

# Neptune Loader Get-Status API
](load-api-reference-status.md)
  + [

# Neptune Loader Get-Status requests
](load-api-reference-status-requests.md)
    + [

## Loader Get-Status request syntax
](load-api-reference-status-requests.md#load-api-reference-status-request-syntax)
    + [

## Neptune Loader Get-Status request parameters
](load-api-reference-status-requests.md#load-api-reference-status-parameters)
  + [

# Neptune Loader Get-Status Responses
](load-api-reference-status-response.md)
    + [

## Neptune Loader Get-Status Response JSON layout
](load-api-reference-status-response.md#load-api-reference-status-response-layout)
    + [

## Neptune Loader Get-Status `overallStatus` and `failedFeeds` response objects
](load-api-reference-status-response.md#load-api-reference-status-response-objects)
    + [

## Neptune Loader Get-Status `errors` response object
](load-api-reference-status-response.md#load-api-reference-status-errors)
    + [

## Neptune Loader Get-Status `errorLogs` response object
](load-api-reference-status-response.md#load-api-reference-error-logs)
  + [

# Neptune Loader Get-Status Examples
](load-api-reference-status-examples.md)
    + [

## Example request for load status
](load-api-reference-status-examples.md#load-api-reference-status-examples-status-request)
    + [

## Example request for loadIds
](load-api-reference-status-examples.md#load-api-reference-status-examples-loadId-request)
    + [

## Example request for detailed status
](load-api-reference-status-examples.md#load-api-reference-status-examples-details-request)
  + [

# Neptune Loader Get-Status `errorLogs` examples
](load-api-reference-error-logs-examples.md)
    + [

## Example detailed status response when errors occurred
](load-api-reference-error-logs-examples.md#load-api-reference-status-examples-details-request-errors)
    + [

## Example of a `Data prefetch task interrupted` error
](load-api-reference-error-logs-examples.md#load-api-reference-status-examples-task-interrupted)
+ [

# Neptune Loader Cancel Job
](load-api-reference-cancel.md)
  + [

## Cancel Job request syntax
](load-api-reference-cancel.md#load-api-reference-cancel-syntax)
  + [

## Cancel Job Request Parameters
](load-api-reference-cancel.md#load-api-reference-cancel-parameters)
  + [

## Cancel Job Response Syntax
](load-api-reference-cancel.md#load-api-reference-cancel-parameters-response)
  + [

## Cancel Job Errors
](load-api-reference-cancel.md#load-api-reference-cancel-parameters-errors)
  + [

## Cancel Job Error Messages
](load-api-reference-cancel.md#load-api-reference-cancel-parameters-errors-messages)
  + [

## Cancel Job Examples
](load-api-reference-cancel.md#load-api-reference-cancel-examples)

# Neptune Loader Command
Loader Command

Loads data from an Amazon S3 bucket into a Neptune DB instance.

To load data, you must send an HTTP `POST` request to the `https://your-neptune-endpoint:port/loader` endpoint. The parameters for the `loader` request can be sent in the `POST` body or as URL-encoded parameters.

**Important**  
The MIME type must be `application/json`.

The Amazon S3 bucket must be in the same AWS Region as the cluster.

**Note**  
You can load encrypted data from Amazon S3 if it was encrypted using the Amazon S3 `SSE-S3` mode. In that case, Neptune is able to impersonate your credentials and issue `s3:getObject` calls on your behalf.  
You can also load encrypted data from Amazon S3 that was encrypted using the `SSE-KMS` mode, as long as your IAM role includes the necessary permissions to access AWS KMS. Without proper AWS KMS permissions, the bulk load operation fails and returns a `LOAD_FAILED` response.  
Neptune does not currently support loading Amazon S3 data encrypted using the `SSE-C` mode.

You don't have to wait for one load job to finish before you start another one. Neptune can queue up as many as 64 jobs requests at a time, provided that their `queueRequest` parameters are all set to `"TRUE"`. The queue order of the jobs will be first-in-first-out (FIFO). If you don't want a load job to be queued up, on the other hand, you can set its `queueRequest` parameter to `"FALSE"` (the default), so that the load job will fail if another one is already in progress.

You can use the `dependencies` parameter to queue up a job that must only be run after specified previous jobs in the queue have completed successfully. If you do that and any of those specified jobs fails, your job will not be run and its status will be set to `LOAD_FAILED_BECAUSE_DEPENDENCY_NOT_SATISFIED`.

## Neptune Loader Request Syntax
Request Syntax

```
{
  "source" : "string",
  "format" : "string",
  "iamRoleArn" : "string",
  "mode": "NEW|RESUME|AUTO",
  "region" : "us-east-1",
  "failOnError" : "string",
  "parallelism" : "string",
  "parserConfiguration" : {
    "baseUri" : "http://base-uri-string",
    "namedGraphUri" : "http://named-graph-string"
  },
  "updateSingleCardinalityProperties" : "string",
  "queueRequest" : "TRUE",
  "dependencies" : ["load_A_id", "load_B_id"]
}
```

**edgeOnlyLoad Syntax**  
 For an `edgeOnlyLoad`, the syntax would be: 

```
{
"source" : "string",
"format" : "string",
"iamRoleArn" : "string",
"mode": "NEW|RESUME|AUTO",
"region" : "us-east-1",
"failOnError" : "string",
"parallelism" : "string",
"edgeOnlyLoad" : "string",
"parserConfiguration" : {
    "baseUri" : "http://base-uri-string",
    "namedGraphUri" : "http://named-graph-string"
},
"updateSingleCardinalityProperties" : "string",
"queueRequest" : "TRUE",
"dependencies" : ["load_A_id", "load_B_id"]
}
```

## Neptune Loader Request Parameters
Request Parameters
+ **`source`**   –   An Amazon S3 URI.

  The `SOURCE` parameter accepts an Amazon S3 URI that identifies a single file, multiple files, a folder, or multiple folders. Neptune loads every data file in any folder that is specified.

  The URI can be in any of the following formats.
  + `s3://bucket_name/object-key-name`
  + `https://s3.amazonaws.com/bucket_name/object-key-name`
  + `https://s3.us-east-1.amazonaws.com/bucket_name/object-key-name`

  The `object-key-name` element of the URI is equivalent to the [prefix](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestParameters) parameter in an Amazon S3 [ListObjects](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html) API call. It identifies all the objects in the specified Amazon S3 bucket whose names begin with that prefix. That can be a single file or folder, or multiple files and/or folders.

  The specified folder or folders can contain multiple vertex files and multiple edge files.

   For example, if you had the following folder structure and files in an Amazon S3 bucket named `bucket-name`: 

  ```
  s3://bucket-name/a/bc
  s3://bucket-name/ab/c
  s3://bucket-name/ade
  s3://bucket-name/bcd
  ```

   If the source parameter is specified as `s3://bucket-name/a`, the first three files will be loaded. 

  ```
  s3://bucket-name/a/bc
  s3://bucket-name/ab/c
  s3://bucket-name/ade
  ```
+ **`format`**   –   The format of the data. For more information about data formats for the Neptune `Loader` command, see [Using the Amazon Neptune bulk loader to ingest data](bulk-load.md).

**Allowed values**
  + **`csv`** for the [Gremlin CSV data format](bulk-load-tutorial-format-gremlin.md).
  + **`opencypher`** for the [openCypher CSV data format](bulk-load-tutorial-format-opencypher.md).
  + **`ntriples`** for the [N-Triples RDF data format](https://www.w3.org/TR/n-triples/).
  + **`nquads`** for the [N-Quads RDF data format](https://www.w3.org/TR/n-quads/).
  + **`rdfxml`** for the [RDF\$1XML RDF data format](https://www.w3.org/TR/rdf-syntax-grammar/).
  + **`turtle`** for the [Turtle RDF data format](https://www.w3.org/TR/turtle/).
+ **`iamRoleArn`**   –   The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see [Prerequisites: IAM Role and Amazon S3 Access](bulk-load-tutorial-IAM.md).

  Starting with [engine release 1.2.1.0.R3](engine-releases-1.2.1.0.R3.md), you can also chain multiple IAM roles if the Neptune DB instance and the Amazon S3 bucket are located in different AWS Accounts. In this case, `iamRoleArn` contains a comma-separated list of role ARNs, as described in [Chaining IAM roles in Amazon Neptune](bulk-load-tutorial-chain-roles.md). For example:

------
#### [ AWS CLI ]

  ```
  aws neptunedata start-loader-job \
    --endpoint-url https://your-neptune-endpoint:port \
    --source "s3://(the target bucket name)/(the target date file name)" \
    --format "csv" \
    --iam-role-arn "arn:aws:iam::(Account A ID):role/(RoleA),arn:aws:iam::(Account B ID):role/(RoleB),arn:aws:iam::(Account C ID):role/(RoleC)" \
    --s3-bucket-region "us-east-1"
  ```

  For more information, see [start-loader-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-loader-job.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

  ```
  import boto3
  from botocore.config import Config
  
  client = boto3.client(
      'neptunedata',
      endpoint_url='https://your-neptune-endpoint:port',
      config=Config(read_timeout=None, retries={'total_max_attempts': 1})
  )
  
  response = client.start_loader_job(
      source='s3://(the target bucket name)/(the target date file name)',
      format='csv',
      iamRoleArn='arn:aws:iam::(Account A ID):role/(RoleA),arn:aws:iam::(Account B ID):role/(RoleB),arn:aws:iam::(Account C ID):role/(RoleC)',
      s3BucketRegion='us-east-1'
  )
  
  print(response)
  ```

------
#### [ awscurl ]

  ```
  awscurl https://your-neptune-endpoint:port/loader \
    --region us-east-1 \
    --service neptune-db \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
          "source" : "s3://(the target bucket name)/(the target date file name)",
          "iamRoleArn" : "arn:aws:iam::(Account A ID):role/(RoleA),arn:aws:iam::(Account B ID):role/(RoleB),arn:aws:iam::(Account C ID):role/(RoleC)",
          "format" : "csv",
          "region" : "us-east-1"
        }'
  ```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

  ```
  curl -X POST https://your-neptune-endpoint:port/loader \
    -H 'Content-Type: application/json' \
    -d '{
          "source" : "s3://(the target bucket name)/(the target date file name)",
          "iamRoleArn" : "arn:aws:iam::(Account A ID):role/(RoleA),arn:aws:iam::(Account B ID):role/(RoleB),arn:aws:iam::(Account C ID):role/(RoleC)",
          "format" : "csv",
          "region" : "us-east-1"
        }'
  ```

------
+ **`region`**   –   The `region` parameter must match the AWS Region of the cluster and the S3 bucket.

  Amazon Neptune is available in the following Regions:
  + US East (N. Virginia):   `us-east-1`
  + US East (Ohio):   `us-east-2`
  + US West (N. California):   `us-west-1`
  + US West (Oregon):   `us-west-2`
  + Canada (Central):   `ca-central-1`
  + Canada West (Calgary):   `ca-west-1`
  + South America (São Paulo):   `sa-east-1`
  + Europe (Stockholm):   `eu-north-1`
  + Europe (Spain):   `eu-south-2`
  + Europe (Ireland):   `eu-west-1`
  + Europe (London):   `eu-west-2`
  + Europe (Paris):   `eu-west-3`
  + Europe (Frankfurt):   `eu-central-1`
  + Middle East (Bahrain):   `me-south-1`
  + Middle East (UAE):   `me-central-1`
  + Israel (Tel Aviv):   `il-central-1`
  + Africa (Cape Town):   `af-south-1`
  + Asia Pacific (Hong Kong):   `ap-east-1`
  + Asia Pacific (Tokyo):   `ap-northeast-1`
  + Asia Pacific (Seoul):   `ap-northeast-2`
  + Asia Pacific (Osaka):   `ap-northeast-3`
  + Asia Pacific (Singapore):   `ap-southeast-1`
  + Asia Pacific (Sydney):   `ap-southeast-2`
  + Asia Pacific (Jakarta):   `ap-southeast-3`
  + Asia Pacific (Melbourne):   `ap-southeast-4`
  + Asia Pacific (Malaysia):   `ap-southeast-5`
  + Asia Pacific (Mumbai):   `ap-south-1`
  + Asia Pacific (Hyderabad):   `ap-south-2`
  + China (Beijing):   `cn-north-1`
  + China (Ningxia):   `cn-northwest-1`
  + AWS GovCloud (US-West):   `us-gov-west-1`
  + AWS GovCloud (US-East):   `us-gov-east-1`
+ **`mode`**   –   The load job mode.

  *Allowed values*: `RESUME`, `NEW`, `AUTO`.

  *Default value*: `AUTO`

****
  + `RESUME`   –   In RESUME mode, the loader looks for a previous load from this source, and if it finds one, resumes that load job. If no previous load job is found, the loader stops.

    The loader avoids reloading files that were successfully loaded in a previous job. It only tries to process failed files. If you dropped previously loaded data from your Neptune cluster, that data is not reloaded in this mode. If a previous load job loaded all files from the same source successfully, nothing is reloaded, and the loader returns success.
  + `NEW`   –   In NEW mode, the creates a new load request regardless of any previous loads. You can use this mode to reload all the data from a source after dropping previously loaded data from your Neptune cluster, or to load new data available at the same source.
  + `AUTO`   –   In AUTO mode, the loader looks for a previous load job from the same source, and if it finds one, resumes that job, just as in `RESUME` mode.

    If the loader doesn't find a previous load job from the same source, it loads all data from the source, just as in `NEW` mode.
+  **`edgeOnlyLoad`**   –   A flag that controls file processing order during bulk loading. 

  *Allowed values*: `"TRUE"`, `"FALSE"`.

  *Default value*: `"FALSE"`.

   When this parameter is set to "FALSE", the loader automatically loads vertex files first, then edge files afterwards. It does this by first scanning all files to determine their contents (vertices or edges). When this parameter is set to "TRUE", the loader skips the initial scanning phase and immediately loads all files in the order they appear. For more information see [bulk load optimize](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-optimize.html). 
+ **`failOnError`**   –   A flag to toggle a complete stop on an error.

  *Allowed values*: `"TRUE"`, `"FALSE"`.

  *Default value*: `"TRUE"`.

  When this parameter is set to `"FALSE"`, the loader tries to load all the data in the location specified, skipping any entries with errors.

  When this parameter is set to `"TRUE"`, the loader stops as soon as it encounters an error. Data loaded up to that point persists.
+ **`parallelism`**   –   This is an optional parameter that can be set to reduce the number of threads used by the bulk load process.

  *Allowed values*:
  + `LOW` –   The number of threads used is the number of available vCPUs divided by 8.
  + `MEDIUM` –   The number of threads used is the number of available vCPUs divided by 2.
  + `HIGH` –   The number of threads used is the same as the number of available vCPUs.
  + `OVERSUBSCRIBE` –   The number of threads used is the number of available vCPUs multiplied by 2. If this value is used, the bulk loader takes up all available resources.

    This does not mean, however, that the `OVERSUBSCRIBE` setting results in 100% CPU utilization. Because the load operation is I/O bound, the highest CPU utilization to expect is in the 60% to 70% range.

  *Default value*: `HIGH`

  The `parallelism` setting can sometimes result in a deadlock between threads when loading openCypher data. When this happens, Neptune returns the `LOAD_DATA_DEADLOCK` error. You can generally fix the issue by setting `parallelism` to a lower setting and retrying the load command.
+ **`parserConfiguration`**   –   An optional object with additional parser configuration values. Each of the child parameters is also optional:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-load.html)

  For more information, see [SPARQL Default Graph and Named Graphs](feature-sparql-compliance.md#sparql-default-graph).
+ **`updateSingleCardinalityProperties`**   –   This is an optional parameter that controls how the bulk loader treats a new value for single-cardinality vertex or edge properties. This is not supported for loading openCypher data (see [Loading openCypher data](#load-api-reference-load-parameters-opencypher)).

  *Allowed values*: `"TRUE"`, `"FALSE"`.

  *Default value*: `"FALSE"`.

  By default, or when `updateSingleCardinalityProperties` is explicitly set to `"FALSE"`, the loader treats a new value as an error, because it violates single cardinality.

  When `updateSingleCardinalityProperties` is set to `"TRUE"`, on the other hand, the bulk loader replaces the existing value with the new one. If multiple edge or single-cardinality vertex property values are provided in the source file(s) being loaded, the final value at the end of the bulk load could be any one of those new values. The loader only guarantees that the existing value has been replaced by one of the new ones.
+ **`queueRequest`**   –   This is an optional flag parameter that indicates whether the load request can be queued up or not. 

  You don't have to wait for one load job to complete before issuing the next one, because Neptune can queue up as many as 64 jobs at a time, provided that their `queueRequest` parameters are all set to `"TRUE"`. The queue order of the jobs will be first-in-first-out (FIFO). 

  If the `queueRequest` parameter is omitted or set to `"FALSE"`, the load request will fail if another load job is already running.

  *Allowed values*: `"TRUE"`, `"FALSE"`.

  *Default value*: `"FALSE"`.
+ **`dependencies`**   –   This is an optional parameter that can make a queued load request contingent on the successful completion of one or more previous jobs in the queue.

  Neptune can queue up as many as 64 load requests at a time, if their `queueRequest` parameters are set to `"TRUE"`. The `dependencies` parameter lets you make execution of such a queued request dependent on the successful completion of one or more specified previous requests in the queue.

  For example, if load `Job-A` and `Job-B` are independent of each other, but load `Job-C` needs `Job-A` and `Job-B` to be finished before it begins, proceed as follows:

  1. Submit `load-job-A` and `load-job-B` one after another in any order, and save their load-ids.

  1. Submit `load-job-C` with the load-ids of the two jobs in its `dependencies` field:

  ```
    "dependencies" : ["job_A_load_id", "job_B_load_id"]
  ```

  Because of the `dependencies` parameter, the bulk loader will not start `Job-C` until `Job-A` and `Job-B` have completed successfully. If either one of them fails, Job-C will not be executed, and its status will be set to `LOAD_FAILED_BECAUSE_DEPENDENCY_NOT_SATISFIED`.

  You can set up multiple levels of dependency in this way, so that the failure of one job will cause all requests that are directly or indirectly dependent on it to be cancelled.
+ **`userProvidedEdgeIds`**   –   This parameter is required only when loading openCypher data that contains relationship IDs. It must be included and set to `True` when openCypher relationship IDs are explicitly provided in the load data (recommended).

  When `userProvidedEdgeIds` is absent or set to `True`, an `:ID` column must be present in every relationship file in the load.

  When `userProvidedEdgeIds` is present and set to `False`, relationship files in the load **must not** contain an `:ID` column. Instead, the Neptune loader automatically generates an ID for each relationship.

  It's useful to provide relationship IDs explicitly so that the loader can resume loading after error in the CSV data have been fixed, without having to reload any relationships that have already been loaded. If relationship IDs have not been explicitly assigned, the loader cannot resume a failed load if any relationship file has had to be corrected, and must instead reload all the relationships.
+ `accessKey`   –   **[deprecated]** An access key ID of an IAM role with access to the S3 bucket and data files.

  The `iamRoleArn` parameter is recommended instead. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see [Prerequisites: IAM Role and Amazon S3 Access](bulk-load-tutorial-IAM.md).

  For more information, see [Access keys (access key ID and secret access key)](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys).
+ `secretKey`   –   **[deprecated]** The `iamRoleArn` parameter is recommended instead. For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see [Prerequisites: IAM Role and Amazon S3 Access](bulk-load-tutorial-IAM.md).

  For more information, see [Access keys (access key ID and secret access key)](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys).

### Special considerations for loading openCypher data
Loading openCypher data
+ When loading openCypher data in CSV format, the format parameter must be set to `opencypher`.
+ The `updateSingleCardinalityProperties` parameter is not supported for openCypher loads because all openCypher properties have single cardinality. The openCypher load format does not support arrays, and if an ID value appears more than once, it is treated as a duplicate or an insertion error (see below).
+ The Neptune loader handles duplicates that it encounters in openCypher data as follows:
  + If the loader encounters multiple rows with the same node ID, they are merged using the following rule:
    + All the labels in the rows are added to the node.
    + For each property, only one of the property values is loaded. The selection of the one to load is non-deterministic.
  + If the loader encounters multiple rows with the same relationship ID, only one of them is loaded. The selection of the one to load is non-deterministric.
  + The loader never updates property values of an existing node or relationship in the database if it encounters load data having the ID of the existing node or relationship. However, it does load node labels and properties that are not present in the existing node or relationship. 
+ Although you don't have to assign IDs to relationships, it is usually a good idea (see the `userProvidedEdgeIds` parameter above). Without explicit relationship IDs, the loader must reload all relationships in case of an error in a relationship file, rather than resuming the load from where it failed.

  Also, if the load data doesn't contain explicit relationship IDs, the loader has no way of detecting duplicate relationships.

Here is an example of an openCypher load command:

------
#### [ AWS CLI ]

```
aws neptunedata start-loader-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --source "s3://bucket-name/object-key-name" \
  --format "opencypher" \
  --user-provided-edge-ids \
  --iam-role-arn "arn:aws:iam::account-id:role/role-name" \
  --s3-bucket-region "region" \
  --no-fail-on-error \
  --parallelism "MEDIUM"
```

For more information, see [start-loader-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-loader-job.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_loader_job(
    source='s3://bucket-name/object-key-name',
    format='opencypher',
    userProvidedEdgeIds=True,
    iamRoleArn='arn:aws:iam::account-id:role/role-name',
    s3BucketRegion='region',
    failOnError=False,
    parallelism='MEDIUM'
)

print(response)
```

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/loader \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "source" : "s3://bucket-name/object-key-name",
        "format" : "opencypher",
        "userProvidedEdgeIds": "TRUE",
        "iamRoleArn" : "arn:aws:iam::account-id:role/role-name",
        "region" : "region",
        "failOnError" : "FALSE",
        "parallelism" : "MEDIUM"
      }'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/loader \
  -H 'Content-Type: application/json' \
  -d '{
        "source" : "s3://bucket-name/object-key-name",
        "format" : "opencypher",
        "userProvidedEdgeIds": "TRUE",
        "iamRoleArn" : "arn:aws:iam::account-id:role/role-name",
        "region" : "region",
        "failOnError" : "FALSE",
        "parallelism" : "MEDIUM"
      }'
```

------

The loader response is the same as normal. For example:

```
{
  "status" : "200 OK",
  "payload" : {
    "loadId" : "guid_as_string"
  }
}
```

## Neptune Loader Response Syntax
Response Syntax

```
{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "guid_as_string"
    }
}
```

**200 OK**  
Successfully started load job returns a `200` code.

# Neptune Loader Errors
Errors

When an error occurs, a JSON object is returned in the `BODY` of the response. The `message` object contains a description of the error.

**Error Categories**
+ `Error 400`   –   Syntax errors return an HTTP `400` bad request error. The message describes the error.
+ `Error 500`   –   A valid request that cannot be processed returns an HTTP `500` internal server error. The message describes the error.

The following are possible error messages from the loader with a description of the error.

**Loader Error Messages**
+ `Couldn't find the AWS credential for iam_role_arn`  (HTTP 400)

  The credentials were not found. Verify the supplied credentials against the IAM console or AWS CLI output. Make sure that you have added the IAM role specified in `iamRoleArn` to the cluster.
+ `S3 bucket not found for source`  (HTTP 400)

  The S3 bucket does not exist. Check the name of the bucket.
+ `The source source-uri does not exist/not reachable`  (HTTP 400)

  No matching files were found in the S3 bucket.
+ `Unable to connect to S3 endpoint. Provided source = source-uri and region = aws-region`  (HTTP 500)

  Unable to connect to Amazon S3. Region must match the cluster Region. Ensure that you have a VPC endpoint. For information about creating a VPC endpoint, see [Creating an Amazon S3 VPC Endpoint](bulk-load-data.md#bulk-load-prereqs-s3).
+ `Bucket is not in provided Region (aws-region)`  (HTTP 400)

  The bucket must be in the same AWS Region as your Neptune DB instance.
+ `Unable to perform S3 list operation`  (HTTP 400)

  The IAM user or role provided does not have `List` permissions on the bucket or the folder. Check the policy or the access control list (ACL) on the bucket.
+ `Start new load operation not permitted on a read replica instance`  (HTTP 405)

  Loading is a write operation. Retry load on the read/write cluster endpoint.
+ `Failed to start load because of unknown error from S3`  (HTTP 500)

  Amazon S3 returned an unknown error. Contact [AWS Support](https://aws.amazon.com/premiumsupport/).
+ `Invalid S3 access key`  (HTTP 400)

  Access key is invalid. Check the provided credentials.
+ `Invalid S3 secret key`  (HTTP 400)

  Secret key is invalid. Check the provided credentials.
+ `Max concurrent load limit breached`  (HTTP 400)

  If a load request is submitted without `"queueRequest" : "TRUE"`, and a load job is currently running, the request will fail with this error.
+ `Failed to start new load for the source "source name". Max load task queue size limit breached. Limit is 64`  (HTTP 400)

  Neptune supports queuing up as many as 64 loader jobs at a time. If an additional load request is submitted to the queue when it already contains 64 jobs, the request fails with this message.

# Neptune Loader Examples
Examples

 This example demonstrates how to use the Neptune loader to load data into a Neptune graph database using the Gremlin CSV format. The request is sent as an HTTP POST request to the Neptune loader endpoint, and the request body contains the necessary parameters to specify the data source, format, IAM role, and other configuration options. The response includes the load ID, which can be used to track the progress of the data loading process. 

**Example Request**  
The following is a request sent via HTTP POST using the `curl` command. It loads a file in the Neptune CSV format. For more information, see [Gremlin load data format](bulk-load-tutorial-format-gremlin.md).  

```
aws neptunedata start-loader-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --source "s3://bucket-name/object-key-name" \
  --format "csv" \
  --iam-role-arn "ARN for the IAM role you are using" \
  --s3-bucket-region "region" \
  --no-fail-on-error \
  --parallelism "MEDIUM" \
  --no-update-single-cardinality-properties \
  --no-queue-request
```
For more information, see [start-loader-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-loader-job.html) in the AWS CLI Command Reference.

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_loader_job(
    source='s3://bucket-name/object-key-name',
    format='csv',
    iamRoleArn='ARN for the IAM role you are using',
    s3BucketRegion='region',
    failOnError=False,
    parallelism='MEDIUM',
    updateSingleCardinalityProperties=False,
    queueRequest=False
)

print(response)
```

```
awscurl https://your-neptune-endpoint:port/loader \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "source" : "s3://bucket-name/object-key-name",
        "format" : "csv",
        "iamRoleArn" : "ARN for the IAM role you are using",
        "region" : "region",
        "failOnError" : "FALSE",
        "parallelism" : "MEDIUM",
        "updateSingleCardinalityProperties" : "FALSE",
        "queueRequest" : "FALSE"
      }'
```
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

```
curl -X POST https://your-neptune-endpoint:port/loader \
  -H 'Content-Type: application/json' \
  -d '{
        "source" : "s3://bucket-name/object-key-name",
        "format" : "csv",
        "iamRoleArn" : "ARN for the IAM role you are using",
        "region" : "region",
        "failOnError" : "FALSE",
        "parallelism" : "MEDIUM",
        "updateSingleCardinalityProperties" : "FALSE",
        "queueRequest" : "FALSE"
      }'
```

**Example Response**  

```
{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "ef478d76-d9da-4d94-8ff1-08d9d4863aa5"
    }
}
```

# Neptune Loader Get-Status API
Get-Status API

Gets the status of a `loader` job.

To get load status, you must send an HTTP `GET` request to the `https://your-neptune-endpoint:port/loader` endpoint. To get the status for a particular load request, you must include the `loadId` as a URL parameter, or append the `loadId` to the URL path.

Neptune only keeps track of the most recent 1,024 bulk load jobs, and only stores the last 10,000 error details per job. 

See [Neptune Loader Error and Feed Messages](loader-message.md) for a list of the error and feed messages returned by the loader in case of errors.

**Contents**
+ [

# Neptune Loader Get-Status requests
](load-api-reference-status-requests.md)
  + [

## Loader Get-Status request syntax
](load-api-reference-status-requests.md#load-api-reference-status-request-syntax)
  + [

## Neptune Loader Get-Status request parameters
](load-api-reference-status-requests.md#load-api-reference-status-parameters)
+ [

# Neptune Loader Get-Status Responses
](load-api-reference-status-response.md)
  + [

## Neptune Loader Get-Status Response JSON layout
](load-api-reference-status-response.md#load-api-reference-status-response-layout)
  + [

## Neptune Loader Get-Status `overallStatus` and `failedFeeds` response objects
](load-api-reference-status-response.md#load-api-reference-status-response-objects)
  + [

## Neptune Loader Get-Status `errors` response object
](load-api-reference-status-response.md#load-api-reference-status-errors)
  + [

## Neptune Loader Get-Status `errorLogs` response object
](load-api-reference-status-response.md#load-api-reference-error-logs)
+ [

# Neptune Loader Get-Status Examples
](load-api-reference-status-examples.md)
  + [

## Example request for load status
](load-api-reference-status-examples.md#load-api-reference-status-examples-status-request)
  + [

## Example request for loadIds
](load-api-reference-status-examples.md#load-api-reference-status-examples-loadId-request)
  + [

## Example request for detailed status
](load-api-reference-status-examples.md#load-api-reference-status-examples-details-request)
+ [

# Neptune Loader Get-Status `errorLogs` examples
](load-api-reference-error-logs-examples.md)
  + [

## Example detailed status response when errors occurred
](load-api-reference-error-logs-examples.md#load-api-reference-status-examples-details-request-errors)
  + [

## Example of a `Data prefetch task interrupted` error
](load-api-reference-error-logs-examples.md#load-api-reference-status-examples-task-interrupted)

# Neptune Loader Get-Status requests
Requests

## Loader Get-Status request syntax
Syntax

```
GET https://your-neptune-endpoint:port/loader?loadId=loadId
```

```
GET https://your-neptune-endpoint:port/loader/loadId
```

```
GET https://your-neptune-endpoint:port/loader
```

## Neptune Loader Get-Status request parameters
Request parameters
+ **`loadId`**   –   The ID of the load job. If you do not specify a `loadId`, a list of load IDs is returned.
+ **`details`**   –   Include details beyond overall status.

  *Allowed values*: `TRUE`, `FALSE`.

  *Default value*: `FALSE`.
+ **`errors`**   –   Include the list of errors.

  *Allowed values*: `TRUE`, `FALSE`.

  *Default value*: `FALSE`.

  The list of errors is paged. The `page` and `errorsPerPage` parameters allow you to page through all the errors.
+ **`page`**   –   The error page number. Only valid with the `errors` parameter set to `TRUE`.

  *Allowed values*: Positive integers.

  *Default value*: 1.
+ **`errorsPerPage`**   –   The number of errors per each page. Only valid with the `errors` parameter set to `TRUE`.

  *Allowed values*: Positive integers.

  *Default value*: 10.
+ **`limit`**   –   The number of load ids to list. Only valid when requesting a list of load IDs by sending a `GET` request with no `loadId` specified.

  *Allowed values*: Positive integers from 1 through 100.

  *Default value*: 100.
+ **`includeQueuedLoads`**   –   An optional parameter that can be used to exclude the load IDs of queued load requests when a list of load IDs is requested.

  By default, the load IDs of all load jobs with status `LOAD_IN_QUEUE` are included in such a list. They appear before the load IDs of other jobs, sorted by the time they were added to the queue from most recent to earliest.

  *Allowed values*: `TRUE`, `FALSE`.

  *Default value*: `TRUE`.

# Neptune Loader Get-Status Responses
Responses

 The following example response from the Neptune Get-Status API describes the overall structure of the response, explains the various fields and their data types, as well as the error handling and error log details. 

## Neptune Loader Get-Status Response JSON layout
JSON layout

The general layout of a loader status response is as follows:

```
{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_FAILED" : number
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://bucket/key",
            "runNumber" : number,
            "retryNumber" : number,
            "status" : "string",
            "totalTimeSpent" : number,
            "startTime" : number,
            "totalRecords" : number,
            "totalDuplicates" : number,
            "parsingErrors" : number,
            "datatypeMismatchErrors" : number,
            "insertErrors" : number,
        },
        "failedFeeds" : [
            {
                "fullUri" : "s3://bucket/key",
                "runNumber" : number,
                "retryNumber" : number,
                "status" : "string",
                "totalTimeSpent" : number,
                "startTime" : number,
                "totalRecords" : number,
                "totalDuplicates" : number,
                "parsingErrors" : number,
                "datatypeMismatchErrors" : number,
                "insertErrors" : number,
            }
        ],
        "errors" : {
            "startIndex" : number,
            "endIndex" : number,
            "loadId" : "string,
            "errorLogs" : [ ]
        }
    }
}
```

## Neptune Loader Get-Status `overallStatus` and `failedFeeds` response objects
overallStatus objects

The possible responses returned for each failed feed, including the error descriptions, are the same as for the `overallStatus` object in a `Get-Status` response.

The following fields appear in the `overallStatus` object for all loads, and the `failedFeeds` object for each failed feed:
+ **`fullUri` **  –   The URI of the file or files to be loaded.

  *Type:* *string*

  *Format*: `s3://bucket/key`.
+ **`runNumber`**   –   The run number of this load or feed. This is incremented when the load is restarted.

  *Type:* *unsigned long*.
+ **`retryNumber`**   –   The retry number of this load or feed. This is incremented when the loader automatically retries a feed or load.

  *Type:* *unsigned long*.
+ **`status`**   –   The returned status of the load or feed. `LOAD_COMPLETED` indicates a successful load with no problems. For a list of other load-status messages, see [Neptune Loader Error and Feed Messages](loader-message.md).

  *Type:* *string*.
+ **`totalTimeSpent`**   –   The time, in seconds, spent to parse and insert data for the load or feed. This does not include the time spent fetching the list of source files.

  *Type:* *unsigned long*.
+ **`totalRecords`**   –   Total records loaded or attempted to load.

  *Type:* *unsigned long*.

  Note that when loading from a CSV file, the record count does not refer to the number of lines loaded, but rather to the number of individual records in those lines. For example, take a tiny CSV file like this:

  ```
  ~id,~label,name,team
  'P-1','Player','Stokes','England'
  ```

  Neptune would consider this file to contain 3 records, namely:

  ```
  P-1  label Player
  P-1  name  Stokes
  P-1  team  England
  ```
+ **`totalDuplicates`**   –   The number of duplicate records encountered.

  *Type:* *unsigned long*.

  As in the case of the `totalRecords` count, this value contains the number of individual duplicate records in a CSV file, not the number of duplicate lines. Take this small CSV file, for example:

  ```
  ~id,~label,name,team
  P-2,Player,Kohli,India
  P-2,Player,Kohli,India
  ```

  The status returned after loading it would look like this, reporting 6 total records, of which 3 are duplicates:

  ```
  {
    "status": "200 OK",
    "payload": {
      "feedCount": [
        {
          "LOAD_COMPLETED": 1
        }
      ],
      "overallStatus": {
        "fullUri": "(the URI of the CSV file)",
        "runNumber": 1,
        "retryNumber": 0,
        "status": "LOAD_COMPLETED",
        "totalTimeSpent": 3,
        "startTime": 1662131463,
        "totalRecords": 6,
        "totalDuplicates": 3,
        "parsingErrors": 0,
        "datatypeMismatchErrors": 0,
        "insertErrors": 0
      }
    }
  }
  ```

  For openCypher loads, a duplicate is counted when:
  + The loader detects that a row in a node file has an ID without an ID space that is the same as another ID value without an ID space, either in another row or belonging to an existing node.
  + The loader detects that a row in a node file has an ID with an ID space that is the same as another ID value with ID space, either in another row or belonging to an existing node.

  See [Special considerations for loading openCypher data](load-api-reference-load.md#load-api-reference-load-parameters-opencypher).
+ **`parsingErrors`**   –   The number of parsing errors encountered.

  *Type:* *unsigned long*.
+ **`datatypeMismatchErrors`**   –   The number of records with a data type that did not match the given data.

  *Type:* *unsigned long*.
+ **`insertErrors`**   –   The number of records that could not be inserted due to errors.

  *Type:* *unsigned long*.

## Neptune Loader Get-Status `errors` response object
errors object

Errors fall into the following categories:
+ **`Error 400`**   –   An invalid `loadId` returns an HTTP `400` bad request error. The message describes the error.
+ **`Error 500`**   –   A valid request that cannot be processed returns an HTTP `500` internal server error. The message describes the error.

See [Neptune Loader Error and Feed Messages](loader-message.md) for a list of the error and feed messages returned by the loader in case of errors.

When an error occurs, a JSON `errors` object is returned in the `BODY` of the response, with the following fields:
+ **`startIndex`**   –   The index of the first included error.

  *Type:* *unsigned long*.
+ **`endIndex`**   –   The index of the last included error.

  *Type:* *unsigned long*.
+ **`loadId`**   –   The ID of the load. You can use this ID to print the errors for the load by setting the `errors` parameter to `TRUE`.

  *Type:* *string*.
+ **`errorLogs`**   –   A list of the errors.

  *Type:* *list*.

## Neptune Loader Get-Status `errorLogs` response object
errorLogs object

The `errorLogs` object under `errors` in the loader Get-Status response contains an object describing each error using the following fields:
+ **`errorCode`**   –   Identifies the nature of error.

  It can take one of the following values:
  + `PARSING_ERROR`
  + `S3_ACCESS_DENIED_ERROR`
  + `FROM_OR_TO_VERTEX_ARE_MISSING`
  + `ID_ASSIGNED_TO_MULTIPLE_EDGES`
  + `SINGLE_CARDINALITY_VIOLATION`
  + `FILE_MODIFICATION_OR_DELETION_ERROR`
  + `OUT_OF_MEMORY_ERROR`
  + `INTERNAL_ERROR` (returned when the bulk loader cannot determine the type of the error).
+ **`errorMessage`**   –   A message describing the error.

  This can be a generic message associated with the error code or a specific message containing details, for example about a missing from/to vertex or about a parsing error.
+ **`fileName`**   –   The name of the feed.
+ **`recordNum`**   –   In the case of a parsing error, this is the record number in the file of the record that could not be parsed. It is set to zero if the record number is not applicable to the error, or if it could not be determined.

For example, the bulk loader would generate a parsing error if it encountered a faulty row such as the following in an RDF `nquads` file:

```
<http://base#subject> |http://base#predicate> <http://base#true> .
```

As you can see, the second `http` in the row above should be preceded by  `<`  rather than  `|` . The resulting error object under `errorLogs` in a status response would look like this:

```
{
    "errorCode" : "PARSING_ERROR",
    "errorMessage" : "Expected '<', found: |",
    "fileName" : "s3://bucket/key",
    "recordNum" : 12345
},
```

# Neptune Loader Get-Status Examples
Examples

 The following examples showcase the usage of the Neptune loader's GET-Status API, which allows you to retrieve information about the status of your data loads into the Amazon Neptune graph database. These examples cover three main scenarios: retrieving the status of a specific load, listing the available load IDs, and requesting detailed status information for a specific load. 

## Example request for load status
Example status request

The following is a request sent via HTTP `GET` using the `curl` command.

------
#### [ AWS CLI ]

```
aws neptunedata get-loader-job-status \
  --endpoint-url https://your-neptune-endpoint:port \
  --load-id loadId (a UUID)
```

For more information, see [get-loader-job-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-loader-job-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.get_loader_job_status(
    loadId='loadId (a UUID)'
)

print(response)
```

------
#### [ awscurl ]

```
awscurl 'https://your-neptune-endpoint:port/loader/loadId (a UUID)' \
  --region us-east-1 \
  --service neptune-db
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl -X GET 'https://your-neptune-endpoint:port/loader/loadId (a UUID)'
```

------

**Example Response**  

```
{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_FAILED" : 1
            }
        ],
        "overallStatus" : {
            "datatypeMismatchErrors" : 0,
            "fullUri" : "s3://bucket/key",
            "insertErrors" : 0,
            "parsingErrors" : 5,
            "retryNumber" : 0,
            "runNumber" : 1,
            "status" : "LOAD_FAILED",
            "totalDuplicates" : 0,
            "totalRecords" : 5,
            "totalTimeSpent" : 3.0
        }
    }
}
```

## Example request for loadIds
Example loadIds request

The following is a request sent via HTTP `GET` using the `curl` command.

------
#### [ AWS CLI ]

```
aws neptunedata list-loader-jobs \
  --endpoint-url https://your-neptune-endpoint:port \
  --limit 3
```

For more information, see [list-loader-jobs](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/list-loader-jobs.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.list_loader_jobs(
    limit=3
)

print(response)
```

------
#### [ awscurl ]

```
awscurl 'https://your-neptune-endpoint:port/loader?limit=3' \
  --region us-east-1 \
  --service neptune-db
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl -X GET 'https://your-neptune-endpoint:port/loader?limit=3'
```

------

**Example Response**  

```
{
    "status" : "200 OK",
    "payload" : {
         "loadIds" : [
            "a2c0ce44-a44b-4517-8cd4-1dc144a8e5b5",
            "09683a01-6f37-4774-bb1b-5620d87f1931",
            "58085eb8-ceb4-4029-a3dc-3840969826b9"
        ]
    }
}
```

## Example request for detailed status
Example details request

The following is a request sent via HTTP `GET` using the `curl` command.

------
#### [ AWS CLI ]

```
aws neptunedata get-loader-job-status \
  --endpoint-url https://your-neptune-endpoint:port \
  --load-id loadId (a UUID) \
  --details
```

For more information, see [get-loader-job-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-loader-job-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.get_loader_job_status(
    loadId='loadId (a UUID)',
    details=True
)

print(response)
```

------
#### [ awscurl ]

```
awscurl 'https://your-neptune-endpoint:port/loader/loadId (a UUID)?details=true' \
  --region us-east-1 \
  --service neptune-db
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl -X GET 'https://your-neptune-endpoint:port/loader/loadId (a UUID)?details=true'
```

------

**Example Response**  

```
{
    "status" : "200 OK",
    "payload" : {
        "failedFeeds" : [
            {
                "datatypeMismatchErrors" : 0,
                "fullUri" : "s3://bucket/key",
                "insertErrors" : 0,
                "parsingErrors" : 5,
                "retryNumber" : 0,
                "runNumber" : 1,
                "status" : "LOAD_FAILED",
                "totalDuplicates" : 0,
                "totalRecords" : 5,
                "totalTimeSpent" : 3.0
            }
        ],
        "feedCount" : [
            {
                "LOAD_FAILED" : 1
            }
        ],
        "overallStatus" : {
            "datatypeMismatchErrors" : 0,
            "fullUri" : "s3://bucket/key",
            "insertErrors" : 0,
            "parsingErrors" : 5,
            "retryNumber" : 0,
            "runNumber" : 1,
            "status" : "LOAD_FAILED",
            "totalDuplicates" : 0,
            "totalRecords" : 5,
            "totalTimeSpent" : 3.0
        }
    }
}
```

# Neptune Loader Get-Status `errorLogs` examples
errorLogs examples

 The following examples showcase the detailed status response from the Neptune loader when errors have occurred during the data loading process. The examples illustrate the structure of the response, including information about failed feeds, overall status, and detailed error logs. 

## Example detailed status response when errors occurred
Example details with errors

This a request sent via HTTP `GET` using `curl`:

------
#### [ AWS CLI ]

```
aws neptunedata get-loader-job-status \
  --endpoint-url https://your-neptune-endpoint:port \
  --load-id 0a237328-afd5-4574-a0bc-c29ce5f54802 \
  --details \
  --errors \
  --errors-per-page 3 \
  --page 1
```

For more information, see [get-loader-job-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-loader-job-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.get_loader_job_status(
    loadId='0a237328-afd5-4574-a0bc-c29ce5f54802',
    details=True,
    errors=True,
    errorsPerPage=3,
    page=1
)

print(response)
```

------
#### [ awscurl ]

```
awscurl 'https://your-neptune-endpoint:port/loader/0a237328-afd5-4574-a0bc-c29ce5f54802?details=true&errors=true&page=1&errorsPerPage=3' \
  --region us-east-1 \
  --service neptune-db
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl -X GET 'https://your-neptune-endpoint:port/loader/0a237328-afd5-4574-a0bc-c29ce5f54802?details=true&errors=true&page=1&errorsPerPage=3'
```

------

**Example of a detailed response when errors have occurred**  
This is an example of the response that you might get from the query above, with an `errorLogs` object listing the load errors encountered:  

```
{
    "status" : "200 OK",
    "payload" : {
        "failedFeeds" : [
            {
                "datatypeMismatchErrors" : 0,
                "fullUri" : "s3://bucket/key",
                "insertErrors" : 0,
                "parsingErrors" : 5,
                "retryNumber" : 0,
                "runNumber" : 1,
                "status" : "LOAD_FAILED",
                "totalDuplicates" : 0,
                "totalRecords" : 5,
                "totalTimeSpent" : 3.0
            }
        ],
        "feedCount" : [
            {
                "LOAD_FAILED" : 1
            }
        ],
        "overallStatus" : {
            "datatypeMismatchErrors" : 0,
            "fullUri" : "s3://bucket/key",
            "insertErrors" : 0,
            "parsingErrors" : 5,
            "retryNumber" : 0,
            "runNumber" : 1,
            "status" : "LOAD_FAILED",
            "totalDuplicates" : 0,
            "totalRecords" : 5,
            "totalTimeSpent" : 3.0
        },
        "errors" : {
            "endIndex" : 3,
            "errorLogs" : [
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "Expected '<', found: |",
                    "fileName" : "s3://bucket/key",
                    "recordNum" : 1
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "Expected '<', found: |",
                    "fileName" : "s3://bucket/key",
                    "recordNum" : 2
                },
                {
                    "errorCode" : "PARSING_ERROR",
                    "errorMessage" : "Expected '<', found: |",
                    "fileName" : "s3://bucket/key",
                    "recordNum" : 3
                }
            ],
            "loadId" : "0a237328-afd5-4574-a0bc-c29ce5f54802",
            "startIndex" : 1
        }
    }
}
```

## Example of a `Data prefetch task interrupted` error
Task-interrupted error

Occasionally when you get a `LOAD_FAILED` status and then request more detailed information, the error returned may be a `PARSING_ERROR` with a `Data prefetch task interrupted` message, like this:

```
"errorLogs" : [
    {
        "errorCode" : "PARSING_ERROR",
        "errorMessage" : "Data prefetch task interrupted: Data prefetch task for 11467 failed",
        "fileName" : "s3://amzn-s3-demo-bucket/some-source-file",
        "recordNum" : 0
    }
]
```

This error occurs when there was a temporary interruption in the data load process that was typically not caused by your request or your data. It can usually be resolved simply by running the bulk upload request again. If you are using default settings, namely `"mode":"AUTO"`, and `"failOnError":"TRUE"`, the loader skips the files that it already successfully loaded and resumes loading files it had not yet loaded when the interruption occurred.

# Neptune Loader Cancel Job
Cancel Job

Cancels a load job.

To cancel a job, you must send an HTTP `DELETE` request to the `https://your-neptune-endpoint:port/loader` endpoint. The `loadId` can be appended to the `/loader` URL path, or included as a variable in the URL.

## Cancel Job request syntax
Request Syntax

```
DELETE https://your-neptune-endpoint:port/loader?loadId=loadId
```

```
DELETE https://your-neptune-endpoint:port/loader/loadId
```

## Cancel Job Request Parameters
Request Parameters

**loadId**  
The ID of the load job.

## Cancel Job Response Syntax
Response Syntax

```
no response body
```

**200 OK**  
Successfully deleted load job returns a `200` code.

## Cancel Job Errors
Errors

When an error occurs, a JSON object is returned in the `BODY` of the response. The `message` object contains a description of the error.

**Error Categories**
+ **`Error 400`**   –   An invalid `loadId` returns an HTTP `400` bad request error. The message describes the error.
+ **`Error 500`**   –   A valid request that cannot be processed returns an HTTP `500` internal server error. The message describes the error.

## Cancel Job Error Messages
Error Messages

The following are possible error messages from the cancel API with a description of the error.
+ `The load with id = load_id does not exist or not active` (HTTP 404)   –   The load was not found. Check the value of `id` parameter.
+ `Load cancellation is not permitted on a read replica instance.` (HTTP 405)   –   Loading is a write operation. Retry load on the read/write cluster endpoint. 

## Cancel Job Examples
Examples

**Example Request**  
The following is a request sent via HTTP `DELETE` using the `curl` command.  

```
aws neptunedata cancel-loader-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --load-id 0a237328-afd5-4574-a0bc-c29ce5f54802
```
For more information, see [cancel-loader-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/cancel-loader-job.html) in the AWS CLI Command Reference.

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.cancel_loader_job(
    loadId='0a237328-afd5-4574-a0bc-c29ce5f54802'
)

print(response)
```

```
awscurl 'https://your-neptune-endpoint:port/loader/0a237328-afd5-4574-a0bc-c29ce5f54802' \
  --region us-east-1 \
  --service neptune-db \
  -X DELETE
```
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

```
curl -X DELETE 'https://your-neptune-endpoint:port/loader/0a237328-afd5-4574-a0bc-c29ce5f54802'
```