# Exporting data from a Neptune DB cluster
<a name="neptune-data-export"></a>

There are several good ways to export data from a Neptune DB cluster:
+ For small amounts of data, simply use the results of a query or queries.
+ For RDF data, the [Graph Store Protocol (GSP)](sparql-graph-store-protocol.md) can make exporting easy. For example:

  ```
  curl --request GET \
    'https://your-neptune-endpoint:port/sparql/gsp/?graph=http://www.example.com/named/graph'
  ```
+ There is also a powerful and flexible open-source tool for exporting Neptune data, namely [https://github.com/aws/neptune-export](https://github.com/aws/neptune-export). The following sections describe the features of this tool and how to use it.

**Topics**
+ [Using `neptune-export`](neptune-export.md)
+ [Using the Neptune-Export service to export Neptune data](export-service.md)
+ [Using the `neptune-export` command-line tool to export data from Neptune](export-utility.md)
+ [Files exported by Neptune-Export and `neptune-export`](exported-files.md)
+ [Parameters used to control the Neptune export process](export-parameters.md)
+ [Troubleshooting the Neptune export process](export-troubleshooting.md)
+ [Exporting Gremlin query results to Amazon S3](exporting-gremlin.md)

# Using `neptune-export`
<a name="neptune-export"></a>

You can use the open-source [https://github.com/aws/neptune-export](https://github.com/aws/neptune-export) tool in two different ways:
+ **As the [Neptune-Export service](export-service.md)**.   When you export data from Neptune using the Neptune-Export service, you trigger and monitor export jobs through a REST API.
+ **As the [`neptune-export` Java command-line utility](export-utility.md)**.   To use this command-line tool to export Neptune data, you have to run it in an environment where your Neptune DB cluster is accessible.

Both the Neptune-Export service and the `neptune-export` command line tool publish data to Amazon Simple Storage Service (Amazon S3), encrypted using Amazon S3 server-side encryption (`SSE-S3`).

**Note**  
It is a best practice to [enable access logging](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-server-access-logging.html) on all Amazon S3 buckets, to let you audit all access to those buckets.

If you try to export data from a Neptune DB cluster whose data is changing while the export is happening, the consistency of the exported data is not guaranteed. That is, if your cluster is servicing write traffic while an export job is in progress, there may be inconsistencies in the exported data. This is true whether you export from the primary instance in the cluster or from one or more read replicas.

To guarantee that exported data is consistent, it is best to export from a [clone of your DB cluster](manage-console-cloning.md). This both provides the export tool with a static version of your data and ensures that the export job doesn't slow down queries in your original DB cluster.

To make this easier, you can indicate that you want to clone the source DB cluster when you trigger an export job. If you do, the export process automatically creates the clone, uses it for the export, and then deletes it when the export is finished.

# Using the Neptune-Export service to export Neptune data
<a name="export-service"></a>

You can use the following steps to export data from your Neptune DB cluster to Amazon S3 using the Neptune-Export service:

## Installing the Neptune-Export service
<a name="export-service-install"></a>

Use an AWS CloudFormation template to create the stack:

**To install the Neptune-Export service**

1. Launch the CloudFormation stack on the CloudFormation console by choosing one of the **Launch Stack** buttons in the following table:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune/latest/userguide/export-service.html)

1.  On the **Select Template** page, choose **Next**.

1. On the **Specify Details** page, the template, set the following parameters:
   + **`VPC`**   –   The easiest way to set up the Neptune-Export service is to install it in the same Amazon VPC as your Neptune database. If you want to install it in a separate VPC you can use [VPC peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) to establish connectivity between the Neptune DB cluster's VPC and the Neptune-Export service VPC.
   + **`Subnet1`**   –   The Neptune-Export service must be installed in a subnet in your VPC that allows outbound IPv4 HTTPS traffic from the subnet to the internet. This is so that the Neptune-Export service can call the [AWS Batch API](https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/) to create and run an export job.

     If you created your Neptune cluster using the CloudFormation template on the [Create Neptune cluster](get-started-create-cluster.md) page in the Neptune documentation, you can use the `PrivateSubnet1` and `PrivateSubnet2` outputs from that stack to populate this and the next parameter.
   + **`Subnet2`**   –   A second subnet in the VPC that allows outbound IPv4 HTTPS traffic from the subnet to the internet.
   + **`EnableIAM`**   –   Set this to `true` to secure the Neptune-Endpoint API using AWS Identity and Access Management (IAM). We recommend that you do so.

     If you do enable IAM authentication, you must `Sigv4` sign all HTTPS requests to the endpoint. You can use a tool such as [awscurl](https://github.com/okigan/awscurl) to sign requests on your behalf.
   + **`VPCOnly`**   –   Setting this to `true` makes the export endpoint VPC-only, so that you can only access it from within the VPC where the Neptune-Export service is installed. This restricts the Neptune-Export API to being used only from within that VPC.

     We recommend that you set `VPCOnly` to `true`.
   + **`NumOfFilesULimit `**   –   Specify a value between 10,000 and 1,000,000 for `nofile` in the `ulimits` container property. The default is 10,000, and we recommend keeping the default unless your graph contains a large number of unique labels.
   + **`PrivateDnsEnabled `** (Boolean)   –   Indicates whether to associate a private hosted zone with the specified VPC or not. The default value is `true`.

     When a VPC endpoint is created with this flag enabled, all API Gateway traffic is routed through the VPC endpoint, and public API Gateway endpoint calls becomes disabled. If you set `PrivateDnsEnabled` to `false`, the public API Gateway endpoint is enabled, but the Neptune export service cannot be connected through the private DNS endpoint. You can then use a public DNS endpoint for the VPC endpoint to call the export service, as detailed [here](https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-private-api-test-invoke-url.html#apigateway-private-api-public-dns).
   +  **`NeptuneExportVersion`**   –   Specify the version of the Neptune Export Utility to be used. All versions greater than or equal to `v1.1.11` are supported. A version of `v2.latest` may be used to automatically receive minor updates. The full list of available versions, as well as patch notes can be found in the open source [GitHub releases](https://github.com/aws/neptune-export/releases). 

1. Choose **Next**.

1. On the **Options** page, choose **Next**.

1. On the **Review** page, select the first check box to acknowledge that CloudFormation will create IAM resources. Select the second check box to acknowledge `CAPABILITY_AUTO_EXPAND` for the new stack. 
**Note**  
`CAPABILITY_AUTO_EXPAND` explicitly acknowledges that macros will be expanded when creating the stack, without prior review. Users often create a change set from a processed template so that the changes made by macros can be reviewed before actually creating the stack. For more information, see the CloudFormation [CreateStack](https://docs.aws.amazon.com/AWSCloudFormation/latest/APIReference/API_CreateStack.html) API.

   Then choose **Create**.

## Enable access to Neptune from Neptune-Export
<a name="export-service-access-to-neptune"></a>

After the Neptune-Export installation has completed, update your [Neptune VPC security group](get-started-vpc.md#security-vpc-security-group) to allow access from Neptune-Export. When the Neptune-Export CloudFormation stack has been created, the **Outputs** tab includes a `NeptuneExportSecurityGroup` ID. Update your Neptune VPC security group to allow access from this Neptune-Export security group.

## Enable access to the Neptune-Export endpoint from a VPC-based EC2 instance
<a name="export-service-access-to-service"></a>

If you make your Neptune-Export endpoint VPC-only, you can only access it from within the VPC in which the Neptune-Export service is installed. To allow connectivity from an Amazon EC2 instance in the VPC from which you can make Neptune-Export API calls, attach the `NeptuneExportSecurityGroup` created by the CloudFormation stack to that Amazon EC2 instance.

# Run a Neptune-Export job using the Neptune-Export API
<a name="export-service-run-export"></a>

The **Outputs** tab of the CloudFormation stack also includes the `NeptuneExportApiUri`. Use this URI whenever you send a request to the Neptune-Export endpoint.

**Run an export job**
+ Be sure that the user or role under which the export runs has been granted `execute-api:Invoke` permission.
+ If you set the `EnableIAM` parameter to `true` in the CloudFormation stack when you installed Neptune-Export, you need to `Sigv4` sign all requests to the Neptune-Export API. We recommend using [awscurl](https://github.com/okigan/awscurl) to make requests to the API. All the examples here assume that IAM auth is enabled.
+ If you set the `VPCOnly` parameter to `true` in the CloudFormation stack when you installed Neptune-Export, you must call the Neptune-Export API from within the VPC, typically from an Amazon EC2 instance located in the VPC.

To start exporting data, send a request to the `NeptuneExportApiUri` endpoint with `command` and `outputS3Path` request parameters and an `endpoint` export parameter.

The following is an example of a request that exports property-graph data from Neptune and publishes it to Amazon S3:

```
curl \
  (your NeptuneExportApiUri) \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "command": "export-pg",
        "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export",
        "params": { "endpoint": "(your Neptune endpoint DNS name)" }
      }'
```

Similarly, here is an example of a request that exports RDF data from Neptune to Amazon S3:

```
curl \
  (your NeptuneExportApiUri) \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "command": "export-rdf",
        "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export",
        "params": { "endpoint": "(your Neptune endpoint DNS name)" }
      }'
```

If you omit the `command` request parameter, by default Neptune-Export attempts to export property-graph data from Neptune.

If the previous command ran successfully, the output would look like this:

```
{
  "jobName": "neptune-export-abc12345-1589808577790",
  "jobId": "c86258f7-a9c9-4f8c-8f4c-bbfe76d51c8f"
}
```

## Monitor the export job you just started
<a name="export-service-monitor"></a>

To monitor a running job, append its jobID to your `NeptuneExportApiUri`, something like this:

```
curl \
  (your NeptuneExportApiUri)/(the job ID)
```

If the service had not yet started the export job, the response would look like this:

```
{
  "jobId": "c86258f7-a9c9-4f8c-8f4c-bbfe76d51c8f",
  "status": "pending"
}
```

When you repeat the command after the export job has started, the response would look something like this:

```
{
  "jobId": "c86258f7-a9c9-4f8c-8f4c-bbfe76d51c8f",
  "status": "running",
  "logs": "https://us-east-1.console.aws.amazon.com/cloudwatch/home?..."
}
```

If you open the logs in CloudWatch Logs using the URI provided by the status call, you can then monitor the progress of the export in detail:

![\[Screenshot of the CloudWatch Logs display.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/export-job-monitor.png)


## Cancel a running export job
<a name="export-service-cancel-job"></a>

**To cancel a running export job using the AWS Management Console**

1. Open the AWS Batch console at [https://console.aws.amazon.com/batch/](https://console.aws.amazon.com/batch/).

1. Choose **Jobs**.

1. Locate the running job that you want to cancel, based on its `jobID`.

1. Select **Cancel job**.

**To cancel a running export job using the Neptune export API**:

Send an `HTTP DELETE` request to the `NeptuneExportApiUri` with the `jobID` appended, like this:

```
curl -X DELETE \
  (your NeptuneExportApiUri)/(the job ID)
```

# Using the `neptune-export` command-line tool to export data from Neptune
<a name="export-utility"></a>

You can use the following steps to export data from your Neptune DB cluster to Amazon S3 using the `neptune-export` command-line utility:

## Prerequisites for using the `neptune-export` command-line utility
<a name="export-utility-setup"></a>

**Before you start**
+ **Have version 8 of the JDK**   –   You need version 8 of the [Java SE Development Kit (JDK)](https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html) installed.
+ **Download the neptune-export utility**   –   Download and install the [neptune-export.jar](https://s3.amazonaws.com/aws-neptune-customer-samples/neptune-export/bin/neptune-export.jar) file.
+ **Make sure `neptune-export` has access to your Neptune VPC**   –   Run neptune-export from a location where it can access the VPC where your Neptune DB cluster is located.

  For example, you can run it on an Amazon EC2 instance within the Neptune VPC, or in a separate VPC that is peered with the Neptune VPC, or on a separate bastion host.
+ **Make sure the VPC security groups grant access to `neptune-export`**   –   Check that the VPC security group(s) attached to the Neptune VPC allow access to your DB cluster from the IP address or security group associated with the `neptune-export` environment.
+ **Set up the necessary IAM permissions**   –   If your database has AWS Identity and Access Management (IAM) database authentication enabled, make sure that the role under which `neptune-export` runs is associated with an IAM policy that allows connections to Neptune. For information about Neptune policies, see [Using IAM policies](security-iam-access-manage.md).

  If you want to use the `clusterId` export parameter in your query requests, the role under which `neptune-export` runs requires the following IAM permissions:
  + `rds:DescribeDBClusters`
  + `rds:DescribeDBInstances`
  + `rds:ListTagsForResource`

  If you want to export from a cloned cluster, the role under which `neptune-export` runs requires the following IAM permissions:
  + `rds:AddTagsToResource`
  + `rds:DescribeDBClusters`
  + `rds:DescribeDBInstances`
  + `rds:ListTagsForResource`
  + `rds:DescribeDBClusterParameters`
  + `rds:DescribeDBParameters`
  + `rds:ModifyDBParameterGroup`
  + `rds:ModifyDBClusterParameterGroup`
  + `rds:RestoreDBClusterToPointInTime`
  + `rds:DeleteDBInstance`
  + `rds:DeleteDBClusterParameterGroup`
  + `rds:DeleteDBParameterGroup`
  + `rds:DeleteDBCluster`
  + `rds:CreateDBInstance`
  + `rds:CreateDBClusterParameterGroup`
  + `rds:CreateDBParameterGroup`

  To publish the exported data to Amazon S3, the role under which `neptune-export` runs requires the following IAM permissions for the Amazon S3 location(s):
  + `s3:PutObject`
  + `s3:PutObjectTagging`
  + `s3:GetObject`
+ **Set the `SERVICE_REGION` environment variable**   –   Set the `SERVICE_REGION` environment variable to identify the Region where your DB cluster is located (see [Connecting to Neptune](iam-auth-connecting-gremlin-java.md) for a list of Region identifiers).

## Running the `neptune-export` utility to initiate an export operation
<a name="export-utility-running"></a>

Use the following command to run neptune-export from the command line and start an export operation:

```
java -jar neptune-export.jar nesvc \
  --root-path (path to a local directory) \
  --json (the JSON file that defines the export)
```

The command has two parameters:

**Parameters for neptune-export when starting an export**
+ **`--root-path`**   –   Path to a local directory where export files are written before being published to Amazon S3.
+ **`--json`**   –   A JSON object that defines the export.

## Example commands using the `neptune-export` command line utility
<a name="export-utility-examples"></a>

To export property-graph data directly from your source DB cluster:

```
java -jar neptune-export.jar nesvc \
  --root-path /home/ec2-user/neptune-export \
  --json '{
            "command": "export-pg",
            "outputS3Path" : "s3://(your Amazon S3 bucket)/neptune-export",
            "params": {
              "endpoint" : "(your neptune DB cluster endpoint)"
            }
          }'
```

To export RDF data directly from your source DB cluster:

```
java -jar neptune-export.jar nesvc \
  --root-path /home/ec2-user/neptune-export \
  --json '{
            "command": "export-rdf",
            "outputS3Path" : "s3://(your Amazon S3 bucket)/neptune-export",
            "params": {
              "endpoint" : "(your neptune DB cluster endpoint)"
            }
          }'
```

If you omit the `command` request parameter, the `neptune-export` utility exports property-graph data from Neptune by default.

To export from a clone of your DB cluster:

```
java -jar neptune-export.jar nesvc \
  --root-path /home/ec2-user/neptune-export \
  --json '{
            "command": "export-pg",
            "outputS3Path" : "s3://(your Amazon S3 bucket)/neptune-export",
            "params": {
              "endpoint" : "(your neptune DB cluster endpoint)",
              "cloneCluster" : true
            }
          }'
```

To export from your DB cluster using IAM authentication:

```
java -jar neptune-export.jar nesvc \
  --root-path /home/ec2-user/neptune-export \
  --json '{
            "command": "export-pg",
            "outputS3Path" : "s3://(your Amazon S3 bucket)/neptune-export",
            "params": {
              "endpoint" : "(your neptune DB cluster endpoint)"
              "useIamAuth" : true
            }
          }'
```

# Files exported by Neptune-Export and `neptune-export`
<a name="exported-files"></a>

When an export is complete, the export files are published to the Amazon S3 location you have specified. All files published to Amazon S3 are encrypted using Amazon S3 server-side encryption (`SSE-S3`). The folders and files published to Amazon S3 vary depending on whether you are exporting property-graph or RDF data. If you open the Amazon Amazon S3 location where the files are published, you see the following content:

**Locations of exported files in Amazon S3**
+ **`nodes/`**   –   This folder contains node data files in either a comma-separated value (CSV) or a JSON format.

   In Neptune, nodes can have one or more labels. Nodes with different individual labels (or different combinations of multiple labels) are written to different files, meaning that no individual file contains data for nodes with different combinations of labels. If a node has multiple labels, these labels are sorted alphabetically before they are assigned to a file.
+ **`edges/`**   –   This folder contains edge data files in either a comma-separated value (CSV) or a JSON format.

  As with the nodes files, edge data is written to different files based on a combinations of their labels. For purposes of model-training, edge data is assigned to different files based on a combination of the edge's label plus the labels of the edge's start and end nodes.
+ **`statements/`**   –   This folder contains **RDF** data files in Turtle, N-Quads, N-Triples, or JSON format.
+ **`config.json`**   –   This file contains the *schema* of the graph as inferred by the export process.
+ **`lastEventId.json`**   –   This file contains the `commitNum` and `opNum` of the last event on the database's Neptune streams. The export process only includes this file if you set the `includeLastEventId` export parameter to `true`, and the database from which you are exporting data has [Neptune streams](streams-using.md) enabled.

# Parameters used to control the Neptune export process
<a name="export-parameters"></a>

Whether you are using the Neptune-Export service or the `neptune-export` command line utility, the parameters you use to control the export are mostly the same. They contain a JSON object passed to the Neptune-Export endpoint or to `neptune-export` on the command line.

The object passed in to the export process has up to five top-level fields:

```
-d '{
      "command" : "(either export-pg or export-rdf)",
      "outputS3Path" : "s3:/(your Amazon S3 bucket)/(path to the folder for exported data)",
      "jobSize" : "(for Neptune-Export service only)",
      "params" : { (a JSON object that contains export-process parameters) },
      "additionalParams": { (a JSON object that contains parameters for training configuration) }
    }'
```

**Contents**
+ [The `command` parameter](#export-parameters-command)
+ [The `outputS3Path` parameter](#export-parameters-outputS3Path)
+ [The `jobSize` parameter](#export-parameters-jobSize)
+ [The `params` object](#export-parameters-params)
+ [The `additionalParams` object](#export-parameters-additionalParams)
+ [Export parameter fields in the `params` top-level JSON object](export-params-fields.md)
  + [List of possible fields in the export parameters `params` object](export-params-fields.md#export-params-fields-list)
    + [List of fields common to all types of export](export-params-fields.md#export-params-common-fields-list)
    + [List of fields for property-graph exports](export-params-fields.md#export-params-property-graph-fields-list)
    + [List of fields for RDF exports](export-params-fields.md#export-params-RDF-fields-list)
  + [Fields common to all types of export](export-params-fields.md#export-params-common-fields)
    + [`cloneCluster` field in `params`](export-params-fields.md#export-params-cloneCluster)
    + [`cloneClusterInstanceType` field in `params`](export-params-fields.md#export-params-cloneClusterInstanceType)
    + [`cloneClusterReplicaCount` field in `params`](export-params-fields.md#export-params-cloneClusterReplicaCount)
    + [`cloneClusterEnableAuditLogs` field in `params`](export-params-fields.md#export-params-cloneClusterEnableAuditLogs)
    + [`clusterId` field in `params`](export-params-fields.md#export-params-clusterId)
    + [`endpoint` field in `params`](export-params-fields.md#export-params-endpoint)
    + [`endpoints` field in `params`](export-params-fields.md#export-params-endpoints)
    + [`profile` field in `params`](export-params-fields.md#export-params-profile)
    + [`useIamAuth` field in `params`](export-params-fields.md#export-params-useIamAuth)
    + [`includeLastEventId` field in `params`](export-params-fields.md#export-params-includeLastEventId)
  + [Fields for property-graph export](export-params-fields.md#export-params-property-graph-fields)
    + [`concurrency` field in `params`](export-params-fields.md#export-params-concurrency)
    + [`edgeLabels` field in `params`](export-params-fields.md#export-params-edgeLabels)
    + [`filter` field in `params`](export-params-fields.md#export-params-filter)
    + [`filterConfigFile` field in `params`](export-params-fields.md#export-params-filterConfigFile)
    + [`format` field used for property-graph data in `params`](export-params-fields.md#export-params-format-pg)
    + [`gremlinFilter` field in `params`](export-params-fields.md#export-params-gremlinFilter)
    + [`gremlinNodeFilter` field in `params`](export-params-fields.md#export-params-gremlinNodeFilter)
    + [`gremlinEdgeFilter` field in `params`](export-params-fields.md#export-params-gremlinEdgeFilter)
    + [`nodeLabels` field in `params`](export-params-fields.md#export-params-nodeLabels)
    + [`scope` field in `params`](export-params-fields.md#export-params-scope)
  + [Fields for RDF export](export-params-fields.md#export-params-rdf-fields)
    + [`format` field used for RDF data in `params`](export-params-fields.md#export-params-format-rdf)
    + [`rdfExportScope` field in `params`](export-params-fields.md#export-params-rdfExportScope)
    + [`sparql` field in `params`](export-params-fields.md#export-params-sparql)
    + [`namedGraph` field in `params`](export-params-fields.md#namedgraph-params-sparql)
+ [Examples of filtering what is exported](export-filtering-examples.md)
  + [Filtering the export of property-graph data](export-filtering-examples.md#export-property-graph-filtering-examples)
    + [Example of using `scope` to export only edges](export-filtering-examples.md#export-property-graph-filtering-scope-example)
    + [Example of using `nodeLabels` and `edgeLabels` to export only nodes and edges having specific labels](export-filtering-examples.md#export-property-graph-filtering-labels-example)
    + [Example of using `filter` to export only specified nodes, edges and properties](export-filtering-examples.md#export-property-graph-filtering-filter-example)
    + [Example that uses `gremlinFilter`](export-filtering-examples.md#export-property-graph-filtering-gremlinFilter-example)
    + [Example that uses `gremlinNodeFilter`](export-filtering-examples.md#export-property-graph-filtering-gremlinNodeFilter-example)
    + [Example that uses `gremlinEdgeFilter`](export-filtering-examples.md#export-property-graph-filtering-gremlinEdgeFilter-example)
    + [Combining `filter`, `gremlinNodeFilter`, `nodeLabels`, `edgeLabels` and `scope`](export-filtering-examples.md#export-property-graph-filtering-combo-example)
  + [Filtering the export of RDF data](export-filtering-examples.md#export-RDF-filtering-examples)
    + [Using `rdfExportScope` and `sparql` to export specific edges](export-filtering-examples.md#export-RDF-filtering-rdfExportScope-sparql-example)
    + [Using `namedGraph` to export a single named graph](export-filtering-examples.md#export-RDF-filtering-rdfExportScope-sparql-namedGraph-example)

## The `command` parameter
<a name="export-parameters-command"></a>

The `command` top-level parameter determines whether to export property-graph data or RDF data. If you omit the `command` parameter, the export process defaults to exporting property-graph data.
+ **`export-pg`**   –   Export property-graph data.
+ **`export-rdf`**   –   Export RDF data.

## The `outputS3Path` parameter
<a name="export-parameters-outputS3Path"></a>

The `outputS3Path` top-level parameter is required, and must contain the URI of an Amazon S3 location to which the exported files can be published:

```
  "outputS3Path" : "s3://(your Amazon S3 bucket)/(path to output folder)"
```

The value must begin with `s3://`, followed by a valid bucket name and optionally a folder path within the bucket.

## The `jobSize` parameter
<a name="export-parameters-jobSize"></a>

The `jobSize` top-level parameter is only used with the the Neptune-Export service, not with the `neptune-export` command line utility, and is optional. It lets you characterize the size of the export job you are starting, which helps determine the amount of compute resources devoted to the job and its maximum concurrency level.

```
  "jobSize" : "(one of four size descriptors)"
```

The four valid size descriptors are:
+ `small`   –   Maximum concurrency: 8. Suitable for storage volumes up to 10 GB.
+ `medium`   –   Maximum concurrency: 32. Suitable for storage volumes up to 100 GB.
+ `large`   –   Maximum concurrency: 64. Suitable for storage volumes over 100 GB but less than 1 TB.
+ `xlarge`   –   Maximum concurrency: 96. Suitable for storage volumes over 1 TB.

By default, an export initiated on the Neptune-Export service runs as a `small` job.

The performance of an export depends not only on the `jobSize` setting, but also on the number of database instances that you're exporting from, the size of each instance, and the effective concurrency level of the job.

For property-graph exports, you can configure the number of database instances using the [cloneClusterReplicaCount](export-params-fields.md#export-params-cloneClusterReplicaCount) parameter, and you can configure the job's effective concurrency level using the [concurrency](export-params-fields.md#export-params-concurrency) parameter.

## The `params` object
<a name="export-parameters-params"></a>

The `params` top-level parameter is a JSON object that contains parameters that you use to control the export process itself, as explained in [Export parameter fields in the `params` top-level JSON object](export-params-fields.md). Some of the fields in the `params` object are specific to property-graph exports, some to RDF.

## The `additionalParams` object
<a name="export-parameters-additionalParams"></a>

The `additionalParams` top-level parameter is a JSON object that contains parameters you can use to control actions that are applied to the data after it has been exported. At present, `additionalParams` is used only for exporting training data for [Neptune ML](machine-learning-additionalParams.md).

# Export parameter fields in the `params` top-level JSON object
<a name="export-params-fields"></a>

The Neptune export `params` JSON object allows you to control the export, including the type and format of the exported data.

## List of possible fields in the export parameters `params` object
<a name="export-params-fields-list"></a>

Listed below are all the possible top-level fields that can appear in a `params` object. Only a subset of these fields appear in any one object.

### List of fields common to all types of export
<a name="export-params-common-fields-list"></a>
+ [`cloneCluster`](#export-params-cloneCluster)
+ [`cloneClusterInstanceType`](#export-params-cloneClusterInstanceType)
+ [`cloneClusterReplicaCount`](#export-params-cloneClusterReplicaCount)
+ [`cloneClusterEnableAuditLogs`](#export-params-cloneClusterEnableAuditLogs)
+ [`clusterId`](#export-params-clusterId)
+ [`endpoint`](#export-params-endpoint)
+ [`endpoints`](#export-params-endpoints)
+ [`profile`](#export-params-profile)
+ [`useIamAuth`](#export-params-useIamAuth)
+ [`includeLastEventId`](#export-params-includeLastEventId)

### List of fields for property-graph exports
<a name="export-params-property-graph-fields-list"></a>
+ [`concurrency`](#export-params-concurrency)
+ [`edgeLabels`](#export-params-edgeLabels)
+ [`filter`](#export-params-filter)
+ [`filterConfigFile`](#export-params-filterConfigFile)
+ [`gremlinFilter`](#export-params-gremlinFilter)
+ [`gremlinNodeFilter`](#export-params-gremlinFilter)
+ [`gremlinEdgeFilter`](#export-params-gremlinFilter)
+ [`format`](#export-params-format-pg)
+ [`nodeLabels`](#export-params-nodeLabels)
+ [`scope`](#export-params-scope)

### List of fields for RDF exports
<a name="export-params-RDF-fields-list"></a>
+ [`format`](#export-params-format-rdf)
+ [`rdfExportScope`](#export-params-rdfExportScope)
+ [`sparql`](#export-params-sparql)
+ [`namedGraph`](#namedgraph-params-sparql)

## Fields common to all types of export
<a name="export-params-common-fields"></a>

### `cloneCluster` field in `params`
<a name="export-params-cloneCluster"></a>

*(Optional)*. Default: `false`.

If the `cloneCluster` parameter is set to `true`, the export process uses a fast clone of your DB cluster:

```
  "cloneCluster" : true
```

By default, the export process exports data from the DB cluster that you specify using the `endpoint`, `endpoints` or `clusterId` parameters. However, if your DB cluster is in use while the export is going on, and data is changing, the export process cannot guarantee the consistency of the data being exported.

To ensure that the exported data is consistent, use the `cloneCluster` parameter to export from a static clone of your DB cluster instead.

The cloned DB cluster is created in the same VPC as the source DB cluster and inherits the security group, subnet group and IAM database authentication settings of the source. When the export is complete, Neptune deletes the cloned DB cluster.

By default, a cloned DB cluster consists of a single instance of the same instance type as the primary instance in the source DB cluster. You can change the instance type used for the cloned DB cluster by specifying a different one using `cloneClusterInstanceType`.

**Note**  
If you don't use the `cloneCluster` option, and are exporting directly from your main DB cluster, you might need to increase the timeout on the instances from which data is being exported. For large data sets, the timeout should be set to several hours.

### `cloneClusterInstanceType` field in `params`
<a name="export-params-cloneClusterInstanceType"></a>

*(Optional)*.

If the `cloneCluster` parameter is present and set to `true`, you can use the `cloneClusterInstanceType` parameter to specify the instance type used for the cloned DB cluster:

By default, a cloned DB cluster consists of a single instance of the same instance type as the primary instance in the source DB cluster.

```
  "cloneClusterInstanceType" : "(for example, r5.12xlarge)"
```

### `cloneClusterReplicaCount` field in `params`
<a name="export-params-cloneClusterReplicaCount"></a>

*(Optional)*.

If the `cloneCluster` parameter is present and set to `true`, you can use the `cloneClusterReplicaCount` parameter to specify the number of read-replica instances created in the cloned DB cluster:

```
  "cloneClusterReplicaCount" : (for example, 3)
```

By default, a cloned DB cluster consists of a single primary instance. The `cloneClusterReplicaCount` parameter lets you specify how many additional read-replica instances should be created.

### `cloneClusterEnableAuditLogs` field in `params`
<a name="export-params-cloneClusterEnableAuditLogs"></a>

*(Optional)*. Default: false.

If the `cloneCluster` parameter is present and set to true, you can use the `cloneClusterEnableAuditLogs` parameter to enable or disable audit logs in the cloned cluster.

By default, audit logging is disabled.

```
"cloneClusterEnableAuditLogs" : true
```

### `clusterId` field in `params`
<a name="export-params-clusterId"></a>

*(Optional)*.

The `clusterId` parameter specifies the ID of a DB cluster to use:

```
  "clusterId" : "(the ID of your DB cluster)"
```

If you use the `clusterId` parameter, the export process uses all available instances in that DB cluster to extract data.

**Note**  
The `endpoint`, `endpoints`, and `clusterId` parameters are mutually exclusive. Use one and only one of them.

### `endpoint` field in `params`
<a name="export-params-endpoint"></a>

*(Optional)*.

Use `endpoint` to specify an endpoint of a Neptune instance in your DB cluster that the export process can query to extract data (see [Endpoint Connections](feature-overview-endpoints.md)). This is the DNS name only, and does not include the protocol or port:

```
  "endpoint" : "(a DNS endpoint of your DB cluster)"
```

Use a cluster or instance endpoint, but not the main reader endpoint.

**Note**  
The `endpoint`, `endpoints`, and `clusterId` parameters are mutually exclusive. Use one and only one of them.

### `endpoints` field in `params`
<a name="export-params-endpoints"></a>

*(Optional)*.

Use `endpoints` to specify a JSON array of endpoints in your DB cluster that the export process can query to extract data (see [Endpoint Connections](feature-overview-endpoints.md)). These are DNS names only, and do not include the protocol or port:

```
  "endpoints": [
    "(one endpoint in your DB cluster)",
    "(another endpoint in your DB cluster)",
    "(a third endpoint in your DB cluster)"
    ]
```

If you have multiple instances in your cluster (a primary and one or more read replicas), you can improve export performance by using the `endpoints` parameter to distribute queries across a list of those endpoints.

**Note**  
The `endpoint`, `endpoints`, and `clusterId` parameters are mutually exclusive. Use one and only one of them.

### `profile` field in `params`
<a name="export-params-profile"></a>

*(Required to export training data for Neptune ML, unless the `neptune_ml` field is present in the `additionalParams` field)*.

The `profile` parameter provides sets of pre-configured parameters for specific workloads. At present, the export process only supports the `neptune_ml` profile

If you are exporting training data for Neptune ML, add the following parameter to the `params` object:

```
  "profile" : "neptune_ml"
```

### `useIamAuth` field in `params`
<a name="export-params-useIamAuth"></a>

*(Optional)*. Default: `false`.

If the database from which you are exporting data has [IAM authentication enabled](iam-auth-enable.md), you must include the `useIamAuth` parameter set to `true`:

```
  "useIamAuth" : true
```

### `includeLastEventId` field in `params`
<a name="export-params-includeLastEventId"></a>

If you set `includeLastEventId` to true, and the database from which you are exporting data has [Neptune Streams](streams-using.md) enabled, the export process writes a `lastEventId.json` file to your specified export location. This file contains the `commitNum` and `opNum` of the last event in the stream.

```
  "includeLastEventId" : true
```

A cloned database created by the export process inherits the streams setting of its parent. If the parent has streams enabled, the clone will likewise have streams enabled. The contents of the stream on the clone will reflect the contents of the parent (including the same event IDs) at the point in time the clone was created.

## Fields for property-graph export
<a name="export-params-property-graph-fields"></a>

### `concurrency` field in `params`
<a name="export-params-concurrency"></a>

*(Optional)*. Default: `4`.

The `concurrency` parameter specifies the number of parallel queries that the export process should use:

```
  "concurrency" : (for example, 24)
```

A good guideline is to set the concurrency level to twice the number of vCPUs on all the instances from which you are exporting data. An r5.xlarge instance, for example, has 4 vCPUs. If you are exporting from a cluster of 3 r5.xlarge instances, you can set the concurrency level to 24 (= 3 x 2 x 4).

If you are using the Neptune-Export service, the concurrency level is limited by the [jobSize](export-parameters.md#export-parameters-jobSize) setting. A small job, for example, supports a concurrency level of 8. If you try to specify a concurrency level of 24 for a small job using the `concurrency` parameter, the effective level remains at 8.

If you export from a cloned cluster, the export process calculates an appropriate concurrency level based on the size of the cloned instances and the job size.

### `edgeLabels` field in `params`
<a name="export-params-edgeLabels"></a>

*(Optional)*.

Use `edgeLabels` to export only those edges that have labels that you specify:

```
  "edgeLabels" : ["(a label)", "(another label"]
```

Each label in the JSON array must be a single, simple label.

The `scope` parameter takes precedence over the `edgeLabels` parameter, so if the `scope` value does not include edges, the `edgeLabels` parameter has no effect.

### `filter` field in `params`
<a name="export-params-filter"></a>

*(Optional)*.

Use `filter` to specify that only nodes and/or edges with specific labels should be exported, and to filter the properties that are exported for each node or edge.

The general structure of a `filter` object, either inline or in a filter-configuration file, is as follows:

```
  "filter" : {
    "nodes": [ (array of node label and properties objects) ],
    "edges": [ (array of edge definition an properties objects) ]
  }
```
+ **`nodes`**   –   Contains a JSON array of nodes and node properties in the following form:

  ```
      "nodes : [
        {
          "label": "(node label)",
          "properties": [ "(a property name)", "(another property name)", ( ... ) ]
        }
      ]
  ```
  + `label`  –   The node's property-graph label or labels.

    Takes a single value or, if the node has multiple labels, an array of values.
  + `properties`  –   Contains an array of the names of the node's properties that you want to export.
+ **`edges`**   –   Contains a JSON array of edge definitions in the following form:

  ```
      "edges" : [
        {
          "label": "(edge label)",
          "properties": [ "(a property name)", "(another property name)", ( ... ) ]
        }
      ]
  ```
  + `label`   –   The edge's property graph label. Takes a single value.
  + `properties`  –   Contains an array of the names of the edge's properties that you want to export.

### `filterConfigFile` field in `params`
<a name="export-params-filterConfigFile"></a>

*(Optional)*.

Use `filterConfigFile` to specify a JSON file that contains a filter configuration in the same form that the `filter` parameter takes:

```
  "filterConfigFile" : "s3://(your Amazon S3 bucket)/neptune-export/(the name of the JSON file)"
```

See [filter](#export-params-filter) for the format of the `filterConfigFile` file.

### `format` field used for property-graph data in `params`
<a name="export-params-format-pg"></a>

*(Optional)*. *Default*: `csv` (comma-separated values)

The `format` parameter specifies the output format of the exported property graph data:

```
  "format" : (one of: csv, csvNoHeaders, json, neptuneStreamsJson)
```
+ **`csv`**   –   Comma-separated value (CSV) formatted output, with column headings formatted according to the [Gremlin load data format](bulk-load-tutorial-format-gremlin.md).
+ **`csvNoHeaders`**   –   CSV formatted data with no column headings.
+ **`json`**   –   JSON formatted data.
+ **`neptuneStreamsJson`**   –   JSON formatted data that uses the [GREMLIN\$1JSON change serialization format](streams-change-formats.md).

### `gremlinFilter` field in `params`
<a name="export-params-gremlinFilter"></a>

*(Optional)*.

The `gremlinFilter` parameter allows you to supply a Gremlin snippet, such as a `has()` step, that is used to filter both nodes and edges:

```
  "gremlinFilter" : (a Gremlin snippet)
```

Field names and string values should be surrounded by escaped double quotes. For dates and times, you can use the [datetime](best-practices-gremlin-datetime.md) method.

The following example exports only those nodes and edges with a date-created property whose value is greater than 2021-10-10:

```
  "gremlinFilter" : "has(\"created\", gt(datetime(\"2021-10-10\")))"
```

### `gremlinNodeFilter` field in `params`
<a name="export-params-gremlinNodeFilter"></a>

*(Optional)*.

The `gremlinNodeFilter` parameter allows you to supply a Gremlin snippet, such as a `has()` step, that is used to filter nodes:

```
  "gremlinNodeFilter" : (a Gremlin snippet)
```

Field names and string values should be surrounded by escaped double quotes. For dates and times, you can use the [datetime](best-practices-gremlin-datetime.md) method.

The following example exports only those nodes with a `deleted` Boolean property whose value is `true`:

```
  "gremlinNodeFilter" : "has(\"deleted\", true)"
```

### `gremlinEdgeFilter` field in `params`
<a name="export-params-gremlinEdgeFilter"></a>

*(Optional)*.

The `gremlinEdgeFilter` parameter allows you to supply a Gremlin snippet, such as a `has()` step, that is used to filter edges:

```
  "gremlinEdgeFilter" : (a Gremlin snippet)
```

Field names and string values should be surrounded by escaped double quotes. For dates and times, you can use the [datetime](best-practices-gremlin-datetime.md) method.

The following example exports only those edges with a `strength` numerical property whose value is 5:

```
  "gremlinEdgeFilter" : "has(\"strength\", 5)"
```

### `nodeLabels` field in `params`
<a name="export-params-nodeLabels"></a>

*(Optional)*.

Use `nodeLabels` to export only those nodes that have labels you specify:

```
  "nodeLabels" : ["(a label)", "(another label"]
```

Each label in the JSON array must be a single, simple label.

The `scope` parameter takes precedence over the `nodeLabels` parameter, so if the `scope` value does not include nodes, the `nodeLabels` parameter has no effect.

### `scope` field in `params`
<a name="export-params-scope"></a>

*(Optional)*. Default: `all`.

The `scope` parameter specifies whether to export only nodes, or only edges, or both nodes and edges:

```
  "scope" : (one of: nodes, edges, or all)
```
+ `nodes`   –   Export nodes and their properties only.
+ `edges`   –   Export edges and their properties only.
+ `all`   –   Export both nodes and edges and their properties (the default).

## Fields for RDF export
<a name="export-params-rdf-fields"></a>

### `format` field used for RDF data in `params`
<a name="export-params-format-rdf"></a>

*(Optional)*. *Default*: `turtle`

The `format` parameter specifies the output format of the exported RDF data:

```
  "format" : (one of: turtle, nquads, ntriples, neptuneStreamsJson)
```
+ **`turtle`**   –   Turtle formatted output.
+ **`nquads`**   –   N-Quads formatted data with no column headings.
+ **`ntriples`**   –   N-Triples formatted data.
+ **`neptuneStreamsJson`**   –   JSON formatted data that uses the [SPARQL NQUADS change serialization format](streams-change-formats.md).

### `rdfExportScope` field in `params`
<a name="export-params-rdfExportScope"></a>

*(Optional)*. Default: `graph`.

The `rdfExportScope` parameter specifies the scope of the RDF export:

```
  "rdfExportScope" : (one of: graph, edges, or query)
```
+ `graph`   –   Export all RDF data.
+ `edges`   –   Export only those triples that represent edges.
+ `query`   –   Export data retrieved by a SPARQL query that issupplied using the `sparql` field.

### `sparql` field in `params`
<a name="export-params-sparql"></a>

*(Optional)*.

The `sparql` parameter allows you to specify a SPARQL query to retrieve the data to export:

```
  "sparql" : (a SPARQL query)
```

If you supply a query using the `sparql` field, you must also set the `rdfExportScope` field to `query`.

### `namedGraph` field in `params`
<a name="namedgraph-params-sparql"></a>

*(Optional)*.

The `namedGraph` parameter allows you to specify an IRI to limit the export to a single named graph:

```
  "namedGraph" : (Named graph IRI)
```

The `namedGraph` parameter can only be used with the `rdfExportScope` field set to `graph`.

# Examples of filtering what is exported
<a name="export-filtering-examples"></a>

Here are examples that illustrate ways to filter the data that is exported.

## Filtering the export of property-graph data
<a name="export-property-graph-filtering-examples"></a>

### Example of using `scope` to export only edges
<a name="export-property-graph-filtering-scope-example"></a>

```
{
  "command": "export-pg",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "scope": "edges"
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

### Example of using `nodeLabels` and `edgeLabels` to export only nodes and edges having specific labels
<a name="export-property-graph-filtering-labels-example"></a>

The `nodeLabels` parameter in the following example specifies that only nodes having a `Person` label or a `Post` label should be exported. The `edgeLabels` parameter specifies that only edges with a `likes` label should be exported:

```
{
  "command": "export-pg",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "nodeLabels": ["Person", "Post"],
    "edgeLabels": ["likes"]
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

### Example of using `filter` to export only specified nodes, edges and properties
<a name="export-property-graph-filtering-filter-example"></a>

The `filter` object in this example exports `country` nodes with their `type`, `code` and `desc` properties, and also `route` edges with their `dist` property.

```
{
  "command": "export-pg",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "filter": {
      "nodes": [
        {
          "label": "country",
          "properties": [
            "type",
            "code",
            "desc"
          ]
        }
      ],
      "edges": [
        {
          "label": "route",
          "properties": [
            "dist"
          ]
        }
      ]
    }
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

### Example that uses `gremlinFilter`
<a name="export-property-graph-filtering-gremlinFilter-example"></a>

This example uses `gremlinFilter` to export only those nodes and edges created after 2021-10-10 (that is, with a `created` property whose value is greater than 2021-10-10):

```
{
  "command": "export-pg",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "gremlinFilter" : "has(\"created\", gt(datetime(\"2021-10-10\")))"
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

### Example that uses `gremlinNodeFilter`
<a name="export-property-graph-filtering-gremlinNodeFilter-example"></a>

This example uses `gremlinNodeFilter` to export only deleted nodes (nodes with a Boolean `deleted` property whose value is `true`):

```
{
  "command": "export-pg",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "gremlinNodeFilter" : "has(\"deleted\", true)"
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

### Example that uses `gremlinEdgeFilter`
<a name="export-property-graph-filtering-gremlinEdgeFilter-example"></a>

This example uses `gremlinEdgeFilter `to export only edges with a `strength` numerical property whose value is 5:

```
{
  "command": "export-pg",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "gremlinEdgeFilter" : "has(\"strength\", 5)"
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

### Combining `filter`, `gremlinNodeFilter`, `nodeLabels`, `edgeLabels` and `scope`
<a name="export-property-graph-filtering-combo-example"></a>

The `filter` object in this example exports:
+ `country` nodes with their `type`, `code` and `desc` properties
+ `airport` nodes with their `code`, `icao` and `runways` properties
+ `route` edges with their `dist` property

The `gremlinNodeFilter` parameter filters the nodes so that only nodes with a `code` property whose value begins with A are exported.

The `nodeLabels` and `edgeLabels` parameters further restrict the output so that only `airport` nodes and `route` edges are exported.

Finally, the `scope` parameter eliminates edges from the export, which leaves only the designated `airport` nodes in the output.

```
{
  "command": "export-pg",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "filter": {
      "nodes": [
        {
          "label": "airport",
          "properties": [
            "code",
            "icao",
            "runways"
          ]
        },
        {
          "label": "country",
          "properties": [
            "type",
            "code",
            "desc"
          ]
        }
      ],
      "edges": [
        {
          "label": "route",
          "properties": [
            "dist"
          ]
        }
      ]
    },
    "gremlinNodeFilter": "has(\"code\", startingWith(\"A\"))",
    "nodeLabels": [
      "airport"
    ],
    "edgeLabels": [
      "route"
    ],
    "scope": "nodes"
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

## Filtering the export of RDF data
<a name="export-RDF-filtering-examples"></a>

### Using `rdfExportScope` and `sparql` to export specific edges
<a name="export-RDF-filtering-rdfExportScope-sparql-example"></a>

This example exports triples whose predicate is <http://kelvinlawrence.net/air-routes/objectProperty/route> and whose object is not a literal:

```
{
  "command": "export-rdf",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "rdfExportScope": "query",
    "sparql": "CONSTRUCT { ?s <http://kelvinlawrence.net/air-routes/objectProperty/route> ?o } WHERE { ?s ?p ?o . FILTER(!isLiteral(?o)) }"
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

### Using `namedGraph` to export a single named graph
<a name="export-RDF-filtering-rdfExportScope-sparql-namedGraph-example"></a>

This example exports triples belonging to the named graph <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph>:

```
{
  "command": "export-rdf",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "rdfExportScope": "graph",
    "namedGraph": "http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph"
  },
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export"
}
```

# Troubleshooting the Neptune export process
<a name="export-troubleshooting"></a>

The Amazon Neptune export process uses [AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/) to provision the compute and storage resources necessary to export your Neptune data. When an export is running, you can use the link in the `logs` field to access the CloudWatch logs for the export job.

However, the CloudWatch logs for the AWS Batch job that performs the export are only available when the AWS Batch job is running. If Neptune export reports that an export is in a pending state, there won’t be a logs link through which you can access CloudWatch logs. If an export job remains in the `pending` state for more than a few minutes, there may be a problem provisioning the underlying AWS Batch resources.

When the export job leaves the pending state, you can check its status as follows:

**To check the status of a AWS Batch job**

1. Open the AWS Batch console at [https://console.aws.amazon.com/batch/](https://console.aws.amazon.com/batch/).

1. Select the neptune-export job queue.

1. Look for the job whose name matches the `jobName` returned by Neptune export when you started the export.

![\[Screenshot of the AWS Batch console when checking for status\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/batch-console-checking-export.png)


If the job remains stuck in a `RUNNABLE` state, it may be because networking or security issues are preventing the container instance from joining the underlying Amazon Elastic Container Service (Amazon ECS) cluster. See the section about verifying network and security settings of the compute environment in [this support article](https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/).

Another thing you can check is for problems with auto-scaling:

**To check the Amazon EC2 auto-scaling group for the AWS Batch compute environment**

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. Select the **Auto Scaling** group for the neptune-export compute environment.

1. Open the **Activity** tab and check the activity history for unsuccessful events.

![\[Screenshot of the Amazon EC2 console when checking for Auto Scaling problems\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/ec2-console-checking-auto-scaling.png)


## Neptune Export common errors
<a name="export-troubleshooting-errors"></a>

### `org.eclipse.rdf4j.query.QueryEvaluationException: Tag mismatch!`
<a name="export-troubleshooting-errors-tag-mismatch"></a>

If an `export-rdf` job is regularly failing with a `Tag mismatch!` `QueryEvaluationException`, the Neptune instance is undersized for the large, long-running queries used by Neptune Export.

You can avoid getting this error by scaling up to a larger Neptune instance or by configuring the job to export from a large cloned cluster, like this:

```
'{
  "command": "export-rdf",
  "outputS3Path": "s3://(your Amazon S3 bucket)/neptune-export",
  "params": {
    "endpoint": "(your Neptune endpoint DNS name)",
    "cloneCluster": True,
    "cloneClusterInstanceType" : "r5.24xlarge"
  }
}'
```

# Exporting Gremlin query results to Amazon S3
<a name="exporting-gremlin"></a>

 Starting in engine release 1.4.3.0, Amazon Neptune supports exporting Gremlin query results directly to Amazon S3. This feature allows you to handle large query results efficiently by exporting them to an Amazon S3 bucket instead of returning them as a query response. 

 To export query results to Amazon S3, use the `call()` step with the `neptune.query.exportToS3` service name as the final step in your Gremlin query. Terminal step in [Tinkerpop drivers using Bytecode](https://tinkerpop.apache.org/docs/current/reference/#terminal-steps) can be added after the `call()` step. The export parameters must be provided as string values. 

**Note**  
 The query with the `call()` step having `neptune.query.exportToS3` will fail if not used as the final step. The Gremlin clients using bytecode can use terminal steps. See [ Gremlin best practices](https://docs.aws.amazon.com//neptune/latest/userguide/best-practices-gremlin-java-bytecode.html) in the Amazon Neptune documentation for more information. 

```
g.V()
  ...
  .call('neptune.query.exportToS3', [
    'destination': 's3://your-bucket/path/result.json',
    'format': 'GraphSONv3',
    'kmskeyArn': 'optional-kms-key-arn'
  ])
```

**Parameters**
+  `destination`: required - The Amazon S3 URI where results will be written. 
+  `format`: required - The output format, currently only supports '[GraphSONv3](https://tinkerpop.apache.org/docs/3.7.3/dev/io/#graphson-3d0)'. 
+  `keyArn`: optional - The ARN of a AWS KMS key for Amazon S3 [ server-side encryption](https://docs.aws.amazon.com//AmazonS3/latest/userguide/serv-side-encryption.html). 

## Examples
<a name="exporting-gremlin-examples"></a>

 **Example query** 

```
g.V().
    hasLabel('Comment').
    valueMap().
    call('neptune.query.exportToS3', [
    'destination': 's3://your-bucket/path/result.json',
    'format': 'GraphSONv3',
    'keyArn': 'optional-kms-key-arn'
  ])
```

 **Example query response** 

```
{
    "destination":"s3://your-bucket/path/result.json,
    "exportedResults": 100,
    "exportedBytes": 102400
}
```

## Prerequisites
<a name="exporting-gremlin-prerequisites"></a>
+  Your Neptune DB instance must have access to Amazon S3 through a VPC endpoint of type gateway. 
+  To use custom AWS KMS encryption in the query, an Interface-type VPC endpoint for AWS KMS is required to allow Neptune to communicate with AWS KMS. 
+  You must enable IAM auth on Neptune, and have appropriate IAM permissions to write to the target Amazon S3 bucket. Not having this will cause a 400 bad request error "Cluster must have IAM authentication enabled for S3 Export". 
+  The target Amazon S3 bucket: 
  +  The target Amazon S3 bucket must not be public. `Block public access` must be enabled. 
  +  The target Amazon S3 destination must be empty. 
  +  The target Amazon S3 bucket must have a lifecycle rule on `Delete expired object delete markers or incomplete multipart uploads` with `Delete incomplete multipart uploads`. See [ Amazon S3 lifecycle management update - support for multipart uploads and delete markers](https://aws.amazon.com/blogs/aws/s3-lifecycle-management-update-support-for-multipart-uploads-and-delete-markers/) for more information.   
![\[An image showing the lifecycle rule actions.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/lifecycleRuleActions.png)
  +  The target Amazon S3 bucket must have the a lifecycle rule on `Delete expired object delete markers or incomplete multipart uploads` with `Delete incomplete multipart uploads` set to a value higher than query evaluation will take (e.g., 7 days). This is required for deleting incomplete uploads (which are not directly visible but would incur costs) in case they cannot be completed or aborted by Neptune (e.g., due to instance/engine failures). See [ Amazon S3 lifecycle management update - support for multipart uploads and delete markers](https://aws.amazon.com/blogs/aws/s3-lifecycle-management-update-support-for-multipart-uploads-and-delete-markers/) for more information.   
![\[An image showing the lifecycle rule actions, and the delete expired object delete markers.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/lifecycleRuleActionsDelete.png)

**Important considerations**
+  The export step must be the last step in your Gremlin query. 
+  If an object already exists at the specified Amazon S3 location, the query will fail. 
+  Maximum query execution time for export queries is limited to 11 hours and 50 minutes. This feature uses [Forward access sessions](https://docs.aws.amazon.com//IAM/latest/UserGuide/access_forward_access_sessions.html). It is currently limited to 11 hours and 50 minutes to avoid token expiry issues. 
**Note**  
 The export query still honors the query timeout. For large exports, you should use an appropriate query timeout. 
+  All new object uploads to Amazon S3 are automatically encrypted. 
+  To avoid storage costs from incomplete multipart uploads in the event of errors or crashes, we recommend setting up a lifecycle rule with `Delete incomplete multipart uploads` on your Amazon S3 bucket. 

## Response format
<a name="exporting-gremlin-response"></a>

 Rather than returning the query results directly, the query returns metadata about the export operation, including status and export details. The query results in Amazon S3 will be in [GraphSONv3](https://tinkerpop.apache.org/docs/3.7.3/dev/io/#graphson-3d0) format. 

```
{
  "data": {
    "@type": "g:List",
    "@value": [
      {
        "@type": "g:Map",
        "@value": [
          "browserUsed",
          {
            "@type": "g:List",
            "@value": [
              "Safari"
            ]
          },
          "length",
          {
            "@type": "g:List",
            "@value": [
              {
                "@type": "g:Int32",
                "@value": 7
              }
            ]
          },
          "locationIP",
          {
            "@type": "g:List",
            "@value": [
              "192.0.2.0/24"
            ]
          },
          "creationDate",
          {
            "@type": "g:List",
            "@value": [
              {
                "@type": "g:Date",
                "@value": 1348341961000
              }
            ]
          },
          "content",
          {
            "@type": "g:List",
            "@value": [
              "no way!"
            ]
          }
        ]
      },
      {
        "@type": "g:Map",
        "@value": [
          "browserUsed",
          {
            "@type": "g:List",
            "@value": [
              "Firefox"
            ]
          },
          "length",
          {
            "@type": "g:List",
            "@value": [
              {
                "@type": "g:Int32",
                "@value": 2
              }
            ]
          },
          "locationIP",
          {
            "@type": "g:List",
            "@value": [
              "203.0.113.0/24"
            ]
          },
          "creationDate",
          {
            "@type": "g:List",
            "@value": [
              {
                "@type": "g:Date",
                "@value": 1348352960000
              }
            ]
          },
          "content",
          {
            "@type": "g:List",
            "@value": [
              "ok"
            ]
          }
        ]
      },
      
      
      ...
      
      
    ]
  }
}
```

**Security**
+  All data transferred to Amazon S3 is encrypted in transit using SSL. 
+  You can specify a AWS KMS key for server-side encryption of the exported data. Amazon S3 encrypts new data by default. If the bucket is configured to use a specific AWS KMS key, then that key is used. 
+  Neptune verifies that the target bucket is not public before starting the export. 
+  Cross-account and cross-region exports are not supported. 

**Error handling**
+  The target Amazon S3 bucket is public. 
+  The specified object already exists. 
+  You don't have sufficient permissions to write to the Amazon S3 bucket. 
+  The query execution exceeds the maximum time limit. 

**Best practices**
+  Use Amazon S3 bucket lifecycle rules to clean up incomplete multipart uploads. 
+  Monitor your export operations using Neptune logs and metrics. You can check the [Gremlin status endpoint](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-api-status.html) to see if a query is currently running. As long as the client has not received a response, the query will be assumed to be running. 

# Granting access for Gremlin Amazon S3 export feature
<a name="granting-access-gremlin"></a>

 **Required IAM policies** 

1.  **Neptune query read access** 

   ```
   {
     "Sid": "NeptuneQueryRead",
     "Effect": "Allow",
     "Action": ["neptune-db:Read*"],
     "Resource": "arn:aws:neptune-db:us-east-1:123456789012:cluster-ABCD12/*"
   }
   ```

    **Why it's needed:** This permission allows reading data from Neptune databases, which is necessary to execute the Gremlin queries that will be exported. The previous example allows read queries. For a read/write query, write/delete permissions are required. 

1.  **Amazon S3 export permissions** 

   ```
   {
     "Sid": "NeptuneS3Export",
     "Effect": "Allow",
     "Action": [
       "s3:ListBucket",
       "s3:PutObject",
       "s3:AbortMultipartUpload",
       "s3:GetBucketPublicAccessBlock"
     ],
     "Resource": "arn:aws:s3:::neptune-export-bucket/*"
   }
   ```

    **Why each permission is needed:** 
   +  `s3:ListBucket`: Required to verify bucket existence and list contents. 
   +  `s3:PutObject`: Required to write the exported data to Amazon S3. 
   +  `s3:AbortMultipartUpload`: Required to clean up incomplete multipart uploads if the export fails. 
   +  `s3:GetBucketPublicAccessBlock`: Required as a security measure to verify that the bucket is not public before exporting data. 

1.  **AWS KMS permissios** - optional. Only required if using custom AWS KMS encryption. 

   ```
   {
     "Sid": "NeptuneS3ExportKMS",
     "Effect": "Allow",
     "Action": [
       "kms:Decrypt",
       "kms:GenerateDataKey",
       "kms:DescribeKey"
     ],
     "Resource": "arn:aws:kms:<REGION>:<AWS_ACCOUNT_ID>:key/mrk-48971c37"
       "Condition": {
       "StringEquals": {
         "kms:ViaService": [
           "s3.<REGION>.amazonaws.com",
           "rds.<REGION>.amazonaws.com"
         ]
       }
     }
   }
   ```

    **Why each permission is needed:** 
   +  `kms:Decrypt`: Required to decrypt the AWS KMS key for data encryption. 
   +  `kms:GenerateDataKey`: Required to generate data keys for encrypting the exported data. 
   +  `kms:DescribeKey`: Required to verify and retrieve information about the AWS KMS key. 
   +  `kms:ViaService`: Increases security by enforcing that the key is not usable by this role for any other AWS service. 

**Important prerequisites**
+  **IAM authentication:** Must be enabled on the Neptune cluster to enforce these permissions. 
+  **VPC endpoint:** 
  +  A Gateway-type VPC endpoint for Amazon S3 is required to allow Neptune to communicate with Amazon S3. 
  +  To use custom AWS KMS encryption in the query, an Interface-type VPC endpoint for AWS KMS is required to allow Neptune to communicate with AWS KMS. 
+  **Amazon S3 bucket configuration:** 
  +  Must not be public. 
  +  Should have a lifecycle rule to clean up incomplete multipart uploads. 
  +  Will automatically encrypt new objects. 

 These permissions and prerequisites ensure secure and reliable export of Gremlin query results while maintaining proper access controls and data protection measures.