

# Accessing a Neptune graph with Gremlin
<a name="access-graph-gremlin"></a>

Amazon Neptune is compatible with Apache TinkerPop and Gremlin. This means that you can connect to a Neptune DB instance and use the Gremlin traversal language to query the graph (see [The Graph](https://tinkerpop.apache.org/docs/current/reference/#graph) in the Apache TinkerPop documentation). For differences in the Neptune implementation of Gremlin, see [Gremlin standards compliance](access-graph-gremlin-differences.md).

 A *traversal* in Gremlin is a series of chained steps. It starts at a vertex (or edge). It walks the graph by following the outgoing edges of each vertex and then the outgoing edges of those vertices. Each step is an operation in the traversal. For more information, see [The Traversal](https://tinkerpop.apache.org/docs/current/reference/#traversal) in the TinkerPop documentation.

Different Neptune engine versions support different Gremlin versions. Check the [engine release page](engine-releases.md) of the Neptune version you are running to determine which Gremlin release it supports or consult the following table which lists the earliest and latest versions of TinkerPop supported by different Neptune engine versions:


| Neptune Engine Version | Minimum TinkerPop Version | Maximum TinkerPop Version | 
| --- | --- | --- | 
| `1.3.2.0 and newer` | `3.7.1` | `3.7.3` | 
| `1.3.1.0` | `3.6.2` | `3.6.5` | 
| `1.3.0.0` | `3.6.2` | `3.6.4` | 
| `1.2.1.0 <= 1.2.1.2` | `3.6.2` | `3.6.2` | 
| `1.1.1.0 <= 1.2.0.2` | `3.5.5` | `3.5.6` | 
| `1.1.0.0 and older` | `(deprecated)` | `(deprecated)` | 

TinkerPop clients are usually backwards compatible within a series (`3.6.x`, for example, or `3.7.x`) and while they can often work across those boundaries, the table above recommends the version combinations to use for the best possible experience and compatibility. Unless otherwise advised, it is generally best to adhere to these guidelines and upgrade client applications to match the version of TinkerPop you are using.

When upgrading TinkerPop versions it is always important to refer to [TinkerPop's upgrade documentation](http://tinkerpop.apache.org/docs/current/upgrade/) which will help you identify new features you can take advantage of, but also issues you may need to be aware of as you approach your upgrade. You should typically expect existing queries and features to work after upgrade unless something in particular is called out as an issue to consider. Finally, it is important to note that should a version you upgrade to have a new feature, you may not be able to use it if it is from a version later than what Neptune supports.

There are Gremlin language variants and support for Gremlin access in various programming languages. For more information, see [On Gremlin Language Variants](https://tinkerpop.apache.org/docs/current/reference/#gremlin-drivers-variants) in the TinkerPop documentation.

This documentation describes how to access Neptune with the following variants and programming languages:
+ [Set up the Gremlin console to connect to a Neptune DB instance](access-graph-gremlin-console.md)
+ [Using the HTTPS REST endpoint to connect to a Neptune DB instance](access-graph-gremlin-rest.md)
+ [Java-based Gremlin clients to use with Amazon Neptune](access-graph-gremlin-client.md)
+ [Using Python to connect to a Neptune DB instance](access-graph-gremlin-python.md)
+ [Using .NET to connect to a Neptune DB instance](access-graph-gremlin-dotnet.md)
+ [Using Node.js to connect to a Neptune DB instance](access-graph-gremlin-node-js.md)
+ [Using Go to connect to a Neptune DB instance](access-graph-gremlin-go.md)

As discussed in [Encrypting connections to your Amazon Neptune database with SSL/HTTPS](security-ssl.md), you must use Transport Layer Security/Secure Sockets Layer (TLS/SSL) when connecting to Neptune in all AWS Regions.

Before you begin, you must have the following:
+ A Neptune DB instance. For information about creating a Neptune DB instance, see [Creating an Amazon Neptune cluster](get-started-create-cluster.md).
+ An Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

For more information about loading data into Neptune, including prerequisites, loading formats, and load parameters, see [Loading data into Amazon Neptune](load-data.md).

**Topics**
+ [Set up the Gremlin console to connect to a Neptune DB instance](access-graph-gremlin-console.md)
+ [Using the HTTPS REST endpoint to connect to a Neptune DB instance](access-graph-gremlin-rest.md)
+ [Java-based Gremlin clients to use with Amazon Neptune](access-graph-gremlin-client.md)
+ [Using Python to connect to a Neptune DB instance](access-graph-gremlin-python.md)
+ [Using .NET to connect to a Neptune DB instance](access-graph-gremlin-dotnet.md)
+ [Using Node.js to connect to a Neptune DB instance](access-graph-gremlin-node-js.md)
+ [Using Go to connect to a Neptune DB instance](access-graph-gremlin-go.md)
+ [Using the AWS SDK to run Gremlin queries](access-graph-gremlin-sdk.md)
+ [Gremlin query hints](gremlin-query-hints.md)
+ [Gremlin query status API](gremlin-api-status.md)
+ [Gremlin query cancellation](gremlin-api-status-cancel.md)
+ [Support for Gremlin script-based sessions](access-graph-gremlin-sessions.md)
+ [Gremlin transactions in Neptune](access-graph-gremlin-transactions.md)
+ [Streaming query results with Gremlin](access-graph-gremlin-streaming.md)
+ [Using the Gremlin API with Amazon Neptune](gremlin-api-reference.md)
+ [Caching query results in Amazon Neptune Gremlin](gremlin-results-cache.md)
+ [Making efficient upserts with Gremlin `mergeV()` and `mergeE()` steps](gremlin-efficient-upserts.md)
+ [Making efficient Gremlin upserts with `fold()/coalesce()/unfold()`](gremlin-efficient-upserts-pre-3.6.md)
+ [Analyzing Neptune query execution using Gremlin `explain`](gremlin-explain.md)
+ [Using Gremlin with the Neptune DFE query engine](gremlin-with-dfe.md)

# Set up the Gremlin console to connect to a Neptune DB instance
<a name="access-graph-gremlin-console"></a>

The Gremlin Console allows you to experiment with TinkerPop graphs and queries in a REPL (read-eval-print loop) environment.

## Installing the Gremlin console and connecting to it in the usual way
<a name="access-graph-gremlin-console-usual-connect"></a>

You can use the Gremlin Console to connect to a remote graph database. The following section walks you through installing and configuring the Gremlin Console to connect remotely to a Neptune DB instance. You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

For help connecting to Neptune with SSL/TLS (which is required), see [SSL/TLS configuration](access-graph-gremlin-java.md#access-graph-gremlin-java-ssl).

**Note**  
If you have [IAM authentication enabled](iam-auth-enable.md) on your Neptune DB cluster, follow the instructions in [Connecting to Amazon Neptune databases using IAM authentication with Gremlin console](iam-auth-connecting-gremlin-console.md) to install the Gremlin console rather than the instructions here.

**To install the Gremlin Console and connect to Neptune**

1. The Gremlin Console binaries require Java 8 or Java 11. These instructions assume usage of Java 11. You can install Java 11 on your EC2 instance as follows:
   + If you're using [Amazon Linux 2 (AL2)](https://aws.amazon.com/amazon-linux-2):

     ```
     sudo amazon-linux-extras install java-openjdk11
     ```
   + If you're using [Amazon Linux 2023 (AL2023)](https://docs.aws.amazon.com/linux/al2023/ug/what-is-amazon-linux.html):

     ```
     sudo yum install java-11-amazon-corretto-devel
     ```
   + For other distributions, use whichever of the following is appropriate:

     ```
     sudo yum install java-11-openjdk-devel
     ```

     or:

     ```
     sudo apt-get install openjdk-11-jdk
     ```

1. Enter the following to set Java 11 as the default runtime on your EC2 instance.

   ```
   sudo /usr/sbin/alternatives --config java
   ```

   When prompted, enter the number for Java 11.

1. Download the appropriate version of the Gremlin Console from the Apache web site. You can check the [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) to determine which Gremlin version your version of Neptune supports. For example, if you need version 3.7.2, you can download the [Gremlin console](https://archive.apache.org/dist/tinkerpop/3.7.2/apache-tinkerpop-gremlin-console-3.7.2-bin.zip) from the [Apache Tinkerpop](https://tinkerpop.apache.org/download.html) website onto your EC2 instance like this:

   ```
   wget https://archive.apache.org/dist/tinkerpop/3.7.2/apache-tinkerpop-gremlin-console-3.7.2-bin.zip
   ```

1. Unzip the Gremlin Console zip file.

   ```
   unzip apache-tinkerpop-gremlin-console-3.7.2-bin.zip
   ```

1. Change directories into the unzipped directory.

   ```
   cd apache-tinkerpop-gremlin-console-3.7.2
   ```

1. In the `conf` subdirectory of the extracted directory, create a file named `neptune-remote.yaml` with the following text. Replace *your-neptune-endpoint* with the hostname or IP address of your Neptune DB instance. The square brackets (`[ ]`) are required.
**Note**  
For information about finding the hostname of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

   ```
   hosts: [your-neptune-endpoint]
   port: 8182
   connectionPool: { enableSsl: true }
   serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1,
                 config: { serializeResultToString: true }}
   ```
**Note**  
 Serializers were moved from the `gremlin-driver` module to the new `gremlin-util` module in version 3.7.0. The package changed from org.apache.tinkerpop.gremlin.driver.ser to org.apache.tinkerpop.gremlin.util.ser. 

1. In a terminal, navigate to the Gremlin Console directory (`apache-tinkerpop-gremlin-console-3.7.2`), and then enter the following command to run the Gremlin Console.

   ```
   bin/gremlin.sh
   ```

   You should see the following output:

   ```
            \,,,/
            (o o)
   -----oOOo-(3)-oOOo-----
   plugin activated: tinkerpop.server
   plugin activated: tinkerpop.utilities
   plugin activated: tinkerpop.tinkergraph
   gremlin>
   ```

   You are now at the `gremlin>` prompt. You will enter the remaining steps at this prompt.

1. At the `gremlin>` prompt, enter the following to connect to the Neptune DB instance.

   ```
   :remote connect tinkerpop.server conf/neptune-remote.yaml
   ```

1. At the `gremlin>` prompt, enter the following to switch to remote mode. This sends all Gremlin queries to the remote connection.

   ```
   :remote console
   ```

1. Enter the following to send a query to the Gremlin Graph.

   ```
   g.V().limit(1)
   ```

1. When you are finished, enter the following to exit the Gremlin Console.

   ```
   :exit
   ```

**Note**  
Use a semicolon (`;`) or a newline character (`\n`) to separate each statement.   
Each traversal preceding the final traversal must end in `next()` to be executed. Only the data from the final traversal is returned.

For more information on the Neptune implementation of Gremlin, see [Gremlin standards compliance in Amazon Neptune](access-graph-gremlin-differences.md).

# An alternate way to connect to the Gremlin console
<a name="access-graph-gremlin-console-connect"></a>

**Drawbacks of the normal connection approach**

The most common way to connect to the Gremlin console is the one explained above, using commands like this at the `gremlin>` prompt:

```
gremlin> :remote connect tinkerpop.server conf/(file name).yaml
gremlin> :remote console
```

This works well, and lets you send queries to Neptune. However, it takes the Groovy script engine out of the loop, so Neptune treats all queries as pure Gremlin. This means that the following query forms fail:

```
gremlin> 1 + 1
gremlin> x = g.V().count()
```

The closest you can get to using a variable when connected this way is to use the `result` variable maintained by the console and send the query using `:>`, like this:

```
gremlin> :remote console
==>All scripts will now be evaluated locally - type ':remote console' to return to remote mode for Gremlin Server - [krl-1-cluster.cluster-ro-cm9t6tfwbtsr.us-east-1.neptune.amazonaws.com/172.31.19.217:8182]
gremlin> :> g.V().count()
==>4249

gremlin> println(result)
[result{object=4249 class=java.lang.Long}]

gremlin> println(result['object'])
[4249]
```

 

**A different way to connect**

You can also connect to the Gremlin console in a different way, which you may find nicer, like this:

```
gremlin> g = traversal().withRemote('conf/neptune.properties')
```

Here `neptune.properties` takes this form:

```
gremlin.remote.remoteConnectionClass=org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection
gremlin.remote.driver.clusterFile=conf/my-cluster.yaml
gremlin.remote.driver.sourceName=g
```

The `my-cluster.yaml` file should look like this:

```
hosts: [my-cluster-abcdefghijk.us-east-1.neptune.amazonaws.com]
port: 8182
serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1,
              config: { serializeResultToString: false } }
connectionPool: { enableSsl: true }
```

**Note**  
 Serializers were moved from the `gremlin-driver` module to the new `gremlin-util` module in version 3.7.0. The package changed from org.apache.tinkerpop.gremlin.driver.ser to org.apache.tinkerpop.gremlin.util.ser. 

Configuring the Gremlin console connection like that lets you make the following kinds of queries successfully:

```
gremlin> 1+1
==>2

gremlin> x=g.V().count().next()
==>4249

gremlin> println("The answer was ${x}")
The answer was 4249
```

You can avoid displaying the result, like this:

```
gremlin> x=g.V().count().next();[]
gremlin> println(x)
4249
```

All the usual ways of querying (without the terminal step) continue to work. For example:

```
gremlin> g.V().count()
==>4249
```

You can even use the [https://tinkerpop.apache.org/docs/current/reference/#io-step](https://tinkerpop.apache.org/docs/current/reference/#io-step) step to load a file with this kind of connection.

## IAM authentication
<a name="access-graph-gremlin-console-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from the Gremlin console, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin console](iam-auth-connecting-gremlin-console.md).

# Using the HTTPS REST endpoint to connect to a Neptune DB instance
<a name="access-graph-gremlin-rest"></a>

Amazon Neptune provides an HTTPS endpoint for Gremlin queries. The REST interface is compatible with whatever Gremlin version your DB cluster is using (see the [engine release page](engine-releases.md) of the Neptune engine version you are running to determine which Gremlin release it supports).

**Note**  
As discussed in [Encrypting connections to your Amazon Neptune database with SSL/HTTPS](security-ssl.md), Neptune now requires that you connect using HTTPS instead of HTTP. In addition, Neptune does not currently support HTTP/2 for REST API requests. Clients must use HTTP/1.1 when connecting to endpoints.

The following instructions walk you through connecting to the Gremlin endpoint using the `curl` command and HTTPS. You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

The HTTPS endpoint for Gremlin queries to a Neptune DB instance is `https://your-neptune-endpoint:port/gremlin`.

**Note**  
For information about finding the hostname of your Neptune DB instance, see [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md).

## To connect to Neptune using the HTTP REST endpoint
<a name="access-graph-gremlin-rest-connect"></a>

The following examples show how to submit a Gremlin query to the REST endpoint. You can use the AWS SDK, the AWS CLI, or **curl**.

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query "g.V().limit(1)"
```

For more information, see [execute-gremlin-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
import json
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_query(
    gremlinQuery='g.V().limit(1)',
    serializer='application/vnd.gremlin-v3.0+json;types=false'
)

print(json.dumps(response['result'], indent=2))
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.V().limit(1)"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

The following example uses **curl** to submit a Gremlin query through HTTP **POST**. The query is submitted in JSON format in the body of the post as the `gremlin` property.

```
curl -X POST -d '{"gremlin":"g.V().limit(1)"}' https://your-neptune-endpoint:port/gremlin
```

Although HTTP **POST** requests are recommended for sending Gremlin queries, it is also possible to use HTTP **GET** requests:

```
curl -G "https://your-neptune-endpoint:port?gremlin=g.V().count()"
```

------

These examples return the first vertex in the graph by using the `g.V().limit(1)` traversal. You can query for something else by replacing it with another Gremlin traversal.

**Important**  
By default, the REST endpoint returns all results in a single JSON result set. If this result set is too large, an `OutOfMemoryError` exception can occur on the Neptune DB instance.  
You can avoid this by enabling chunked responses (results returned in a series of separate responses). See [Use optional HTTP trailing headers to enable multi-part Gremlin responses](access-graph-gremlin-rest-trailing-headers.md).

**Note**  
Neptune does not support the `bindings` property.

# Use optional HTTP trailing headers to enable multi-part Gremlin responses
<a name="access-graph-gremlin-rest-trailing-headers"></a>

By default, the HTTP response to Gremlin queries is returned in a single JSON result set. In the case of a very large result set, this can cause an `OutOfMemoryError` exception on the DB instance.

However, you can enable *chunked* responses (responses that are returned in multiple separate parts). You do this by including a transfer-encoding (TE) trailers header (`te: trailers`) in your request. See [the MDN page about TE request headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/TE)) for more information about TE headers.

When a response is returned in multiple parts, it can be hard to diagnose a problem that occurs after the first part is received, since the first part arrives with an HTTP status code of `200` (OK). A subsequent failure usually results in a message body containing a corrupt response, at the end of which Neptune appends an error message.

To make detection and diagnosis of this kind of failure easier, Neptune also includes two new header fields within the trailing headers of every response chunk:
+ `X-Neptune-Status`  –   contains the response code followed by a short name. For instance, in case of success the trailing header would be: `X-Neptune-Status: 200 OK`. In the case of failure, the response code would be one of the [Neptune engine error code](errors-engine-codes.md), such as `X-Neptune-Status: 500 TimeLimitExceededException`.
+ `X-Neptune-Detail`  –   is empty for successful requests. In the case of errors, it contains the JSON error message. Because only ASCII characters are allowed in HTTP header values, the JSON string is URL encoded.

**Note**  
Neptune does not currently support `gzip` compression of chunked responses. If the client requests both chunked encoding and compression at the same time, Neptune skips the compression.

# Java-based Gremlin clients to use with Amazon Neptune
<a name="access-graph-gremlin-client"></a>

You can use either of two open-source Java-based Gremlin clients with Amazon Neptune: the [Apache TinkerPop Java Gremlin client](https://search.maven.org/artifact/org.apache.tinkerpop/gremlin-driver), or the [Gremlin client for Amazon Neptune](https://search.maven.org/artifact/software.amazon.neptune/gremlin-client).

## Apache TinkerPop Java Gremlin client
<a name="access-graph-gremlin-java-driver"></a>

The Apache TinkerPop Java [gremlin-driver](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java) is the standard, official Gremlin client that works with any TinkerPop-enabled graph database. Use this client when you need maximum compatibility with the broader TinkerPop development space, when you're working with multiple graph database systems, or when you don't require the advanced cluster management and load balancing features specific to Neptune. This client is also suitable for simple applications that connect to a single Neptune instance or when you prefer to handle load balancing at the infrastructure level rather than within the client.

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

**Note**  
The table that helps you determine the correct Apache TinkerPop version to use with Neptune has been moved to [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md). This table was previously located on this page for many years and is now more centralized for reference for all programming languages that TinkerPop supports.

## Gremlin Java client for Amazon Neptune
<a name="access-graph-neptune-gremlin-client"></a>

The Gremlin client for Amazon Neptune is an [open-source Java-based Gremlin client](https://github.com/aws/neptune-gremlin-client) that acts as a drop-in replacement for the standard TinkerPop Java client.

The Neptune Gremlin client is optimized for Neptune clusters. It lets you manage traffic distribution across multiple instances in a cluster, and adapts to changes in cluster topology when you add or remove a replica. You can even configure the client to distribute requests across a subset of instances in your cluster, based on role, instance type, availability zone (AZ), or tags associated with instances.

The [latest version of the Neptune Gremlin Java client](https://search.maven.org/artifact/software.amazon.neptune/gremlin-client) is available on Maven Central.

For more information about the Neptune Gremlin Java client, see [this blog post](https://aws.amazon.com/blogs/database/load-balance-graph-queries-using-the-amazon-neptune-gremlin-client/). For code samples and demos, check out the [client's GitHub project](https://github.com/aws/neptune-gremlin-client).

When choosing the version of the Neptune Gremlin client, you need to consider the underlying TinkerPop version in relation to your Neptune engine version. Refer to the compatibility table at [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md) to determine the correct TinkerPop version for your Neptune engine, then use the following table to select the appropriate Neptune Gremlin client version:


**Neptune Gremlin client version compatibility**  

| Neptune Gremlin client version | TinkerPop version | 
| --- | --- | 
| 3.x | 3.7.x (AWS SDK for Java 2.x/1.x) | 
| 2.1.x | 3.7.x (AWS SDK for Java 1.x) | 
| 2.0.x | 3.6.x | 
| 1.12 | 3.5.x | 

# Using a Java client to connect to a Neptune DB instance
<a name="access-graph-gremlin-java"></a>

The following section walks you through the running of a complete Java sample that connects to a Neptune DB instance and performs a Gremlin traversal using the Apache TinkerPop Gremlin client.

These instructions must be followed from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

**To connect to Neptune using Java**

1. Install Apache Maven on your EC2 instance. If using Amazon Linux 2023 (preferred), use:

   ```
   sudo dnf update -y
   sudo dnf install maven -y
   ```

   If using Amazon Linux 2, download the latest binary from [ https://maven.apache.org/download.cgi: ](https://maven.apache.org/download.cgi:)

   ```
   sudo yum remove maven -y
   wget https://dlcdn.apache.org/maven/maven-3/ <version>/binaries/apache-maven-<version>-bin.tar.gz
   sudo tar -xzf apache-maven-<version>-bin.tar.gz -C /opt/
   sudo ln -sf /opt/apache-maven-<version> /opt/maven
   echo 'export MAVEN_HOME=/opt/maven' >> ~/.bashrc
   echo 'export PATH=$MAVEN_HOME/bin:$PATH' >> ~/.bashrc
   source ~/.bashrc
   ```

1. **Install Java.** The Gremlin libraries need Java 8 or 11. You can install Java 11 as follows:
   + If you're using [Amazon Linux 2 (AL2)](https://aws.amazon.com/amazon-linux-2):

     ```
     sudo amazon-linux-extras install java-openjdk11
     ```
   + If you're using [Amazon Linux 2023 (AL2023)](https://docs.aws.amazon.com/linux/al2023/ug/what-is-amazon-linux.html):

     ```
     sudo yum install java-11-amazon-corretto-devel
     ```
   + For other distributions, use whichever of the following is appropriate:

     ```
     sudo yum install java-11-openjdk-devel
     ```

     or:

     ```
     sudo apt-get install openjdk-11-jdk
     ```

1. **Set Java 11 as the default runtime on your EC2 instance:** Enter the following to set Java 8 as the default runtime on your EC2 instance:

   ```
   sudo /usr/sbin/alternatives --config java
   ```

   When prompted, enter the number for Java 11.

1. **Create a new directory named `gremlinjava`:**

   ```
   mkdir gremlinjava
   cd gremlinjava
   ```

1.  In the `gremlinjava` directory, create a `pom.xml` file, and then open it in a text editor:

   ```
   nano pom.xml
   ```

1. Copy the following into the `pom.xml` file and save it:

   ```
   <project xmlns="https://maven.apache.org/POM/4.0.0"
            xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/maven-v4_0_0.xsd">
     <properties>
       <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
     </properties>
     <modelVersion>4.0.0</modelVersion>
     <groupId>com.amazonaws</groupId>
     <artifactId>GremlinExample</artifactId>
     <packaging>jar</packaging>
     <version>1.0-SNAPSHOT</version>
     <name>GremlinExample</name>
     <url>https://maven.apache.org</url>
     <dependencies>
       <dependency>
         <groupId>org.apache.tinkerpop</groupId>
         <artifactId>gremlin-driver</artifactId>
         <version>3.7.2</version>
       </dependency>
       <dependency>
         <groupId>org.slf4j</groupId>
         <artifactId>slf4j-jdk14</artifactId>
         <version>1.7.25</version>
       </dependency>
     </dependencies>
     <build>
       <plugins>
         <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-compiler-plugin</artifactId>
           <version>2.5.1</version>
           <configuration>
             <source>11</source>
             <target>11</target>
           </configuration>
         </plugin>
           <plugin>
             <groupId>org.codehaus.mojo</groupId>
             <artifactId>exec-maven-plugin</artifactId>
             <version>1.3</version>
             <configuration>
               <executable>java</executable>
               <arguments>
                 <argument>-classpath</argument>
                 <classpath/>
                 <argument>com.amazonaws.App</argument>
               </arguments>
               <mainClass>com.amazonaws.App</mainClass>
               <complianceLevel>1.11</complianceLevel>
               <killAfter>-1</killAfter>
             </configuration>
           </plugin>
       </plugins>
     </build>
   </project>
   ```
**Note**  
If you are modifying an existing Maven project, the required dependency is highlighted in the preceding code.

1. Create subdirectories for the example source code (`src/main/java/com/amazonaws/`) by typing the following at the command line:

   ```
   mkdir -p src/main/java/com/amazonaws/
   ```

1. In the `src/main/java/com/amazonaws/` directory, create a file named `App.java`, and then open it in a text editor.

   ```
   nano src/main/java/com/amazonaws/App.java
   ```

1. Copy the following into the `App.java` file. Replace *your-neptune-endpoint* with the address of your Neptune DB instance. Do *not* include the `https://` prefix in the `addContactPoint` method.
**Note**  
For information about finding the hostname of your Neptune DB instance, see [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md).

   ```
   package com.amazonaws;
   import org.apache.tinkerpop.gremlin.driver.Cluster;
   import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
   import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
   import static org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource.traversal;
   import org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection;
   import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
   import org.apache.tinkerpop.gremlin.structure.T;
   
   public class App
   {
     public static void main( String[] args )
     {
       Cluster.Builder builder = Cluster.build();
       builder.addContactPoint("your-neptune-endpoint");
       builder.port(8182);
       builder.enableSsl(true);
   
       Cluster cluster = builder.create();
   
       GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster));
   
       // Add a vertex.
       // Note that a Gremlin terminal step, e.g. iterate(), is required to make a request to the remote server.
       // The full list of Gremlin terminal steps is at https://tinkerpop.apache.org/docs/current/reference/#terminal-steps
       g.addV("Person").property("Name", "Justin").iterate();
   
       // Add a vertex with a user-supplied ID.
       g.addV("Custom Label").property(T.id, "CustomId1").property("name", "Custom id vertex 1").iterate();
       g.addV("Custom Label").property(T.id, "CustomId2").property("name", "Custom id vertex 2").iterate();
   
       g.addE("Edge Label").from(__.V("CustomId1")).to(__.V("CustomId2")).iterate();
   
       // This gets the vertices, only.
       GraphTraversal t = g.V().limit(3).elementMap();
   
       t.forEachRemaining(
         e ->  System.out.println(t.toList())
       );
   
       cluster.close();
     }
   }
   ```

   For help connecting to Neptune with SSL/TLS (which is required), see [SSL/TLS configuration](#access-graph-gremlin-java-ssl).

1. Compile and run the sample using the following Maven command:

   ```
   mvn compile exec:exec
   ```

The preceding example returns a map of the key and values of each property for the first two vertexes in the graph by using the `g.V().limit(3).elementMap()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

**Note**  
The final part of the Gremlin query, `.toList()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.  
You also must append an appropriate ending when you add a vertex or edge, such as when you use the `addV( )` step.

The following methods submit the query to the Neptune DB instance:
+ `toList()`
+ `toSet()`
+ `next()`
+ `nextTraverser()`
+ `iterate()`

## SSL/TLS configuration for Gremlin Java client
<a name="access-graph-gremlin-java-ssl"></a>

Neptune requires SSL/TLS to be enabled by default. Typically, if the Java driver is configured with `enableSsl(true)`, it can connect to Neptune without having to set up a `trustStore()` or `keyStore()` with a local copy of a certificate.

However, if the instance with which you are connecting doesn't have an internet connection through which to verify a public certificate, or if the certificate you're using isn't public, you can take the following steps to configure a local certificate copy:

**Setting up a local certificate copy to enable SSL/TLS**

1. Download and install [keytool](https://docs.oracle.com/javase/9/tools/keytool.htm#JSWOR-GUID-5990A2E4-78E3-47B7-AE75-6D1826259549) from Oracle. This will make setting up the local key store much easier.

1. Download the `SFSRootCAG2.pem`CA certificate (the Gremlin Java SDK requires a certificateto verify the remote certificate):

   ```
   wget https://www.amazontrust.com/repository/SFSRootCAG2.pem
   ```

1. Create a key store in either JKS or PKCS12 format. This example uses JKS. Answer the questions that follow at the prompt. The password that you create here will be needed later:

   ```
   keytool -genkey -alias (host name) -keyalg RSA -keystore server.jks
   ```

1. Import the `SFSRootCAG2.pem` file that you downloaded into the newly created key store:

   ```
   keytool -import -keystore server.jks -file .pem
   ```

1. Configure the `Cluster` object programmatically:

   ```
   Cluster cluster = Cluster.build("(your neptune endpoint)")
                            .port(8182)
                            .enableSSL(true)
                            .keyStore(‘server.jks’)
                            .keyStorePassword("(the password from step 2)")
                            .create();
   ```

   You can do the same thing in a configuration file if you want, as you might do with the Gremlin console:

   ```
   hosts: [(your neptune endpoint)]
   port: 8182
   connectionPool: { enableSsl: true, keyStore: server.jks, keyStorePassword: (the password from step 2) }
   serializer: { className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1, config: { serializeResultToString: true }}
   ```

## IAM authentication
<a name="access-graph-gremlin-java-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a Java client, see [Connecting to Amazon Neptune databases using IAM with Gremlin Java](iam-auth-connecting-gremlin-java.md).

# Java example of connecting to a Neptune DB instance with re-connect logic
<a name="access-graph-gremlin-java-reconnect-example"></a>

The following Java example demonstrates how to connect to the Gremlin client with reconnect logic to recover from an unexpected disconnect.

For detailed guidance on developing a practical retry strategy, including exponential backoff with jitter, see [Exception Handling and Retries](transactions-exceptions.md).

It has the following dependencies:

```
<dependency>
    <groupId>org.apache.tinkerpop</groupId>
    <artifactId>gremlin-driver</artifactId>
    <version>${gremlin.version}</version>
</dependency>

<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>amazon-neptune-sigv4-signer</artifactId>
    <version>${sig4.signer.version}</version>
</dependency>

<dependency>
    <groupId>com.evanlennick</groupId>
    <artifactId>retry4j</artifactId>
    <version>0.15.0</version>
</dependency>
```

Here is the sample code:

**Important**  
 The `CallExecutor` from Retry4j may not be thread-safe. Consider having each thread use its own `CallExecutor` instance, or use a different retrying library. 

**Note**  
 The following example has been updated to include the use of requestInterceptor(). This was added in TinkerPop 3.6.6. Prior to TinkerPop version 3.6.6, the code example used handshakeInterceptor(), which was deprecated with that release. 

```
public static void main(String args[]) {
  boolean useIam = true;

  // Create Gremlin cluster and traversal source
  Cluster.Builder builder = Cluster.build()
         .addContactPoint(System.getenv("neptuneEndpoint"))
         .port(Integer.parseInt(System.getenv("neptunePort")))
         .enableSsl(true)
         .minConnectionPoolSize(1)
         .maxConnectionPoolSize(1)
         .serializer(Serializers.GRAPHBINARY_V1D0)
         .reconnectInterval(2000);

  if (useIam) {
      builder.requestInterceptor( r -> {
         try {
            NeptuneNettyHttpSigV4Signer sigV4Signer =
                        new NeptuneNettyHttpSigV4Signer("(your region)", new DefaultAWSCredentialsProviderChain());
            sigV4Signer.signRequest(r);
         } catch (NeptuneSigV4SignerException e) {
            throw new RuntimeException("Exception occurred while signing the request", e);
         }
         return r;
      });
   }

  Cluster cluster = builder.create();

  GraphTraversalSource g = AnonymousTraversalSource
      .traversal()
      .withRemote(DriverRemoteConnection.using(cluster));

  // Configure retries
  // NOTE: This example uses a fixed backoff for simplicity.
  // In a production application consider use of exponential backoff with jitter.
  RetryConfig retryConfig = new RetryConfigBuilder()
      .retryOnCustomExceptionLogic(getRetryLogic())
      .withDelayBetweenTries(1000, ChronoUnit.MILLIS)
      .withMaxNumberOfTries(5)
      .withFixedBackoff()
      .build();

  @SuppressWarnings("unchecked")
  CallExecutor<Object> retryExecutor = new CallExecutorBuilder<Object>()
      .config(retryConfig)
      .build();

  // Do lots of queries
  for (int i = 0; i < 100; i++){
    String id = String.valueOf(i);

    @SuppressWarnings("unchecked")
    Callable<Object> query = () -> g.mergeV(Map.of(T.id, id))
        .option(Merge.onCreate, Map.of(T.label, "Person"))
        .id().next();

    // Retry query
    // If there are connection failures, the Java Gremlin client will automatically
    // attempt to reconnect in the background, so all we have to do is wait and retry.
    Status<Object> status = retryExecutor.execute(query);

    System.out.println(status.getResult().toString());
  }

  cluster.close();
}

private static Function<Exception, Boolean> getRetryLogic() {

  return e -> {

    Class<? extends Exception> exceptionClass = e.getClass();

    StringWriter stringWriter = new StringWriter();
    String message = stringWriter.toString();


    if (RemoteConnectionException.class.isAssignableFrom(exceptionClass)){
      System.out.println("Retrying because RemoteConnectionException");
      return true;
    }

    // Check for connection issues
    if (message.contains("Timed out while waiting for an available host") ||
        message.contains("Timed-out") && message.contains("waiting for connection on Host") ||
        message.contains("Connection to server is no longer active") ||
        message.contains("Connection reset by peer") ||
        message.contains("SSLEngine closed already") ||
        message.contains("Pool is shutdown") ||
        message.contains("ExtendedClosedChannelException") ||
        message.contains("Broken pipe") ||
        message.contains(System.getenv("neptuneEndpoint")))
    {
      System.out.println("Retrying because connection issue");
      return true;
    };

    // Concurrent writes can sometimes trigger a ConcurrentModificationException.
    // In these circumstances you may want to backoff and retry.
    if (message.contains("ConcurrentModificationException")) {
      System.out.println("Retrying because ConcurrentModificationException");
      return true;
    }

    // If the primary fails over to a new instance, existing connections to the old primary will
    // throw a ReadOnlyViolationException. You may want to back and retry.
    if (message.contains("ReadOnlyViolationException")) {
      System.out.println("Retrying because ReadOnlyViolationException");
      return true;
    }

    System.out.println("Not a retriable error");
    return false;
  };
}
```

# Using Python to connect to a Neptune DB instance
<a name="access-graph-gremlin-python"></a>

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

The following section walks you through the running of a Python sample that connects to an Amazon Neptune DB instance and performs a Gremlin traversal.

You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

Before you begin, do the following:
+ Download and install Python 3.6 or later from the [Python.org website](https://www.python.org/downloads/).
+ Verify that you have **pip** installed. If you don't have **pip** or you're not sure, see [Do I need to install pip?](https://pip.pypa.io/en/stable/installing/#do-i-need-to-install-pip) in the **pip** documentation.
+ If your Python installation does not already have it, download `futures` as follows: `pip install futures`



**To connect to Neptune using Python**

1. Enter the following to install the `gremlinpython` package:

   ```
   pip install --user gremlinpython
   ```

1. Create a file named `gremlinexample.py`, and then open it in a text editor.

1. Copy the following into the `gremlinexample.py` file. Replace *your-neptune-endpoint* with the address of your Neptune DB cluster and *your-neptune-port* with the port of your Neptune DB cluster (default:8182). 

   For information about finding the address of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

    The example below demonstrates how to connect with Gremlin Python. 

   ```
   import boto3
   import os
   from botocore.auth import SigV4Auth
   from botocore.awsrequest import AWSRequest
   from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
   from gremlin_python.process.anonymous_traversal import traversal
   
   database_url = "wss://your-neptune-endpoint:your-neptune-port/gremlin"
   
   remoteConn = DriverRemoteConnection(database_url, "g")
   
   g = traversal().withRemote(remoteConn)
   
   print(g.inject(1).toList())
   remoteConn.close()
   ```

1. Enter the following command to run the sample:

   ```
   python gremlinexample.py
   ```

   The Gremlin query at the end of this example returns the vertices (`g.V().limit(2)`) in a list. This list is then printed with the standard Python `print` function.
**Note**  
The final part of the Gremlin query, `toList()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.

   The following methods submit the query to the Neptune DB instance:
   + `toList()`
   + `toSet()`
   + `next()`
   + `nextTraverser()`
   + `iterate()`

   

   The preceding example returns the first two vertices in the graph by using the `g.V().limit(2).toList()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

## IAM authentication
<a name="access-graph-gremlin-python-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a Python client, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin Python](gremlin-python-iam-auth.md).

# Using .NET to connect to a Neptune DB instance
<a name="access-graph-gremlin-dotnet"></a>

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

The following section contains a code example written in C\$1 that connects to a Neptune DB instance and performs a Gremlin traversal.

Connections to Amazon Neptune must be from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance. This sample code was tested on an Amazon EC2 instance running Ubuntu.

Before you begin, do the following:
+ Install .NET on the Amazon EC2 instance. To get instructions for installing .NET on multiple operating systems, including Windows, Linux, and macOS, see [Get Started with .NET](https://www.microsoft.com/net/learn/get-started/).
+ Install Gremlin.NET by running `dotnet add package gremlin.net` for your package. For more information, see [Gremlin.NET](https://tinkerpop.apache.org/docs/current/reference/#gremlin-DotNet) in the TinkerPop documentation.



**To connect to Neptune using Gremlin.NET**

1. Create a new .NET project.

   ```
   dotnet new console -o gremlinExample
   ```

1. Change directories into the new project directory.

   ```
   cd gremlinExample
   ```

1. Copy the following into the `Program.cs` file. Replace *your-neptune-endpoint* with the address of your Neptune DB instance.

   For information about finding the address of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

   ```
   using System;
   using System.Threading.Tasks;
   using System.Collections.Generic;
   using Gremlin.Net;
   using Gremlin.Net.Driver;
   using Gremlin.Net.Driver.Remote;
   using Gremlin.Net.Structure;
   using static Gremlin.Net.Process.Traversal.AnonymousTraversalSource;
   namespace gremlinExample
   {
     class Program
     {
       static void Main(string[] args)
       {
         try
         {
           var endpoint = "your-neptune-endpoint";
           // This uses the default Neptune and Gremlin port, 8182
           var gremlinServer = new GremlinServer(endpoint, 8182, enableSsl: true );
           var gremlinClient = new GremlinClient(gremlinServer);
           var remoteConnection = new DriverRemoteConnection(gremlinClient, "g");
           var g = Traversal().WithRemote(remoteConnection);
           g.AddV("Person").Property("Name", "Justin").Iterate();
           g.AddV("Custom Label").Property("name", "Custom id vertex 1").Iterate();
           g.AddV("Custom Label").Property("name", "Custom id vertex 2").Iterate();
           var output = g.V().Limit<Vertex>(3).ToList();
           foreach(var item in output) {
               Console.WriteLine(item);
           }
         }
         catch (Exception e)
         {
             Console.WriteLine("{0}", e);
         }
       }
     }
   }
   ```

1. Enter the following command to run the sample:

   ```
   dotnet run
   ```

   The Gremlin query at the end of this example returns the count of a single vertex for testing purposes. It is then printed to the console.
**Note**  
The final part of the Gremlin query, `Next()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.

   The following methods submit the query to the Neptune DB instance:
   + `ToList()`
   + `ToSet()`
   + `Next()`
   + `NextTraverser()`
   + `Iterate()`

   Use `Next()` if you need the query results to be serialized and returned, or `Iterate()` if you don't.

   The preceding example returns a list by using the `g.V().Limit(3).ToList()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

## IAM authentication
<a name="access-graph-gremlin-dotnet-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a .NET client, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin .NET](gremlin-dotnet-iam-auth.md).

# Using Node.js to connect to a Neptune DB instance
<a name="access-graph-gremlin-node-js"></a>

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

The following section walks you through the running of a Node.js sample that connects to an Amazon Neptune DB instance and performs a Gremlin traversal.

You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

Before you begin, do the following:
+ Verify that Node.js version 8.11 or higher is installed. If it is not, download and install Node.js from the [Nodejs.org website](https://nodejs.org).

**To connect to Neptune using Node.js**

1. Enter the following to install the `gremlin-javascript` package:

   ```
   npm install gremlin
   ```

1. Create a file named `gremlinexample.js` and open it in a text editor.

1. Copy the following into the `gremlinexample.js` file. Replace *your-neptune-endpoint* with the address of your Neptune DB instance.

   For information about finding the address of your Neptune DB instance, see the [Connecting to Amazon Neptune Endpoints](feature-overview-endpoints.md) section.

   ```
   const gremlin = require('gremlin');
   const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
   const Graph = gremlin.structure.Graph;
   
   dc = new DriverRemoteConnection('wss://your-neptune-endpoint:8182/gremlin',{});
   
   const graph = new Graph();
   const g = graph.traversal().withRemote(dc);
   
   g.V().limit(1).count().next().
       then(data => {
           console.log(data);
           dc.close();
       }).catch(error => {
           console.log('ERROR', error);
           dc.close();
       });
   ```

1. Enter the following command to run the sample:

   ```
   node gremlinexample.js
   ```

The preceding example returns the count of a single vertex in the graph by using the `g.V().limit(1).count().next()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

**Note**  
The final part of the Gremlin query, `next()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.

The following methods submit the query to the Neptune DB instance:
+ `toList()`
+ `toSet()`
+ `next()`
+ `nextTraverser()`
+ `iterate()`

Use `next()` if you need the query results to be serialized and returned, or `iterate()` if you don't.

**Important**  
This is a standalone Node.js example. If you are planning to run code like this in an AWS Lambda function, see [Lambda function examples](lambda-functions-examples.md) for details about using JavaScript efficiently in a Neptune Lambda function.

## IAM authentication
<a name="access-graph-gremlin-nodejs-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a JavaScript client, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin JavaScript](gremlin-javascript-iam-auth.md).

# Using Go to connect to a Neptune DB instance
<a name="access-graph-gremlin-go"></a>

**Important**  
Choosing the correct Apache TinkerPop Gremlin driver version is critical for compatibility with your Neptune engine version. Using an incompatible version can result in connection failures or unexpected behavior. For detailed version compatibility information, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

**Note**  
The gremlingo 3.5.x versions are backwards compatible with TinkerPop 3.4.x versions as long as you only use 3.4.x features in the Gremlin queries you write.

The following section walks you through the running of a Go sample that connects to an Amazon Neptune DB instance and performs a Gremlin traversal.

You must follow these instructions from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB instance.

Before you begin, do the following:
+ Download and install Go 1.17 or later from the [go.dev](https://go.dev/dl/) website.

**To connect to Neptune using Go**

1. Starting from an empty directory, initialize a new Go module:

   ```
   go mod init example.com/gremlinExample
   ```

1. Add gremlin-go as a dependency of your new module:

   ```
   go get github.com/apache/tinkerpop/gremlin-go/v3/driver
   ```

1. Create a file named `gremlinExample.go` and then open it in a text editor.

1. Copy the following into the `gremlinExample.go` file, replacing *`(your neptune endpoint)`* with the address of your Neptune DB instance:

   ```
   package main
   
   import (
     "fmt"
     gremlingo "github.com/apache/tinkerpop/gremlin-go/v3/driver"
   )
   
   func main() {
     // Creating the connection to the server.
     driverRemoteConnection, err := gremlingo.NewDriverRemoteConnection("wss://(your neptune endpoint):8182/gremlin",
       func(settings *gremlingo.DriverRemoteConnectionSettings) {
         settings.TraversalSource = "g"
       })
     if err != nil {
       fmt.Println(err)
       return
     }
     // Cleanup
     defer driverRemoteConnection.Close()
   
     // Creating graph traversal
     g := gremlingo.Traversal_().WithRemote(driverRemoteConnection)
   
     // Perform traversal
     results, err := g.V().Limit(2).ToList()
     if err != nil {
       fmt.Println(err)
       return
     }
     // Print results
     for _, r := range results {
       fmt.Println(r.GetString())
     }
   }
   ```
**Note**  
The Neptune TLS certificate format is not currently supported on Go 1.18\$1 with macOS, and may give a 509 error when trying to initiate a connection. For local testing, this can be skipped by adding "crypto/tls" to the imports and modifying the `DriverRemoteConnection` settings as follows:  

   ```
   // Creating the connection to the server.
   driverRemoteConnection, err := gremlingo.NewDriverRemoteConnection("wss://your-neptune-endpoint:8182/gremlin",
     func(settings *gremlingo.DriverRemoteConnectionSettings) {
         settings.TraversalSource = "g"
         settings.TlsConfig = &tls.Config{InsecureSkipVerify: true}
     })
   ```

1. Enter the following command to run the sample:

   ```
   go run gremlinExample.go
   ```

The Gremlin query at the end of this example returns the vertices `(g.V().Limit(2))` in a slice. This slice is then iterated through and printed with the standard `fmt.Println` function.

**Note**  
The final part of the Gremlin query, `ToList()`, is required to submit the traversal to the server for evaluation. If you don't include that method or another equivalent method, the query is not submitted to the Neptune DB instance.

The following methods submit the query to the Neptune DB instance:
+ `ToList()`
+ `ToSet()`
+ `Next()`
+ `GetResultSet()`
+ `Iterate()`

The preceding example returns the first two vertices in the graph by using the `g.V().Limit(2).ToList()` traversal. To query for something else, replace it with another Gremlin traversal with one of the appropriate ending methods.

## IAM authentication
<a name="access-graph-gremlin-go-iam"></a>

Neptune supports [IAM authentication](iam-auth-enable.md) to control access to your DB cluster. If you have IAM authentication enabled, you need to use Signature Version 4 signing to authenticate your requests. For detailed instructions and code examples for connecting from a Go client, see [Connecting to Amazon Neptune databases using IAM authentication with Gremlin Go](gremlin-go-iam-auth.md).

# Using the AWS SDK to run Gremlin queries
<a name="access-graph-gremlin-sdk"></a>

With the AWS SDK, you can run Gremlin queries against your Neptune graph using a programming language of your choice. The Neptune data API SDK (service name `neptunedata`) provides the [ExecuteGremlinQuery](https://docs.aws.amazon.com/neptune/latest/data-api/API_ExecuteGremlinQuery.html) action for submitting Gremlin queries.

You must run these examples from an Amazon EC2 instance in the same virtual private cloud (VPC) as your Neptune DB cluster, or from a location that has network connectivity to your cluster endpoint.

Direct links to the API reference documentation for the `neptunedata` service in each SDK language can be found below:


| Programming language | neptunedata API reference | 
| --- | --- | 
| C\$1\$1 | [https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-neptunedata/html/annotated.html](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-neptunedata/html/annotated.html) | 
| Go | [https://docs.aws.amazon.com/sdk-for-go/api/service/neptunedata/](https://docs.aws.amazon.com/sdk-for-go/api/service/neptunedata/) | 
| Java | [https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/neptunedata/package-summary.html](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/neptunedata/package-summary.html) | 
| JavaScript | [https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-neptunedata/](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-neptunedata/) | 
| Kotlin | [https://sdk.amazonaws.com/kotlin/api/latest/neptunedata/index.html](https://sdk.amazonaws.com/kotlin/api/latest/neptunedata/index.html) | 
| .NET | [https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/Neptunedata/NNeptunedata.html](https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/Neptunedata/NNeptunedata.html) | 
| PHP | [https://docs.aws.amazon.com/aws-sdk-php/v3/api/namespace-Aws.Neptunedata.html](https://docs.aws.amazon.com/aws-sdk-php/v3/api/namespace-Aws.Neptunedata.html) | 
| Python | [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/neptunedata.html](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/neptunedata.html) | 
| Ruby | [https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/Neptunedata.html](https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/Neptunedata.html) | 
| Rust | [https://crates.io/crates/aws-sdk-neptunedata](https://crates.io/crates/aws-sdk-neptunedata) | 
| CLI | [https://docs.aws.amazon.com/cli/latest/reference/neptunedata/](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/) | 

## Gremlin AWS SDK examples
<a name="access-graph-gremlin-sdk-examples"></a>

The following examples show how to set up a `neptunedata` client, run a Gremlin query, and print the results. Replace *YOUR\$1NEPTUNE\$1HOST* and *YOUR\$1NEPTUNE\$1PORT* with the endpoint and port of your Neptune DB cluster.

**Client-side timeout and retry configuration**  
The SDK client timeout controls how long the *client* waits for a response. It does not control how long the query runs on the server. If the client times out before the server finishes, the query may continue running on Neptune while the client has no way to retrieve the results.  
We recommend setting the client-side read timeout to `0` (no timeout) or to a value that is at least a few seconds longer than the server-side [neptune\$1query\$1timeout](parameters.md#parameters-db-cluster-parameters-neptune_query_timeout) setting on your Neptune DB cluster. This lets Neptune control when queries time out.  
We also recommend setting the maximum retry attempts to `1` (no retries). If the SDK retries a query that is still running on the server, it can result in duplicate operations. This is especially important for mutation queries, where a retry could cause unintended duplicate writes.

------
#### [ Python ]

1. Follow the [installation instructions](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html) to install Boto3.

1. Create a file named `gremlinExample.py` and paste the following code:

   ```
   import boto3
   import json
   from botocore.config import Config
   
   # Disable the client-side read timeout and retries so that
   # Neptune's server-side neptune_query_timeout controls query duration.
   client = boto3.client(
       'neptunedata',
       endpoint_url=f'https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT',
       config=Config(read_timeout=None, retries={'total_max_attempts': 1})
   )
   
   # Use the untyped GraphSON v3 serializer for a cleaner JSON response.
   response = client.execute_gremlin_query(
       gremlinQuery='g.V().limit(1)',
       serializer='application/vnd.gremlin-v3.0+json;types=false'
   )
   
   print(json.dumps(response['result'], indent=2))
   ```

1. Run the example: `python gremlinExample.py`

------
#### [ Java ]

1. Follow the [installation instructions](https://docs.aws.amazon.com//sdk-for-java/latest/developer-guide/setup.html) to set up the AWS SDK for Java.

1. Use the following code to set up a `NeptunedataClient`, run a Gremlin query, and print the result:

   ```
   import java.net.URI;
   import java.time.Duration;
   import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration;
   import software.amazon.awssdk.core.retry.RetryPolicy;
   import software.amazon.awssdk.services.neptunedata.NeptunedataClient;
   import software.amazon.awssdk.services.neptunedata.model.ExecuteGremlinQueryRequest;
   import software.amazon.awssdk.services.neptunedata.model.ExecuteGremlinQueryResponse;
   
   // Disable the client-side timeout and retries so that
   // Neptune's server-side neptune_query_timeout controls query duration.
   NeptunedataClient client = NeptunedataClient.builder()
       .endpointOverride(URI.create("https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT"))
       .overrideConfiguration(ClientOverrideConfiguration.builder()
           .apiCallTimeout(Duration.ZERO)
           .retryPolicy(RetryPolicy.none())
           .build())
       .build();
   
   // Use the untyped GraphSON v3 serializer for a cleaner JSON response.
   ExecuteGremlinQueryRequest request = ExecuteGremlinQueryRequest.builder()
       .gremlinQuery("g.V().limit(1)")
       .serializer("application/vnd.gremlin-v3.0+json;types=false")
       .build();
   
   ExecuteGremlinQueryResponse response = client.executeGremlinQuery(request);
   
   System.out.println(response.result().toString());
   ```

------
#### [ JavaScript ]

1. Follow the [installation instructions](https://docs.aws.amazon.com//sdk-for-javascript/v3/developer-guide/getting-started-nodejs.html) to set up the AWS SDK for JavaScript. Install the neptunedata client package: `npm install @aws-sdk/client-neptunedata`.

1. Create a file named `gremlinExample.js` and paste the following code:

   ```
   import { NeptunedataClient, ExecuteGremlinQueryCommand } from "@aws-sdk/client-neptunedata";
   import { NodeHttpHandler } from "@smithy/node-http-handler";
   
   const config = {
       endpoint: "https://YOUR_NEPTUNE_HOST:YOUR_NEPTUNE_PORT",
       // Disable the client-side request timeout so that
       // Neptune's server-side neptune_query_timeout controls query duration.
       requestHandler: new NodeHttpHandler({
           requestTimeout: 0
       }),
       maxAttempts: 1
   };
   
   const client = new NeptunedataClient(config);
   
   // Use the untyped GraphSON v3 serializer for a cleaner JSON response.
   const input = {
       gremlinQuery: "g.V().limit(1)",
       serializer: "application/vnd.gremlin-v3.0+json;types=false"
   };
   
   const command = new ExecuteGremlinQueryCommand(input);
   const response = await client.send(command);
   
   console.log(JSON.stringify(response, null, 2));
   ```

1. Run the example: `node gremlinExample.js`

------

# Gremlin query hints
<a name="gremlin-query-hints"></a>

You can use query hints to specify optimization and evaluation strategies for a particular Gremlin query in Amazon Neptune. 

Query hints are specified by adding a `withSideEffect` step to the query with the following syntax.

```
g.withSideEffect(hint, value)
```
+ *hint* – Identifies the type of the hint to apply.
+ *value* – Determines the behavior of the system aspect under consideration.

For example, the following shows how to include a `repeatMode` hint in a Gremlin traversal.

**Note**  
All Gremlin query hints side effects are prefixed with `Neptune#`.

```
g.withSideEffect('Neptune#repeatMode', 'DFS').V("3").repeat(out()).times(10).limit(1).path()
```

The preceding query instructs the Neptune engine to traverse the graph *Depth First* (`DFS`) rather than the default Neptune, *Breadth First* (`BFS`).

The following sections provide more information about the available query hints and their usage.

**Topics**
+ [Gremlin repeatMode query hint](gremlin-query-hints-repeatMode.md)
+ [Gremlin noReordering query hint](gremlin-query-hints-noReordering.md)
+ [Gremlin typePromotion query hint](gremlin-query-hints-typePromotion.md)
+ [Gremlin useDFE query hint](gremlin-query-hints-useDFE.md)
+ [Gremlin query hints for using the results cache](gremlin-query-hints-results-cache.md)

# Gremlin repeatMode query hint
<a name="gremlin-query-hints-repeatMode"></a>

The Neptune `repeatMode` query hint specifies how the Neptune engine evaluates the `repeat()` step in a Gremlin traversal: breadth first, depth first, or chunked depth first.

The evaluation mode of the `repeat()` step is important when it is used to find or follow a path, rather than simply repeating a step a limited number of times.

## Syntax
<a name="gremlin-query-hints-repeatMode-syntax"></a>

The `repeatMode` query hint is specified by adding a `withSideEffect` step to the query.

```
g.withSideEffect('Neptune#repeatMode', 'mode').gremlin-traversal
```

**Note**  
All Gremlin query hints side effects are prefixed with `Neptune#`.

**Available Modes**
+ `BFS`

  Breadth-First Search

  Default execution mode for the `repeat()` step. This gets all sibling nodes before going deeper along the path.

  This version is memory-intensive and frontiers can get very large. There is a higher risk that the query will run out of memory and be cancelled by the Neptune engine. This most closely matches other Gremlin implementations.
+ `DFS`

  Depth-First Search

  Follows each path to the maximum depth before moving on to the next solution.

  This uses less memory. It may provide better performance in situations like finding a single path from a starting point out multiple hops.
+ `CHUNKED_DFS`

  Chunked Depth-First Search

  A hybrid approach that explores the graph depth-first in chunks of 1,000 nodes, rather than 1 node (`DFS`) or all nodes (`BFS)`.

  The Neptune engine will get up to 1,000 nodes at each level before following the path deeper.

  This is a balanced approach between speed and memory usage. 

  It is also useful if you want to use `BFS`, but the query is using too much memory.



## Example
<a name="gremlin-query-hints-repeatMode-example"></a>

The following section describes the effect of the repeat mode on a Gremlin traversal.

In Neptune the default mode for the `repeat()` step is to perform a breadth-first (`BFS`) execution strategy for all traversals. 

In most cases, the TinkerGraph implementation uses the same execution strategy, but in some cases it alters the execution of a traversal. 

For example, the TinkerGraph implementation modifies the following query.

```
g.V("3").repeat(out()).times(10).limit(1).path()
```

The `repeat()` step in this traversal is "unrolled" into the following traversal, which results in a depth-first (`DFS`) strategy.

```
g.V(<id>).out().out().out().out().out().out().out().out().out().out().limit(1).path()
```

**Important**  
The Neptune query engine does not do this automatically.

Breadth-first (`BFS`) is the default execution strategy, and is similar to TinkerGraph in most cases. However, there are certain cases where depth-first (`DFS`) strategies are preferable.

 

**BFS (Default)**  
Breadth-first (BFS) is the default execution strategy for the `repeat()` operator.

```
g.V("3").repeat(out()).times(10).limit(1).path()
```

The Neptune engine fully explores the first nine-hop frontiers before finding a solution ten hops out. This is effective in many cases, such as a shortest-path query.

However, for the preceding example, the traversal would be much faster using the depth-first (`DFS`) mode for the `repeat()` operator.

**DFS**  
The following query uses the depth-first (`DFS`) mode for the `repeat()` operator.

```
g.withSideEffect("Neptune#repeatMode", "DFS").V("3").repeat(out()).times(10).limit(1)
```

This follows each individual solution out to the maximum depth before exploring the next solution. 

# Gremlin noReordering query hint
<a name="gremlin-query-hints-noReordering"></a>

When you submit a Gremlin traversal, the Neptune query engine investigates the structure of the traversal and reorders parts of the query, trying to minimize the amount of work required for evaluation and query response time. For example, a traversal with multiple constraints, such as multiple `has()` steps, is typically not evaluated in the given order. Instead it is reordered after the query is checked with static analysis.

The Neptune query engine tries to identify which constraint is more selective and runs that one first. This often results in better performance, but the order in which Neptune chooses to evaluate the query might not always be optimal.

If you know the exact characteristics of the data and want to manually dictate the order of the query execution, you can use the Neptune `noReordering` query hint to specify that the traversal be evaluated in the order given.

## Syntax
<a name="gremlin-query-hints-noReordering-syntax"></a>

The `noReordering` query hint is specified by adding a `withSideEffect` step to the query.

```
g.withSideEffect('Neptune#noReordering', true or false).gremlin-traversal
```

**Note**  
All Gremlin query hints side effects are prefixed with `Neptune#`.

**Available Values**
+ `true`
+ `false`

# Gremlin typePromotion query hint
<a name="gremlin-query-hints-typePromotion"></a>

When you submit a Gremlin traversal that filters on a numerical value or range, the Neptune query engine must normally use type promotion when it executes the query. This means that it has to examine values of every type that could hold the value you are filtering on.

For example, if you are filtering for values equal to 55, the engine must look for integers equal to 55, long integers equal to 55L, floats equal to 55.0, and so forth. Each type promotion requires an additional lookup on storage, which can cause an apparently simple query to take an unexpectedly long time to complete.

Let's say you are searching for all vertexes with a customer-age property greater than 5:

```
g.V().has('customerAge', gt(5))
```

To execute that traversal thoroughly, Neptune must expand the query to examine every numeric type that the value you are querying for could be promoted to. In this case, the `gt` filter has to be applied for any integer over 5, any long over 5L, any float over 5.0, and any double over 5.0. Because each of these type promotions requires an additional lookup on storage, you will see multiple filters per numeric filter when you run the [Gremlin `profile` API](gremlin-profile-api.md) for this query, and it will take significantly longer to complete than you might expect.

Often type promotion is unnecessary because you know in advance that you only need to find values of one specific type. When this is the case, you can speed up your queries dramatically by using the `typePromotion` query hint to turn off type promotion.

## Syntax
<a name="gremlin-query-hints-typePromotion-syntax"></a>

The `typePromotion` query hint is specified by adding a `withSideEffect` step to the query.

```
g.withSideEffect('Neptune#typePromotion', true or false).gremlin-traversal
```

**Note**  
All Gremlin query hints side effects are prefixed with `Neptune#`.

**Available Values**
+ `true`
+ `false`

To turn off type promotion for the query above, you would use:

```
g.withSideEffect('Neptune#typePromotion', false).V().has('customerAge', gt(5))
```

# Gremlin useDFE query hint
<a name="gremlin-query-hints-useDFE"></a>

Use this query hint to enable use of the DFE for executing the query. By default Neptune does not use the DFE without this query hint being set to `true`, because the [neptune\$1dfe\$1query\$1engine](parameters.md#parameters-instance-parameters-neptune_dfe_query_engine) instance parameter defaults to `viaQueryHint`. If you set that instance parameter to `enabled`, the DFE engine is used for all queries except those having the `useDFE` query hint set to `false`.

Example of enabling the DFE for a query:

```
g.withSideEffect('Neptune#useDFE', true).V().out()
```

# Gremlin query hints for using the results cache
<a name="gremlin-query-hints-results-cache"></a>

The following query hints can be used when the [query results cache](gremlin-results-cache.md) is enabled.

## Gremlin `enableResultCache` query hint
<a name="gremlin-query-hints-results-cache-enableResultCache"></a>

The `enableResultCache` query hint with a value of `true` causes query results to be returned from the cache if they have already been cached. If not, it returns new results and caches them until such time as they are cleared from the cache. For example:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

Later, you can access the cached results by issuing exactly the same query again.

If the value of this query hint is `false`, or if it isn't present, query results are not cached. However, setting it to `false` does not clear existing cached results. To clear cached results, use the `invalidateResultCache` or `invalidateResultCachekey` hint.

## Gremlin `enableResultCacheWithTTL` query hint
<a name="gremlin-query-hints-results-cache-enableResultCacheWithTTL"></a>

The `enableResultCacheWithTTL` query hint also returns cached results if there are any, without affecting the TTL of results already in the cache. If there are currently no cached results, the query returns new results and caches them for the time to live (TTL) specified by the `enableResultCacheWithTTL` query hint. That time to live is specified in seconds. For example, the following query specifies a time to live of sixty seconds:

```
g.with('Neptune#enableResultCacheWithTTL', 60)
 .V().has('genre','drama').in('likes')
```

Before the 60-second time-to-live is over, you can use the same query (here, `g.V().has('genre','drama').in('likes')`) with either the `enableResultCache` or the `enableResultCacheWithTTL` query hint to access the cached results.

**Note**  
The time to live specified with `enableResultCacheWithTTL` does not affect results that have already been cached.  
If results were previously cached using `enableResultCache`, the cache must first be explicitly cleared before `enableResultCacheWithTTL` generates new results and caches them for the TTL that it specifies.
If results were previously cached using `enableResultCachewithTTL`, that previous TTL must first expire before `enableResultCacheWithTTL` generates new results and caches them for the TTL that it specifies.

After the time to live has passed, the cached results for the query are cleared, and a subsequent instance of the same query then returns new results. If `enableResultCacheWithTTL` is attached to that subsequent query, the new results are cached with the TTL that it specifies.

## Gremlin `invalidateResultCacheKey` query hint
<a name="gremlin-query-hints-results-cache-invalidateResultCacheKey"></a>

The `invalidateResultCacheKey` query hint can take a `true` or `false` value. A `true` value causes cached results for the the query to which `invalidateResultCacheKey` is attached to be cleared. For example, the following example causes results cached for the query key `g.V().has('genre','drama').in('likes')` to be cleared:

```
g.with('Neptune#invalidateResultCacheKey', true)
 .V().has('genre','drama').in('likes')
```

The example query above does not cause its new results to be cached. You can include `enableResultCache` (or `enableResultCacheWithTTL`) in the same query if you want to cache the new results after clearing the existing cached ones:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#invalidateResultCacheKey', true)
 .V().has('genre','drama').in('likes')
```

## Gremlin `invalidateResultCache` query hint
<a name="gremlin-query-hints-results-cache-invalidateResultCache"></a>

The `invalidateResultCache` query hint can take a `true` or `false` value. A `true` value causes all results in the results cache to be cleared. For example:

```
g.with('Neptune#invalidateResultCache', true)
 .V().has('genre','drama').in('likes')
```

The example query above does not cause its results to be cached. You can include `enableResultCache` (or `enableResultCacheWithTTL`) in the same query if you want to cache new results after completely clearing the existing cache:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#invalidateResultCache', true)
 .V().has('genre','drama').in('likes')
```

## Gremlin `numResultsCached` query hint
<a name="gremlin-query-hints-results-cache-numResultsCached"></a>

The `numResultsCached` query hint can only be used with queries that contain `iterate()`, and it specifies the maximum number of results to cache for the query to which it is attached. Note that the results cached when `numResultsCached` is present are not returned, only cached.

For example, the following query specifies that up to 100 of its results should be cached, but none of those cached results returned:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#numResultsCached', 100)
 .V().has('genre','drama').in('likes').iterate()
```

You can then use a query like the following to retrieve a range of the cached results (here, the first ten):

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#numResultsCached', 100)
 .V().has('genre','drama').in('likes').range(0, 10)
```

## Gremlin `noCacheExceptions` query hint
<a name="gremlin-query-hints-results-cache-noCacheExceptions"></a>

The `noCacheExceptions` query hint can take a `true` or `false` value. A `true` value causes any exceptions related to the results cache to be suppressed. For example:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#noCacheExceptions', true)
 .V().has('genre','drama').in('likes')
```

In particular, this suppresses the `QueryLimitExceededException`, which is raised if the results of a query are too large to fit in the results cache.

# Gremlin query status API
<a name="gremlin-api-status"></a>

You can list all active Gremlin queries or get the status of a specific query. The underlying HTTP endpoint for both operations is `https://your-neptune-endpoint:port/gremlin/status`.

## Listing active Gremlin queries
<a name="gremlin-api-status-list"></a>

To list all active Gremlin queries, call the endpoint with no `queryId` parameter.

### Request parameters
<a name="gremlin-api-status-list-request"></a>
+ **includeWaiting** (*optional*)   –   If set to `TRUE`, the response includes waiting queries in addition to running queries.

### Response syntax
<a name="gremlin-api-status-list-response"></a>

```
{
  "acceptedQueryCount": integer,
  "runningQueryCount": integer,
  "queries": [
    {
      "queryId": "guid",
      "queryEvalStats": {
        "waited": integer,
        "elapsed": integer,
        "cancelled": boolean
      },
      "queryString": "string"
    }
  ]
}
```
+ **acceptedQueryCount**   –   The number of queries that have been accepted but not yet completed, including queries in the queue.
+ **runningQueryCount**   –   The number of currently running Gremlin queries.
+ **queries**   –   A list of the current Gremlin queries.

### Example
<a name="gremlin-api-status-list-example"></a>

------
#### [ AWS CLI ]

```
aws neptunedata list-gremlin-queries \
  --endpoint-url https://your-neptune-endpoint:port
```

For more information, see [list-gremlin-queries](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/list-gremlin-queries.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.list_gremlin_queries()

print(response)
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/status \
  --region us-east-1 \
  --service neptune-db
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/gremlin/status
```

------

The following output shows a single running query.

```
{
  "acceptedQueryCount": 9,
  "runningQueryCount": 1,
  "queries": [
    {
      "queryId": "fb34cd3e-f37c-4d12-9cf2-03bb741bf54f",
      "queryEvalStats": {
        "waited": 0,
        "elapsed": 23,
        "cancelled": false
      },
      "queryString": "g.V().out().count()"
    }
  ]
}
```

## Getting the status of a specific Gremlin query
<a name="gremlin-api-status-get-single"></a>

To get the status of a specific Gremlin query, provide the `queryId` parameter.

### Request parameters
<a name="gremlin-api-status-get-request"></a>
+ **queryId** (*required*)   –   The ID of the Gremlin query. Neptune automatically assigns this ID value to each query, or you can assign your own ID (see [Inject a Custom ID Into a Neptune Gremlin or SPARQL Query](features-query-id.md)).

### Response syntax
<a name="gremlin-api-status-get-response-syntax"></a>

```
{
  "queryId": "guid",
  "queryString": "string",
  "queryEvalStats": {
    "waited": integer,
    "elapsed": integer,
    "cancelled": boolean,
    "subqueries": document
  }
}
```
+ **queryId**   –   The ID of the query.
+ **queryString**   –   The submitted query. This is truncated to 1024 characters if it is longer than that.
+ **queryEvalStats**   –   Statistics for the query, including `waited` (wait time in milliseconds), `elapsed` (run time in milliseconds), `cancelled` (whether the query was cancelled), and `subqueries` (the number of subqueries).

### Example
<a name="gremlin-api-status-get-example"></a>

------
#### [ AWS CLI ]

```
aws neptunedata get-gremlin-query-status \
  --endpoint-url https://your-neptune-endpoint:port \
  --query-id "fb34cd3e-f37c-4d12-9cf2-03bb741bf54f"
```

For more information, see [get-gremlin-query-status](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/get-gremlin-query-status.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.get_gremlin_query_status(
    queryId='fb34cd3e-f37c-4d12-9cf2-03bb741bf54f'
)

print(response)
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/status/fb34cd3e-f37c-4d12-9cf2-03bb741bf54f \
  --region us-east-1 \
  --service neptune-db
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/gremlin/status/fb34cd3e-f37c-4d12-9cf2-03bb741bf54f
```

------

The following is an example response.

```
{
  "queryId": "fb34cd3e-f37c-4d12-9cf2-03bb741bf54f",
  "queryString": "g.V().out().count()",
  "queryEvalStats": {
    "waited": 0,
    "elapsed": 23,
    "cancelled": false
  }
}
```

# Gremlin query cancellation
<a name="gremlin-api-status-cancel"></a>

To get the status of Gremlin queries, use HTTP `GET` or `POST` to make a request to the `https://your-neptune-endpoint:port/gremlin/status` endpoint.

## Gremlin query cancellation request parameters
<a name="gremlin-api-status-cancel-request"></a>
+ **cancelQuery**   –   Required for cancellation. This parameter has no corresponding value.
+ **queryId**   –   The ID of the running Gremlin query to cancel.

## Gremlin query cancellation example
<a name="gremlin-api-status-cancel-example"></a>

The following is an example of cancelling a query.

------
#### [ AWS CLI ]

```
aws neptunedata cancel-gremlin-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --query-id "fb34cd3e-f37c-4d12-9cf2-03bb741bf54f"
```

For more information, see [cancel-gremlin-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/cancel-gremlin-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.cancel_gremlin_query(
    queryId='fb34cd3e-f37c-4d12-9cf2-03bb741bf54f'
)

print(response)
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/status \
  --region us-east-1 \
  --service neptune-db \
  --data-urlencode "cancelQuery" \
  --data-urlencode "queryId=fb34cd3e-f37c-4d12-9cf2-03bb741bf54f"
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl https://your-neptune-endpoint:port/gremlin/status \
  --data-urlencode "cancelQuery" \
  --data-urlencode "queryId=fb34cd3e-f37c-4d12-9cf2-03bb741bf54f"
```

------

Successful cancellation returns HTTP `200` OK.

# Support for Gremlin script-based sessions
<a name="access-graph-gremlin-sessions"></a>

You can use Gremlin sessions with implicit transactions in Amazon Neptune. For information about Gremlin sessions, see [Considering Sessions](http://tinkerpop.apache.org/docs/current/reference/#sessions) in the Apache TinkerPop documentation. The sections below describe how to use Gremlin sessions with Java.

**Important**  
Currently, the longest time Neptune can keep a script-based session open is 10 minutes. If you don't close a session before that, the session times out and everything in it is rolled back.

**Topics**
+ [Gremlin sessions on the Gremlin console](#access-graph-gremlin-sessions-console)
+ [Gremlin sessions in the Gremlin Language Variant](#access-graph-gremlin-sessions-glv)

## Gremlin sessions on the Gremlin console
<a name="access-graph-gremlin-sessions-console"></a>

If you create a remote connection on the Gremlin Console without the `session` parameter, the remote connection is created in *sessionless* mode. In this mode, each request that is submitted to the server is treated as a complete transaction in itself, and no state is saved between requests. If a request fails, only that request is rolled back.

If you create a remote connection that *does* use the `session` parameter, you create a script-based session that lasts until you close the remote connection. Every session is identified by a unique UUID that the console generates and returns to you.

The following is an example of one console call that creates a session. After queries are submitted, another call closes the session and commits the queries.

**Note**  
The Gremlin client must always be closed to release server side resources.

```
gremlin> :remote connect tinkerpop.server conf/neptune-remote.yaml session
  . . .
  . . .
gremlin> :remote close
```

For more information and examples, see [Sessions](http://tinkerpop.apache.org/docs/current/reference/#console-sessions) in the TinkerPop documentation.

All the queries that you run during a session form a single transaction that isn't committed until all the queries succeed and you close the remote connection. If a query fails, or if you don't close the connection within the maximum session lifetime that Neptune supports, the session transaction is not committed, and all the queries in it are rolled back.

## Gremlin sessions in the Gremlin Language Variant
<a name="access-graph-gremlin-sessions-glv"></a>

In the Gremlin language variant (GLV), you need to create a `SessionedClient` object to issue multiple queries in a single transaction, as in the following example.

```
try {                              // line 1
  Cluster cluster = Cluster.open();                    // line 2
  Client client = cluster.connect("sessionName");      // line 3
   ...
   ...
} finally {
  // Always close. If there are no errors, the transaction is committed; otherwise, it's rolled back.
  client.close();
}
```

Line 3 in the preceding example creates the `SessionedClient` object according to the configuration options set for the cluster in question. The *sessionName* string that you pass to the connect method becomes the unique name of the session. To avoid collisions, use a UUID for the name.

The client starts a session transaction when it is initialized. All the queries that you run during the session form are committed only when you call `client.close( )`. Again, if a single query fails, or if you don't close the connection within the maximum session lifetime that Neptune supports, the session transaction fails, and all the queries in it are rolled back.

**Note**  
The Gremlin client must always be closed to release server side resources.

```
GraphTraversalSource g = traversal().withRemote(conn);

Transaction tx = g.tx();

// Spawn a GraphTraversalSource from the Transaction.
// Traversals spawned from gtx are executed within a single transaction.
GraphTraversalSource gtx = tx.begin();
try {
  gtx.addV('person').iterate();
  gtx.addV('software').iterate();

  tx.commit();
} finally {
    if (tx.isOpen()) {
        tx.rollback();
    }
}
```

# Gremlin transactions in Neptune
<a name="access-graph-gremlin-transactions"></a>

There are several contexts within which Gremlin [transactions](transactions.md) are executed. When working with Gremlin it is important to understand the context you are working within and what its implications are:
+ **`Script-based`**   –   Requests are made using text-based Gremlin strings, like this:
  + Using the Java driver and `Client.submit(string)`.
  + Using the Gremlin console and `:remote connect`.
  + Using the HTTP API.
+ **`Bytecode-based`**   –   Requests are made using serialized Gremlin bytecode typical of [Gremlin Language Variants](https://tinkerpop.apache.org/docs/current/reference/#gremlin-drivers-variants)(GLV).

  For example, using the Java driver, `g = traversal().withRemote(...)`.

For either of the above contexts, there is the additional context of the request being sent as sessionless or as bound to a session.

**Note**  
 Gremlin transactions must always either be committed or rolled back, so that server-side resources can be released. In the event of an error during the transaction, it is important to retry the entire transaction and not just the particular request that failed. 

## Sessionless requests
<a name="access-graph-gremlin-transactions-sessionless"></a>

 When sessionless, a request is equivalent to a single transaction.

For scripts, the implication is that one or more Gremlin statements sent in a single request will commit or rollback as a single transaction. For example:

```
Cluster cluster = Cluster.open();
Client client = cluster.connect(); // sessionless
// 3 vertex additions in one request/transaction:
client.submit("g.addV();g.addV();g.addV()").all().get();
```

For bytecode, a sessionless request is made for each traversal spawned and executed from `g`:

```
GraphTraversalSource g = traversal().withRemote(...);

// 3 vertex additions in three individual requests/transactions:
g.addV().iterate();
g.addV().iterate();
g.addV().iterate();

// 3 vertex additions in one single request/transaction:
g.addV().addV().addV().iterate();
```

## Requests bound to a session
<a name="access-graph-gremlin-transactions-session-bound"></a>

When bound to a session, multiple requests can be applied within the context of a single transaction.

For scripts, the implication is that there is no need to concatenate together all of the graph operations into a single embedded string value:

```
Cluster cluster = Cluster.open();
Client client = cluster.connect(sessionName); // session
try {
    // 3 vertex additions in one request/transaction:
    client.submit("g.addV();g.addV();g.addV()").all().get();
} finally {
    client.close();
}

try {
    // 3 vertex additions in three requests, but one transaction:
    client.submit("g.addV()").all().get(); // starts a new transaction with the same sessionName
    client.submit("g.addV()").all().get();
    client.submit("g.addV()").all().get();
} finally {
    client.close();
}
```

For script-based sessions, closing the client with `client.close()` commits the transaction. There is no explicit rollback command available in script-based sessions. To force a rollback, you can cause the transaction to fail by issuing a query such as `g.inject(0).fail('rollback')` before closing the client.

**Note**  
A query like `g.inject(0).fail('rollback')`, used to intentionally throw an error to force a rollback, produces an exception on the client. Catch and discard the resulting exception before closing the client.

For bytecode, the transaction can be explicitly controlled and the session managed transparently. Gremlin Language Variants (GLV) support Gremlin's `tx()` syntax to `commit()` or `rollback()` a transaction as follows:

```
GraphTraversalSource g = traversal().withRemote(conn);

Transaction tx = g.tx();

// Spawn a GraphTraversalSource from the Transaction.
// Traversals spawned from gtx are executed within a single transaction.
GraphTraversalSource gtx = tx.begin();
try {
    gtx.addV('person').iterate();
    gtx.addV('software').iterate();

    tx.commit();
} finally {
    if (tx.isOpen()) {
        tx.rollback();
    }
}
```

Although the example above is written in Java, you can also use this `tx()` syntax in other languages. For language-specific transaction syntax, see the Transactions section of the Apache TinkerPop documentation for [Java](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java-transactions), [Python](https://tinkerpop.apache.org/docs/current/reference/#gremlin-python-transactions), [Javascript](https://tinkerpop.apache.org/docs/current/reference/#gremlin-javascript-transactions), [.NET](https://tinkerpop.apache.org/docs/current/reference/#gremlin-dotnet-transactions), and [Go](https://tinkerpop.apache.org/docs/current/reference/#gremlin-go-transactions).

**Warning**  
Sessionless read-only queries are executed under [SNAPSHOT](transactions-isolation-levels.md) isolation, but read-only queries run within an explicit transaction are executed under [SERIALIZABLE](transactions-isolation-levels.md) isolation. The read-only queries executed under `SERIALIZABLE` isolation incur higher overhead and can block or get blocked by concurrent writes, unlike those run under `SNAPSHOT` isolation.

## Timeout behavior for bytecode commit and rollback
<a name="access-graph-gremlin-transactions-commit-rollback-timeout"></a>

When you use bytecode-based transactions with the `tx()` syntax, the `commit()` and `rollback()` operations are not subject to query timeout settings. Neither the global `neptune_query_timeout` parameter nor per-query timeout values set through `evaluationTimeout` apply to these operations. On the server, `commit()` and `rollback()` run without a time limit until they complete or encounter an error.

On the client side, the Gremlin driver's `tx.commit()` and `tx.rollback()` calls will not complete until the server responds. Depending on the language, this might manifest as a blocking call or an unresolved async operation. No driver provides a built-in timeout setting that bounds these calls. Consult the API documentation for your specific Gremlin Language Variant for details on concurrency behavior around these transaction features.

**Important**  
If a `commit()` or `rollback()` call takes longer than expected, it might be blocked by lock contention from a concurrent transaction. For more information about lock conflicts, see [Conflict Resolution Using Lock-Wait Timeouts](transactions-neptune.md#transactions-neptune-conflicts).

If you need to bound the time your application waits for a `commit()` or `rollback()`, you can use your language's concurrency features to apply a client-side timeout. If the client-side timeout fires, the server continues processing the operation. The server-side operation holds a worker thread until it completes. After a client-side timeout, close the connection and create a new one rather than reusing the existing connection, because the transaction state is indeterminate.

### Server-side transaction cleanup
<a name="access-graph-gremlin-transactions-server-side-cleanup"></a>

If a client disconnects or abandons a transaction without committing or rolling back, Neptune has server-side mechanisms that eventually clean up the orphaned transaction:
+ **Session timeout**   –   Bytecode-based sessions that remain idle for longer than the maximum session lifetime (10 minutes) are closed, and any open transaction is rolled back.
+ **Connection idle timeout**   –   Neptune closes WebSocket connections that are idle for approximately 20 minutes. When the connection closes, the server rolls back any open transaction associated with that connection.

These cleanup mechanisms are safety nets. We recommend that you always explicitly commit or roll back transactions when you are finished with them.

# Streaming query results with Gremlin
<a name="access-graph-gremlin-streaming"></a>

When you run a Gremlin traversal that returns a large number of results, Neptune streams them back to the client in batches over the WebSocket connection. Neptune sends result batches as they are produced, without waiting for the client to request more. This can be advantageous if you want to process results as they are being returned from the server, but requires using lazy iteration patterns to avoid collecting the full result set into memory.

Neptune sends results in batches of 64 per WebSocket frame by default. You cannot change this server-side default, but the batch size can be overridden on a per-request basis from the client using the [https://tinkerpop.apache.org/docs/current/reference/#gremlin-java-configuration](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java-configuration) request option (called `Tokens.ARGS_BATCH_SIZE` in the Java driver, or `connectionPool.resultIterationBatchSize` as a driver-level default).

For details on configuring `batchSize` in other language drivers, see the Configuration section for each driver in the [Apache TinkerPop Gremlin Drivers and Variants](https://tinkerpop.apache.org/docs/current/reference/#gremlin-drivers-variants) documentation.

Because the server pushes results automatically, client-side backpressure is handled implicitly through TCP and WebSocket flow control. If the client is slow to read from the socket, the server's writes will eventually block until the client catches up.

**Important**  
Streaming is most effective with traversals that can produce results incrementally. Traversals that include `order()`, `groupCount()`, `group()`, `dedup()`, or other steps that require the full traversal to complete before emitting results will cause Neptune to materialize the entire result set in memory before streaming begins. In these cases, batching still reduces per-frame serialization overhead, but does not reduce server-side memory usage.

## Consuming results incrementally
<a name="access-graph-gremlin-streaming-usage"></a>

To process results as they arrive, iterate lazily using `hasNext()` / `next()` or equivalent APIs rather than collecting all results into a list. You can use `next(batchSize)` to pull results in application-level batches, allowing you to perform intermediate work between batches while the server continues producing results.

**Example Java (GLV bytecode)**  

```
GraphTraversalSource g = traversal().withRemote(connection);

int batchSize = 10;
int batchNum = 0;
var traversal = g.V().hasLabel("movie").values("title").limit(1000);
while (traversal.hasNext()) {
    var batch = traversal.next(batchSize);
    batchNum++;
    for (var title : batch) {
        System.out.println("  " + title);
    }

    // Do other intermediary work here between batch calls
    System.out.println("Batch " + batchNum + " processing complete\n");
}
```

**Example Python**  

```
g = traversal().with_remote(connection)

BATCH_SIZE = 10
batch_num = 0
t = g.V().has_label('movie').values('title').limit(1000)
while t.has_next():
    batch = t.next(BATCH_SIZE)
    batch_num += 1
    for title in batch:
        print(f"  {title}")

    # Do other intermediary work here between batch calls
    print(f"Batch {batch_num} processing complete\n")
```

**Example Go**  

```
// The Go driver does not support next(n), so batches are accumulated manually.
g := gremlingo.Traversal_().WithRemote(connection)

resultSet, err := g.V().HasLabel("movie").Values("title").Limit(1000).GetResultSet()
if err != nil {
    log.Fatal(err)
}

batchSize := 10
batchNum := 0
for {
    var batch []interface{}
    for i := 0; i < batchSize; i++ {
        result, ok, err := resultSet.One() // returns (value, ok, error); ok is false when results are exhausted
        if err != nil {
            log.Fatal(err)
        }
        if !ok {
            break
        }
        batch = append(batch, result)
    }
    if len(batch) == 0 {
        break
    }
    batchNum++
    for _, v := range batch {
        fmt.Printf("  %v\n", v)
    }

    // Do other intermediary work here between batch calls
    fmt.Printf("Batch %d processing complete\n\n", batchNum)
}
```

**Example .NET**  

```
var g = Traversal().WithRemote(connection);

var batchSize = 10;
var batchNum = 0;
var traversal = g.V().HasLabel("movie").Values<string>("title").Limit<string>(1000);
while (traversal.HasNext())
{
    var batch = traversal.Next(batchSize);
    batchNum++;
    foreach (var title in batch)
    {
        Console.WriteLine($"  {title}");
    }

    // Do other intermediary work here between batch calls
    Console.WriteLine($"Batch {batchNum} processing complete\n");
}
```

**Example Node.js**  

```
// The Node.js driver does not support next(n), so batches are accumulated manually.
const g = traversal().withRemote(connection);

const batchSize = 10;
let batchNum = 0;
const t = g.V().hasLabel('movie').values('title').limit(1000);
while (true) {
    const batch = [];
    for (let i = 0; i < batchSize; i++) {
        const result = await t.next();
        if (result.done) break;
        batch.push(result.value);
    }
    if (batch.length === 0) break;
    batchNum++;
    for (const title of batch) {
        console.log(`  ${title}`);
    }

    // Do other intermediary work here between batch calls
    console.log(`Batch ${batchNum} processing complete\n`);
}
```

## Eager vs. incremental consumption
<a name="access-graph-gremlin-streaming-avoid"></a>

Streaming allows you to process results incrementally as additional data is being fetched and returned. The following methods block until the entire result set is collected into memory, preventing your application from acting on results as they arrive:
+ **Java:** `toList()` or `toSet()`
+ **Python:** `toList()` or `toSet()`
+ **Go:** `ToList()`, `ToSet()`, or `GetResultSet().GetAll()`
+ **.NET:** `ToList()` or `Promise()`
+ **Node.js:** `toList()`

**Note**  
Data still flows incrementally over the WebSocket connection even when using these methods. The difference is that your application cannot process individual results until the entire collection is complete. To process results as they arrive, use the lazy iteration or batch patterns shown in the examples above.

# Using the Gremlin API with Amazon Neptune
<a name="gremlin-api-reference"></a>

**Note**  
Amazon Neptune does not support the `bindings` property.

Gremlin HTTPS requests all use a single endpoint: `https://your-neptune-endpoint:port/gremlin`. All Neptune connections must use HTTPS.

You can connect the Gremlin Console to a Neptune graph directly through WebSockets.

For more information about connecting to the Gremlin endpoint, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).

The Amazon Neptune implementation of Gremlin has specific details and differences that you need to consider. For more information, see [Gremlin standards compliance in Amazon Neptune](access-graph-gremlin-differences.md).

For information about the Gremlin language and traversals, see [The Traversal](https://tinkerpop.apache.org/docs/current/reference/#traversal) in the Apache TinkerPop documentation.

# Caching query results in Amazon Neptune Gremlin
<a name="gremlin-results-cache"></a>

Amazon Neptune supports a results cache for Gremlin queries.

You can enable the query results cache and then use a query hint to cache the results of a Gremlin read-only query.

Any re-run of the query then retrieves the cached results with low latency and no I/O costs, as long as they are still in the cache. This works for queries submitted both on an HTTP endpoint and using Websockets, either as byte-code or in string form.

**Note**  
Queries sent to the profile endpoint are not cached even when the query cache is enabled.

You can control how the Neptune query results cache behaves in several ways. For example:
+ You can get cached results paginated, in blocks.
+ You can specify the time-to-live (TTL) for specified queries.
+ You can clear the cache for specified queries.
+ You can clear the entire cache.
+ You can set up to be notified if results exceed the cache size.

The cache is maintained using a least-recently-used (LRU) policy, meaning that once the space allotted to the cache is full, the least-recently-used results are removed to make room when new results are being cached.

**Important**  
The query-results cache is not available on `t3.medium` or `t4.medium` instance types.

## Enabling the query results cache in Neptune
<a name="gremlin-results-cache-enabling"></a>

 The query results cache can be enabled across all instances in a cluster or per-instance. To enable the results cache on all instances in a cluster, set the `neptune_result_cache` parameter in the cluster's `cluster-parameter-group` to `1`. To enable this on a specific instance, set the `neptune_result_cache` parameter in the instance's `instance-parameter-group` to `1`. The cluster parameter group setting will override the instance parameter group value. 

 A restart is required on any affected instances for the results cache parameter settings to be applied. While you can enable the results cache across all instances in a cluster via the `cluster-parameter-group`, each instance maintains its own cache. The query results cache feature is not a cluster-wide cache. 

Once the results cache is enabled, Neptune sets aside a portion of current memory for caching query results. The larger the instance type you're using and the more memory is available, the more memory Neptune sets aside for the cache.

If the results cache memory fills up, Neptune automatically drops least-recently-used (LRU) cached results to make way for new ones.

You can check the current status of the results cache using the [Instance Status](access-graph-status.md) command.

## Using hints to cache query results
<a name="gremlin-results-cache-using"></a>

Once the query results cache is enabled, you use query hints to control query caching. All the examples below apply to the same query traversal, namely:

```
g.V().has('genre','drama').in('likes')
```

### Using `enableResultCache`
<a name="using-enableResultCache"></a>

With the query results cache enabled, you can cache the results of a Gremlin query using the `enableResultCache` query hint, as follows:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

Neptune then returns the query results to you, and also caches them. Later, you can access the cached results by issuing exactly the same query again:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

The cache key that identifies the cached results is the query string itself, namely:

```
g.V().has('genre','drama').in('likes')
```

### Using `enableResultCacheWithTTL`
<a name="using-enableResultCacheWithTTL"></a>

You can specify how long the query results should be cached for by using the `enableResultCacheWithTTL` query hint. For example, the following query specifies that the query results should expire after 120 seconds:

```
g.with('Neptune#enableResultCacheWithTTL', 120)
 .V().has('genre','drama').in('likes')
```

Again, the cache key that identifies the cached results is the base query string:

```
g.V().has('genre','drama').in('likes')
```

And again, you can access the cached results using that query string with the `enableResultCache` query hint:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

If 120 or more seconds have passed since the results were cached, that query will return new results, and cache them, without any time-to-live.

You can also access the cached results by issuing the same query again with the `enableResultCacheWithTTL` query hint. For example:

```
g.with('Neptune#enableResultCacheWithTTL', 140)
 .V().has('genre','drama').in('likes')
```

Until 120 seconds have passed (that is, the TTL currently in effect), this new query using the `enableResultCacheWithTTL` query hint returns the cached results. After 120 seconds, it would return new results and cache them with a time-to-live of 140 seconds.

**Note**  
If results for a query key are already cached, then the same query key with `enableResultCacheWithTTL` does not generate new results and has no effect on the time-to-live of the currently cached results.  
If results were previously cached using `enableResultCache`, the cache must first be cleared before `enableResultCacheWithTTL` generates new results and caches them for the TTL that it specifies.
If results were previously cached using `enableResultCachewithTTL`, that previous TTL must first expire before `enableResultCacheWithTTL` generates new results and caches them for the TTL that it specifies.

### Using `invalidateResultCacheKey`
<a name="using-invalidateResultCacheKey"></a>

You can use the `invalidateResultCacheKey` query hint to clear cached results for one particular query. For example:

```
g.with('Neptune#invalidateResultCacheKey', true)
 .V().has('genre','drama').in('likes')
```

That query clears the cache for the query key, `g.V().has('genre','drama').in('likes')`, and returns new results for that query.

You can also combine `invalidateResultCacheKey` with `enableResultCache` or `enableResultCacheWithTTL`. For example, the following query clears the current cached results, caches new results, and returns them:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#invalidateResultCacheKey', true)
 .V().has('genre','drama').in('likes')
```

### Using `invalidateResultCache`
<a name="using-invalidateResultCache"></a>

You can use the `invalidateResultCache` query hint to clear all cached results in the query result cache. For example:

```
g.with('Neptune#invalidateResultCache', true)
 .V().has('genre','drama').in('likes')
```

That query clears the entire result cache and returns new results for the query.

You can also combine `invalidateResultCache` with `enableResultCache` or `enableResultCacheWithTTL`. For example, the following query clears the entire results cache, caches new results for this query, and returns them:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#invalidateResultCache', true)
 .V().has('genre','drama').in('likes')
```

## Paginating cached query results
<a name="gremlin-results-cache-paginating"></a>

Suppose you have already cached a large number of results like this:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes')
```

Now suppose you issue the following range query:

```
g.with('Neptune#enableResultCache', true)
 .V().has('genre','drama').in('likes').range(0,10)
```

Neptune first looks for the full cache key, namely `g.V().has('genre','drama').in('likes').range(0,10)`. If that key doesn't exist, Neptune next looks to see if there is a key for that query string without the range (namely `g.V().has('genre','drama').in('likes')`). When it finds that key, Neptune then fetches the first ten results from its cache, as the range specifies.

**Note**  
If you use the `invalidateResultCacheKey` query hint with a query that has a range at the end, Neptune clears the cache for a query without the range if it doesn't find an exact match for the query with the range.

### Using `numResultsCached` with `.iterate()`
<a name="gremlin-results-cache-paginating-numResultsCached"></a>

Using the `numResultsCached` query hint, you can populate the results cache without returning all the results being cached, which can be useful when you prefer to paginate a large number of results.

The `numResultsCached` query hint only works with queries that end with `iterate()`.

For example, if you want to cache the first 50 results of the sample query:

```
g.with("Neptune#enableResultCache", true)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes').iterate()
```

In this case the query key in the cache is: `g.with("Neptune#numResultsCached", 50).V().has('genre','drama').in('likes')`. You can now retrieve the first ten of the cached results with this query:

```
g.with("Neptune#enableResultCache", true)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes').range(0, 10)
```

And, you can retrieve the next ten results from the query as follows:

```
g.with("Neptune#enableResultCache", true)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes').range(10, 20)
```

Don't forget to include the `numResultsCached` hint\$1 It is an essential part of the query key and must therefore be present in order to access the cached results.

**Some things to keep in mind when using `numResultsCached`**
+ **The number you supply with `numResultsCached` is applied at the end of the query.**   This means, for example, that the following query actually caches results in the range `(1000, 1500)`:

  ```
  g.with("Neptune#enableResultCache", true)
   .with("Neptune#numResultsCached", 500)
   .V().range(1000, 2000).iterate()
  ```
+ **The number you supply with `numResultsCached` specifies the maximum number of results to cache.**   This means, for example, that the following query actually caches results in the range `(1000, 2000)`:

  ```
  g.with("Neptune#enableResultCache", true)
   .with("Neptune#numResultsCached", 100000)
   .V().range(1000, 2000).iterate()
  ```
+ **Results cached by queries that end with `.range().iterate()` have their own range.**   For example, suppose you cache results using a query like this:

  ```
  g.with("Neptune#enableResultCache", true)
   .with("Neptune#numResultsCached", 500)
   .V().range(1000, 2000).iterate()
  ```

  To retrieve the first 100 results from the cache, you would write a query like this:

  ```
  g.with("Neptune#enableResultCache", true)
   .with("Neptune#numResultsCached", 500)
   .V().range(1000, 2000).range(0, 100)
  ```

  Those hundred results would be equivalent to results from the base query in the range `(1000, 1100)`.

## The query cache keys used to locate cached results
<a name="gremlin-results-cache-query-keys"></a>

After the results of a query have been cached, subsequent queries with the same *query cache key* retrieve results from the cache rather than generating new ones. The query cache key of a query is evaluated as follows:

1. All the cache-related query hints are ignored, except for `numResultsCached`.

1. A final `iterate()` step is ignored.

1. The rest of the query is ordered according to its byte-code representation.

The resulting string is matched against an index of the query results already in the cache to determine whether there is a cache hit for the query.

For example, take this query:

```
g.withSideEffect('Neptune#typePromotion', false).with("Neptune#enableResultCache", true)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes').iterate()
```

It will be stored as the byte-code version of this:

```
g.withSideEffect('Neptune#typePromotion', false)
 .with("Neptune#numResultsCached", 50)
 .V().has('genre','drama').in('likes')
```

## Exceptions related to the results cache
<a name="gremlin-results-cache-exceptions"></a>

If the results of a query that you are trying to cache are too large to fit in the cache memory even after removing everything previously cached, Neptune raises a `QueryLimitExceededException` fault. No results are returned, and the exception generates the following error message:

```
The result size is larger than the allocated cache,
      please refer to results cache best practices for options to rerun the query.
```

You can supress this message using the `noCacheExceptions` query hint, as follows:

```
g.with('Neptune#enableResultCache', true)
 .with('Neptune#noCacheExceptions', true)
 .V().has('genre','drama').in('likes')
```

# Making efficient upserts with Gremlin `mergeV()` and `mergeE()` steps
<a name="gremlin-efficient-upserts"></a>

An upsert (or conditional insert) reuses a vertex or edge if it already exists, or creates it if it doesn't. Efficient upserts can make a significant difference in the performance of Gremlin queries.

Upserts allow you to write idempotent insert operations: no matter how many times you run such an operation, the overall outcome is the same. This is useful in highly concurrent write scenarios where concurrent modifications to the same part of the graph can force one or more transactions to roll back with a `ConcurrentModificationException`, thereby necessitating retries.

For example, the following query upserts a vertex by using the supplied `Map` to first try to find a vertex with a `T.id` of `"v-1"`. If that vertex is found then it is returned. If it is not found then a vertex with that `id` and property are created through the `onCreate` clause.

```
g.mergeV([(id):'v-1']).
  option(onCreate, [(label): 'PERSON', 'email': 'person-1@example.org'])
```

## Batching upserts to improve throughput
<a name="gremlin-upserts-batching"></a>

For high throughput write scenarios, you can chain `mergeV()` and `mergeE()` steps together to upsert vertices and edges in batches. Batching reduces the transactional overhead of upserting large numbers of vertices and edges. You can then further improve throughput by upserting batch requests in parallel using multiple clients.

As a rule of thumb we recommend upserting approximately 200 records per batch request. A record is a single vertex or edge label or property. A vertex with a single label and 4 properties, for example, creates 5 records. An edge with a label and a single property creates 2 records. If you wanted to upsert batches of vertices, each with a single label and 4 properties, you should start with a batch size of 40, because `200 / (1 + 4) = 40`.

You can experiment with the batch size. 200 records per batch is a good starting point, but the ideal batch size may be higher or lower depending on your workload. Note, however, that Neptune may limit the overall number of Gremlin steps per request. This limit is not documented, but to be on the safe side, try to ensure that your requests contain no more than 1,500 Gremlin steps. Neptune may reject large batch requests with more than 1,500 steps.

To increase throughput, you can upsert batches in parallel using multiple clients (see [Creating Efficient Multithreaded Gremlin Writes](best-practices-gremlin-multithreaded-writes.md)). The number of clients should be the same as the number of worker threads on your Neptune writer instance, which is typically 2 x the number of vCPUs on the server. For instance, an `r5.8xlarge` instance has 32 vCPUs and 64 worker threads. For high-throughput write scenarios using an `r5.8xlarge`, you would use 64 clients writing batch upserts to Neptune in parallel.

Each client should submit a batch request and wait for the request to complete before submitting another request. Although the multiple clients run in parallel, each individual client submits requests in a serial fashion. This ensures that the server is supplied with a steady stream of requests that occupy all the worker threads without flooding the server-side request queue (see [Sizing DB instances in a Neptune DB cluster](feature-overview-db-clusters.md#feature-overview-sizing-instances)).

## Try to avoid steps that generate multiple traversers
<a name="gremlin-upserts-single-traverser"></a>

When a Gremlin step executes, it takes an incoming traverser, and emits one or more output traversers. The number of traversers emitted by a step determines the number of times the next step is executed.

Typically, when performing batch operations you want each operation, such as upsert vertex A, to execute once, so that the sequence of operations looks like this: upsert vertex A, then upsert vertex B, then upsert vertex C, and so on. As long as a step creates or modifies only one element, it emits only one traverser, and the steps that represent the next operation are executed only once. If, on the other hand, an operation creates or modifies more than one element, it emits multiple traversers, which in turn cause the subsequent steps to be executed multiple times, once per emitted traverser. This can result in the database performing unnecessary additional work, and in some cases can result in the creation of unwanted additional vertices, edges or property values.

An example of how things can go wrong is with a query like `g.V().addV()`. This simple query adds a vertex for every vertex found in the graph, because `V()` emits a traverser for each vertex in the graph and each of those traversers triggers a call to `addV()`.

See [Mixing upserts and inserts](#gremlin-upserts-and-inserts) for ways to deal with operations that can emit multiple traversers.

## Upserting vertices
<a name="gremlin-upserts-vertices"></a>

The `mergeV()` step is specifically designed for upserting vertices. It takes as an argument a `Map` that represents elements to match for existing vertices in the graph, and if an element is not found, uses that `Map` to create a new vertex. The step also allows you to alter the behavior in the event of a creation or a match, where the `option()` modulator can be applied with `Merge.onCreate` and `Merge.onMatch` tokens to control those respective behaviors. See the TinkerPop [Reference Documentation](https://tinkerpop.apache.org/docs/current/reference/#mergevertex-step) for further information about how to use this step.

You can use a vertex ID to determine whether a specific vertex exists. This is the preferred approach, because Neptune optimizes upserts for highly concurrent use cases around IDs. As an example, the following query creates a vertex with a given vertex ID if it doesn't already exist, or reuses it if it does:

```
g.mergeV([(T.id): 'v-1']).
    option(onCreate, [(T.label): 'PERSON', email: 'person-1@example.org', age: 21]).
    option(onMatch, [age: 22]).
  id()
```

Note that this query ends with an `id()` step. While not strictly necessary for the purpose of upserting the vertex, an `id()` step to the end of an upsert query ensures that the server doesn't serialize all the vertex properties back to the client, which helps reduce the locking cost of the query.

Alternatively, you can use a vertex property to identify a vertex:

```
g.mergeV([email: 'person-1@example.org']).
    option(onCreate, [(T.label): 'PERSON', age: 21]).
    option(onMatch, [age: 22]).
  id()
```

If possible, use your own user-supplied IDs to create vertices, and use these IDs to determine whether a vertex exists during an upsert operation. This lets Neptune optimize the upserts. An ID-based upsert can be significantly more efficient than a property-based upsert when concurrent modifications are common.

### Chaining vertex upserts
<a name="gremlin-upserts-vertices-chaining"></a>

You can chain vertex upserts together to insert them in a batch:

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org'))
 .V('v-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org'))
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org'))
 .id()
```

Alternatively, you can also use this `mergeV()` syntax:

```
g.mergeV([(T.id): 'v-1', (T.label): 'PERSON', email: 'person-1@example.org']).
  mergeV([(T.id): 'v-2', (T.label): 'PERSON', email: 'person-2@example.org']).
  mergeV([(T.id): 'v-3', (T.label): 'PERSON', email: 'person-3@example.org'])
```

However, because this form of the query includes elements in the search criteria that are superfluous to the basic lookup by `id`, it isn't as efficient as the previous query.

## Upserting edges
<a name="gremlin-upserts-edges"></a>

The `mergeE()` step is specifically designed for upserting edges. It takes a `Map` as an argument that represents elements to match for existing edges in the graph and if an element is not found, uses that `Map` to create a new edge. The step also allows you to alter the behavior in the event of a creation or a match, where the `option()` modulator can be applied with `Merge.onCreate` and `Merge.onMatch` tokens to control those respective behaviors. See the TinkerPop [Reference Documentation](https://tinkerpop.apache.org/docs/current/reference/#mergeedge-step) for further information about how to use this step.

You can use edge IDs to upsert edges in the same way you upsert vertices using custom vertex IDs. Again, this is the preferred approach because it allows Neptune to optimize the query. For example, the following query creates an edge based on its edge ID if it doesn't already exist, or reuses it if it does. The query also uses the IDs of the `Direction.from` and `Direction.to` vertices if it needs to create a new edge:

```
g.mergeE([(T.id): 'e-1']).
    option(onCreate, [(from): 'v-1', (to): 'v-2', weight: 1.0]).
    option(onMatch, [weight: 0.5]).
  id()
```

Note that this query ends with an `id()` step. While not strictly necessary for the purpose of upserting the edge, adding an `id()` step to the end of an upsert query ensures that the server doesn't serialize all the edge properties back to the client, which helps reduce the locking cost of the query.

Many applications use custom vertex IDs, but leave Neptune to generate edge IDs. If you don't know the ID of an edge, but you do know the `from` and `to` vertex IDs, you can use this kind of query to upsert an edge:

```
g.mergeE([(from): 'v-1', (to): 'v-2', (T.label): 'KNOWS']).
  id()
```

All vertices referenced by `mergeE()` must exist for the step to create the edge.

### Chaining edge upserts
<a name="gremlin-upserts-edges-chaining"></a>

As with vertex upserts, it's straightforward to chain `mergeE()` steps together for batch requests:

```
g.mergeE([(from): 'v-1', (to): 'v-2', (T.label): 'KNOWS']).
  mergeE([(from): 'v-2', (to): 'v-3', (T.label): 'KNOWS']).
  mergeE([(from): 'v-3', (to): 'v-4', (T.label): 'KNOWS']).
  id()
```

## Combining vertex and edge upserts
<a name="gremlin-upserts-vertexes-and-edges"></a>

Sometimes you may want to upsert both vertices and the edges that connect them. You can mix the batch examples presented here. The following example upserts 3 vertices and 2 edges:

```
g.mergeV([(id):'v-1']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-1@example.org']).
  mergeV([(id):'v-2']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-2@example.org']).
  mergeV([(id):'v-3']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-3@example.org']).
  mergeE([(from): 'v-1', (to): 'v-2', (T.label): 'KNOWS']).
  mergeE([(from): 'v-2', (to): 'v-3', (T.label): 'KNOWS']).
 id()
```

## Mixing upserts and inserts
<a name="gremlin-upserts-and-inserts"></a>

Sometimes you may want to upsert both vertices and the edges that connect them. You can mix the batch examples presented here. The following example upserts 3 vertices and 2 edges:

Upserts typically proceed one element at a time. If you stick to the upsert patterns presented here, each upsert operation emits a single traverser, which causes the subsequent operation to be executed just once.

However, sometimes you may want to mix upserts with inserts. This can be the case, for example, if you use edges to represent instances of actions or events. A request might use upserts to ensure that all necessary vertices exist, and then use inserts to add edges. With requests of this kind, pay attention to the potential number of traversers being emitted from each operation.

Consider the following example, which mixes upserts and inserts to add edges that represent events into the graph:

```
// Fully optimized, but inserts too many edges
g.mergeV([(id):'v-1']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-1@example.org']).
  mergeV([(id):'v-2']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-2@example.org']).
  mergeV([(id):'v-3']).
    option(onCreate, [(label): 'PERSON', 'email': 'person-3@example.org']).
  mergeV([(T.id): 'c-1', (T.label): 'CITY', name: 'city-1']).
  V('p-1', 'p-2').
  addE('FOLLOWED').to(V('p-1')).
  V('p-1', 'p-2', 'p-3').
  addE('VISITED').to(V('c-1')).
  id()
```

The query should insert 5 edges: 2 FOLLOWED edges and 3 VISITED edges. However, the query as written inserts 8 edges: 2 FOLLOWED and 6 VISITED. The reason for this is that the operation that inserts the 2 FOLLOWED edges emits 2 traversers, causing the subsequent insert operation, which inserts 3 edges, to be executed twice.

The fix is to add a `fold()` step after each operation that can potentially emit more than one traverser:

```
g.mergeV([(T.id): 'v-1', (T.label): 'PERSON', email: 'person-1@example.org']).
  mergeV([(T.id): 'v-2', (T.label): 'PERSON', email: 'person-2@example.org']).
  mergeV([(T.id): 'v-3', (T.label): 'PERSON', email: 'person-3@example.org']).
  mergeV([(T.id): 'c-1', (T.label): 'CITY', name: 'city-1']).
  V('p-1', 'p-2').
  addE('FOLLOWED').
    to(V('p-1')).
  fold().
  V('p-1', 'p-2', 'p-3').
  addE('VISITED').
    to(V('c-1')).
  id()
```

Here we’ve inserted a `fold()` step after the operation that inserts FOLLOWED edges. This results in a single traverser, which then causes the subsequent operation to be executed only once.

The downside of this approach is that the query is now not fully optimized, because `fold()` is not optimized. The insert operation that follows `fold()` will now also not be optimized.

If you need to use `fold()` to reduce the number of traversers on behalf of subsequent steps, try to order your operations so that the least expensive ones occupy the non-optimized part of the query.

## Setting Cardinality
<a name="gremlin-upserts-setting-cardinality"></a>

 The default cardinality for vertex properties in Neptune is set, which means that when using mergeV() the values supplied in the map are all going to be given that cardinality. To use single cardinality, you must be explicit in its usage. Starting in TinkerPop 3.7.0, there is a new syntax that allows the cardinality to be supplied as part of the map as shown in the following example: 

```
g.mergeV([(T.id): '1234']).
  option(onMatch, ['age': single(20), 'name': single('alice'), 'city': set('miami')])
```

 Alternatively, you may set the cardinality as a default for that `option` as follows: 

```
// age and name are set to single cardinality by default
g.mergeV([(T.id): '1234']).
  option(onMatch, ['age': 22, 'name': 'alice', 'city': set('boston')], single)
```

 There are fewer options for setting cardinality in `mergeV()` prior to version 3.7.0. The general approach is to fall back to the `property()` step as follows: 

```
g.mergeV([(T.id): '1234']). 
  option(onMatch, sideEffect(property(single,'age', 20).
  property(set,'city','miami')).constant([:]))
```

**Note**  
 This approach will only work with `mergeV()` when it is used with a start step. You would therefore not be able to chain `mergeV()` within a single traversal as the first `mergeV()` after the start step that uses this syntax will produce an error should the incoming traverser be a graph element. In this case, you would want to break up your `mergeV()` calls into multiple requests where each can be a start step. 

# Making efficient Gremlin upserts with `fold()/coalesce()/unfold()`
<a name="gremlin-efficient-upserts-pre-3.6"></a>

An upsert (or conditional insert) reuses a vertex or edge if it already exists, or creates it if it doesn't. Efficient upserts can make a significant difference in the performance of Gremlin queries.

This page shows how use the `fold()/coalesce()/unfold()` Gremlin pattern to make efficient upserts. However, with the release of TinkerPop version 3.6.x introduced in Neptune in engine version [1.2.1.0](engine-releases-1.2.1.0.md), the new `mergeV()` and `mergeE()` steps are preferable in most cases. The `fold()/coalesce()/unfold()` pattern described here may still be useful in a some complex situations, but in general use `mergeV()` and `mergeE()` if you can, as described in [Making efficient upserts with Gremlin `mergeV()` and `mergeE()` steps](gremlin-efficient-upserts.md).

Upserts allow you to write idempotent insert operations: no matter how many times you run such an operation, the overall outcome is the same. This is useful in highly concurrent write scenarios where concurrent modifications to the same part of the graph can force one or more transactions to roll back with a `ConcurrentModificationException`, thereby necessitating a retry.

For example, the following query upserts a vertex by first looking for the specified vertex in the dataset, and then folding the results into a list. In the first traversal supplied to the `coalesce()` step, the query then unfolds this list. If the unfolded list is not empty, the results are emitted from the `coalesce()`. If, however, the `unfold()` returns an empty collection because the vertex does not currently exist, `coalesce()` moves on to evaluate the second traversal with which it has been supplied, and in this second traversal the query creates the missing vertex.

```
g.V('v-1').fold()
          .coalesce(
             unfold(),
             addV('Person').property(id, 'v-1')
                           .property('email', 'person-1@example.org')
           )
```

## Use an optimized form of `coalesce()` for upserts
<a name="gremlin-upserts-pre-3.6-coalesce"></a>

Neptune can optimize the `fold().coalesce(unfold(), ...)` idiom to make high-throughput updates, but this optimization only works if both parts of the `coalesce()` return either a vertex or an edge but nothing else. If you try to return something different, such as a property, from any part of the `coalesce()`, the Neptune optimization does not occur. The query may succeed, but it will not perform as well as an optimized version, particularly against large datasets.

Because unoptimized upsert queries increase execution times and reduce throughput, it's worth using the Gremlin `explain` endpoint to determine whether an upsert query is fully optimized. When reviewing `explain` plans, look for lines that begin with `+ not converted into Neptune steps` and `WARNING: >>`. For example:

```
+ not converted into Neptune steps: [FoldStep, CoalesceStep([[UnfoldStep], [AddEdgeSte...
WARNING: >> FoldStep << is not supported natively yet
```

These warnings can help you identify the parts of a query that are preventing it from being fully optimized.

Sometimes it isn't possible to optimize a query fully. In these situations you should try to put the steps that cannot be optimized at the end of the query, thereby allowing the engine to optimize as many steps as possible. This technique is used in some of the batch upsert examples, where all optimized upserts for a set of vertices or edges are performed before any additional, potentially unoptimized modifications are applied to the same vertices or edges.

## Batching upserts to improve throughput
<a name="gremlin-upserts-pre-3.6-batching"></a>

For high throughput write scenarios, you can chain upsert steps together to upsert vertices and edges in batches. Batching reduces the transactional overhead of upserting large numbers of vertices and edges. You can then further improve throughput by upserting batch requests in parallel using multiple clients.

As a rule of thumb we recommend upserting approximately 200 records per batch request. A record is a single vertex or edge label or property. A vertex with a single label and 4 properties, for example, creates 5 records. An edge with a label and a single property creates 2 records. If you wanted to upsert batches of vertices, each with a single label and 4 properties, you should start with a batch size of 40, because `200 / (1 + 4) = 40`.

You can experiment with the batch size. 200 records per batch is a good starting point, but the ideal batch size may be higher or lower depending on your workload. Note, however, that Neptune may limit the overall number of Gremlin steps per request. This limit is not documented, but to be on the safe side try to ensure that your requests contain no more than 1500 Gremlin steps. Neptune may reject large batch requests with more than 1500 steps.

To increase throughput, you can upsert batches in parallel using multiple clients (see [Creating Efficient Multithreaded Gremlin Writes](best-practices-gremlin-multithreaded-writes.md)). The number of clients should be the same as the number of worker threads on your Neptune writer instance, which is typically 2 x the number of vCPUs on the server. For instance, an `r5.8xlarge` instance has 32 vCPUs and 64 worker threads. For high-throughput write scenarios using an `r5.8xlarge`, you would use 64 clients writing batch upserts to Neptune in parallel.

Each client should submit a batch request and wait for the request to complete before submitting another request. Although the multiple clients run in parallel, each individual client submits requests in a serial fashion. This ensures that the server is supplied with a steady stream of requests that occupy all the worker threads without flooding the server-side request queue (see [Sizing DB instances in a Neptune DB cluster](feature-overview-db-clusters.md#feature-overview-sizing-instances)).

## Try to avoid steps that generate multiple traversers
<a name="gremlin-upserts-pre-3.6-single-traverser"></a>

When a Gremlin step executes, it takes an incoming traverser, and emits one or more output traversers. The number of traversers emitted by a step determines the number of times the next step is executed.

Typically, when performing batch operations you want each operation, such as upsert vertex A, to execute once, so that the sequence of operations looks like this: upsert vertex A, then upsert vertex B, then upsert vertex C, and so on. As long as a step creates or modifies only one element, it emits only one traverser, and the steps that represent the next operation are executed only once. If, on the other hand, an operation creates or modifies more than one element, it emits multiple traversers, which in turn cause the subsequent steps to be executed multiple times, once per emitted traverser. This can result in the database performing unnecessary additional work, and in some cases can result in the creation of unwanted additional vertices, edges or property values.

An example of how things can go wrong is with a query like `g.V().addV()`. This simple query adds a vertex for every vertex found in the graph, because `V()` emits a traverser for each vertex in the graph and each of those traversers triggers a call to `addV()`.

See [Mixing upserts and inserts](#gremlin-upserts-pre-3.6-and-inserts) for ways to deal with operations that can emit multiple traversers.

## Upserting vertices
<a name="gremlin-upserts-pre-3.6-vertices"></a>

You can use a vertex ID to determine whether a corresponding vertex exists. This is the preferred approach, because Neptune optimizes upserts for highly concurrent use cases around IDs. As an example, the following query creates a vertex with a given vertex ID if it doesn't already exist, or reuses it if it does:

```
g.V('v-1')
 .fold()
  .coalesce(unfold(),
            addV('Person').property(id, 'v-1')
                          .property('email', 'person-1@example.org'))
  .id()
```

Note that this query ends with an `id()` step. While not strictly necessary for the purpose of upserting the vertex, adding an `id()` step to the end of an upsert query ensures that the server doesn't serialize all the vertex properties back to the client, which helps reduce the locking cost of the query.

Alternatively, you can use a vertex property to determine whether the vertex exists:

```
g.V()
 .hasLabel('Person')
 .has('email', 'person-1@example.org')
 .fold()
 .coalesce(unfold(),
           addV('Person').property('email', 'person-1@example.org'))
 .id()
```

If possible, use your own user-supplied IDs to create vertices, and use these IDs to determine whether a vertex exists during an upsert operation. This lets Neptune optimize upserts around the IDs. An ID-based upsert can be significantly more efficient than a property-based upsert in highly concurrent modification scenarios.

### Chaining vertex upserts
<a name="gremlin-upserts-pre-3.6-vertices-chaining"></a>

You can chain vertex upserts together to insert them in a batch:

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org'))
 .V('v-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org'))
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org'))
 .id()
```

## Upserting edges
<a name="gremlin-upserts-pre-3.6-edges"></a>

You can use edge IDs to upsert edges in the same way you upsert vertices using custom vertex IDs. Again, this is the preferred approach because it allows Neptune to optimize the query. For example, the following query creates an edge based on its edge ID if it doesn't already exist, or reuses it if it does. The query also uses the IDs of the `from` and `to` vertices if it needs to create a new edge.

```
g.E('e-1')
 .fold()
 .coalesce(unfold(),
           addE('KNOWS').from(V('v-1'))
                        .to(V('v-2'))
                        .property(id, 'e-1'))
 .id()
```

Many applications use custom vertex IDs, but leave Neptune to generate edge IDs. If you don't know the ID of an edge, but you do know the `from` and `to` vertex IDs, you can use this formulation to upsert an edge:

```
g.V('v-1')
 .outE('KNOWS')
 .where(inV().hasId('v-2'))
 .fold()
 .coalesce(unfold(),
           addE('KNOWS').from(V('v-1'))
                        .to(V('v-2')))
 .id()
```

Note that the vertex step in the `where()` clause should be `inV()` (or `outV()` if you've used `inE()` to find the edge), not `otherV()`. Do not use `otherV()`, here, or the query will not be optimized and performance will suffer. For example, Neptune would not optimize the following query:

```
// Unoptimized upsert, because of otherV()
g.V('v-1')
 .outE('KNOWS')
 .where(otherV().hasId('v-2'))
 .fold()
 .coalesce(unfold(),
           addE('KNOWS').from(V('v-1'))
                        .to(V('v-2')))
 .id()
```

If you don't know the edge or vertex IDs up front, you can upsert using vertex properties:

```
g.V()
 .hasLabel('Person')
 .has('name', 'person-1')
 .outE('LIVES_IN')
 .where(inV().hasLabel('City').has('name', 'city-1'))
 .fold()
 .coalesce(unfold(),
           addE('LIVES_IN').from(V().hasLabel('Person')
                                    .has('name', 'person-1'))
                           .to(V().hasLabel('City')
                                  .has('name', 'city-1')))
 .id()
```

As with vertex upserts, it's preferable to use ID-based edge upserts using either an edge ID or `from` and `to` vertex IDs, rather than property-based upserts, so that Neptune can fully optimize the upsert.

### Checking for `from` and `to` vertex existence
<a name="gremlin-upserts-pre-3.6-edges-checking"></a>

Note the construction of the steps that create a new edge: `addE().from().to()`. This construction ensures that the query checks the existence of both the `from` and the `to` vertex. If either of these does not exist, the query returns an error as follows:

```
{
  "detailedMessage": "Encountered a traverser that does not map to a value for child...
  "code": "IllegalArgumentException",
  "requestId": "..."
}
```

If it's possible that either the `from` or the `to` vertex doesn't exist, you should attempt to upsert them before upserting the edge between them. See [Combining vertex and edge upserts](#gremlin-upserts-pre-3.6-vertexes-and-edges).

There's an alternative construction for creating an edge that you shouldn't use: `V().addE().to()`. It only adds an edge if the `from` vertex exists. If the `to` vertex doesn't exist the, query generates an error, as described previously, but if the `from` vertex doesn't exist, it silently fails to insert an edge, without generating any error. For example, the following upsert completes without upserting an edge if the `from` vertex doesn't exist:

```
// Will not insert edge if from vertex does not exist
g.V('v-1')
 .outE('KNOWS')
 .where(inV().hasId('v-2'))
 .fold()
 .coalesce(unfold(),
           V('v-1').addE('KNOWS')
                   .to(V('v-2')))
 .id()
```

### Chaining edge upserts
<a name="gremlin-upserts-pre-3.6-edges-chaining"></a>

If you want to chain edge upserts together to create a batch request, you must begin each upsert with a vertex lookup, even if you already know the edge IDs.

If you do already know the IDs of the edges you want to upsert, and the IDs of the `from` and `to` vertices, you can use this formulation:

```
g.V('v-1')
 .outE('KNOWS')
 .hasId('e-1')
 .fold()
 .coalesce(unfold(),
           V('v-1').addE('KNOWS')
                   .to(V('v-2'))
                   .property(id, 'e-1'))
 .V('v-3')
 .outE('KNOWS')
 .hasId('e-2').fold()
 .coalesce(unfold(),
           V('v-3').addE('KNOWS')
                   .to(V('v-4'))
                   .property(id, 'e-2'))
 .V('v-5')
 .outE('KNOWS')
 .hasId('e-3')
 .fold()
 .coalesce(unfold(),
           V('v-5').addE('KNOWS')
                   .to(V('v-6'))
                   .property(id, 'e-3'))
 .id()
```

Perhaps the most common batch edge upsert scenario is that you know the `from` and `to` vertex IDs, but don't know the IDs of the edges you want to upsert. In that case, use the following formulation:

```
g.V('v-1')
 .outE('KNOWS')
 .where(inV().hasId('v-2'))
 .fold()
 .coalesce(unfold(),
           V('v-1').addE('KNOWS')
                   .to(V('v-2')))

 .V('v-3')
 .outE('KNOWS')
 .where(inV().hasId('v-4'))
 .fold()
 .coalesce(unfold(),
           V('v-3').addE('KNOWS')
                   .to(V('v-4')))
 .V('v-5')
 .outE('KNOWS')
 .where(inV().hasId('v-6'))
 .fold()
 .coalesce(unfold(),
           V('v-5').addE('KNOWS').to(V('v-6')))
 .id()
```

If you know IDs of the edges you want to upsert, but don’t know the IDs of the `from` and `to` vertices (this is unusual), you can use this formulation:

```
g.V()
 .hasLabel('Person')
 .has('email', 'person-1@example.org')
 .outE('KNOWS')
 .hasId('e-1')
 .fold()
 .coalesce(unfold(),
           V().hasLabel('Person')
              .has('email', 'person-1@example.org')
              .addE('KNOWS')
              .to(V().hasLabel('Person')
                     .has('email', 'person-2@example.org'))
                     .property(id, 'e-1'))
 .V()
 .hasLabel('Person')
 .has('email', 'person-3@example.org')
 .outE('KNOWS')
 .hasId('e-2')
 .fold()
 .coalesce(unfold(),
           V().hasLabel('Person')
              .has('email', 'person-3@example.org')
              .addE('KNOWS')
              .to(V().hasLabel('Person')
                     .has('email', 'person-4@example.org'))
              .property(id, 'e-2'))
 .V()
 .hasLabel('Person')
 .has('email', 'person-5@example.org')
 .outE('KNOWS')
 .hasId('e-1')
 .fold()
 .coalesce(unfold(),
           V().hasLabel('Person')
              .has('email', 'person-5@example.org')
              .addE('KNOWS')
              .to(V().hasLabel('Person')
                     .has('email', 'person-6@example.org'))
                     .property(id, 'e-3'))
 .id()
```

## Combining vertex and edge upserts
<a name="gremlin-upserts-pre-3.6-vertexes-and-edges"></a>

Sometimes you may want to upsert both vertices and the edges that connect them. You can mix the batch examples presented here. The following example upserts 3 vertices and 2 edges:

```
g.V('p-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-1')
                         .property('email', 'person-1@example.org'))
 .V('p-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-2')
                         .property('name', 'person-2@example.org'))
 .V('c-1')
 .fold()
 .coalesce(unfold(),
           addV('City').property(id, 'c-1')
                       .property('name', 'city-1'))
 .V('p-1')
 .outE('LIVES_IN')
 .where(inV().hasId('c-1'))
 .fold()
 .coalesce(unfold(),
           V('p-1').addE('LIVES_IN')
                   .to(V('c-1')))
 .V('p-2')
 .outE('LIVES_IN')
 .where(inV().hasId('c-1'))
 .fold()
 .coalesce(unfold(),
           V('p-2').addE('LIVES_IN')
                   .to(V('c-1')))
 .id()
```

## Mixing upserts and inserts
<a name="gremlin-upserts-pre-3.6-and-inserts"></a>

Sometimes you may want to upsert both vertices and the edges that connect them. You can mix the batch examples presented here. The following example upserts 3 vertices and 2 edges:

Upserts typically proceed one element at a time. If you stick to the upsert patterns presented here, each upsert operation emits a single traverser, which causes the subsequent operation to be executed just once.

However, sometimes you may want to mix upserts with inserts. This can be the case, for example, if you use edges to represent instances of actions or events. A request might use upserts to ensure that all necessary vertices exist, and then use inserts to add edges. With requests of this kind, pay attention to the potential number of traversers being emitted from each operation.

Consider the following example, which mixes upserts and inserts to add edges that represent events into the graph:

```
// Fully optimized, but inserts too many edges
g.V('p-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-1')
                         .property('email', 'person-1@example.org'))
 .V('p-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-2')
                         .property('name', 'person-2@example.org'))
 .V('p-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-3')
                         .property('name', 'person-3@example.org'))
 .V('c-1')
 .fold()
 .coalesce(unfold(),
           addV('City').property(id, 'c-1')
                       .property('name', 'city-1'))
 .V('p-1', 'p-2')
 .addE('FOLLOWED')
 .to(V('p-1'))
 .V('p-1', 'p-2', 'p-3')
 .addE('VISITED')
 .to(V('c-1'))
 .id()
```

The query should insert 5 edges: 2 FOLLOWED edges and 3 VISITED edges. However, the query as written inserts 8 edges: 2 FOLLOWED and 6 VISITED. The reason for this is that the operation that inserts the 2 FOLLOWED edges emits 2 traversers, causing the subsequent insert operation, which inserts 3 edges, to be executed twice.

The fix is to add a `fold()` step after each operation that can potentially emit more than one traverser:

```
g.V('p-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-1')
                         .property('email', 'person-1@example.org'))
 .V('p-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-2').
                         .property('name', 'person-2@example.org'))
 .V('p-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'p-3').
                         .property('name', 'person-3@example.org'))
 .V('c-1')
 .fold().
 .coalesce(unfold(),
            addV('City').property(id, 'c-1').
                        .property('name', 'city-1'))
 .V('p-1', 'p-2')
 .addE('FOLLOWED')
 .to(V('p-1'))
 .fold()
 .V('p-1', 'p-2', 'p-3')
 .addE('VISITED')
 .to(V('c-1')).
 .id()
```

Here we’ve inserted a `fold()` step after the operation that inserts FOLLOWED edges. This results in a single traverser, which then causes the subsequent operation to be executed only once.

The downside of this approach is that the query is now not fully optimized, because `fold()` is not optimized. The insert operation that follows `fold()` will now not be optimized.

If you need to use `fold()` to reduce the number of traversers on behalf of subsequent steps, try to order your operations so that the least expensive ones occupy the non-optimized part of the query.

## Upserts that modify existing vertices and edges
<a name="gremlin-upserts-pre-3.6-that-modify"></a>

Sometimes you want to create a vertex or edge if it doesn’t exist, and then add or update a property to it, regardless of whether it is a new or existing vertex or edge.

To add or modify a property, use the `property()` step. Use this step outside the `coalesce()` step. If you try to modify the property of an existing vertex or edge inside the `coalesce()` step, the query may not be optimized by the Neptune query engine.

The following query adds or updates a counter property on each upserted vertex. Each `property()` step has single cardinality to ensure that the new values replace any existing values, rather than being added to a set of existing values.

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org'))
 .property(single, 'counter', 1)
 .V('v-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org'))
 .property(single, 'counter', 2)
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org'))
 .property(single, 'counter', 3)
 .id()
```

If you have a property value, such as a `lastUpdated` timestamp value, that applies to all upserted elements, you can add or update it at the end of the query:

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org'))
 .V('v-2').
 .fold().
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org'))
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org'))
 .V('v-1', 'v-2', 'v-3')
 .property(single, 'lastUpdated', datetime('2020-02-08'))
 .id()
```

If there are additional conditions that determine whether or not a vertex or edge should be further modified, you can use a `has()` step to filter the elements to which a modification will be applied. The following example uses a `has()` step to filter upserted vertices based on the value of their `version` property. The query then updates to 3 the `version` of any vertex whose `version` is less than 3:

```
g.V('v-1')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-1')
                         .property('email', 'person-1@example.org')
                         .property('version', 3))
 .V('v-2')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-2')
                         .property('email', 'person-2@example.org')
                         .property('version', 3))
 .V('v-3')
 .fold()
 .coalesce(unfold(),
           addV('Person').property(id, 'v-3')
                         .property('email', 'person-3@example.org')
                         .property('version', 3))
 .V('v-1', 'v-2', 'v-3')
 .has('version', lt(3))
 .property(single, 'version', 3)
 .id()
```

# Analyzing Neptune query execution using Gremlin `explain`
<a name="gremlin-explain"></a>

Amazon Neptune has added a Gremlin feature named *explain*. This feature is a self-service tool for understanding the execution approach taken by the Neptune engine. You invoke it by adding an `explain` parameter to an HTTP call that submits a Gremlin query.

The `explain` feature provides information about the logical structure of query execution plans. You can use this information to identify potential evaluation and execution bottlenecks and tune your query, as explained in [Tuning Gremlin queries](gremlin-traversal-tuning.md). You can also use [query hints](gremlin-query-hints.md) to improve query execution plans.

**Topics**
+ [Understanding how Gremlin queries work in Neptune](gremlin-explain-background.md)
+ [Using the Gremlin `explain` API in Neptune](gremlin-explain-api.md)
+ [Gremlin `profile` API in Neptune](gremlin-profile-api.md)
+ [Tuning Gremlin queries using `explain` and `profile`](gremlin-traversal-tuning.md)
+ [Native Gremlin step support in Amazon Neptune](gremlin-step-support.md)

# Understanding how Gremlin queries work in Neptune
<a name="gremlin-explain-background"></a>

To take full advantage of the Gremlin `explain` and `profile` reports in Amazon Neptune, it is helpful to understand some background information about Gremlin queries.

**Topics**
+ [Gremlin statements in Neptune](gremlin-explain-background-statements.md)
+ [How Neptune processes Gremlin queries using statement indexes](gremlin-explain-background-indexing-examples.md)
+ [How Gremlin queries are processed in Neptune](gremlin-explain-background-querying.md)

# Gremlin statements in Neptune
<a name="gremlin-explain-background-statements"></a>

Property graph data in Amazon Neptune is composed of four-position (quad) statements. Each of these statements represents an individual atomic unit of property graph data. For more information, see [Neptune Graph Data Model](feature-overview-data-model.md). Similar to the Resource Description Framework (RDF) data model, these four positions are as follows:
+ `subject (S)`
+ `predicate (P)`
+ `object (O)`
+ `graph (G)`

Each statement is an assertion about one or more resources. For example, a statement can assert the existence of a relationship between two resources, or it can attach a property (key-value pair) to some resource.

You can think of the predicate as the verb of the statement, describing the type of relationship or property. The object is the target of the relationship, or the value of the property. The graph position is optional and can be used in many different ways. For the Neptune property graph (PG) data, it is either unused (null graph) or it is used to represent the identifier for an edge. A set of statements with shared resource identifiers creates a graph.

There are three classes of statements in the Neptune property graph data model:

**Topics**
+ [Vertex Label Statements](#gremlin-explain-background-vertex-labels)
+ [Edge Statements](#gremlin-explain-background-edge-statements)
+ [Property Statements](#gremlin-explain-background-property-statements)

## Gremlin Vertex Label Statements
<a name="gremlin-explain-background-vertex-labels"></a>

Vertex label statements in Neptune serve two purposes:
+ They track the labels for a vertex.
+ The presence of at least one of these statements is what implies the existence of a particular vertex in the graph.

The subject of these statements is a vertex identifier, and the object is a label, both of which are specified by the user. You use a special fixed predicate for these statements, displayed as `<~label>`, and a default graph identifier (the null graph), displayed as `<~>`.

For example, consider the following `addV` traversal.

```
g.addV("Person").property(id, "v1")
```

This traversal results in the following statement being added to the graph.

```
StatementEvent[Added(<v1> <~label> <Person> <~>) .]
```

## Gremlin Edge Statements
<a name="gremlin-explain-background-edge-statements"></a>

A Gremlin edge statement is what implies the existence of an edge between two vertices in a graph in Neptune. The subject (S) of an edge statement is the source `from` vertex. The predicate (P) is a user-supplied edge label. The object (O) is the target `to` vertex. The graph (G) is a user-supplied edge identifier.

For example, consider the following `addE` traversal.

```
g.addE("knows").from(V("v1")).to(V("v2")).property(id, "e1")
```

The traversal results in the following statement being added to the graph.

```
StatementEvent[Added(<v1> <knows> <v2> <e1>) .]
```

## Gremlin Property Statements
<a name="gremlin-explain-background-property-statements"></a>

A Gremlin property statement in Neptune asserts an individual property value for a vertex or edge. The subject is a user-supplied vertex or edge identifier. The predicate is the property name (key), and the object is the individual property value. The graph (G) is again the default graph identifier, the null graph, displayed as `<~>`.

Consider the following vertex property example.

```
g.V("v1").property("name", "John")
```

This statement results in the following.

```
StatementEvent[Added(<v1> <name> "John" <~>) .]
```

Property statements differ from others in that their object is a primitive value (a `string`, `date`, `byte`, `short`, `int`, `long`, `float`, or `double`). Their object is not a resource identifier that could be used as the subject of another assertion.

For multi-properties, each individual property value in the set receives its own statement.

```
g.V("v1").property(set, "phone", "956-424-2563").property(set, "phone", "956-354-3692 (tel:9563543692)")
```

This results in the following.

```
StatementEvent[Added(<v1> <phone> "956-424-2563" <~>) .]
StatementEvent[Added(<v1> <phone> "956-354-3692" <~>) .]
```

Edge properties are handled similarly to vertex properties, but use the edge identifier in the (S) position. For example, adding a property to an edge:

```
g.E("e1").property("weight", 0.8)
```

This results in the following statement being added to the graph.

```
StatementEvent[Added(<e1> <weight> 0.8 <~>) .]
```

# How Neptune processes Gremlin queries using statement indexes
<a name="gremlin-explain-background-indexing-examples"></a>

Statements are accessed in Amazon Neptune by way of three statement indexes, as detailed in [How Statements Are Indexed in Neptune](feature-overview-storage-indexing.md). Neptune extracts a statement *pattern* from a Gremlin query in which some positions are known, and the rest are left for discovery by index search.

Neptune assumes that the size of the property graph schema is not large. This means that the number of distinct edge labels and property names is fairly low, resulting in a low total number of distinct predicates. Neptune tracks distinct predicates in a separate index. It uses this cache of predicates to do a union scan of `{ all P x POGS }` rather than use an OSGP index. Avoiding the need for a reverse traversal OSGP index saves both storage space and load throughput.

The Neptune Gremlin Explain/Profile API lets you obtain the predicate count in your graph. You can then determine whether your application invalidates the Neptune assumption that your property graph schema is small.

The following examples help illustrate how Neptune uses indexes to process Gremlin queries.

**Question: What are the labels of vertex `v1`?**

```
  Gremlin code:      g.V('v1').label()
  Pattern:           (<v1>, <~label>, ?, ?)
  Known positions:   SP
  Lookup positions:  OG
  Index:             SPOG
  Key range:         <v1>:<~label>:*
```

**Question: What are the 'knows' out-edges of vertex `v1`?**

```
  Gremlin code:      g.V('v1').out('knows')
  Pattern:           (<v1>, <knows>, ?, ?)
  Known positions:   SP
  Lookup positions:  OG
  Index:             SPOG
  Key range:         <v1>:<knows>:*
```

**Question: Which vertices have a `Person` vertex label?**

```
  Gremlin code:      g.V().hasLabel('Person')
  Pattern:           (?, <~label>, <Person>, <~>)
  Known positions:   POG
  Lookup positions:  S
  Index:             POGS
  Key range:         <~label>:<Person>:<~>:*
```

**Question: What are the from/to vertices of a given edge `e1`?**

```
  Gremlin code:      g.E('e1').bothV()
  Pattern:           (?, ?, ?, <e1>)
  Known positions:   G
  Lookup positions:  SPO
  Index:             GPSO
  Key range:         <e1>:*
```

One statement index that Neptune does **not** have is a reverse traversal OSGP index. This index could be used to gather all incoming edges across all edge labels, as in the following example.

**Question: What are the incoming adjacent vertices `v1`?**

```
  Gremlin code:      g.V('v1').in()
  Pattern:           (?, ?, <v1>, ?)
  Known positions:   O
  Lookup positions:  SPG
  Index:             OSGP  // <-- Index does not exist
```

# How Gremlin queries are processed in Neptune
<a name="gremlin-explain-background-querying"></a>

In Amazon Neptune, more complex traversals can be represented by a series of patterns that create a relation based on the definition of named variables that can be shared across patterns to create joins. This is shown in the following example.

**Question: What is the two-hop neighborhood of vertex `v1`?**

```
  Gremlin code:      g.V(‘v1’).out('knows').out('knows').path()
  Pattern:           (?1=<v1>, <knows>, ?2, ?) X Pattern(?2, <knows>, ?3, ?)

  The pattern produces a three-column relation (?1, ?2, ?3) like this:
                     ?1     ?2     ?3
                     ================
                     v1     v2     v3
                     v1     v2     v4
                     v1     v5     v6
```

By sharing the `?2` variable across the two patterns (at the O position in the first pattern and the S position of the second pattern), you create a join from the first hop neighbors to the second hop neighbors. Each Neptune solution has bindings for the three named variables, which can be used to re-create a [TinkerPop Traverser](http://tinkerpop.apache.org/docs/current/reference/#_the_traverser) (including path information).

```
```

The first step in Gremlin query processing is to parse the query into a TinkerPop [Traversal](http://tinkerpop.apache.org/docs/current/reference/#traversal) object, composed of a series of TinkerPop [steps](http://tinkerpop.apache.org/docs/current/reference/#graph-traversal-steps). These steps, which are part of the open-source [Apache TinkerPop project](http://tinkerpop.apache.org/), are both the logical and physical operators that compose a Gremlin traversal in the reference implementation. They are both used to represent the model of the query. They are executable operators that can produce solutions according to the semantics of the operator that they represent. For example, `.V()` is both represented and executed by the TinkerPop [GraphStep](http://tinkerpop.apache.org/docs/current/reference/#graph-step).

Because these off-the-shelf TinkerPop steps are executable, such a TinkerPop Traversal can execute any Gremlin query and produce the correct answer. However, when executed against a large graph, TinkerPop steps can sometimes be very inefficient and slow. Instead of using them, Neptune tries to convert the traversal into a declarative form composed of groups of patterns, as described previously.

Neptune doesn't currently support all Gremlin operators (steps) in its native query engine. So it tries to collapse as many steps as possible down into a single `NeptuneGraphQueryStep`, which contains the declarative logical query plan for all the steps that have been converted. Ideally, all steps are converted. But when a step is encountered that can't be converted, Neptune breaks out of native execution and defers all query execution from that point forward to the TinkerPop steps. It doesn't try to weave in and out of native execution.

After the steps are translated into a logical query plan, Neptune runs a series of query optimizers that rewrite the query plan based on static analysis and estimated cardinalities. These optimizers do things like reorder operators based on range counts, prune unnecessary or redundant operators, rearrange filters, push operators into different groups, and so on.

After an optimized query plan is produced, Neptune creates a pipeline of physical operators that do the work of executing the query. This includes reading data from the statement indices, performing joins of various types, filtering, ordering, and so on. The pipeline produces a solution stream that is then converted back into a stream of TinkerPop Traverser objects.

## Serialization of query results
<a name="gremlin-explain-background-querying-serialization"></a>

Amazon Neptune currently relies on the TinkerPop response message serializers to convert query results (TinkerPop Traversers) into the serialized data to be sent over the wire back to the client. These serialization formats tend to be quite verbose.

For example, to serialize the result of a vertex query such as `g.V().limit(1)`, the Neptune query engine must perform a single search to produce the query result. However, the `GraphSON` serializer would perform a large number of additional searches to package the vertex into the serialization format. It would have to perform one search to get the label, one to get the property keys, and one search per property key for the vertex to get all the values for each key.

Some of the serialization formats are more efficient, but all require additional searches. Additionally, the TinkerPop serializers don't try to avoid duplicated searches, often resulting in many searches being repeated unnecessarily.

This makes it very important to write your queries so that they ask specifically just for the information they need. For example, `g.V().limit(1).id()` would return just the vertex ID and eliminate all the additional serializer searches. The [Gremlin `profile` API in Neptune](gremlin-profile-api.md) allows you to see how many search calls are made during query execution and during serialization.

# Using the Gremlin `explain` API in Neptune
<a name="gremlin-explain-api"></a>

The Amazon Neptune Gremlin `explain` API returns the query plan that would be executed if a specified query were run. Because the API doesn't actually run the query, the plan is returned almost instantaneously.

It differs from the TinkerPop .explain() step so as to be able to report information specific to the Neptune engine.

## Information contained in a Gremlin `explain` report
<a name="gremlin-explain-api-results"></a>

An `explain` report contains the following information:
+ The query string as requested.
+ **The original traversal.** This is the TinkerPop Traversal object produced by parsing the query string into TinkerPop steps. It is equivalent to the original query produced by running `.explain()` on the query against the TinkerPop TinkerGraph.
+ **The converted traversal.** This is the Neptune Traversal produced by converting the TinkerPop Traversal into the Neptune logical query plan representation. In many cases the entire TinkerPop traversal is converted into two Neptune steps: one that executes the entire query (`NeptuneGraphQueryStep`) and one that converts the Neptune query engine output back into TinkerPop Traversers (`NeptuneTraverserConverterStep`).
+ **The optimized traversal.** This is the optimized version of the Neptune query plan after it has been run through a series of static work-reducing optimizers that rewrite the query based on static analysis and estimated cardinalities. These optimizers do things like reorder operators based on range counts, prune unnecessary or redundant operators, rearrange filters, push operators into different groups, and so on.
+ **The predicate count.** Because of the Neptune indexing strategy described earlier, having a large number of different predicates can cause performance problems. This is especially true for queries that use reverse traversal operators with no edge label (`.in` or `.both`). If such operators are used and the predicate count is high enough, the `explain` report displays a warning message.
+ **DFE information.** When the DFE alternative engine is enabled, the following traversal components may show up in the optimized traversal:
  + **`DFEStep`**   –   A Neptune optimized DFE step in the traversal that contains a child `DFENode`. `DFEStep` represents the part of the query plan that is executed in the DFE engine.
  + **`DFENode`**   –   Contains the intermediate representation as one or more child `DFEJoinGroupNodes`.
  + **`DFEJoinGroupNode`**   –   Represents a join of one or more `DFENode` or `DFEJoinGroupNode` elements.
  + **`NeptuneInterleavingStep`**   –   A Neptune optimized DFE step in the traversal that contains a child `DFEStep`.

    Also contains a `stepInfo` element that contains information about the traversal, such as the frontier element, the path elements used, and so on. This information is used to process the child `DFEStep`.

  An easy way to find out if your query is being evaluated by DFE is to check whether the `explain` output contains a `DFEStep`. Any part of the traversal that is not part of the `DFEStep` will not be executed by DFE and will be executed by the TinkerPop engine.

  See [Example with DFE enabled](#gremlin-explain-dfe) for a sample report.

## Gremlin `explain` syntax
<a name="gremlin-explain-api-syntax"></a>

The syntax of the `explain` API is the same as that for the HTTP API for query, except that it uses `/gremlin/explain` as the endpoint instead of `/gremlin`, as in the following examples.

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query "g.V().limit(1)"
```

For more information, see [execute-gremlin-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_explain_query(
    gremlinQuery='g.V().limit(1)'
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/explain \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.V().limit(1)"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/explain \
  -d '{"gremlin":"g.V().limit(1)"}'
```

------

The preceding query would produce the following output.

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().limit(1)

Original Traversal
==================
[GraphStep(vertex,[]), RangeGlobalStep(0,1)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
        }, finishers=[limit(1)], annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY}
        }, finishers=[limit(1)], annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 18
```

## Unconverted TinkerPop Steps
<a name="gremlin-explain-unconverted-steps"></a>

Ideally, all TinkerPop steps in a traversal have native Neptune operator coverage. When this isn't the case, Neptune falls back on TinkerPop step execution for gaps in its operator coverage. If a traversal uses a step for which Neptune does not yet have native coverage, the `explain` report displays a warning showing where the gap occurred.

When a step without a corresponding native Neptune operator is encountered, the entire traversal from that point forward is run using TinkerPop steps, even if subsequent steps do have native Neptune operators.

The exception to this is when Neptune full-text search is invoked. The NeptuneSearchStep implements steps without native equivalents as full-text search steps.

## Example of `explain` output where all steps in a query have native equivalents
<a name="gremlin-explain-all-steps-converted"></a>

The following is an example `explain` report for a query where all steps have native equivalents:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().out()

Original Traversal
==================
[GraphStep(vertex,[]), VertexStep(OUT,vertex)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep], maxVarId=7}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=INFINITY}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep], maxVarId=7}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 18
```

## Example where some steps in a query do not have native equivalents
<a name="gremlin-explain-not-all-steps-converted"></a>

Neptune handles both `GraphStep` and `VertexStep` natively, but if you introduce a `FoldStep` and `UnfoldStep`, the resulting `explain` output is different:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().fold().unfold().out()

Original Traversal
==================
[GraphStep(vertex,[]), FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
        }, annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY}
        }, annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep,
    NeptuneMemoryTrackerStep
]
+ not converted into Neptune steps: [FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

WARNING: >> FoldStep << is not supported natively yet
```

In this case, the `FoldStep` breaks you out of native execution. But even the subsequent `VertexStep` is no longer handled natively because it appears downstream of the `Fold/Unfold` steps.

For performance and cost-savings, it's important that you try to formulate traversals so that the maximum amount of work possible is done natively inside the Neptune query engine, instead of by the TinkerPop step implementations.

## Example of a query that uses Neptune full-text-search
<a name="gremlin-explain-full-text-search-steps"></a>

The following query uses Neptune full-text search:

```
g.withSideEffect("Neptune#fts.endpoint", "some_endpoint")
  .V()
  .tail(100)
  .has("Neptune#fts mark*")
  -------
  .has("name", "Neptune#fts mark*")
  .has("Person", "name", "Neptune#fts mark*")
```

The `.has("name", "Neptune#fts mark*")` part limits the search to vertexes with `name`, while `.has("Person", "name", "Neptune#fts mark*")` limits the search to vertexes with `name` and the label `Person`. This results in the following traversal in the `explain` report:

```
Final Traversal
[NeptuneGraphQueryStep(Vertex) {
    JoinGroupNode {
        PatternNode[(?1, termid(1,URI), ?2, termid(0,URI)) . project distinct ?1 .], {estimatedCardinality=INFINITY}
    }, annotations={path=[Vertex(?1):GraphStep], maxVarId=4}
}, NeptuneTraverserConverterStep, NeptuneTailGlobalStep(10), NeptuneTinkerpopTraverserConverterStep, NeptuneSearchStep {
    JoinGroupNode {
        SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=some_endpoint}
    }
    JoinGroupNode {
        SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=some_endpoint}
    }
}]
```

## Example of using `explain` when the DFE is enabled
<a name="gremlin-explain-dfe"></a>

The following is an example of an `explain` report when the DFE alternative query engine is enabled:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().as("a").out().has("name", "josh").out().in().where(eq("a"))


Original Traversal
==================
[GraphStep(vertex,[])@[a], VertexStep(OUT,vertex), HasStep([name.eq(josh)]), VertexStep(OUT,vertex), VertexStep(IN,vertex), WherePredicateStep(eq(a))]

Converted Traversal
===================
Neptune steps:
[
    DFEStep(Vertex) {
      DFENode {
        DFEJoinGroupNode[ children={
          DFEPatternNode[(?1, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, ?2, <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph>) . project DISTINCT[?1] {rangeCountEstimate=unknown}],
          DFEPatternNode[(?1, ?3, ?4, ?5) . project ALL[?1, ?4] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}]
        }, {rangeCountEstimate=unknown}
        ]
      } [Vertex(?1):GraphStep@[a], Vertex(?4):VertexStep]
    } ,
    NeptuneTraverserConverterDFEStep
]
+ not converted into Neptune steps: HasStep([name.eq(josh)]),
Neptune steps:
[
    NeptuneInterleavingStep {
      StepInfo[joinVars=[?7, ?1], frontierElement=Vertex(?7):HasStep, pathElements={a=(last,Vertex(?1):GraphStep@[a])}, listPathElement={}, indexTime=0ms],
      DFEStep(Vertex) {
        DFENode {
          DFEJoinGroupNode[ children={
            DFEPatternNode[(?7, ?8, ?9, ?10) . project ALL[?7, ?9] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}],
            DFEPatternNode[(?12, ?11, ?9, ?13) . project ALL[?9, ?12] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}]
          }, {rangeCountEstimate=unknown}
          ]
        } [Vertex(?9):VertexStep, Vertex(?12):VertexStep]
      } 
    }
]
+ not converted into Neptune steps: WherePredicateStep(eq(a)),
Neptune steps:
[
    DFECleanupStep
]


Optimized Traversal
===================
Neptune steps:
[
    DFEStep(Vertex) {
      DFENode {
        DFEJoinGroupNode[ children={
          DFEPatternNode[(?1, ?3, ?4, ?5) . project ALL[?1, ?4] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}]
        }, {rangeCountEstimate=unknown}
        ]
      } [Vertex(?1):GraphStep@[a], Vertex(?4):VertexStep]
    } ,
    NeptuneTraverserConverterDFEStep
]
+ not converted into Neptune steps: NeptuneHasStep([name.eq(josh)]),
Neptune steps:
[
    NeptuneMemoryTrackerStep,
    NeptuneInterleavingStep {
      StepInfo[joinVars=[?7, ?1], frontierElement=Vertex(?7):HasStep, pathElements={a=(last,Vertex(?1):GraphStep@[a])}, listPathElement={}, indexTime=0ms],
      DFEStep(Vertex) {
        DFENode {
          DFEJoinGroupNode[ children={
            DFEPatternNode[(?7, ?8, ?9, ?10) . project ALL[?7, ?9] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}],
            DFEPatternNode[(?12, ?11, ?9, ?13) . project ALL[?9, ?12] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}]
          }, {rangeCountEstimate=unknown}
          ]
        } [Vertex(?9):VertexStep, Vertex(?12):VertexStep]
      } 
    }
]
+ not converted into Neptune steps: WherePredicateStep(eq(a)),
Neptune steps:
[
    DFECleanupStep
]


WARNING: >> [NeptuneHasStep([name.eq(josh)]), WherePredicateStep(eq(a))] << (or one of the children for each step) is not supported natively yet

Predicates
==========
# of predicates: 8
```

See [Information in `explain`](#gremlin-explain-api-results) for a description of the DFE-specific sections in the report.

# Gremlin `profile` API in Neptune
<a name="gremlin-profile-api"></a>

The Neptune Gremlin `profile` API runs a specified Gremlin traversal, collects various metrics about the run, and produces a profile report as output.

It differs from the TinkerPop .profile() step so as to be able to report information specific to the Neptune engine.

The profile report includes the following information about the query plan:
+ The physical operator pipeline
+ The index operations for query execution and serialization
+ The size of the result

The `profile` API uses an extended version of the HTTP API syntax for query, with `/gremlin/profile` as the endpoint instead of `/gremlin`.

## Parameters specific to Neptune Gremlin `profile`
<a name="gremlin-profile-api-parameters"></a>
+ **profile.results** – `boolean`, allowed values: `TRUE` and `FALSE`, default value: `TRUE`.

  If true, the query results are gathered and displayed as part of the `profile` report. If false, only the result count is displayed.
+ **profile.chop** – `int`, default value: 250.

  If non-zero, causes the results string to be truncated at that number of characters. This does not keep all results from being captured. It simply limits the size of the string in the profile report. If set to zero, the string contains all the results.
+ **profile.serializer** – `string`, default value: `<null>`.

  If non-null, the gathered results are returned in a serialized response message in the format specified by this parameter. The number of index operations necessary to produce that response message is reported along with the size in bytes to be sent to the client.

  Allowed values are `<null>` or any of the valid MIME type or TinkerPop driver "Serializers" enum values.

  ```
  "application/json" or "GRAPHSON"
  "application/vnd.gremlin-v1.0+json" or "GRAPHSON_V1"
  "application/vnd.gremlin-v1.0+json;types=false" or "GRAPHSON_V1_UNTYPED"
  "application/vnd.gremlin-v2.0+json" or "GRAPHSON_V2"
  "application/vnd.gremlin-v2.0+json;types=false" or "GRAPHSON_V2_UNTYPED"
  "application/vnd.gremlin-v3.0+json" or "GRAPHSON_V3"
  "application/vnd.gremlin-v3.0+json;types=false" or "GRAPHSON_V3_UNTYPED"
  "application/vnd.graphbinary-v1.0" or "GRAPHBINARY_V1"
  ```
+ **profile.indexOps** – `boolean`, allowed values: `TRUE` and `FALSE`, default value: `FALSE`.

  If true, shows a detailed report of all index operations that took place during query execution and serialization. Warning: This report can be verbose.



## Sample output of Neptune Gremlin `profile`
<a name="gremlin-profile-sample-output"></a>

The following is a sample `profile` query.

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-profile-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query 'g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)' \
  --serializer "application/vnd.gremlin-v3.0+json"
```

For more information, see [execute-gremlin-profile-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-profile-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_profile_query(
    gremlinQuery='g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)',
    serializer='application/vnd.gremlin-v3.0+json'
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/profile \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.V().hasLabel(\"airport\").has(\"code\", \"AUS\").emit().repeat(in().simplePath()).times(2).limit(100)", "profile.serializer":"application/vnd.gremlin-v3.0+json"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile \
  -d '{"gremlin":"g.V().hasLabel(\"airport\").has(\"code\", \"AUS\").emit().repeat(in().simplePath()).times(2).limit(100)", "profile.serializer":"application/vnd.gremlin-v3.0+json"}'
```

------

This query generates the following `profile` report when executed on the air-routes sample graph from the blog post, [Let Me Graph That For You – Part 1 – Air Routes](https://aws.amazon.com/blogs/database/let-me-graph-that-for-you-part-1-air-routes/).

```
*******************************************************
                Neptune Gremlin Profile
*******************************************************

Query String
==================
g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([~label.eq(airport), code.eq(AUS)]), RepeatStep(emit(true),[VertexStep(IN,vertex), PathFilterStep(simple), RepeatEndStep],until(loops(2))), RangeGlobalStep(0,100)]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "AUS", ?) . project ?1 .], {estimatedCardinality=1, indexTime=84, hashJoin=true, joinTime=3, actualTotalOutput=1}
            PatternNode[(?1, <~label>, ?2=<airport>, <~>) . project ask .], {estimatedCardinality=3374, indexTime=29, hashJoin=true, joinTime=0, actualTotalOutput=61}
            RepeatNode {
                Repeat {
                    PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0, joinTime=3}
                }
                Emit {
                    Filter(true)
                }
                LoopsCondition {
                    LoopsFilter([?1, ?3],eq(2))
                }
            }, annotations={repeatMode=BFS, emitFirst=true, untilFirst=false, leftVar=?1, rightVar=?3}
        }, finishers=[limit(100)], annotations={path=[Vertex(?1):GraphStep, Repeat[Vertex(?3):VertexStep]], joinStats=true, optimizationTime=495, maxVarId=7, executionTime=323}
    },
    NeptuneTraverserConverterStep
]

Physical Pipeline
=================
NeptuneGraphQueryStep
    |-- StartOp
    |-- JoinGroupOp
        |-- SpoolerOp(100)
        |-- DynamicJoinOp(PatternNode[(?1, <code>, "AUS", ?) . project ?1 .], {estimatedCardinality=1, indexTime=84, hashJoin=true})
        |-- SpoolerOp(100)
        |-- DynamicJoinOp(PatternNode[(?1, <~label>, ?2=<airport>, <~>) . project ask .], {estimatedCardinality=3374, indexTime=29, hashJoin=true})
        |-- RepeatOp
            |-- <upstream input> (Iteration 0) [visited=1, output=1 (until=0, emit=1), next=1]
            |-- BindingSetQueue (Iteration 1) [visited=61, output=61 (until=0, emit=61), next=61]
                |-- SpoolerOp(100)
                |-- DynamicJoinOp(PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0})
            |-- BindingSetQueue (Iteration 2) [visited=38, output=38 (until=38, emit=0), next=0]
                |-- SpoolerOp(100)
                |-- DynamicJoinOp(PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0})
        |-- LimitOp(100)

Runtime (ms)
============
Query Execution:  392.686
Serialization:   2636.380

Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(Vertex)                                        100         100         314.162    82.78
NeptuneTraverserConverterStep                                        100         100          65.333    17.22
                                            >TOTAL                     -           -         379.495        -

Repeat Metrics
==============
Iteration  Visited   Output    Until     Emit     Next
------------------------------------------------------
        0        1        1        0        1        1
        1       61       61        0       61       61
        2       38       38       38        0        0
------------------------------------------------------
               100      100       38       62       62

Predicates
==========
# of predicates: 16

WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance

Results
=======
Count: 100
Output: [v[3], v[3600], v[3614], v[4], v[5], v[6], v[7], v[8], v[9], v[10], v[11], v[12], v[47], v[49], v[136], v[13], v[15], v[16], v[17], v[18], v[389], v[20], v[21], v[22], v[23], v[24], v[25], v[26], v[27], v[28], v[416], v[29], v[30], v[430], v[31], v[9...
Response serializer: GRYO_V3D0
Response size (bytes): 23566

Index Operations
================
Query execution:
    # of statement index ops: 3
    # of unique statement index ops: 3
    Duplication ratio: 1.0
    # of terms materialized: 0
Serialization:
    # of statement index ops: 200
    # of unique statement index ops: 140
    Duplication ratio: 1.43
    # of terms materialized: 393
```

In addition to the query plans returned by a call to Neptune `explain`, the `profile` results include runtime statistics around query execution. Each Join operation is tagged with the time it took to perform its join as well as the actual number of solutions that passed through it.

The `profile` output includes the time taken during the core query execution phase, as well as the serialization phase if the `profile.serializer` option was specified.

The breakdown of the index operations performed during each phase is also included at the bottom of the `profile` output.

Note that consecutive runs of the same query may show different results in terms of run-time and index operations because of caching.

For queries using the `repeat()` step, a breakdown of the frontier on each iteration is available if the `repeat()` step was pushed down as part of a `NeptuneGraphQueryStep`.

## Differences in `profile` reports when DFE is enabled
<a name="gremlin-profile-dfe-output"></a>

When the Neptune DFE alternative query engine is enabled, `profile` output is somewhat different:

**Optimized Traversal:** This section is similar to the one in `explain` output, but contains additional information. This includes the type of DFE operators that were considered in planning, and the associated worst case and best case cost estimates.

**Physical Pipeline:** This section captures the operators that are used to execute the query. `DFESubQuery` elements abstract the physical plan that is used by DFE to execute the portion of the plan it is responsible for. The `DFESubQuery` elements are unfolded in the following section where DFE statistics are listed.

**DFEQueryEngine Statistics:** This section shows up only when at least part of the query is executed by DFE. It outlines various runtime statistics that are specific to DFE, and contains a detailed breakdown of the time spent in the various parts of the query execution, by `DFESubQuery`.

Nested subqueries in different `DFESubQuery` elements are flattened in this section, and unique identifiers are marked with a header that starts with `subQuery=`.

**Traversal metrics:** This section shows step-level traversal metrics, and when the DFE engine runs all or part of the query, displays metrics for `DFEStep` and/or `NeptuneInterleavingStep`. See [Tuning Gremlin queries using `explain` and `profile`](gremlin-traversal-tuning.md).

**Note**  
Because DFE support for Gremlin is an experimental feature, the exact format of the `profile` output is subject to change.

## Sample `profile` output when the Neptune Dataflow engine (DFE) is enabled
<a name="gremlin-profile-sample-dfe-output"></a>

When the DFE engine is being used to run Gremlin queries, output of the [Gremlin `profile` API](#gremlin-profile-api) is formatted as shown in the example below.

Query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-profile-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query "g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()"
```

For more information, see [execute-gremlin-profile-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-profile-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_profile_query(
    gremlinQuery="g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()"
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/profile \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.withSideEffect('"'"'Neptune#useDFE'"'"', true).V().has('"'"'code'"'"', '"'"'ATL'"'"').out()"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile \
  -d '{"gremlin":"g.withSideEffect('"'"'Neptune#useDFE'"'"', true).V().has('"'"'code'"'"', '"'"'ATL'"'"').out()"}'
```

------

```
*******************************************************
                    Neptune Gremlin Profile
    *******************************************************

    Query String
    ==================
    g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()

    Original Traversal
    ==================
    [GraphStep(vertex,[]), HasStep([code.eq(ATL)]), VertexStep(OUT,vertex)]

    Optimized Traversal
    ===================
    Neptune steps:
    [
        DFEStep(Vertex) {
          DFENode {
            DFEJoinGroupNode[null](
              children=[
                DFEPatternNode((?1, vp://code[419430926], ?4, defaultGraph[526]) . project DISTINCT[?1] objectFilters=(in(ATL[452987149]) . ), {rangeCountEstimate=1},
                  opInfo=(type=PipelineJoin, cost=(exp=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=0.00),wc=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=0.00)),
                    disc=(type=PipelineScan, cost=(exp=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=34.00),wc=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=34.00))))),
                DFEPatternNode((?1, ?5, ?6, ?7) . project ALL[?1, ?6] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807})],
              opInfo=[
                OperatorInfoWithAlternative[
                  rec=(type=PipelineJoin, cost=(exp=(in=1.00,out=27.76,io=0.00,comp=0.00,mem=0.00),wc=(in=1.00,out=27.76,io=0.00,comp=0.00,mem=0.00)),
                    disc=(type=PipelineScan, cost=(exp=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00),wc=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00)))),
                  alt=(type=PipelineScan, cost=(exp=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00),wc=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00)))]])
          } [Vertex(?1):GraphStep, Vertex(?6):VertexStep]
        } ,
        NeptuneTraverserConverterDFEStep,
        DFECleanupStep
    ]


    Physical Pipeline
    =================
    DFEStep
        |-- DFESubQuery1

    DFEQueryEngine Statistics
    =================
    DFESubQuery1
    ╔════╤════════╤════════╤═══════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤════════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments                                                                                                    │ Mode │ Units In │ Units Out │ Ratio  │ Time (ms) ║
    ╠════╪════════╪════════╪═══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪════════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFESolutionInjection  │ solutions=[]                                                                                                 │ -    │ 0        │ 1         │ 0.00   │ 0.01      ║
    ║    │        │        │                       │ outSchema=[]                                                                                                 │      │          │           │        │           ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 1  │ 2      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_1 │ -    │ 1        │ 1         │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 2  │ 3      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_2 │ -    │ 1        │ 242       │ 242.00 │ 0.02      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 3  │ 4      │ -      │ DFEMergeChunks        │ -                                                                                                            │ -    │ 242      │ 242       │ 1.00   │ 0.01      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 4  │ -      │ -      │ DFEDrain              │ -                                                                                                            │ -    │ 242      │ 0         │ 0.00   │ 0.01      ║
    ╚════╧════════╧════════╧═══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧════════╧═══════════╝


    subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_1
    ╔════╤════════╤════════╤══════════════════════╤═════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                                                   │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
    ╠════╪════════╪════════╪══════════════════════╪═════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFEPipelineScan      │ pattern=Node(?1) with property 'code' as ?4 and label 'ALL' │ -    │ 0        │ 1         │ 0.00  │ 0.22      ║
    ║    │        │        │                      │ inlineFilters=[(?4 IN ["ATL"])]                             │      │          │           │       │           ║
    ║    │        │        │                      │ patternEstimate=1                                           │      │          │           │       │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 1  │ 2      │ -      │ DFEMergeChunks       │ -                                                           │ -    │ 1        │ 1         │ 1.00  │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 2  │ 4      │ -      │ DFERelationalJoin    │ joinVars=[]                                                 │ -    │ 2        │ 1         │ 0.50  │ 0.09      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 3  │ 2      │ -      │ DFESolutionInjection │ solutions=[]                                                │ -    │ 0        │ 1         │ 0.00  │ 0.01      ║
    ║    │        │        │                      │ outSchema=[]                                                │      │          │           │       │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 4  │ -      │ -      │ DFEDrain             │ -                                                           │ -    │ 1        │ 0         │ 0.00  │ 0.01      ║
    ╚════╧════════╧════════╧══════════════════════╧═════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


    subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_2
    ╔════╤════════╤════════╤══════════════════════╤═════════════════════════════════════╤══════╤══════════╤═══════════╤════════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                           │ Mode │ Units In │ Units Out │ Ratio  │ Time (ms) ║
    ╠════╪════════╪════════╪══════════════════════╪═════════════════════════════════════╪══════╪══════════╪═══════════╪════════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFESolutionInjection │ solutions=[]                        │ -    │ 0        │ 1         │ 0.00   │ 0.01      ║
    ║    │        │        │                      │ outSchema=[?1]                      │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 1  │ 2      │ 3      │ DFETee               │ -                                   │ -    │ 1        │ 2         │ 2.00   │ 0.01      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 2  │ 4      │ -      │ DFEDistinctColumn    │ column=?1                           │ -    │ 1        │ 1         │ 1.00   │ 0.21      ║
    ║    │        │        │                      │ ordered=false                       │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 3  │ 5      │ -      │ DFEHashIndexBuild    │ vars=[?1]                           │ -    │ 1        │ 1         │ 1.00   │ 0.03      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 4  │ 5      │ -      │ DFEPipelineJoin      │ pattern=Edge((?1)-[?7:?5]->(?6))    │ -    │ 1        │ 242       │ 242.00 │ 0.51      ║
    ║    │        │        │                      │ constraints=[]                      │      │          │           │        │           ║
    ║    │        │        │                      │ patternEstimate=9223372036854775807 │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 5  │ 6      │ 7      │ DFESync              │ -                                   │ -    │ 243      │ 243       │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 6  │ 8      │ -      │ DFEForwardValue      │ -                                   │ -    │ 1        │ 1         │ 1.00   │ 0.01      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 7  │ 8      │ -      │ DFEForwardValue      │ -                                   │ -    │ 242      │ 242       │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 8  │ 9      │ -      │ DFEHashIndexJoin     │ -                                   │ -    │ 243      │ 242       │ 1.00   │ 0.31      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 9  │ -      │ -      │ DFEDrain             │ -                                   │ -    │ 242      │ 0         │ 0.00   │ 0.01      ║
    ╚════╧════════╧════════╧══════════════════════╧═════════════════════════════════════╧══════╧══════════╧═══════════╧════════╧═══════════╝


    Runtime (ms)
    ============
    Query Execution: 11.744

    Traversal Metrics
    =================
    Step                                                               Count  Traversers       Time (ms)    % Dur
    -------------------------------------------------------------------------------------------------------------
    DFEStep(Vertex)                                                      242         242          10.849    95.48
    NeptuneTraverserConverterDFEStep                                     242         242           0.514     4.52
                                                >TOTAL                     -           -          11.363        -

    Predicates
    ==========
    # of predicates: 18

    Results
    =======
    Count: 242


    Index Operations
    ================
    Query execution:
        # of statement index ops: 0
        # of terms materialized: 0
```

**Note**  
Because DFE support for Gremlin is an experimental feature, the exact format of the `profile` output is subject to change.

# Tuning Gremlin queries using `explain` and `profile`
<a name="gremlin-traversal-tuning"></a>

You can often tune your Gremlin queries in Amazon Neptune to get better performance, using the information available to you in the reports you get from the Neptune [explain](gremlin-explain-api.md) and [profile](gremlin-profile-api.md) APIs. To do so, it helps to understand how Neptune processes Gremlin traversals.

**Important**  
A change was made in TinkerPop version 3.4.11 that improves correctness of how queries are processed, but for the moment can sometimes seriously impact query performance.  
For example, a query of this sort may run significantly slower:  

```
g.V().hasLabel('airport').
  order().
    by(out().count(),desc).
  limit(10).
  out()
```
The vertices after the limit step are now fetched in a non-optimal way beause of the TinkerPop 3.4.11 change. To avoid this, you can modify the query by adding the barrier() step at any point after the `order().by()`. For example:  

```
g.V().hasLabel('airport').
  order().
    by(out().count(),desc).
  limit(10).
  barrier().
  out()
```
TinkerPop 3.4.11 was enabled in Neptune [engine version 1.0.5.0](engine-releases-1.0.5.0.md).

## Understanding Gremlin traversal processing in Neptune
<a name="gremlin-traversal-processing"></a>

When a Gremlin traversal is sent to Neptune, there are three main processes that transform the traversal into an underlying execution plan for the engine to execute. These are parsing, conversion, and optimization:

![\[3 processes transform a Gremlin query into an execution plan.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_traversal_processing.png)


### The traversal parsing process
<a name="gremlin-traversal-processing-parsing"></a>

The first step in processing a traversal is to parse it into a common language. In Neptune, that common language is the set of TinkerPop steps that are part of the [TinkerPop API](http://tinkerpop.apache.org/javadocs/3.4.8/full/org/apache/tinkerpop/gremlin/process/traversal/Step.html). Each of these steps represents a unit of computation within the traversal.

You can send a Gremlin traversal to Neptune either as a string or as bytecode. The REST endpoint and the Java client driver `submit()` method send traversals as strings, as in this example:

```
client.submit("g.V()")
```

Applications and language drivers using [Gremlin language variants (GLV)](https://tinkerpop.apache.org/docs/current/tutorials/gremlin-language-variants/) send traversals in bytecode.

### The traversal conversion process
<a name="gremlin-traversal-processing-conversion"></a>

The second step in processing a traversal is to convert its TinkerPop steps into a set of converted and non-converted Neptune steps. Most steps in the Apache TinkerPop Gremlin query language are converted to Neptune-specific steps that are optimized to run on the underlying Neptune engine. When a TinkerPop step without a Neptune equivalent is encountered in a traversal, that step and all subsequent steps in the traversal are processed by the TinkerPop query engine.

For more information about what steps can be converted under what circumstances, see [Gremlin step support](gremlin-step-support.md).

### The traversal optimization process
<a name="gremlin-traversal-processing-optimization"></a>

The final step in traversal processing is to run the series of converted and non-converted steps through the optimizer, to try to determine the best execution plan. The output of this optimization is the execution plan that the Neptune engine processes.

## Using the Neptune Gremlin `explain` API to tune queries
<a name="gremlin-traversal-tuning-explain"></a>

The Neptune explain API is not the same as the Gremlin `explain()` step. It returns the final execution plan that the Neptune engine would process when executing the query. Because it does not perform any processing, it returns the same plan regardless of the parameters used, and its output contains no statistics about actual execution.

Consider the following simple traversal that finds all the airport vertices for Anchorage:

```
g.V().has('code','ANC')
```

There are two ways you can run this traversal through the Neptune `explain` API. The first way is to make a REST call to the explain endpoint, like this:

```
curl -X POST https://your-neptune-endpoint:port/gremlin/explain -d '{"gremlin":"g.V().has('code','ANC')"}'
```

The second way is to use the Neptune workbench's [%%gremlin](notebooks-magics.md#notebooks-cell-magics-gremlin) cell magic with the `explain` parameter. This passes the traversal contained in the cell body to the Neptune `explain` API and then displays the resulting output when you run the cell:

```
%%gremlin explain

g.V().has('code','ANC')
```

The resulting `explain` API output describes Neptune's execution plan for the traversal. As you can see in the image below, the plan includes each of the 3 steps in the processing pipeline:

![\[Explain API output for a simple Gremlin traversal.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_explain_output_1.png)


### Tuning a traversal by looking at steps that are not converted
<a name="gremlin-traversal-tuning-explain-non-converted-steps"></a>

One of the first things to look for in the Neptune `explain` API output is for Gremlin steps that are not converted to Neptune native steps. In a query plan, when a step is encountered that cannot be converted to a Neptune native step, it and all subsequent steps in the plan are processed by the Gremlin server.

In the example above, all steps in the traversal were converted. Let's examine `explain` API output for this traversal:

```
g.V().has('code','ANC').out().choose(hasLabel('airport'), values('code'), constant('Not an airport'))
```

As you can see in the image below, Neptune could not convert the `choose()` step:

![\[Explain API output in which not all steps can be converted.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_explain_output_2.png)


There are several things you could do to tune the performance of the traversal. The first would be to rewrite it in such a way as to eliminate the step that could not be converted. Another would be to move the step to the end of the traversal so that all other steps can be converted to native ones.

A query plan with steps that are not converted does not always need to be tuned. If the steps that cannot be converted are at the end of the traversal, and are related to how output is formatted rather than how the graph is traversed, they may have little effect on performance.

### 
<a name="gremlin-traversal-tuning-explain-unindexed-lookups"></a>

Another thing to look for when examining output from the Neptune `explain` API is steps that do not use indexes. The following traversal finds all airports with flights that land in Anchorage:

```
g.V().has('code','ANC').in().values('code')
```

Output from the explain API for this traversal is:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().has('code','ANC').in().values('code')

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([code.eq(ANC)]), VertexStep(IN,vertex), PropertiesStep([code],value)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, <code>, "ANC", ?) . project ask .]
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
            PatternNode[(?3, ?7, ?8, <~>) . project ?3,?8 . ContainsFilter(?7 in (<code>)) .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "ANC", ?) . project ?1 .], {estimatedCardinality=1}
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=INFINITY}
            PatternNode[(?3, ?7=<code>, ?8, <~>) . project ?3,?8 .], {estimatedCardinality=7564}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 26

WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance
```

The `WARNING` message at the bottom of the output occurs because the `in()` step in the traversal cannot be handled using one of the 3 indexes that Neptune maintains (see [How Statements Are Indexed in Neptune](feature-overview-storage-indexing.md) and [Gremlin statements in Neptune](gremlin-explain-background-statements.md)). Because the `in()` step contains no edge filter, it cannot be resolved using the `SPOG`, `POGS` or `GPSO` index. Instead, Neptune must perform a union scan to find the requested vertices, which is much less efficient.

There are two ways to tune the traversal in this situation. The first is to add one or more filtering criteria to the `in()` step so that an indexed lookup can be used to resolve the query. For the example above, this might be:

```
g.V().has('code','ANC').in('route').values('code')
```

Output from the Neptune `explain` API for the revised traversal no longer contains the `WARNING` message:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().has('code','ANC').in('route').values('code')

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([code.eq(ANC)]), VertexStep(IN,[route],vertex), PropertiesStep([code],value)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, <code>, "ANC", ?) . project ask .]
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . ContainsFilter(?5 in (<route>)) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
            PatternNode[(?3, ?7, ?8, <~>) . project ?3,?8 . ContainsFilter(?7 in (<code>)) .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "ANC", ?) . project ?1 .], {estimatedCardinality=1}
            PatternNode[(?3, ?5=<route>, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=32042}
            PatternNode[(?3, ?7=<code>, ?8, <~>) . project ?3,?8 .], {estimatedCardinality=7564}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 26
```

Another option if you are running many traversals of this kind is to run them in a Neptune DB cluster that has the optional `OSGP` index enabled (see [Enabling an OSGP Index](feature-overview-storage-indexing.md#feature-overview-storage-indexing-osgp)). Enabling an `OSGP` index has drawbacks:
+ It must be enabled in a DB cluster before any data is loaded.
+ Insertion rates for vertices and edges may slow by up to 23%.
+ Storage usage will increase by around 20%.
+ Read queries that scatter requests across all indexes may have increased latencies.

Having an `OSGP` index makes a lot of sense for a restricted set of query patterns, but unless you are running those frequently, it is usually preferable to try to ensure that the traversals you write can be resolved using the three primary indexes.

### Using a large number of predicates
<a name="gremlin-traversal-tuning-explain-many-predicates"></a>

Neptune treats each edge label and each distinct vertex or edge property name in your graph as a predicate, and is designed by default to work with a relatively low number of distinct predicates. When you have more than a few thousand predicates in your graph data, performance can degrade.

Neptune `explain` output will warn you if this is the case:

```
Predicates
==========
# of predicates: 9549
WARNING: high predicate count (# of distinct property names and edge labels)
```

If it is not convenient to rework your data model to reduce the number of labels and properties, and therefore the number of predicates, the best way to tune traversals is to run them in a DB cluster that has the `OSGP` index enabled, as discussed above.

## Using the Neptune Gremlin `profile` API to tune traversals
<a name="gremlin-traversal-tuning-profile"></a>

The Neptune `profile` API is quite different from the Gremlin `profile()` step. Like the `explain` API, its output includes the query plan that the Neptune engine uses when executing the traversal. In addition, the `profile` output includes actual execution statistics for the traversal, given how its parameters are set.

Again, take the simple traversal that finds all airport vertices for Anchorage:

```
g.V().has('code','ANC')
```

As with the `explain` API, you can invoke the `profile` API using a REST call:

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile -d '{"gremlin":"g.V().has('code','ANC')"}'
```

You use also the Neptune workbench's [%%gremlin](notebooks-magics.md#notebooks-cell-magics-gremlin) cell magic with the `profile` parameter. This passes the traversal contained in the cell body to the Neptune `profile` API and then displays the resulting output when you run the cell:

```
%%gremlin profile

g.V().has('code','ANC')
```

The resulting `profile` API output contains both Neptune's execution plan for the traversal and statistics about the plan's execution, as you can see in this image:

![\[An example of Neptune profile API output.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_profile_output_1.png)


In `profile` output, the execution plan section only contains the final execution plan for the traversal, not the intermediate steps. The pipeline section contains the physical pipeline operations that were performed as well as the actual time (in milliseconds) that traversal execution took. The runtime metric is extremely helpful in comparing the times that two different versions of a traversal take as you are optimizing them.

**Note**  
The initial runtime of a traversal is generally longer than subsequent runtimes, because the first one causes the relevant data to be cached.

The third section of the `profile` output contains execution statistics and the results of the traversal. To see how this information can be useful in tuning a traversal, consider the following traversal, which finds every airport whose name begins with "Anchora", and all the airports reachable in two hops from those airports, returning airport codes, flight routes, and distances:

```
%%gremlin profile

g.withSideEffect("Neptune#fts.endpoint", "{your-OpenSearch-endpoint-URL").
    V().has("city", "Neptune#fts Anchora~").
    repeat(outE('route').inV().simplePath()).times(2).
    project('Destination', 'Route').
        by('code').
        by(path().by('code').by('dist'))
```

### Traversal metrics in Neptune `profile` API output
<a name="gremlin-traversal-tuning-profile-traversal-metrics"></a>

The first set of metrics that is available in all `profile` output is the traversal metrics. These are similar to the Gremlin `profile()` step metrics, with a few differences:

```
Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(Vertex)                                       3856        3856          91.701     9.09
NeptuneTraverserConverterStep                                       3856        3856          38.787     3.84
ProjectStep([Destination, Route],[value(code), ...                  3856        3856         878.786    87.07
  PathStep([value(code), value(dist)])                              3856        3856         601.359
                                            >TOTAL                     -           -        1009.274        -
```

The first column of the traversal-metrics table lists the steps executed by the traversal. The first two steps are generally the Neptune-specific steps, `NeptuneGraphQueryStep` and `NeptuneTraverserConverterStep`.

`NeptuneGraphQueryStep` represents the execution time for the entire portion of the traversal that could be converted and executed natively by the Neptune engine.

`NeptuneTraverserConverterStep` represents the process of converting the output of those converted steps into TinkerPop traversers which allow steps that could not be converted steps, if any, to be processed, or to return the results in a TinkerPop-compatible format.

In the example above, we have several non-converted steps, so we see that each of these TinkerPop steps (`ProjectStep`, `PathStep`) then appears as a row in the table.

The second column in the table, `Count`, reports the number of *represented* traversers that passed through the step, while the third column, `Traversers`, reports the number of traversers which passed through that step, as explained in the [TinkerPop profile step documentation](https://tinkerpop.apache.org/docs/current/reference/#profile-step).

In our example there are 3,856 vertices and 3,856 traversers returned by the `NeptuneGraphQueryStep`, and these numbers remain the same throughout the remaining processing because `ProjectStep` and `PathStep` are formatting the results, not filtering them.

**Note**  
Unlike TinkerPop, the Neptune engine does not optimize performance by *bulking* in its `NeptuneGraphQueryStep` and `NeptuneTraverserConverterStep` steps. Bulking is the TinkerPop operation that combines traversers on the same vertex to reduce operational overhead, and that is what causes the `Count` and `Traversers` numbers to differ. Because bulking only occurs in steps that Neptune delegates to TinkerPop, and not in steps that Neptune handles natively, the `Count` and `Traverser` columns seldom differ.

The Time column reports the number of milliseconds that the step took, and the the `% Dur` column reports what percent of the total processing time the step took. These are the metrics that tell you where to focus your tuning efforts by showing the steps that took the most time.

### Index operation metrics in Neptune `profile` API output
<a name="gremlin-traversal-tuning-profile-index-operations"></a>

Another set of metrics in the output of the Neptune profile API is the index operations:

```
Index Operations
================
Query execution:
    # of statement index ops: 23191
    # of unique statement index ops: 5960
    Duplication ratio: 3.89
    # of terms materialized: 0
```

These report:
+ The total number of index lookups.
+ The number of unique index lookups performed.
+ The ratio of total index lookups to unique ones. A lower ratio indicates less redundancy.
+ The number of terms materialized from the term dictionary.

### Repeat metrics in Neptune `profile` API output
<a name="gremlin-traversal-tuning-profile-repeat-metrics"></a>

If your traversal uses a `repeat()` step as in the example above, then a section containing repeat metrics appears in the `profile` output:

```
Repeat Metrics
==============
Iteration  Visited   Output    Until     Emit     Next
------------------------------------------------------
        0        2        0        0        0        2
        1       53        0        0        0       53
        2     3856     3856     3856        0        0
------------------------------------------------------
              3911     3856     3856        0       55
```

These report:
+ The loop count for a row (the `Iteration` column).
+ The number of elements visited by the loop (the `Visited` column).
+ The number of elements output by the loop (the `Output` column).
+ The last element output by the loop (the `Until` column).
+ The number of elements emitted by the loop (the `Emit` column).
+ The number of elements passed from the loop to the subsequent loop (the `Next` column).

These repeat metrics are very helpful in understanding the branching factor of your traversal, to get a feeling for how much work is being done by the database. You can use these numbers to diagnose performance problems, especially when the same traversal performs dramatically differently with different parameters.

### Full-text search metrics in Neptune `profile` API output
<a name="gremlin-traversal-tuning-profile-fts-metrics"></a>

When a traversal uses a [full-text search](full-text-search.md) lookup, as in the example above, then a section containing the full-text search (FTS) metrics appears in the `profile` output:

```
FTS Metrics
==============
SearchNode[(idVar=?1, query=Anchora~, field=city) . project ?1 .],
    {endpoint=your-OpenSearch-endpoint-URL, incomingSolutionsThreshold=1000, estimatedCardinality=INFINITY,
    remoteCallTimeSummary=[total=65, avg=32.500000, max=37, min=28],
    remoteCallTime=65, remoteCalls=2, joinTime=0, indexTime=0, remoteResults=2}

    2 result(s) produced from SearchNode above
```

This shows the query sent to the ElasticSearch (ES) cluster and reports several metrics about the interaction with ElasticSearch that can help you pinpoint performance problems relating to full-text search:
+ Summary information about the calls into the ElasticSearch index:
  + The total number of milliseconds required by all remoteCalls to satisfy the query (`total`).
  + The average number of milliseconds spent in a remoteCall (`avg`).
  + The minimum number of milliseconds spent in a remoteCall (`min`).
  + The maximum number of milliseconds spent in a remoteCall (`max`).
+ Total time consumed by remoteCalls to ElasticSearch (`remoteCallTime`).
+ The number of remoteCalls made to ElasticSearch (`remoteCalls`).
+ The number of milliseconds spent in joins of ElasticSearch results (`joinTime`).
+ The number of milliseconds spent in index lookups (`indexTime`).
+ The total number of results returned by ElasticSearch (`remoteResults`).

# Native Gremlin step support in Amazon Neptune
<a name="gremlin-step-support"></a>

The Amazon Neptune engine does not currently have full native support for all Gremlin steps, as explained in [Tuning Gremlin queries](gremlin-traversal-tuning.md). Current support falls into four categories:
+ [Gremlin steps that can always be converted to native Neptune engine operations](#gremlin-steps-always)
+ [Gremlin steps that can be converted to native Neptune engine operations in some cases](#gremlin-steps-sometimes) 
+ [Gremlin steps that are never converted to native Neptune engine operations](#gremlin-steps-never) 
+ [Gremlin steps that are not supported in Neptune at all](#neptune-gremlin-steps-unsupported) 

## Gremlin steps that can always be converted to native Neptune engine operations
<a name="gremlin-steps-always"></a>

Many Gremlin steps can be converted to native Neptune engine operations as long as they meet the following conditions:
+ They are not preceded in the query by a step that cannot be converted.
+ Their parent step, if any, can be converted,
+ All their child traversals, if any, can be converted.

The following Gremlin steps are always converted to native Neptune engine operations if they meet those conditions:
+ [and( )](http://tinkerpop.apache.org/docs/current/reference/#and-step)
+ [as( )](http://tinkerpop.apache.org/docs/current/reference/#as-step)
+ [count( )](http://tinkerpop.apache.org/docs/current/reference/#count-step)
+ [E( )](http://tinkerpop.apache.org/docs/current/reference/#graph-step)
+ [emit( )](http://tinkerpop.apache.org/docs/current/reference/#emit-step)
+ [explain( )](http://tinkerpop.apache.org/docs/current/reference/#explain-step)
+ [group( )](http://tinkerpop.apache.org/docs/current/reference/#group-step)
+ [groupCount( )](http://tinkerpop.apache.org/docs/current/reference/#groupcount-step)
+ [identity( )](http://tinkerpop.apache.org/docs/current/reference/#identity-step)
+ [is( )](http://tinkerpop.apache.org/docs/current/reference/#is-step)
+ [key( )](http://tinkerpop.apache.org/docs/current/reference/#key-step)
+ [label( )](http://tinkerpop.apache.org/docs/current/reference/#label-step)
+ [limit( )](http://tinkerpop.apache.org/docs/current/reference/#limit-step)
+ [local( )](http://tinkerpop.apache.org/docs/current/reference/#local-step)
+ [loops( )](http://tinkerpop.apache.org/docs/current/reference/#loops-step)
+ [not( )](http://tinkerpop.apache.org/docs/current/reference/#not-step)
+ [or( )](http://tinkerpop.apache.org/docs/current/reference/#or-step)
+ [profile( )](http://tinkerpop.apache.org/docs/current/reference/#profile-step)
+ [properties( )](http://tinkerpop.apache.org/docs/current/reference/#properties-step)
+ [subgraph( )](http://tinkerpop.apache.org/docs/current/reference/#subgraph-step)
+ [until( )](http://tinkerpop.apache.org/docs/current/reference/#until-step)
+ [V( )](http://tinkerpop.apache.org/docs/current/reference/#graph-step)
+ [value( )](http://tinkerpop.apache.org/docs/current/reference/#value-step)
+ [valueMap( )](http://tinkerpop.apache.org/docs/current/reference/#valuemap-step)
+ [values( )](http://tinkerpop.apache.org/docs/current/reference/#values-step)

## Gremlin steps that can be converted to native Neptune engine operations in some cases
<a name="gremlin-steps-sometimes"></a>

Some Gremlin steps can be converted to native Neptune engine operations in some situations but not in others:
+ [addE( )](http://tinkerpop.apache.org/docs/current/reference/#addedge-step)   –   The `addE()` step can generally be converted to a native Neptune engine operation, unless it is immediately followed by a `property()` step containing a traversal as a key.
+ [addV( )](http://tinkerpop.apache.org/docs/current/reference/#addvertex-step)   –   The `addV()` step can generally be converted to a native Neptune engine operation, unless it is immediately followed by a `property()` step containing a traversal as a key, or unless multiple labels are assigned.
+ [aggregate( )](http://tinkerpop.apache.org/docs/current/reference/#store-step)   –   The `aggregate()` step can generally be converted to a native Neptune engine operation, unless the step is used in a child traversal or sub-traversal, or unless the value being stored is something other than a vertex, edge, id, label or property value.

  In example below, `aggregate()` is not converted because it is being used in a child traversal:

  ```
  g.V().has('code','ANC').as('a')
       .project('flights').by(select('a')
       .outE().aggregate('x'))
  ```

  In this example, aggregate() is not converted because what is stored is the `min()` of a value:

  ```
  g.V().has('code','ANC').outE().aggregate('x').by(values('dist').min())
  ```
+ [barrier( )](http://tinkerpop.apache.org/docs/current/reference/#barrier-step)   –   The `barrier()` step can generally be converted to a native Neptune engine operation, unless the step following it is not converted.
+ [cap( )](http://tinkerpop.apache.org/docs/current/reference/#cap-step)   –   The only case in which the `cap()` step is converted is when it is combined with the `unfold()` step to return an unfolded version of an aggregate of vertex, edge, id, or poperty values. In this example, `cap()` will be converted because it is followed by `.unfold()`:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport').unfold()
  ```

  However, if you remove the `.unfold()`, `cap()` will not be converted:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport')
  ```
+ [coalesce( )](http://tinkerpop.apache.org/docs/current/reference/#coalesce-step)   –   The only case where the `coalesce()` step is converted is when it follows the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/). Other coalesce() patterns are not allowed. Conversion is limited to the case where all child traversals can be converted, they all produce the same type as output (vertex, edge, id, value, key, or label), they all traverse to a new element, and they do not contain the `repeat()` step.
+ [constant( )](http://tinkerpop.apache.org/docs/current/reference/#constant-step)   –   The constant() step is currently only converted if it is used within a `sack().by()` part of a traversal to assign a constant value, like this:

  ```
  g.V().has('code','ANC').sack(assign).by(constant(10)).out().limit(2)
  ```
+ [cyclicPath( )](http://tinkerpop.apache.org/docs/current/reference/#cyclicpath-step)   –   The `cyclicPath()` step can generally be converted to a native Neptune engine operation, unless the step is used with `by()`, `from()`, or `to()` modulators. In the following queries, for example, `cyclicPath()` is not converted:

  ```
  g.V().has('code','ANC').as('a').out().out().cyclicPath().by('code')
  g.V().has('code','ANC').as('a').out().out().cyclicPath().from('a')
  g.V().has('code','ANC').as('a').out().out().cyclicPath().to('a')
  ```
+ [drop( )](http://tinkerpop.apache.org/docs/current/reference/#drop-step)   –   The `drop()` step can generally be converted to a native Neptune engine operation, unless the step is used inside a `sideEffect(`) or `optional()` step.
+ [fold( )](http://tinkerpop.apache.org/docs/current/reference/#fold-step)   –   There are only two situations where the fold() step can be converted, namely when it is used in the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/), and when it is used in a `group().by()` context like this:

  ```
  g.V().has('code','ANC').out().group().by().by(values('code', 'city').fold())
  ```
+  [has( )](http://tinkerpop.apache.org/docs/current/reference/#has-step)   –   The `has()` step can generally be converted to a native Neptune engine operation provided queries with `T` use the predicate `P.eq`, `P.neq` or `P.contains`. Expect variations of `has()` that imply those instances of `P` to convert to native as well, such as `hasId('id1234')` which is equivalent to `has(eq, T.id, 'id1234')`. 
+ [id( )](http://tinkerpop.apache.org/docs/current/reference/#id-step)   –   The `id()` step is converted unless it is used on a property, like this:

  ```
  g.V().has('code','ANC').properties('code').id()
  ```
+  [mergeE()](https://tinkerpop.apache.org/docs/current/reference/#mergeedge-step)   –   The `mergeE()` step can be converted to a native Neptune engine operation if the parameters (the merge condition, the `onCreate` and `onMatch`) are constant (either `null`, a constant `Map`, or `select()` of a `Map`). All examples in [ upserting edges ](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-efficient-upserts.html#gremlin-upserts-edges) can be converted. 
+  [mergeV()](https://tinkerpop.apache.org/docs/current/reference/#mergevertex-step)   –   The mergeV() step can be converted to a native Neptune engine operation if the parameters (the merge condition, the `onCreate` and `onMatch`) are constant (either `null`, a constant `Map`, or `select()` of a `Map`). All examples in [ upserting vertices ](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-efficient-upserts.html#gremlin-upserts-vertices) can be converted. 
+ [order( )](http://tinkerpop.apache.org/docs/current/reference/#order-step)   –   The `order()` step can generally be converted to a native Neptune engine operation, unless one of the following is true:
  + The `order()` step is within a nested child traversal, like this:

    ```
    g.V().has('code','ANC').where(V().out().order().by(id))
    ```
  + Local ordering is being used, as for example with `order(local)`.
  + A custom comparator is being used in the `by()` modulation to order by. An example is this use of `sack()`:

    ```
    g.withSack(0).
      V().has('code','ANC').
          repeat(outE().sack(sum).by('dist').inV()).times(2).limit(10).
          order().by(sack())
    ```
  + There are multiple orderings on the same element.
+ [project( )](http://tinkerpop.apache.org/docs/current/reference/#project-step)   –   The `project()` step can generally be converted to a native Neptune engine operation, unless the number of `by()` statements following the `project()` does not match the number of labels specified, as here:

  ```
  g.V().has('code','ANC').project('x', 'y').by(id)
  ```
+ [range( )](http://tinkerpop.apache.org/docs/current/reference/#range-step)   –   The `range()` step is only converted when the lower end of the range in question is zero (for example, `range(0,3)`).
+ [repeat( )](http://tinkerpop.apache.org/docs/current/reference/#repeat-step)   –   The `repeat()` step can generally be converted to a native Neptune engine operation, unless it is nested within another `repeat()` step, like this:

  ```
  g.V().has('code','ANC').repeat(out().repeat(out()).times(2)).times(2)
  ```
+ [sack( )](http://tinkerpop.apache.org/docs/current/reference/#sack-step)   –   The `sack()` step can generally be converted to a native Neptune engine operation, except in the following cases:
  + If a non-numeric sack operator is being used.
  + If a numeric sack operator other than `+`, `-`, `mult`, `div`, `min` and `max` is being used.
  + If `sack()` is used inside a `where()` step to filter based on a sack value, as here:

    ```
    g.V().has('code','ANC').sack(assign).by(values('code')).where(sack().is('ANC'))
    ```
+ [sum( )](http://tinkerpop.apache.org/docs/current/reference/#sum-step)   –   The `sum()` step can generally be converted to a native Neptune engine operation, but not when used to calculate a global summation, like this:

  ```
  g.V().has('code','ANC').outE('routes').values('dist').sum()
  ```
+ [union( )](http://tinkerpop.apache.org/docs/current/reference/#union-step)   –   The `union()` step can be converted to a native Neptune engine operation as long as it is the last step in the query aside from the terminal step.
+ [unfold( )](http://tinkerpop.apache.org/docs/current/reference/#unfold-step)   –   The `unfold()` step can only be converted to a native Neptune engine operation when it is used in the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/), and when it is used together with `cap()` like this:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport').unfold()
  ```
+ [where( )](http://tinkerpop.apache.org/docs/current/reference/#where-step)   –   The `where()` step can generally be converted to a native Neptune engine operation, except in the following cases:
  + When by() modulations are used, like this:

    ```
    g.V().hasLabel('airport').as('a')
         .where(gt('a')).by('runways')
    ```
  + When comparison operators other than `eq`, `neq`, `within`, and `without` are used.
  + When user-supplied aggregations are used.

## Gremlin steps that are never converted to native Neptune engine operations
<a name="gremlin-steps-never"></a>

The following Gremlin steps are supported in Neptune but are never converted to native Neptune engine operations. Instead, they are executed by the Gremlin server.
+ [choose( )](http://tinkerpop.apache.org/docs/current/reference/#choose-step)
+ [coin( )](http://tinkerpop.apache.org/docs/current/reference/#coin-step)
+ [inject( )](http://tinkerpop.apache.org/docs/current/reference/#inject-step)
+ [match( )](http://tinkerpop.apache.org/docs/current/reference/#match-step)
+ [math( )](http://tinkerpop.apache.org/docs/current/reference/#math-step)
+ [max( )](http://tinkerpop.apache.org/docs/current/reference/#max-step)
+ [mean( )](http://tinkerpop.apache.org/docs/current/reference/#mean-step)
+ [min( )](http://tinkerpop.apache.org/docs/current/reference/#min-step)
+ [option( )](http://tinkerpop.apache.org/docs/current/reference/#option-step)
+ [optional( )](http://tinkerpop.apache.org/docs/current/reference/#optional-step)
+ [path( )](http://tinkerpop.apache.org/docs/current/reference/#path-step)
+ [propertyMap( )](http://tinkerpop.apache.org/docs/current/reference/#propertymap-step)
+ [sample( )](http://tinkerpop.apache.org/docs/current/reference/#sample-step)
+ [skip( )](http://tinkerpop.apache.org/docs/current/reference/#skip-step)
+ [tail( )](http://tinkerpop.apache.org/docs/current/reference/#tail-step)
+ [timeLimit( )](http://tinkerpop.apache.org/docs/current/reference/#timelimit-step)
+ [tree( )](http://tinkerpop.apache.org/docs/current/reference/#tree-step)

## Gremlin steps that are not supported in Neptune at all
<a name="neptune-gremlin-steps-unsupported"></a>

The following Gremlin steps are not supported at all in Neptune. In most cases this is because they require a `GraphComputer`, which Neptune does not currently support.
+ [connectedComponent( )](http://tinkerpop.apache.org/docs/current/reference/#connectedcomponent-step)
+ [io( )](http://tinkerpop.apache.org/docs/current/reference/#io-step)
+ [shortestPath( )](http://tinkerpop.apache.org/docs/current/reference/#shortestpath-step)
+ [withComputer( )](http://tinkerpop.apache.org/docs/current/reference/#with-step)
+ [pageRank( )](http://tinkerpop.apache.org/docs/current/reference/#pagerank-step)
+ [peerPressure( )](http://tinkerpop.apache.org/docs/current/reference/#peerpressure-step)
+ [program( )](http://tinkerpop.apache.org/docs/current/reference/#program-step)

The `io()` step is actually partially supported, in that it can be used to `read()` from a URL but not to `write()`.

# Using Gremlin with the Neptune DFE query engine
<a name="gremlin-with-dfe"></a>

If you enable Neptune's [alternative query engine](neptune-dfe-engine.md) known as the DFE (by setting the [neptune\$1dfe\$1query\$1engine](parameters.md#parameters-instance-parameters-neptune_dfe_query_engine) instance parameter to `enabled`), then Neptune translates read-only Gremlin queries/traversals into an intermediate logical representation and runs them on the DFE engine whenever possible.

However, the DFE does not yet support all Gremlin steps. When a step can't be run natively on the DFE, Neptune falls back on TinkerPop to run the step. The `explain` and `profile` reports include warnings when this happens.

# Gremlin step coverage in DFE
<a name="gremlin-step-coverage-in-DFE"></a>

 Gremlin DFE is an experimental feature and can be used by either enabling the instance parameter or using the `Neptune#useDFE` query hint. For more information please refer to [ Using Gremlin with the Neptune DFE query engine](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-with-dfe.html). 

 The following steps are available to use in Gremlin DFE. 

## Path and traversal steps:
<a name="DFE-path-and-traversal"></a>

 [asDate()](https://tinkerpop.apache.org/docs/current/reference/#asDate-step), [barrier()](https://tinkerpop.apache.org/docs/current/reference/#barrier-step), [call()](https://tinkerpop.apache.org/docs/current/reference/#call-step), [cap()](https://tinkerpop.apache.org/docs/current/reference/#cap-step), [dateAdd()](https://tinkerpop.apache.org/docs/current/reference/#dateadd-step), [dateDiff()](https://tinkerpop.apache.org/docs/current/reference/#datediff-step), [disjunct()](https://tinkerpop.apache.org/docs/current/reference/#disjunct-step), [drop()](https://tinkerpop.apache.org/docs/current/reference/#drop-step), [fail()](https://tinkerpop.apache.org/docs/current/reference/#fail-step), [filter()](https://tinkerpop.apache.org/docs/current/reference/#filter-step), [flatMap()](https://tinkerpop.apache.org/docs/current/reference/#flatmap-step), [id()](https://tinkerpop.apache.org/docs/current/reference/#id-step), [identity()](https://tinkerpop.apache.org/docs/current/reference/#identity-step), [index()](https://tinkerpop.apache.org/docs/current/reference/#index-step), [intersect()](https://tinkerpop.apache.org/docs/current/reference/#intersect-step), [inject()](https://tinkerpop.apache.org/docs/current/reference/#inject-step), [label()](https://tinkerpop.apache.org/docs/current/reference/#label-step), [length()](https://tinkerpop.apache.org/docs/current/reference/#length-step), [loops()](https://tinkerpop.apache.org/docs/current/reference/#loops-step), [map()](https://tinkerpop.apache.org/docs/current/reference/#map-step), [order()](https://tinkerpop.apache.org/docs/current/reference/#order-step), [order(local)](https://tinkerpop.apache.org/docs/current/reference/#order-step), [path()](https://tinkerpop.apache.org/docs/current/reference/#path-step), [project()](https://tinkerpop.apache.org/docs/current/reference/#project-step), [range()](https://tinkerpop.apache.org/docs/current/reference/#range-step), [repeat()](https://tinkerpop.apache.org/docs/current/reference/#repeat-step), [reverse()](https://tinkerpop.apache.org/docs/current/reference/#reverse-step), [sack()](https://tinkerpop.apache.org/docs/current/reference/#sack-step), [sample()](https://tinkerpop.apache.org/docs/current/reference/#sample-step), [select()](https://tinkerpop.apache.org/docs/current/reference/#select-step), [sideEffect()](https://tinkerpop.apache.org/docs/current/reference/#sideeffect-step), [split()](https://tinkerpop.apache.org/docs/current/reference/#split-step), [unfold()](https://tinkerpop.apache.org/docs/current/reference/#unfold-step), [union()](https://tinkerpop.apache.org/docs/current/reference/#union-step) 

## Aggregate and collection steps:
<a name="DFE-aggregate-and-collection"></a>

 [aggregate(global)](https://tinkerpop.apache.org/docs/current/reference/#aggregate-step), [combine()](https://tinkerpop.apache.org/docs/current/reference/#combine-step), [count()](https://tinkerpop.apache.org/docs/current/reference/#count-step), [dedup()](https://tinkerpop.apache.org/docs/current/reference/#dedup-step), [dedup(local)](https://tinkerpop.apache.org/docs/current/reference/#dedup-step), [fold()](https://tinkerpop.apache.org/docs/current/reference/#fold-step), [group()](https://tinkerpop.apache.org/docs/current/reference/#group-step), [groupCount()](https://tinkerpop.apache.org/docs/current/reference/#groupcount-step), 

## Mathematical steps:
<a name="DFE-mathematical"></a>

 [max()](https://tinkerpop.apache.org/docs/current/reference/#max-step), [mean()](https://tinkerpop.apache.org/docs/current/reference/#mean-step), [min()](https://tinkerpop.apache.org/docs/current/reference/#min-step), [sum()](https://tinkerpop.apache.org/docs/current/reference/#sum-step) 

## Element steps:
<a name="DFE-element"></a>

 [otherV()](https://tinkerpop.apache.org/docs/current/reference/#otherv-step), [elementMap()](https://tinkerpop.apache.org/docs/current/reference/#elementmap-step), [element()](https://tinkerpop.apache.org/docs/current/reference/#element-step), [v()](https://tinkerpop.apache.org/docs/current/reference/#graph-step), [ out(), in(), both(), outE(), inE(), bothE(), outV(), inV(), bothV(), otherV()](https://tinkerpop.apache.org/docs/current/reference/#vertex-step) 

## Property steps:
<a name="DFE-property"></a>

 [properties()](https://tinkerpop.apache.org/docs/current/reference/#properties-step), [key()](https://tinkerpop.apache.org/docs/current/reference/#key-step), [valueMap()](https://tinkerpop.apache.org/docs/current/reference/#propertymap-step), [value()](https://tinkerpop.apache.org/docs/current/reference/#value-step) 

## Filter steps:
<a name="DFE-filter"></a>

 [and()](https://tinkerpop.apache.org/docs/current/reference/#and-step), [coalesce()](https://tinkerpop.apache.org/docs/current/reference/#coalesce-step), [coin()](https://tinkerpop.apache.org/docs/current/reference/#coin-step), [has()](https://tinkerpop.apache.org/docs/current/reference/#has-step), [is()](https://tinkerpop.apache.org/docs/current/reference/#is-step), [local()](https://tinkerpop.apache.org/docs/current/reference/#local-step), [none()](https://tinkerpop.apache.org/docs/current/reference/#none-step), [not()](https://tinkerpop.apache.org/docs/current/reference/#not-step), [or()](https://tinkerpop.apache.org/docs/current/reference/#or-step), [where()](https://tinkerpop.apache.org/docs/current/reference/#where-step) 

## String manipulation steps:
<a name="DFE-string-manipulation"></a>

 [concat()](https://tinkerpop.apache.org/docs/current/reference/#concat-step), [lTrim()](https://tinkerpop.apache.org/docs/current/reference/#lTrim-step), [rTrim()](https://tinkerpop.apache.org/docs/current/reference/#rtrim-step), [substring()](https://tinkerpop.apache.org/docs/current/reference/#substring-step), [toLower()](https://tinkerpop.apache.org/docs/current/reference/#toLower-step), [toUpper()](https://tinkerpop.apache.org/docs/current/reference/#toUpper-step), [trim()](https://tinkerpop.apache.org/docs/current/reference/#trim-step) 

## Predicates:
<a name="DFE-predicates"></a>
+  [ Compare: eq, neq, lt, lte, gt, gte](https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates) 
+  [Contains: within, without](https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates) 
+  [ TextP: endingWith, containing, notStartingWith, notEndingWith, notContaining](https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates) 
+  [ P: and, or, between, outside, inside](https://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates) 

## Limitations
<a name="gremlin-with-dfe-limitations"></a>

 Repeat with Limit, Labels inside repeat traversal and dedup are not supported in DFE yet. 

```
// With Limit inside the repeat traversal
  g.V().has('code','AGR').repeat(out().limit(5)).until(has('code','FRA'))
  
  // With Labels inside the repeat traversal
  g.V().has('code','AGR').repeat(out().as('a')).until(has('code','FRA'))
  
  // With Dedup inside the repeat traversal
  g.V().has('code','AGR').repeat(out().dedup()).until(has('code','FRA'))
```

 Path with nested repeats, or branching steps are not supported yet. 

```
// Path with branching steps
  g.V().has('code','AGR').union(identity, outE().inV()).path().by('code')
  
  
  // With nested repeat
  g.V().has('code','AGR').repeat(out().union(identity(), out())).path().by('code')
```

## Query planning interleaving
<a name="gremlin-with-dfe-interleaving"></a>

When the translation process encounters a Gremlin step that does not have a corresponding native DFE operator, before falling back to using Tinkerpop it tries to find other intermediate query parts that can be run natively on the DFE engine. It does this by applying interleaving logic to the top level traversal. The result is that supported steps are used wherever possible.

Any such intermediate, non-prefix query translation is represented using `NeptuneInterleavingStep` in the `explain` and `profile` outputs.

For performance comparison, you might want to turn off interleaving in a query, while still using the DFE engine to run the prefix part. Or, you might want to use only the TinkerPop engine for non-prefix query execution. You can do this by using `disableInterleaving` query hint.

Just as the [useDFE](gremlin-query-hints-useDFE.md) query hint with a value of `false` prevents a query from being run on the DFE at all, the `disableInterleaving` query hint with a value of `true` turns off DFE interleaving for translation of a query. For example:

```
g.with('Neptune#disableInterleaving', true)
 .V().has('genre','drama').in('likes')
```

## Updated Gremlin `explain` and `profile` output
<a name="gremlin-with-dfe-explain-update"></a>

Gremlin [explain](gremlin-explain.md) provides details about the optimized traversal that Neptune uses to run a query. See the [sample DFE `explain` output](gremlin-explain-api.md#gremlin-explain-dfe) for an example of what `explain` output looks like when the DFE engine is enabled.

The [Gremlin `profile` API](gremlin-profile-api.md) runs a specified Gremlin traversal, collects various metrics about the run, and produces a profile report that contains details about the optimized query plan and the runtime statistics of various operators. See [sample DFE `profile` output](gremlin-profile-api.md#gremlin-profile-sample-dfe-output) for an example of what `profile` output looks like when the DFE engine is enabled.

**Note**  
Because DFE support for Gremlin is an experimental feature, the exact format of the `explain` and `profile` output is subject to change.