

# Analyzing Neptune query execution using Gremlin `explain`
Gremlin `explain`

Amazon Neptune has added a Gremlin feature named *explain*. This feature is a self-service tool for understanding the execution approach taken by the Neptune engine. You invoke it by adding an `explain` parameter to an HTTP call that submits a Gremlin query.

The `explain` feature provides information about the logical structure of query execution plans. You can use this information to identify potential evaluation and execution bottlenecks and tune your query, as explained in [Tuning Gremlin queries](gremlin-traversal-tuning.md). You can also use [query hints](gremlin-query-hints.md) to improve query execution plans.

**Topics**
+ [

# Understanding how Gremlin queries work in Neptune
](gremlin-explain-background.md)
+ [

# Using the Gremlin `explain` API in Neptune
](gremlin-explain-api.md)
+ [

# Gremlin `profile` API in Neptune
](gremlin-profile-api.md)
+ [

# Tuning Gremlin queries using `explain` and `profile`
](gremlin-traversal-tuning.md)
+ [

# Native Gremlin step support in Amazon Neptune
](gremlin-step-support.md)

# Understanding how Gremlin queries work in Neptune
Background information

To take full advantage of the Gremlin `explain` and `profile` reports in Amazon Neptune, it is helpful to understand some background information about Gremlin queries.

**Topics**
+ [

# Gremlin statements in Neptune
](gremlin-explain-background-statements.md)
+ [

# How Neptune processes Gremlin queries using statement indexes
](gremlin-explain-background-indexing-examples.md)
+ [

# How Gremlin queries are processed in Neptune
](gremlin-explain-background-querying.md)

# Gremlin statements in Neptune
Statements

Property graph data in Amazon Neptune is composed of four-position (quad) statements. Each of these statements represents an individual atomic unit of property graph data. For more information, see [Neptune Graph Data Model](feature-overview-data-model.md). Similar to the Resource Description Framework (RDF) data model, these four positions are as follows:
+ `subject (S)`
+ `predicate (P)`
+ `object (O)`
+ `graph (G)`

Each statement is an assertion about one or more resources. For example, a statement can assert the existence of a relationship between two resources, or it can attach a property (key-value pair) to some resource.

You can think of the predicate as the verb of the statement, describing the type of relationship or property. The object is the target of the relationship, or the value of the property. The graph position is optional and can be used in many different ways. For the Neptune property graph (PG) data, it is either unused (null graph) or it is used to represent the identifier for an edge. A set of statements with shared resource identifiers creates a graph.

There are three classes of statements in the Neptune property graph data model:

**Topics**
+ [Vertex Label Statements](#gremlin-explain-background-vertex-labels)
+ [Edge Statements](#gremlin-explain-background-edge-statements)
+ [Property Statements](#gremlin-explain-background-property-statements)

## Gremlin Vertex Label Statements
Vertex Label Statements

Vertex label statements in Neptune serve two purposes:
+ They track the labels for a vertex.
+ The presence of at least one of these statements is what implies the existence of a particular vertex in the graph.

The subject of these statements is a vertex identifier, and the object is a label, both of which are specified by the user. You use a special fixed predicate for these statements, displayed as `<~label>`, and a default graph identifier (the null graph), displayed as `<~>`.

For example, consider the following `addV` traversal.

```
g.addV("Person").property(id, "v1")
```

This traversal results in the following statement being added to the graph.

```
StatementEvent[Added(<v1> <~label> <Person> <~>) .]
```

## Gremlin Edge Statements
Edge Statements

A Gremlin edge statement is what implies the existence of an edge between two vertices in a graph in Neptune. The subject (S) of an edge statement is the source `from` vertex. The predicate (P) is a user-supplied edge label. The object (O) is the target `to` vertex. The graph (G) is a user-supplied edge identifier.

For example, consider the following `addE` traversal.

```
g.addE("knows").from(V("v1")).to(V("v2")).property(id, "e1")
```

The traversal results in the following statement being added to the graph.

```
StatementEvent[Added(<v1> <knows> <v2> <e1>) .]
```

## Gremlin Property Statements
Property Statements

A Gremlin property statement in Neptune asserts an individual property value for a vertex or edge. The subject is a user-supplied vertex or edge identifier. The predicate is the property name (key), and the object is the individual property value. The graph (G) is again the default graph identifier, the null graph, displayed as `<~>`.

Consider the following vertex property example.

```
g.V("v1").property("name", "John")
```

This statement results in the following.

```
StatementEvent[Added(<v1> <name> "John" <~>) .]
```

Property statements differ from others in that their object is a primitive value (a `string`, `date`, `byte`, `short`, `int`, `long`, `float`, or `double`). Their object is not a resource identifier that could be used as the subject of another assertion.

For multi-properties, each individual property value in the set receives its own statement.

```
g.V("v1").property(set, "phone", "956-424-2563").property(set, "phone", "956-354-3692 (tel:9563543692)")
```

This results in the following.

```
StatementEvent[Added(<v1> <phone> "956-424-2563" <~>) .]
StatementEvent[Added(<v1> <phone> "956-354-3692" <~>) .]
```

Edge properties are handled similarly to vertex properties, but use the edge identifier in the (S) position. For example, adding a property to an edge:

```
g.E("e1").property("weight", 0.8)
```

This results in the following statement being added to the graph.

```
StatementEvent[Added(<e1> <weight> 0.8 <~>) .]
```

# How Neptune processes Gremlin queries using statement indexes
Statement Indexes

Statements are accessed in Amazon Neptune by way of three statement indexes, as detailed in [How Statements Are Indexed in Neptune](feature-overview-storage-indexing.md). Neptune extracts a statement *pattern* from a Gremlin query in which some positions are known, and the rest are left for discovery by index search.

Neptune assumes that the size of the property graph schema is not large. This means that the number of distinct edge labels and property names is fairly low, resulting in a low total number of distinct predicates. Neptune tracks distinct predicates in a separate index. It uses this cache of predicates to do a union scan of `{ all P x POGS }` rather than use an OSGP index. Avoiding the need for a reverse traversal OSGP index saves both storage space and load throughput.

The Neptune Gremlin Explain/Profile API lets you obtain the predicate count in your graph. You can then determine whether your application invalidates the Neptune assumption that your property graph schema is small.

The following examples help illustrate how Neptune uses indexes to process Gremlin queries.

**Question: What are the labels of vertex `v1`?**

```
  Gremlin code:      g.V('v1').label()
  Pattern:           (<v1>, <~label>, ?, ?)
  Known positions:   SP
  Lookup positions:  OG
  Index:             SPOG
  Key range:         <v1>:<~label>:*
```

**Question: What are the 'knows' out-edges of vertex `v1`?**

```
  Gremlin code:      g.V('v1').out('knows')
  Pattern:           (<v1>, <knows>, ?, ?)
  Known positions:   SP
  Lookup positions:  OG
  Index:             SPOG
  Key range:         <v1>:<knows>:*
```

**Question: Which vertices have a `Person` vertex label?**

```
  Gremlin code:      g.V().hasLabel('Person')
  Pattern:           (?, <~label>, <Person>, <~>)
  Known positions:   POG
  Lookup positions:  S
  Index:             POGS
  Key range:         <~label>:<Person>:<~>:*
```

**Question: What are the from/to vertices of a given edge `e1`?**

```
  Gremlin code:      g.E('e1').bothV()
  Pattern:           (?, ?, ?, <e1>)
  Known positions:   G
  Lookup positions:  SPO
  Index:             GPSO
  Key range:         <e1>:*
```

One statement index that Neptune does **not** have is a reverse traversal OSGP index. This index could be used to gather all incoming edges across all edge labels, as in the following example.

**Question: What are the incoming adjacent vertices `v1`?**

```
  Gremlin code:      g.V('v1').in()
  Pattern:           (?, ?, <v1>, ?)
  Known positions:   O
  Lookup positions:  SPG
  Index:             OSGP  // <-- Index does not exist
```

# How Gremlin queries are processed in Neptune
Query processing

In Amazon Neptune, more complex traversals can be represented by a series of patterns that create a relation based on the definition of named variables that can be shared across patterns to create joins. This is shown in the following example.

**Question: What is the two-hop neighborhood of vertex `v1`?**

```
  Gremlin code:      g.V(‘v1’).out('knows').out('knows').path()
  Pattern:           (?1=<v1>, <knows>, ?2, ?) X Pattern(?2, <knows>, ?3, ?)

  The pattern produces a three-column relation (?1, ?2, ?3) like this:
                     ?1     ?2     ?3
                     ================
                     v1     v2     v3
                     v1     v2     v4
                     v1     v5     v6
```

By sharing the `?2` variable across the two patterns (at the O position in the first pattern and the S position of the second pattern), you create a join from the first hop neighbors to the second hop neighbors. Each Neptune solution has bindings for the three named variables, which can be used to re-create a [TinkerPop Traverser](http://tinkerpop.apache.org/docs/current/reference/#_the_traverser) (including path information).

```
```

The first step in Gremlin query processing is to parse the query into a TinkerPop [Traversal](http://tinkerpop.apache.org/docs/current/reference/#traversal) object, composed of a series of TinkerPop [steps](http://tinkerpop.apache.org/docs/current/reference/#graph-traversal-steps). These steps, which are part of the open-source [Apache TinkerPop project](http://tinkerpop.apache.org/), are both the logical and physical operators that compose a Gremlin traversal in the reference implementation. They are both used to represent the model of the query. They are executable operators that can produce solutions according to the semantics of the operator that they represent. For example, `.V()` is both represented and executed by the TinkerPop [GraphStep](http://tinkerpop.apache.org/docs/current/reference/#graph-step).

Because these off-the-shelf TinkerPop steps are executable, such a TinkerPop Traversal can execute any Gremlin query and produce the correct answer. However, when executed against a large graph, TinkerPop steps can sometimes be very inefficient and slow. Instead of using them, Neptune tries to convert the traversal into a declarative form composed of groups of patterns, as described previously.

Neptune doesn't currently support all Gremlin operators (steps) in its native query engine. So it tries to collapse as many steps as possible down into a single `NeptuneGraphQueryStep`, which contains the declarative logical query plan for all the steps that have been converted. Ideally, all steps are converted. But when a step is encountered that can't be converted, Neptune breaks out of native execution and defers all query execution from that point forward to the TinkerPop steps. It doesn't try to weave in and out of native execution.

After the steps are translated into a logical query plan, Neptune runs a series of query optimizers that rewrite the query plan based on static analysis and estimated cardinalities. These optimizers do things like reorder operators based on range counts, prune unnecessary or redundant operators, rearrange filters, push operators into different groups, and so on.

After an optimized query plan is produced, Neptune creates a pipeline of physical operators that do the work of executing the query. This includes reading data from the statement indices, performing joins of various types, filtering, ordering, and so on. The pipeline produces a solution stream that is then converted back into a stream of TinkerPop Traverser objects.

## Serialization of query results
Serialization

Amazon Neptune currently relies on the TinkerPop response message serializers to convert query results (TinkerPop Traversers) into the serialized data to be sent over the wire back to the client. These serialization formats tend to be quite verbose.

For example, to serialize the result of a vertex query such as `g.V().limit(1)`, the Neptune query engine must perform a single search to produce the query result. However, the `GraphSON` serializer would perform a large number of additional searches to package the vertex into the serialization format. It would have to perform one search to get the label, one to get the property keys, and one search per property key for the vertex to get all the values for each key.

Some of the serialization formats are more efficient, but all require additional searches. Additionally, the TinkerPop serializers don't try to avoid duplicated searches, often resulting in many searches being repeated unnecessarily.

This makes it very important to write your queries so that they ask specifically just for the information they need. For example, `g.V().limit(1).id()` would return just the vertex ID and eliminate all the additional serializer searches. The [Gremlin `profile` API in Neptune](gremlin-profile-api.md) allows you to see how many search calls are made during query execution and during serialization.

# Using the Gremlin `explain` API in Neptune
Gremlin `explain` API

The Amazon Neptune Gremlin `explain` API returns the query plan that would be executed if a specified query were run. Because the API doesn't actually run the query, the plan is returned almost instantaneously.

It differs from the TinkerPop .explain() step so as to be able to report information specific to the Neptune engine.

## Information contained in a Gremlin `explain` report
Information in `explain`

An `explain` report contains the following information:
+ The query string as requested.
+ **The original traversal.** This is the TinkerPop Traversal object produced by parsing the query string into TinkerPop steps. It is equivalent to the original query produced by running `.explain()` on the query against the TinkerPop TinkerGraph.
+ **The converted traversal.** This is the Neptune Traversal produced by converting the TinkerPop Traversal into the Neptune logical query plan representation. In many cases the entire TinkerPop traversal is converted into two Neptune steps: one that executes the entire query (`NeptuneGraphQueryStep`) and one that converts the Neptune query engine output back into TinkerPop Traversers (`NeptuneTraverserConverterStep`).
+ **The optimized traversal.** This is the optimized version of the Neptune query plan after it has been run through a series of static work-reducing optimizers that rewrite the query based on static analysis and estimated cardinalities. These optimizers do things like reorder operators based on range counts, prune unnecessary or redundant operators, rearrange filters, push operators into different groups, and so on.
+ **The predicate count.** Because of the Neptune indexing strategy described earlier, having a large number of different predicates can cause performance problems. This is especially true for queries that use reverse traversal operators with no edge label (`.in` or `.both`). If such operators are used and the predicate count is high enough, the `explain` report displays a warning message.
+ **DFE information.** When the DFE alternative engine is enabled, the following traversal components may show up in the optimized traversal:
  + **`DFEStep`**   –   A Neptune optimized DFE step in the traversal that contains a child `DFENode`. `DFEStep` represents the part of the query plan that is executed in the DFE engine.
  + **`DFENode`**   –   Contains the intermediate representation as one or more child `DFEJoinGroupNodes`.
  + **`DFEJoinGroupNode`**   –   Represents a join of one or more `DFENode` or `DFEJoinGroupNode` elements.
  + **`NeptuneInterleavingStep`**   –   A Neptune optimized DFE step in the traversal that contains a child `DFEStep`.

    Also contains a `stepInfo` element that contains information about the traversal, such as the frontier element, the path elements used, and so on. This information is used to process the child `DFEStep`.

  An easy way to find out if your query is being evaluated by DFE is to check whether the `explain` output contains a `DFEStep`. Any part of the traversal that is not part of the `DFEStep` will not be executed by DFE and will be executed by the TinkerPop engine.

  See [Example with DFE enabled](#gremlin-explain-dfe) for a sample report.

## Gremlin `explain` syntax
Syntax for `explain`

The syntax of the `explain` API is the same as that for the HTTP API for query, except that it uses `/gremlin/explain` as the endpoint instead of `/gremlin`, as in the following examples.

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-explain-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query "g.V().limit(1)"
```

For more information, see [execute-gremlin-explain-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-explain-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_explain_query(
    gremlinQuery='g.V().limit(1)'
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/explain \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.V().limit(1)"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/explain \
  -d '{"gremlin":"g.V().limit(1)"}'
```

------

The preceding query would produce the following output.

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().limit(1)

Original Traversal
==================
[GraphStep(vertex,[]), RangeGlobalStep(0,1)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
        }, finishers=[limit(1)], annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY}
        }, finishers=[limit(1)], annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 18
```

## Unconverted TinkerPop Steps
Unconverted Steps

Ideally, all TinkerPop steps in a traversal have native Neptune operator coverage. When this isn't the case, Neptune falls back on TinkerPop step execution for gaps in its operator coverage. If a traversal uses a step for which Neptune does not yet have native coverage, the `explain` report displays a warning showing where the gap occurred.

When a step without a corresponding native Neptune operator is encountered, the entire traversal from that point forward is run using TinkerPop steps, even if subsequent steps do have native Neptune operators.

The exception to this is when Neptune full-text search is invoked. The NeptuneSearchStep implements steps without native equivalents as full-text search steps.

## Example of `explain` output where all steps in a query have native equivalents
Example with native equivalents

The following is an example `explain` report for a query where all steps have native equivalents:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().out()

Original Traversal
==================
[GraphStep(vertex,[]), VertexStep(OUT,vertex)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep], maxVarId=7}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, ?5, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=INFINITY}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep], maxVarId=7}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 18
```

## Example where some steps in a query do not have native equivalents
Example without native equivalents

Neptune handles both `GraphStep` and `VertexStep` natively, but if you introduce a `FoldStep` and `UnfoldStep`, the resulting `explain` output is different:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.V().fold().unfold().out()

Original Traversal
==================
[GraphStep(vertex,[]), FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
        }, annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .], {estimatedCardinality=INFINITY}
        }, annotations={path=[Vertex(?1):GraphStep], maxVarId=3}
    },
    NeptuneTraverserConverterStep,
    NeptuneMemoryTrackerStep
]
+ not converted into Neptune steps: [FoldStep, UnfoldStep, VertexStep(OUT,vertex)]

WARNING: >> FoldStep << is not supported natively yet
```

In this case, the `FoldStep` breaks you out of native execution. But even the subsequent `VertexStep` is no longer handled natively because it appears downstream of the `Fold/Unfold` steps.

For performance and cost-savings, it's important that you try to formulate traversals so that the maximum amount of work possible is done natively inside the Neptune query engine, instead of by the TinkerPop step implementations.

## Example of a query that uses Neptune full-text-search
Example with full-text search

The following query uses Neptune full-text search:

```
g.withSideEffect("Neptune#fts.endpoint", "some_endpoint")
  .V()
  .tail(100)
  .has("Neptune#fts mark*")
  -------
  .has("name", "Neptune#fts mark*")
  .has("Person", "name", "Neptune#fts mark*")
```

The `.has("name", "Neptune#fts mark*")` part limits the search to vertexes with `name`, while `.has("Person", "name", "Neptune#fts mark*")` limits the search to vertexes with `name` and the label `Person`. This results in the following traversal in the `explain` report:

```
Final Traversal
[NeptuneGraphQueryStep(Vertex) {
    JoinGroupNode {
        PatternNode[(?1, termid(1,URI), ?2, termid(0,URI)) . project distinct ?1 .], {estimatedCardinality=INFINITY}
    }, annotations={path=[Vertex(?1):GraphStep], maxVarId=4}
}, NeptuneTraverserConverterStep, NeptuneTailGlobalStep(10), NeptuneTinkerpopTraverserConverterStep, NeptuneSearchStep {
    JoinGroupNode {
        SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=some_endpoint}
    }
    JoinGroupNode {
        SearchNode[(idVar=?3, query=mark*, field=name) . project ask .], {endpoint=some_endpoint}
    }
}]
```

## Example of using `explain` when the DFE is enabled
Example with DFE enabled

The following is an example of an `explain` report when the DFE alternative query engine is enabled:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().as("a").out().has("name", "josh").out().in().where(eq("a"))


Original Traversal
==================
[GraphStep(vertex,[])@[a], VertexStep(OUT,vertex), HasStep([name.eq(josh)]), VertexStep(OUT,vertex), VertexStep(IN,vertex), WherePredicateStep(eq(a))]

Converted Traversal
===================
Neptune steps:
[
    DFEStep(Vertex) {
      DFENode {
        DFEJoinGroupNode[ children={
          DFEPatternNode[(?1, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, ?2, <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph>) . project DISTINCT[?1] {rangeCountEstimate=unknown}],
          DFEPatternNode[(?1, ?3, ?4, ?5) . project ALL[?1, ?4] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}]
        }, {rangeCountEstimate=unknown}
        ]
      } [Vertex(?1):GraphStep@[a], Vertex(?4):VertexStep]
    } ,
    NeptuneTraverserConverterDFEStep
]
+ not converted into Neptune steps: HasStep([name.eq(josh)]),
Neptune steps:
[
    NeptuneInterleavingStep {
      StepInfo[joinVars=[?7, ?1], frontierElement=Vertex(?7):HasStep, pathElements={a=(last,Vertex(?1):GraphStep@[a])}, listPathElement={}, indexTime=0ms],
      DFEStep(Vertex) {
        DFENode {
          DFEJoinGroupNode[ children={
            DFEPatternNode[(?7, ?8, ?9, ?10) . project ALL[?7, ?9] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}],
            DFEPatternNode[(?12, ?11, ?9, ?13) . project ALL[?9, ?12] graphFilters=(!= <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> . ), {rangeCountEstimate=unknown}]
          }, {rangeCountEstimate=unknown}
          ]
        } [Vertex(?9):VertexStep, Vertex(?12):VertexStep]
      } 
    }
]
+ not converted into Neptune steps: WherePredicateStep(eq(a)),
Neptune steps:
[
    DFECleanupStep
]


Optimized Traversal
===================
Neptune steps:
[
    DFEStep(Vertex) {
      DFENode {
        DFEJoinGroupNode[ children={
          DFEPatternNode[(?1, ?3, ?4, ?5) . project ALL[?1, ?4] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}]
        }, {rangeCountEstimate=unknown}
        ]
      } [Vertex(?1):GraphStep@[a], Vertex(?4):VertexStep]
    } ,
    NeptuneTraverserConverterDFEStep
]
+ not converted into Neptune steps: NeptuneHasStep([name.eq(josh)]),
Neptune steps:
[
    NeptuneMemoryTrackerStep,
    NeptuneInterleavingStep {
      StepInfo[joinVars=[?7, ?1], frontierElement=Vertex(?7):HasStep, pathElements={a=(last,Vertex(?1):GraphStep@[a])}, listPathElement={}, indexTime=0ms],
      DFEStep(Vertex) {
        DFENode {
          DFEJoinGroupNode[ children={
            DFEPatternNode[(?7, ?8, ?9, ?10) . project ALL[?7, ?9] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}],
            DFEPatternNode[(?12, ?11, ?9, ?13) . project ALL[?9, ?12] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807}]
          }, {rangeCountEstimate=unknown}
          ]
        } [Vertex(?9):VertexStep, Vertex(?12):VertexStep]
      } 
    }
]
+ not converted into Neptune steps: WherePredicateStep(eq(a)),
Neptune steps:
[
    DFECleanupStep
]


WARNING: >> [NeptuneHasStep([name.eq(josh)]), WherePredicateStep(eq(a))] << (or one of the children for each step) is not supported natively yet

Predicates
==========
# of predicates: 8
```

See [Information in `explain`](#gremlin-explain-api-results) for a description of the DFE-specific sections in the report.

# Gremlin `profile` API in Neptune
Gremlin `profile` API

The Neptune Gremlin `profile` API runs a specified Gremlin traversal, collects various metrics about the run, and produces a profile report as output.

It differs from the TinkerPop .profile() step so as to be able to report information specific to the Neptune engine.

The profile report includes the following information about the query plan:
+ The physical operator pipeline
+ The index operations for query execution and serialization
+ The size of the result

The `profile` API uses an extended version of the HTTP API syntax for query, with `/gremlin/profile` as the endpoint instead of `/gremlin`.

## Parameters specific to Neptune Gremlin `profile`
Parameters for `profile`
+ **profile.results** – `boolean`, allowed values: `TRUE` and `FALSE`, default value: `TRUE`.

  If true, the query results are gathered and displayed as part of the `profile` report. If false, only the result count is displayed.
+ **profile.chop** – `int`, default value: 250.

  If non-zero, causes the results string to be truncated at that number of characters. This does not keep all results from being captured. It simply limits the size of the string in the profile report. If set to zero, the string contains all the results.
+ **profile.serializer** – `string`, default value: `<null>`.

  If non-null, the gathered results are returned in a serialized response message in the format specified by this parameter. The number of index operations necessary to produce that response message is reported along with the size in bytes to be sent to the client.

  Allowed values are `<null>` or any of the valid MIME type or TinkerPop driver "Serializers" enum values.

  ```
  "application/json" or "GRAPHSON"
  "application/vnd.gremlin-v1.0+json" or "GRAPHSON_V1"
  "application/vnd.gremlin-v1.0+json;types=false" or "GRAPHSON_V1_UNTYPED"
  "application/vnd.gremlin-v2.0+json" or "GRAPHSON_V2"
  "application/vnd.gremlin-v2.0+json;types=false" or "GRAPHSON_V2_UNTYPED"
  "application/vnd.gremlin-v3.0+json" or "GRAPHSON_V3"
  "application/vnd.gremlin-v3.0+json;types=false" or "GRAPHSON_V3_UNTYPED"
  "application/vnd.graphbinary-v1.0" or "GRAPHBINARY_V1"
  ```
+ **profile.indexOps** – `boolean`, allowed values: `TRUE` and `FALSE`, default value: `FALSE`.

  If true, shows a detailed report of all index operations that took place during query execution and serialization. Warning: This report can be verbose.



## Sample output of Neptune Gremlin `profile`
Sample `profile` output

The following is a sample `profile` query.

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-profile-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query 'g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)' \
  --serializer "application/vnd.gremlin-v3.0+json"
```

For more information, see [execute-gremlin-profile-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-profile-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_profile_query(
    gremlinQuery='g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)',
    serializer='application/vnd.gremlin-v3.0+json'
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/profile \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.V().hasLabel(\"airport\").has(\"code\", \"AUS\").emit().repeat(in().simplePath()).times(2).limit(100)", "profile.serializer":"application/vnd.gremlin-v3.0+json"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile \
  -d '{"gremlin":"g.V().hasLabel(\"airport\").has(\"code\", \"AUS\").emit().repeat(in().simplePath()).times(2).limit(100)", "profile.serializer":"application/vnd.gremlin-v3.0+json"}'
```

------

This query generates the following `profile` report when executed on the air-routes sample graph from the blog post, [Let Me Graph That For You – Part 1 – Air Routes](https://aws.amazon.com/blogs/database/let-me-graph-that-for-you-part-1-air-routes/).

```
*******************************************************
                Neptune Gremlin Profile
*******************************************************

Query String
==================
g.V().hasLabel("airport").has("code", "AUS").emit().repeat(in().simplePath()).times(2).limit(100)

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([~label.eq(airport), code.eq(AUS)]), RepeatStep(emit(true),[VertexStep(IN,vertex), PathFilterStep(simple), RepeatEndStep],until(loops(2))), RangeGlobalStep(0,100)]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Vertex) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "AUS", ?) . project ?1 .], {estimatedCardinality=1, indexTime=84, hashJoin=true, joinTime=3, actualTotalOutput=1}
            PatternNode[(?1, <~label>, ?2=<airport>, <~>) . project ask .], {estimatedCardinality=3374, indexTime=29, hashJoin=true, joinTime=0, actualTotalOutput=61}
            RepeatNode {
                Repeat {
                    PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0, joinTime=3}
                }
                Emit {
                    Filter(true)
                }
                LoopsCondition {
                    LoopsFilter([?1, ?3],eq(2))
                }
            }, annotations={repeatMode=BFS, emitFirst=true, untilFirst=false, leftVar=?1, rightVar=?3}
        }, finishers=[limit(100)], annotations={path=[Vertex(?1):GraphStep, Repeat[Vertex(?3):VertexStep]], joinStats=true, optimizationTime=495, maxVarId=7, executionTime=323}
    },
    NeptuneTraverserConverterStep
]

Physical Pipeline
=================
NeptuneGraphQueryStep
    |-- StartOp
    |-- JoinGroupOp
        |-- SpoolerOp(100)
        |-- DynamicJoinOp(PatternNode[(?1, <code>, "AUS", ?) . project ?1 .], {estimatedCardinality=1, indexTime=84, hashJoin=true})
        |-- SpoolerOp(100)
        |-- DynamicJoinOp(PatternNode[(?1, <~label>, ?2=<airport>, <~>) . project ask .], {estimatedCardinality=3374, indexTime=29, hashJoin=true})
        |-- RepeatOp
            |-- <upstream input> (Iteration 0) [visited=1, output=1 (until=0, emit=1), next=1]
            |-- BindingSetQueue (Iteration 1) [visited=61, output=61 (until=0, emit=61), next=61]
                |-- SpoolerOp(100)
                |-- DynamicJoinOp(PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0})
            |-- BindingSetQueue (Iteration 2) [visited=38, output=38 (until=38, emit=0), next=0]
                |-- SpoolerOp(100)
                |-- DynamicJoinOp(PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . SimplePathFilter(?1, ?3)) .], {hashJoin=true, estimatedCardinality=50148, indexTime=0})
        |-- LimitOp(100)

Runtime (ms)
============
Query Execution:  392.686
Serialization:   2636.380

Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(Vertex)                                        100         100         314.162    82.78
NeptuneTraverserConverterStep                                        100         100          65.333    17.22
                                            >TOTAL                     -           -         379.495        -

Repeat Metrics
==============
Iteration  Visited   Output    Until     Emit     Next
------------------------------------------------------
        0        1        1        0        1        1
        1       61       61        0       61       61
        2       38       38       38        0        0
------------------------------------------------------
               100      100       38       62       62

Predicates
==========
# of predicates: 16

WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance

Results
=======
Count: 100
Output: [v[3], v[3600], v[3614], v[4], v[5], v[6], v[7], v[8], v[9], v[10], v[11], v[12], v[47], v[49], v[136], v[13], v[15], v[16], v[17], v[18], v[389], v[20], v[21], v[22], v[23], v[24], v[25], v[26], v[27], v[28], v[416], v[29], v[30], v[430], v[31], v[9...
Response serializer: GRYO_V3D0
Response size (bytes): 23566

Index Operations
================
Query execution:
    # of statement index ops: 3
    # of unique statement index ops: 3
    Duplication ratio: 1.0
    # of terms materialized: 0
Serialization:
    # of statement index ops: 200
    # of unique statement index ops: 140
    Duplication ratio: 1.43
    # of terms materialized: 393
```

In addition to the query plans returned by a call to Neptune `explain`, the `profile` results include runtime statistics around query execution. Each Join operation is tagged with the time it took to perform its join as well as the actual number of solutions that passed through it.

The `profile` output includes the time taken during the core query execution phase, as well as the serialization phase if the `profile.serializer` option was specified.

The breakdown of the index operations performed during each phase is also included at the bottom of the `profile` output.

Note that consecutive runs of the same query may show different results in terms of run-time and index operations because of caching.

For queries using the `repeat()` step, a breakdown of the frontier on each iteration is available if the `repeat()` step was pushed down as part of a `NeptuneGraphQueryStep`.

## Differences in `profile` reports when DFE is enabled
DFE `profile` reports

When the Neptune DFE alternative query engine is enabled, `profile` output is somewhat different:

**Optimized Traversal:** This section is similar to the one in `explain` output, but contains additional information. This includes the type of DFE operators that were considered in planning, and the associated worst case and best case cost estimates.

**Physical Pipeline:** This section captures the operators that are used to execute the query. `DFESubQuery` elements abstract the physical plan that is used by DFE to execute the portion of the plan it is responsible for. The `DFESubQuery` elements are unfolded in the following section where DFE statistics are listed.

**DFEQueryEngine Statistics:** This section shows up only when at least part of the query is executed by DFE. It outlines various runtime statistics that are specific to DFE, and contains a detailed breakdown of the time spent in the various parts of the query execution, by `DFESubQuery`.

Nested subqueries in different `DFESubQuery` elements are flattened in this section, and unique identifiers are marked with a header that starts with `subQuery=`.

**Traversal metrics:** This section shows step-level traversal metrics, and when the DFE engine runs all or part of the query, displays metrics for `DFEStep` and/or `NeptuneInterleavingStep`. See [Tuning Gremlin queries using `explain` and `profile`](gremlin-traversal-tuning.md).

**Note**  
DFE is an experimental feature released under lab mode, so the exact format of the `profile` output is still subject to change.

## Sample `profile` output when the Neptune Dataflow engine (DFE) is enabled
DFE `profile` output example

When the DFE engine is being used to run Gremlin queries, output of the [Gremlin `profile` API](#gremlin-profile-api) is formatted as shown in the example below.

Query:

------
#### [ AWS CLI ]

```
aws neptunedata execute-gremlin-profile-query \
  --endpoint-url https://your-neptune-endpoint:port \
  --gremlin-query "g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()"
```

For more information, see [execute-gremlin-profile-query](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/execute-gremlin-profile-query.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.execute_gremlin_profile_query(
    gremlinQuery="g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()"
)

print(response['output'])
```

For AWS SDK examples in other languages like Java, .NET, and more, see [AWS SDK](access-graph-gremlin-sdk.md).

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/gremlin/profile \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -d '{"gremlin":"g.withSideEffect('"'"'Neptune#useDFE'"'"', true).V().has('"'"'code'"'"', '"'"'ATL'"'"').out()"}'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

For more information about using **awscurl** with IAM authentication, see [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl).

------
#### [ curl ]

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile \
  -d '{"gremlin":"g.withSideEffect('"'"'Neptune#useDFE'"'"', true).V().has('"'"'code'"'"', '"'"'ATL'"'"').out()"}'
```

------

```
*******************************************************
                    Neptune Gremlin Profile
    *******************************************************

    Query String
    ==================
    g.withSideEffect('Neptune#useDFE', true).V().has('code', 'ATL').out()

    Original Traversal
    ==================
    [GraphStep(vertex,[]), HasStep([code.eq(ATL)]), VertexStep(OUT,vertex)]

    Optimized Traversal
    ===================
    Neptune steps:
    [
        DFEStep(Vertex) {
          DFENode {
            DFEJoinGroupNode[null](
              children=[
                DFEPatternNode((?1, vp://code[419430926], ?4, defaultGraph[526]) . project DISTINCT[?1] objectFilters=(in(ATL[452987149]) . ), {rangeCountEstimate=1},
                  opInfo=(type=PipelineJoin, cost=(exp=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=0.00),wc=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=0.00)),
                    disc=(type=PipelineScan, cost=(exp=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=34.00),wc=(in=1.00,out=1.00,io=0.00,comp=0.00,mem=34.00))))),
                DFEPatternNode((?1, ?5, ?6, ?7) . project ALL[?1, ?6] graphFilters=(!= defaultGraph[526] . ), {rangeCountEstimate=9223372036854775807})],
              opInfo=[
                OperatorInfoWithAlternative[
                  rec=(type=PipelineJoin, cost=(exp=(in=1.00,out=27.76,io=0.00,comp=0.00,mem=0.00),wc=(in=1.00,out=27.76,io=0.00,comp=0.00,mem=0.00)),
                    disc=(type=PipelineScan, cost=(exp=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00),wc=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00)))),
                  alt=(type=PipelineScan, cost=(exp=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00),wc=(in=1.00,out=27.76,io=Infinity,comp=0.00,mem=295147905179352830000.00)))]])
          } [Vertex(?1):GraphStep, Vertex(?6):VertexStep]
        } ,
        NeptuneTraverserConverterDFEStep,
        DFECleanupStep
    ]


    Physical Pipeline
    =================
    DFEStep
        |-- DFESubQuery1

    DFEQueryEngine Statistics
    =================
    DFESubQuery1
    ╔════╤════════╤════════╤═══════════════════════╤══════════════════════════════════════════════════════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤════════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                  │ Arguments                                                                                                    │ Mode │ Units In │ Units Out │ Ratio  │ Time (ms) ║
    ╠════╪════════╪════════╪═══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪════════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFESolutionInjection  │ solutions=[]                                                                                                 │ -    │ 0        │ 1         │ 0.00   │ 0.01      ║
    ║    │        │        │                       │ outSchema=[]                                                                                                 │      │          │           │        │           ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 1  │ 2      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_1 │ -    │ 1        │ 1         │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 2  │ 3      │ -      │ DFEChunkLocalSubQuery │ subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_2 │ -    │ 1        │ 242       │ 242.00 │ 0.02      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 3  │ 4      │ -      │ DFEMergeChunks        │ -                                                                                                            │ -    │ 242      │ 242       │ 1.00   │ 0.01      ║
    ╟────┼────────┼────────┼───────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 4  │ -      │ -      │ DFEDrain              │ -                                                                                                            │ -    │ 242      │ 0         │ 0.00   │ 0.01      ║
    ╚════╧════════╧════════╧═══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧════════╧═══════════╝


    subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_1
    ╔════╤════════╤════════╤══════════════════════╤═════════════════════════════════════════════════════════════╤══════╤══════════╤═══════════╤═══════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                                                   │ Mode │ Units In │ Units Out │ Ratio │ Time (ms) ║
    ╠════╪════════╪════════╪══════════════════════╪═════════════════════════════════════════════════════════════╪══════╪══════════╪═══════════╪═══════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFEPipelineScan      │ pattern=Node(?1) with property 'code' as ?4 and label 'ALL' │ -    │ 0        │ 1         │ 0.00  │ 0.22      ║
    ║    │        │        │                      │ inlineFilters=[(?4 IN ["ATL"])]                             │      │          │           │       │           ║
    ║    │        │        │                      │ patternEstimate=1                                           │      │          │           │       │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 1  │ 2      │ -      │ DFEMergeChunks       │ -                                                           │ -    │ 1        │ 1         │ 1.00  │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 2  │ 4      │ -      │ DFERelationalJoin    │ joinVars=[]                                                 │ -    │ 2        │ 1         │ 0.50  │ 0.09      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 3  │ 2      │ -      │ DFESolutionInjection │ solutions=[]                                                │ -    │ 0        │ 1         │ 0.00  │ 0.01      ║
    ║    │        │        │                      │ outSchema=[]                                                │      │          │           │       │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────────────────────┼──────┼──────────┼───────────┼───────┼───────────╢
    ║ 4  │ -      │ -      │ DFEDrain             │ -                                                           │ -    │ 1        │ 0         │ 0.00  │ 0.01      ║
    ╚════╧════════╧════════╧══════════════════════╧═════════════════════════════════════════════════════════════╧══════╧══════════╧═══════════╧═══════╧═══════════╝


    subQuery=http://aws.amazon.com/neptune/vocab/v01/dfe/past/graph#089f43e3-4d71-4259-8d19-254ff63cee04/graph_2
    ╔════╤════════╤════════╤══════════════════════╤═════════════════════════════════════╤══════╤══════════╤═══════════╤════════╤═══════════╗
    ║ ID │ Out #1 │ Out #2 │ Name                 │ Arguments                           │ Mode │ Units In │ Units Out │ Ratio  │ Time (ms) ║
    ╠════╪════════╪════════╪══════════════════════╪═════════════════════════════════════╪══════╪══════════╪═══════════╪════════╪═══════════╣
    ║ 0  │ 1      │ -      │ DFESolutionInjection │ solutions=[]                        │ -    │ 0        │ 1         │ 0.00   │ 0.01      ║
    ║    │        │        │                      │ outSchema=[?1]                      │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 1  │ 2      │ 3      │ DFETee               │ -                                   │ -    │ 1        │ 2         │ 2.00   │ 0.01      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 2  │ 4      │ -      │ DFEDistinctColumn    │ column=?1                           │ -    │ 1        │ 1         │ 1.00   │ 0.21      ║
    ║    │        │        │                      │ ordered=false                       │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 3  │ 5      │ -      │ DFEHashIndexBuild    │ vars=[?1]                           │ -    │ 1        │ 1         │ 1.00   │ 0.03      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 4  │ 5      │ -      │ DFEPipelineJoin      │ pattern=Edge((?1)-[?7:?5]->(?6))    │ -    │ 1        │ 242       │ 242.00 │ 0.51      ║
    ║    │        │        │                      │ constraints=[]                      │      │          │           │        │           ║
    ║    │        │        │                      │ patternEstimate=9223372036854775807 │      │          │           │        │           ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 5  │ 6      │ 7      │ DFESync              │ -                                   │ -    │ 243      │ 243       │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 6  │ 8      │ -      │ DFEForwardValue      │ -                                   │ -    │ 1        │ 1         │ 1.00   │ 0.01      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 7  │ 8      │ -      │ DFEForwardValue      │ -                                   │ -    │ 242      │ 242       │ 1.00   │ 0.02      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 8  │ 9      │ -      │ DFEHashIndexJoin     │ -                                   │ -    │ 243      │ 242       │ 1.00   │ 0.31      ║
    ╟────┼────────┼────────┼──────────────────────┼─────────────────────────────────────┼──────┼──────────┼───────────┼────────┼───────────╢
    ║ 9  │ -      │ -      │ DFEDrain             │ -                                   │ -    │ 242      │ 0         │ 0.00   │ 0.01      ║
    ╚════╧════════╧════════╧══════════════════════╧═════════════════════════════════════╧══════╧══════════╧═══════════╧════════╧═══════════╝


    Runtime (ms)
    ============
    Query Execution: 11.744

    Traversal Metrics
    =================
    Step                                                               Count  Traversers       Time (ms)    % Dur
    -------------------------------------------------------------------------------------------------------------
    DFEStep(Vertex)                                                      242         242          10.849    95.48
    NeptuneTraverserConverterDFEStep                                     242         242           0.514     4.52
                                                >TOTAL                     -           -          11.363        -

    Predicates
    ==========
    # of predicates: 18

    Results
    =======
    Count: 242


    Index Operations
    ================
    Query execution:
        # of statement index ops: 0
        # of terms materialized: 0
```

**Note**  
Because the DFE engine is an experimental feature released in lab mode, the exact format of the `profile` output is subject to change.

# Tuning Gremlin queries using `explain` and `profile`
Tuning Gremlin queries

You can often tune your Gremlin queries in Amazon Neptune to get better performance, using the information available to you in the reports you get from the Neptune [explain](gremlin-explain-api.md) and [profile](gremlin-profile-api.md) APIs. To do so, it helps to understand how Neptune processes Gremlin traversals.

**Important**  
A change was made in TinkerPop version 3.4.11 that improves correctness of how queries are processed, but for the moment can sometimes seriously impact query performance.  
For example, a query of this sort may run significantly slower:  

```
g.V().hasLabel('airport').
  order().
    by(out().count(),desc).
  limit(10).
  out()
```
The vertices after the limit step are now fetched in a non-optimal way beause of the TinkerPop 3.4.11 change. To avoid this, you can modify the query by adding the barrier() step at any point after the `order().by()`. For example:  

```
g.V().hasLabel('airport').
  order().
    by(out().count(),desc).
  limit(10).
  barrier().
  out()
```
TinkerPop 3.4.11 was enabled in Neptune [engine version 1.0.5.0](engine-releases-1.0.5.0.md).

## Understanding Gremlin traversal processing in Neptune
Traversal processing

When a Gremlin traversal is sent to Neptune, there are three main processes that transform the traversal into an underlying execution plan for the engine to execute. These are parsing, conversion, and optimization:

![\[3 processes transform a Gremlin query into an execution plan.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_traversal_processing.png)


### The traversal parsing process
Parsing

The first step in processing a traversal is to parse it into a common language. In Neptune, that common language is the set of TinkerPop steps that are part of the [TinkerPop API](http://tinkerpop.apache.org/javadocs/3.4.8/full/org/apache/tinkerpop/gremlin/process/traversal/Step.html). Each of these steps represents a unit of computation within the traversal.

You can send a Gremlin traversal to Neptune either as a string or as bytecode. The REST endpoint and the Java client driver `submit()` method send traversals as strings, as in this example:

```
client.submit("g.V()")
```

Applications and language drivers using [Gremlin language variants (GLV)](https://tinkerpop.apache.org/docs/current/tutorials/gremlin-language-variants/) send traversals in bytecode.

### The traversal conversion process
Conversion

The second step in processing a traversal is to convert its TinkerPop steps into a set of converted and non-converted Neptune steps. Most steps in the Apache TinkerPop Gremlin query language are converted to Neptune-specific steps that are optimized to run on the underlying Neptune engine. When a TinkerPop step without a Neptune equivalent is encountered in a traversal, that step and all subsequent steps in the traversal are processed by the TinkerPop query engine.

For more information about what steps can be converted under what circumstances, see [Gremlin step support](gremlin-step-support.md).

### The traversal optimization process
Optimization

The final step in traversal processing is to run the series of converted and non-converted steps through the optimizer, to try to determine the best execution plan. The output of this optimization is the execution plan that the Neptune engine processes.

## Using the Neptune Gremlin `explain` API to tune queries
Using explain output to tune

The Neptune explain API is not the same as the Gremlin `explain()` step. It returns the final execution plan that the Neptune engine would process when executing the query. Because it does not perform any processing, it returns the same plan regardless of the parameters used, and its output contains no statistics about actual execution.

Consider the following simple traversal that finds all the airport vertices for Anchorage:

```
g.V().has('code','ANC')
```

There are two ways you can run this traversal through the Neptune `explain` API. The first way is to make a REST call to the explain endpoint, like this:

```
curl -X POST https://your-neptune-endpoint:port/gremlin/explain -d '{"gremlin":"g.V().has('code','ANC')"}'
```

The second way is to use the Neptune workbench's [%%gremlin](notebooks-magics.md#notebooks-cell-magics-gremlin) cell magic with the `explain` parameter. This passes the traversal contained in the cell body to the Neptune `explain` API and then displays the resulting output when you run the cell:

```
%%gremlin explain

g.V().has('code','ANC')
```

The resulting `explain` API output describes Neptune's execution plan for the traversal. As you can see in the image below, the plan includes each of the 3 steps in the processing pipeline:

![\[Explain API output for a simple Gremlin traversal.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_explain_output_1.png)


### Tuning a traversal by looking at steps that are not converted
Steps that are not converted

One of the first things to look for in the Neptune `explain` API output is for Gremlin steps that are not converted to Neptune native steps. In a query plan, when a step is encountered that cannot be converted to a Neptune native step, it and all subsequent steps in the plan are processed by the Gremlin server.

In the example above, all steps in the traversal were converted. Let's examine `explain` API output for this traversal:

```
g.V().has('code','ANC').out().choose(hasLabel('airport'), values('code'), constant('Not an airport'))
```

As you can see in the image below, Neptune could not convert the `choose()` step:

![\[Explain API output in which not all steps can be converted.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_explain_output_2.png)


There are several things you could do to tune the performance of the traversal. The first would be to rewrite it in such a way as to eliminate the step that could not be converted. Another would be to move the step to the end of the traversal so that all other steps can be converted to native ones.

A query plan with steps that are not converted does not always need to be tuned. If the steps that cannot be converted are at the end of the traversal, and are related to how output is formatted rather than how the graph is traversed, they may have little effect on performance.

### 


Another thing to look for when examining output from the Neptune `explain` API is steps that do not use indexes. The following traversal finds all airports with flights that land in Anchorage:

```
g.V().has('code','ANC').in().values('code')
```

Output from the explain API for this traversal is:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().has('code','ANC').in().values('code')

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([code.eq(ANC)]), VertexStep(IN,vertex), PropertiesStep([code],value)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, <code>, "ANC", ?) . project ask .]
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
            PatternNode[(?3, ?7, ?8, <~>) . project ?3,?8 . ContainsFilter(?7 in (<code>)) .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "ANC", ?) . project ?1 .], {estimatedCardinality=1}
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=INFINITY}
            PatternNode[(?3, ?7=<code>, ?8, <~>) . project ?3,?8 .], {estimatedCardinality=7564}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 26

WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance
```

The `WARNING` message at the bottom of the output occurs because the `in()` step in the traversal cannot be handled using one of the 3 indexes that Neptune maintains (see [How Statements Are Indexed in Neptune](feature-overview-storage-indexing.md) and [Gremlin statements in Neptune](gremlin-explain-background-statements.md)). Because the `in()` step contains no edge filter, it cannot be resolved using the `SPOG`, `POGS` or `GPSO` index. Instead, Neptune must perform a union scan to find the requested vertices, which is much less efficient.

There are two ways to tune the traversal in this situation. The first is to add one or more filtering criteria to the `in()` step so that an indexed lookup can be used to resolve the query. For the example above, this might be:

```
g.V().has('code','ANC').in('route').values('code')
```

Output from the Neptune `explain` API for the revised traversal no longer contains the `WARNING` message:

```
*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============

g.V().has('code','ANC').in('route').values('code')

Original Traversal
==================
[GraphStep(vertex,[]), HasStep([code.eq(ANC)]), VertexStep(IN,[route],vertex), PropertiesStep([code],value)]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <~label>, ?2, <~>) . project distinct ?1 .]
            PatternNode[(?1, <code>, "ANC", ?) . project ask .]
            PatternNode[(?3, ?5, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) . ContainsFilter(?5 in (<route>)) .]
            PatternNode[(?3, <~label>, ?4, <~>) . project ask .]
            PatternNode[(?3, ?7, ?8, <~>) . project ?3,?8 . ContainsFilter(?7 in (<code>)) .]
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(PropertyValue) {
        JoinGroupNode {
            PatternNode[(?1, <code>, "ANC", ?) . project ?1 .], {estimatedCardinality=1}
            PatternNode[(?3, ?5=<route>, ?1, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=32042}
            PatternNode[(?3, ?7=<code>, ?8, <~>) . project ?3,?8 .], {estimatedCardinality=7564}
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, PropertyValue(?8):PropertiesStep], maxVarId=9}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 26
```

Another option if you are running many traversals of this kind is to run them in a Neptune DB cluster that has the optional `OSGP` index enabled (see [Enabling an OSGP Index](feature-overview-storage-indexing.md#feature-overview-storage-indexing-osgp)). Enabling an `OSGP` index has drawbacks:
+ It must be enabled in a DB cluster before any data is loaded.
+ Insertion rates for vertices and edges may slow by up to 23%.
+ Storage usage will increase by around 20%.
+ Read queries that scatter requests across all indexes may have increased latencies.

Having an `OSGP` index makes a lot of sense for a restricted set of query patterns, but unless you are running those frequently, it is usually preferable to try to ensure that the traversals you write can be resolved using the three primary indexes.

### Using a large number of predicates
Many predicates

Neptune treats each edge label and each distinct vertex or edge property name in your graph as a predicate, and is designed by default to work with a relatively low number of distinct predicates. When you have more than a few thousand predicates in your graph data, performance can degrade.

Neptune `explain` output will warn you if this is the case:

```
Predicates
==========
# of predicates: 9549
WARNING: high predicate count (# of distinct property names and edge labels)
```

If it is not convenient to rework your data model to reduce the number of labels and properties, and therefore the number of predicates, the best way to tune traversals is to run them in a DB cluster that has the `OSGP` index enabled, as discussed above.

## Using the Neptune Gremlin `profile` API to tune traversals
Using profile output to tune

The Neptune `profile` API is quite different from the Gremlin `profile()` step. Like the `explain` API, its output includes the query plan that the Neptune engine uses when executing the traversal. In addition, the `profile` output includes actual execution statistics for the traversal, given how its parameters are set.

Again, take the simple traversal that finds all airport vertices for Anchorage:

```
g.V().has('code','ANC')
```

As with the `explain` API, you can invoke the `profile` API using a REST call:

```
curl -X POST https://your-neptune-endpoint:port/gremlin/profile -d '{"gremlin":"g.V().has('code','ANC')"}'
```

You use also the Neptune workbench's [%%gremlin](notebooks-magics.md#notebooks-cell-magics-gremlin) cell magic with the `profile` parameter. This passes the traversal contained in the cell body to the Neptune `profile` API and then displays the resulting output when you run the cell:

```
%%gremlin profile

g.V().has('code','ANC')
```

The resulting `profile` API output contains both Neptune's execution plan for the traversal and statistics about the plan's execution, as you can see in this image:

![\[An example of Neptune profile API output.\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/Gremlin_profile_output_1.png)


In `profile` output, the execution plan section only contains the final execution plan for the traversal, not the intermediate steps. The pipeline section contains the physical pipeline operations that were performed as well as the actual time (in milliseconds) that traversal execution took. The runtime metric is extremely helpful in comparing the times that two different versions of a traversal take as you are optimizing them.

**Note**  
The initial runtime of a traversal is generally longer than subsequent runtimes, because the first one causes the relevant data to be cached.

The third section of the `profile` output contains execution statistics and the results of the traversal. To see how this information can be useful in tuning a traversal, consider the following traversal, which finds every airport whose name begins with "Anchora", and all the airports reachable in two hops from those airports, returning airport codes, flight routes, and distances:

```
%%gremlin profile

g.withSideEffect("Neptune#fts.endpoint", "{your-OpenSearch-endpoint-URL").
    V().has("city", "Neptune#fts Anchora~").
    repeat(outE('route').inV().simplePath()).times(2).
    project('Destination', 'Route').
        by('code').
        by(path().by('code').by('dist'))
```

### Traversal metrics in Neptune `profile` API output
Traversal metrics

The first set of metrics that is available in all `profile` output is the traversal metrics. These are similar to the Gremlin `profile()` step metrics, with a few differences:

```
Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(Vertex)                                       3856        3856          91.701     9.09
NeptuneTraverserConverterStep                                       3856        3856          38.787     3.84
ProjectStep([Destination, Route],[value(code), ...                  3856        3856         878.786    87.07
  PathStep([value(code), value(dist)])                              3856        3856         601.359
                                            >TOTAL                     -           -        1009.274        -
```

The first column of the traversal-metrics table lists the steps executed by the traversal. The first two steps are generally the Neptune-specific steps, `NeptuneGraphQueryStep` and `NeptuneTraverserConverterStep`.

`NeptuneGraphQueryStep` represents the execution time for the entire portion of the traversal that could be converted and executed natively by the Neptune engine.

`NeptuneTraverserConverterStep` represents the process of converting the output of those converted steps into TinkerPop traversers which allow steps that could not be converted steps, if any, to be processed, or to return the results in a TinkerPop-compatible format.

In the example above, we have several non-converted steps, so we see that each of these TinkerPop steps (`ProjectStep`, `PathStep`) then appears as a row in the table.

The second column in the table, `Count`, reports the number of *represented* traversers that passed through the step, while the third column, `Traversers`, reports the number of traversers which passed through that step, as explained in the [TinkerPop profile step documentation](https://tinkerpop.apache.org/docs/current/reference/#profile-step).

In our example there are 3,856 vertices and 3,856 traversers returned by the `NeptuneGraphQueryStep`, and these numbers remain the same throughout the remaining processing because `ProjectStep` and `PathStep` are formatting the results, not filtering them.

**Note**  
Unlike TinkerPop, the Neptune engine does not optimize performance by *bulking* in its `NeptuneGraphQueryStep` and `NeptuneTraverserConverterStep` steps. Bulking is the TinkerPop operation that combines traversers on the same vertex to reduce operational overhead, and that is what causes the `Count` and `Traversers` numbers to differ. Because bulking only occurs in steps that Neptune delegates to TinkerPop, and not in steps that Neptune handles natively, the `Count` and `Traverser` columns seldom differ.

The Time column reports the number of milliseconds that the step took, and the the `% Dur` column reports what percent of the total processing time the step took. These are the metrics that tell you where to focus your tuning efforts by showing the steps that took the most time.

### Index operation metrics in Neptune `profile` API output
Index operations

Another set of metrics in the output of the Neptune profile API is the index operations:

```
Index Operations
================
Query execution:
    # of statement index ops: 23191
    # of unique statement index ops: 5960
    Duplication ratio: 3.89
    # of terms materialized: 0
```

These report:
+ The total number of index lookups.
+ The number of unique index lookups performed.
+ The ratio of total index lookups to unique ones. A lower ratio indicates less redundancy.
+ The number of terms materialized from the term dictionary.

### Repeat metrics in Neptune `profile` API output
Repeat metrics

If your traversal uses a `repeat()` step as in the example above, then a section containing repeat metrics appears in the `profile` output:

```
Repeat Metrics
==============
Iteration  Visited   Output    Until     Emit     Next
------------------------------------------------------
        0        2        0        0        0        2
        1       53        0        0        0       53
        2     3856     3856     3856        0        0
------------------------------------------------------
              3911     3856     3856        0       55
```

These report:
+ The loop count for a row (the `Iteration` column).
+ The number of elements visited by the loop (the `Visited` column).
+ The number of elements output by the loop (the `Output` column).
+ The last element output by the loop (the `Until` column).
+ The number of elements emitted by the loop (the `Emit` column).
+ The number of elements passed from the loop to the subsequent loop (the `Next` column).

These repeat metrics are very helpful in understanding the branching factor of your traversal, to get a feeling for how much work is being done by the database. You can use these numbers to diagnose performance problems, especially when the same traversal performs dramatically differently with different parameters.

### Full-text search metrics in Neptune `profile` API output
Full-text search metrics

When a traversal uses a [full-text search](full-text-search.md) lookup, as in the example above, then a section containing the full-text search (FTS) metrics appears in the `profile` output:

```
FTS Metrics
==============
SearchNode[(idVar=?1, query=Anchora~, field=city) . project ?1 .],
    {endpoint=your-OpenSearch-endpoint-URL, incomingSolutionsThreshold=1000, estimatedCardinality=INFINITY,
    remoteCallTimeSummary=[total=65, avg=32.500000, max=37, min=28],
    remoteCallTime=65, remoteCalls=2, joinTime=0, indexTime=0, remoteResults=2}

    2 result(s) produced from SearchNode above
```

This shows the query sent to the ElasticSearch (ES) cluster and reports several metrics about the interaction with ElasticSearch that can help you pinpoint performance problems relating to full-text search:
+ Summary information about the calls into the ElasticSearch index:
  + The total number of milliseconds required by all remoteCalls to satisfy the query (`total`).
  + The average number of milliseconds spent in a remoteCall (`avg`).
  + The minimum number of milliseconds spent in a remoteCall (`min`).
  + The maximum number of milliseconds spent in a remoteCall (`max`).
+ Total time consumed by remoteCalls to ElasticSearch (`remoteCallTime`).
+ The number of remoteCalls made to ElasticSearch (`remoteCalls`).
+ The number of milliseconds spent in joins of ElasticSearch results (`joinTime`).
+ The number of milliseconds spent in index lookups (`indexTime`).
+ The total number of results returned by ElasticSearch (`remoteResults`).

# Native Gremlin step support in Amazon Neptune
Gremlin step support

The Amazon Neptune engine does not currently have full native support for all Gremlin steps, as explained in [Tuning Gremlin queries](gremlin-traversal-tuning.md). Current support falls into four categories:
+ [Gremlin steps that can always be converted to native Neptune engine operations](#gremlin-steps-always)
+ [Gremlin steps that can be converted to native Neptune engine operations in some cases](#gremlin-steps-sometimes) 
+ [Gremlin steps that are never converted to native Neptune engine operations](#gremlin-steps-never) 
+ [Gremlin steps that are not supported in Neptune at all](#neptune-gremlin-steps-unsupported) 

## Gremlin steps that can always be converted to native Neptune engine operations
Always converted

Many Gremlin steps can be converted to native Neptune engine operations as long as they meet the following conditions:
+ They are not preceded in the query by a step that cannot be converted.
+ Their parent step, if any, can be converted,
+ All their child traversals, if any, can be converted.

The following Gremlin steps are always converted to native Neptune engine operations if they meet those conditions:
+ [and( )](http://tinkerpop.apache.org/docs/current/reference/#and-step)
+ [as( )](http://tinkerpop.apache.org/docs/current/reference/#as-step)
+ [count( )](http://tinkerpop.apache.org/docs/current/reference/#count-step)
+ [E( )](http://tinkerpop.apache.org/docs/current/reference/#graph-step)
+ [emit( )](http://tinkerpop.apache.org/docs/current/reference/#emit-step)
+ [explain( )](http://tinkerpop.apache.org/docs/current/reference/#explain-step)
+ [group( )](http://tinkerpop.apache.org/docs/current/reference/#group-step)
+ [groupCount( )](http://tinkerpop.apache.org/docs/current/reference/#groupcount-step)
+ [identity( )](http://tinkerpop.apache.org/docs/current/reference/#identity-step)
+ [is( )](http://tinkerpop.apache.org/docs/current/reference/#is-step)
+ [key( )](http://tinkerpop.apache.org/docs/current/reference/#key-step)
+ [label( )](http://tinkerpop.apache.org/docs/current/reference/#label-step)
+ [limit( )](http://tinkerpop.apache.org/docs/current/reference/#limit-step)
+ [local( )](http://tinkerpop.apache.org/docs/current/reference/#local-step)
+ [loops( )](http://tinkerpop.apache.org/docs/current/reference/#loops-step)
+ [not( )](http://tinkerpop.apache.org/docs/current/reference/#not-step)
+ [or( )](http://tinkerpop.apache.org/docs/current/reference/#or-step)
+ [profile( )](http://tinkerpop.apache.org/docs/current/reference/#profile-step)
+ [properties( )](http://tinkerpop.apache.org/docs/current/reference/#properties-step)
+ [subgraph( )](http://tinkerpop.apache.org/docs/current/reference/#subgraph-step)
+ [until( )](http://tinkerpop.apache.org/docs/current/reference/#until-step)
+ [V( )](http://tinkerpop.apache.org/docs/current/reference/#graph-step)
+ [value( )](http://tinkerpop.apache.org/docs/current/reference/#value-step)
+ [valueMap( )](http://tinkerpop.apache.org/docs/current/reference/#valuemap-step)
+ [values( )](http://tinkerpop.apache.org/docs/current/reference/#values-step)

## Gremlin steps that can be converted to native Neptune engine operations in some cases
Sometimes converted

Some Gremlin steps can be converted to native Neptune engine operations in some situations but not in others:
+ [addE( )](http://tinkerpop.apache.org/docs/current/reference/#addedge-step)   –   The `addE()` step can generally be converted to a native Neptune engine operation, unless it is immediately followed by a `property()` step containing a traversal as a key.
+ [addV( )](http://tinkerpop.apache.org/docs/current/reference/#addvertex-step)   –   The `addV()` step can generally be converted to a native Neptune engine operation, unless it is immediately followed by a `property()` step containing a traversal as a key, or unless multiple labels are assigned.
+ [aggregate( )](http://tinkerpop.apache.org/docs/current/reference/#store-step)   –   The `aggregate()` step can generally be converted to a native Neptune engine operation, unless the step is used in a child traversal or sub-traversal, or unless the value being stored is something other than a vertex, edge, id, label or property value.

  In example below, `aggregate()` is not converted because it is being used in a child traversal:

  ```
  g.V().has('code','ANC').as('a')
       .project('flights').by(select('a')
       .outE().aggregate('x'))
  ```

  In this example, aggregate() is not converted because what is stored is the `min()` of a value:

  ```
  g.V().has('code','ANC').outE().aggregate('x').by(values('dist').min())
  ```
+ [barrier( )](http://tinkerpop.apache.org/docs/current/reference/#barrier-step)   –   The `barrier()` step can generally be converted to a native Neptune engine operation, unless the step following it is not converted.
+ [cap( )](http://tinkerpop.apache.org/docs/current/reference/#cap-step)   –   The only case in which the `cap()` step is converted is when it is combined with the `unfold()` step to return an unfolded version of an aggregate of vertex, edge, id, or poperty values. In this example, `cap()` will be converted because it is followed by `.unfold()`:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport').unfold()
  ```

  However, if you remove the `.unfold()`, `cap()` will not be converted:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport')
  ```
+ [coalesce( )](http://tinkerpop.apache.org/docs/current/reference/#coalesce-step)   –   The only case where the `coalesce()` step is converted is when it follows the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/). Other coalesce() patterns are not allowed. Conversion is limited to the case where all child traversals can be converted, they all produce the same type as output (vertex, edge, id, value, key, or label), they all traverse to a new element, and they do not contain the `repeat()` step.
+ [constant( )](http://tinkerpop.apache.org/docs/current/reference/#constant-step)   –   The constant() step is currently only converted if it is used within a `sack().by()` part of a traversal to assign a constant value, like this:

  ```
  g.V().has('code','ANC').sack(assign).by(constant(10)).out().limit(2)
  ```
+ [cyclicPath( )](http://tinkerpop.apache.org/docs/current/reference/#cyclicpath-step)   –   The `cyclicPath()` step can generally be converted to a native Neptune engine operation, unless the step is used with `by()`, `from()`, or `to()` modulators. In the following queries, for example, `cyclicPath()` is not converted:

  ```
  g.V().has('code','ANC').as('a').out().out().cyclicPath().by('code')
  g.V().has('code','ANC').as('a').out().out().cyclicPath().from('a')
  g.V().has('code','ANC').as('a').out().out().cyclicPath().to('a')
  ```
+ [drop( )](http://tinkerpop.apache.org/docs/current/reference/#drop-step)   –   The `drop()` step can generally be converted to a native Neptune engine operation, unless the step is used inside a `sideEffect(`) or `optional()` step.
+ [fold( )](http://tinkerpop.apache.org/docs/current/reference/#fold-step)   –   There are only two situations where the fold() step can be converted, namely when it is used in the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/), and when it is used in a `group().by()` context like this:

  ```
  g.V().has('code','ANC').out().group().by().by(values('code', 'city').fold())
  ```
+  [has( )](http://tinkerpop.apache.org/docs/current/reference/#has-step)   –   The `has()` step can generally be converted to a native Neptune engine operation provided queries with `T` use the predicate `P.eq`, `P.neq` or `P.contains`. Expect variations of `has()` that imply those instances of `P` to convert to native as well, such as `hasId('id1234')` which is equivalent to `has(eq, T.id, 'id1234')`. 
+ [id( )](http://tinkerpop.apache.org/docs/current/reference/#id-step)   –   The `id()` step is converted unless it is used on a property, like this:

  ```
  g.V().has('code','ANC').properties('code').id()
  ```
+  [mergeE()](https://tinkerpop.apache.org/docs/current/reference/#mergeedge-step)   –   The `mergeE()` step can be converted to a native Neptune engine operation if the parameters (the merge condition, the `onCreate` and `onMatch`) are constant (either `null`, a constant `Map`, or `select()` of a `Map`). All examples in [ upserting edges ](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-efficient-upserts.html#gremlin-upserts-edges) can be converted. 
+  [mergeV()](https://tinkerpop.apache.org/docs/current/reference/#mergevertex-step)   –   The mergeV() step can be converted to a native Neptune engine operation if the parameters (the merge condition, the `onCreate` and `onMatch`) are constant (either `null`, a constant `Map`, or `select()` of a `Map`). All examples in [ upserting vertices ](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-efficient-upserts.html#gremlin-upserts-vertices) can be converted. 
+ [order( )](http://tinkerpop.apache.org/docs/current/reference/#order-step)   –   The `order()` step can generally be converted to a native Neptune engine operation, unless one of the following is true:
  + The `order()` step is within a nested child traversal, like this:

    ```
    g.V().has('code','ANC').where(V().out().order().by(id))
    ```
  + Local ordering is being used, as for example with `order(local)`.
  + A custom comparator is being used in the `by()` modulation to order by. An example is this use of `sack()`:

    ```
    g.withSack(0).
      V().has('code','ANC').
          repeat(outE().sack(sum).by('dist').inV()).times(2).limit(10).
          order().by(sack())
    ```
  + There are multiple orderings on the same element.
+ [project( )](http://tinkerpop.apache.org/docs/current/reference/#project-step)   –   The `project()` step can generally be converted to a native Neptune engine operation, unless the number of `by()` statements following the `project()` does not match the number of labels specified, as here:

  ```
  g.V().has('code','ANC').project('x', 'y').by(id)
  ```
+ [range( )](http://tinkerpop.apache.org/docs/current/reference/#range-step)   –   The `range()` step is only converted when the lower end of the range in question is zero (for example, `range(0,3)`).
+ [repeat( )](http://tinkerpop.apache.org/docs/current/reference/#repeat-step)   –   The `repeat()` step can generally be converted to a native Neptune engine operation, unless it is nested within another `repeat()` step, like this:

  ```
  g.V().has('code','ANC').repeat(out().repeat(out()).times(2)).times(2)
  ```
+ [sack( )](http://tinkerpop.apache.org/docs/current/reference/#sack-step)   –   The `sack()` step can generally be converted to a native Neptune engine operation, except in the following cases:
  + If a non-numeric sack operator is being used.
  + If a numeric sack operator other than `+`, `-`, `mult`, `div`, `min` and `max` is being used.
  + If `sack()` is used inside a `where()` step to filter based on a sack value, as here:

    ```
    g.V().has('code','ANC').sack(assign).by(values('code')).where(sack().is('ANC'))
    ```
+ [sum( )](http://tinkerpop.apache.org/docs/current/reference/#sum-step)   –   The `sum()` step can generally be converted to a native Neptune engine operation, but not when used to calculate a global summation, like this:

  ```
  g.V().has('code','ANC').outE('routes').values('dist').sum()
  ```
+ [union( )](http://tinkerpop.apache.org/docs/current/reference/#union-step)   –   The `union()` step can be converted to a native Neptune engine operation as long as it is the last step in the query aside from the terminal step.
+ [unfold( )](http://tinkerpop.apache.org/docs/current/reference/#unfold-step)   –   The `unfold()` step can only be converted to a native Neptune engine operation when it is used in the [Upsert pattern](http://tinkerpop.apache.org/docs/current/recipes/#element-existence) recommended on the [TinkerPop recipes page](http://tinkerpop.apache.org/docs/current/recipes/), and when it is used together with `cap()` like this:

  ```
  g.V().has('airport','country','IE').aggregate('airport').limit(2)
       .cap('airport').unfold()
  ```
+ [where( )](http://tinkerpop.apache.org/docs/current/reference/#where-step)   –   The `where()` step can generally be converted to a native Neptune engine operation, except in the following cases:
  + When by() modulations are used, like this:

    ```
    g.V().hasLabel('airport').as('a')
         .where(gt('a')).by('runways')
    ```
  + When comparison operators other than `eq`, `neq`, `within`, and `without` are used.
  + When user-supplied aggregations are used.

## Gremlin steps that are never converted to native Neptune engine operations
Never converted

The following Gremlin steps are supported in Neptune but are never converted to native Neptune engine operations. Instead, they are executed by the Gremlin server.
+ [choose( )](http://tinkerpop.apache.org/docs/current/reference/#choose-step)
+ [coin( )](http://tinkerpop.apache.org/docs/current/reference/#coin-step)
+ [inject( )](http://tinkerpop.apache.org/docs/current/reference/#inject-step)
+ [match( )](http://tinkerpop.apache.org/docs/current/reference/#match-step)
+ [math( )](http://tinkerpop.apache.org/docs/current/reference/#math-step)
+ [max( )](http://tinkerpop.apache.org/docs/current/reference/#max-step)
+ [mean( )](http://tinkerpop.apache.org/docs/current/reference/#mean-step)
+ [min( )](http://tinkerpop.apache.org/docs/current/reference/#min-step)
+ [option( )](http://tinkerpop.apache.org/docs/current/reference/#option-step)
+ [optional( )](http://tinkerpop.apache.org/docs/current/reference/#optional-step)
+ [path( )](http://tinkerpop.apache.org/docs/current/reference/#path-step)
+ [propertyMap( )](http://tinkerpop.apache.org/docs/current/reference/#propertymap-step)
+ [sample( )](http://tinkerpop.apache.org/docs/current/reference/#sample-step)
+ [skip( )](http://tinkerpop.apache.org/docs/current/reference/#skip-step)
+ [tail( )](http://tinkerpop.apache.org/docs/current/reference/#tail-step)
+ [timeLimit( )](http://tinkerpop.apache.org/docs/current/reference/#timelimit-step)
+ [tree( )](http://tinkerpop.apache.org/docs/current/reference/#tree-step)

## Gremlin steps that are not supported in Neptune at all
Not supported at all

The following Gremlin steps are not supported at all in Neptune. In most cases this is because they require a `GraphComputer`, which Neptune does not currently support.
+ [connectedComponent( )](http://tinkerpop.apache.org/docs/current/reference/#connectedcomponent-step)
+ [io( )](http://tinkerpop.apache.org/docs/current/reference/#io-step)
+ [shortestPath( )](http://tinkerpop.apache.org/docs/current/reference/#shortestpath-step)
+ [withComputer( )](http://tinkerpop.apache.org/docs/current/reference/#with-step)
+ [pageRank( )](http://tinkerpop.apache.org/docs/current/reference/#pagerank-step)
+ [peerPressure( )](http://tinkerpop.apache.org/docs/current/reference/#peerpressure-step)
+ [program( )](http://tinkerpop.apache.org/docs/current/reference/#program-step)

The `io()` step is actually partially supported, in that it can be used to `read()` from a URL but not to `write()`.