

# Overview of Amazon Neptune features
<a name="feature-overview"></a>

**Note**  
This section does not cover using the query languages that you can use to access the data in a Neptune graph.  
For information about how to connect to a running Neptune DB cluster with Gremlin, see [Accessing a Neptune graph with Gremlin](access-graph-gremlin.md).  
For information about how to connect to a running Neptune DB cluster with openCypher, see [Accessing the Neptune Graph with openCypher](access-graph-opencypher.md).  
For information about how to connect to a running Neptune DB cluster with SPARQL, see [Accessing the Neptune graph with SPARQL](access-graph-sparql.md).

This section provides an overview of specific Neptune features, including:
+ [Neptune compliance with query-language standards](feature-overview-standards-compliance.md).
+ [Neptune's graph data model](feature-overview-data-model.md).
+ [An explanation of Neptune transaction semantics](transactions.md).
+ [An introduction to Neptune clusters and instances](feature-overview-db-clusters.md).
+ [Neptune's storage, reliability and availability](feature-overview-storage.md).
+ [An explanation of Neptune endpoints](feature-overview-endpoints.md).
+ [Using Neptune's *lab mode* to enable experimental features](features-lab-mode.md).
+ [A description of Neptune's DFE engine](neptune-dfe-engine.md).
+ [Neptune's JDBC connectivity](neptune-jdbc.md).
+ [A list of Neptune engine releases and how to update your engine](engine-releases.md).

# Notes on Amazon Neptune Standards Compliance
<a name="feature-overview-standards-compliance"></a>

Amazon Neptune complies with applicable standards in implementing the Gremlin and SPARQL graph query languages in most cases.

These sections describe the standards as well as those areas where Neptune extends or diverges from them.

**Topics**
+ [Gremlin standards compliance in Amazon Neptune](access-graph-gremlin-differences.md)
+ [SPARQL standards compliance in Amazon Neptune](feature-sparql-compliance.md)
+ [openCypher specification compliance in Amazon Neptune](feature-opencypher-compliance.md)

# Gremlin standards compliance in Amazon Neptune
<a name="access-graph-gremlin-differences"></a>

The following sections provide an overview of the Neptune implementation of Gremlin and how it differs from the Apache TinkerPop implementation.

Neptune implements some Gremlin steps natively in its engine, and uses the Apache TinkerPop Gremlin implementation to process others (see [Native Gremlin step support in Amazon Neptune](gremlin-step-support.md)).

**Note**  
For some concrete examples of these implementation differences shown in Gremlin Console and Amazon Neptune, see the [Using Gremlin to access graph data in Amazon Neptune](get-started-graph-gremlin.md) section of the Quick Start.

**Topics**
+ [Applicable Standards for Gremlin](#feature-gremlin-applicable-standards)
+ [Variables and parameters in scripts](#feature-gremlin-differences-variables)
+ [TinkerPop enumerations](#feature-gremlin-differences-tinkerpop)
+ [Java code](#feature-gremlin-differences-java)
+ [Properties on elements](#feature-gremlin-differences-properties-on-elements)
+ [Script execution](#feature-gremlin-differences-script)
+ [Sessions](#feature-gremlin-differences-sessions)
+ [Transactions](#feature-gremlin-differences-transactions)
+ [Vertex and edge IDs](#feature-gremlin-differences-vertex-edge-ids)
+ [User-supplied IDs](#feature-gremlin-differences-user-supplied-ids)
+ [Vertex property IDs](#feature-gremlin-differences-vertex-property-ids)
+ [Cardinality of vertex properties](#feature-gremlin-differences-vertex-property-cardinality)
+ [Updating a vertex property](#feature-gremlin-differences-vertex-property-update)
+ [Labels](#feature-gremlin-differences-labels)
+ [Escape characters](#feature-gremlin-differences-escapes)
+ [Groovy limitations](#feature-gremlin-differences-groovy)
+ [Serialization](#feature-gremlin-differences-serialization)
+ [Lambda steps](#feature-gremlin-differences-lambda)
+ [Unsupported Gremlin methods](#feature-gremlin-differences-unsupported-methods)
+ [Unsupported Gremlin steps](#feature-gremlin-differences-unsupported-steps)
+ [Gremlin graph features in Neptune](#gremlin-api-reference-features)

## Applicable Standards for Gremlin
<a name="feature-gremlin-applicable-standards"></a>
+ The Gremlin language is defined by [Apache TinkerPop Documentation](http://tinkerpop.apache.org/docs/current/reference/) and the Apache TinkerPop implementation of Gremlin rather than by a formal specification.
+ For numeric formats, Gremlin follows the IEEE 754 standard ([IEEE 754-2019 - IEEE Standard for Floating-Point Arithmetic](https://standards.ieee.org/content/ieee-standards/en/standard/754-2019.html). For more information, also see the [Wikipedia IEEE 754 page](https://en.wikipedia.org/wiki/IEEE_754)).

## Variables and parameters in scripts
<a name="feature-gremlin-differences-variables"></a>

Where pre-bound variables are concerned, the traversal object `g` is Pre-bound in Neptune, and the `graph` object is not supported.

Although Neptune does not support Gremlin variables or parameterization in scripts, you may often encounter sample scripts for Gremlin Server on the Internet that contain variable declarations, such as:

```
String query = "x = 1; g.V(x)";
List<Result> results = client.submit(query).all().get();
```

There are also many examples that make use of [parameterization](https://tinkerpop.apache.org/docs/current/reference/#parameterized-scripts) (or bindings) when submitting queries, such as:

```
Map<String,Object> params = new HashMap<>();
params.put("x",1);
String query = "g.V(x)";
List<Result> results = client.submit(query).all().get();
```

The parameter examples are usually associated with warnings about performance penalties for not parameterizing when possible. There are a great many such examples for TinkerPop that you may encounter, and they all sound quite convincing about the need to parameterize.

However, both the variables declarations feature and the parameterization feature (along with the warnings) only apply to TinkerPop's Gremlin Server when it is using the `GremlinGroovyScriptEngine`. They do not apply when Gremlin Server uses Gremlin's `gremlin-language` ANTLR grammar to parse queries. The ANTLR grammar doesn't support either variable declarations or parameterization, so when using ANTLR, you don't have to worry about failing to parameterize. Because the ANTLR grammar is a newer component of TinkerPop, older content you may encounter on the Internet doesn't generally reflect this distinction.

Neptune uses the ANTLR grammar in its query processing engine rather than the `GremlinGroovyScriptEngine`, so it does not support variables or parameterization or the `bindings` property. As a result, the problems related to failing to parameterize do not apply in Neptune. Using Neptune, it's perfectly safe simply to submit the query as-is where one would normally parameterize. As a result, the previous example can be simplified without any performance penalty as follows:

```
String query = "g.V(1)";
List<Result> results = client.submit(query).all().get();
```

## TinkerPop enumerations
<a name="feature-gremlin-differences-tinkerpop"></a>

Neptune does not support fully qualified class names for enumeration values. For example, you must use `single` and not `org.apache.tinkerpop.gremlin.structure.VertexProperty.Cardinality.single` in your Groovy request.

The enumeration type is determined by parameter type.

The following table shows the allowed enumeration values and the related TinkerPop fully qualified name.

| Allowed Values | Class | 
| --- |--- |
| id, key, label, value | [org.apache.tinkerpop.gremlin.structure.T](https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/structure/T.html) | 
| T.id, T.key, T.label, T.value | [org.apache.tinkerpop.gremlin.structure.T](https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/structure/T.html) | 
| set, single | [org.apache.tinkerpop.gremlin.structure.VertexProperty.Cardinality](https://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/structure/VertexProperty.Cardinality.html) | 
| asc, desc, shuffle | [org.apache.tinkerpop.gremlin.process.traversal.Order](https://tinkerpop.apache.org/javadocs/3.7.2/full/org/apache/tinkerpop/gremlin/process/traversal/Order.html) | 
| Order.asc, Order.desc, Order.shuffle | [org.apache.tinkerpop.gremlin.process.traversal.Order](https://tinkerpop.apache.org/javadocs/3.7.2/full/org/apache/tinkerpop/gremlin/process/traversal/Order.html) | 
| global, local | [org.apache.tinkerpop.gremlin.process.traversal.Scope](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html) | 
| Scope.global, Scope.local | [org.apache.tinkerpop.gremlin.process.traversal.Scope](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/process/traversal/Scope.html) | 
| all, first, last, mixed | [org.apache.tinkerpop.gremlin.process.traversal.Pop](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/process/traversal/Pop.html) | 
| normSack | [org.apache.tinkerpop.gremlin.process.traversal.SackFunctions.Barrier](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/process/traversal/SackFunctions.Barrier.html) | 
| addAll, and, assign, div, max, min, minus, mult, or, sum, sumLong | [org.apache.tinkerpop.gremlin.process.traversal.Operator](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/process/traversal/Operator.html) | 
| keys, values | [org.apache.tinkerpop.gremlin.structure.Column](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/structure/Column.html) | 
| BOTH, IN, OUT | [org.apache.tinkerpop.gremlin.structure.Direction](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/structure/Direction.html) | 
| any, none | [org.apache.tinkerpop.gremlin.process.traversal.step.TraversalOptionParent.Pick](https://tinkerpop.apache.org/javadocs/current/full/org/apache/tinkerpop/gremlin/process/traversal/Pick.html) | 

## Java code
<a name="feature-gremlin-differences-java"></a>

Neptune does not support calls to methods defined by arbitrary Java or Java library calls other than supported Gremlin APIs. For example, `java.lang.*`, `Date()`, and `g.V().tryNext().orElseGet()` are not allowed.

## Properties on elements
<a name="feature-gremlin-differences-properties-on-elements"></a>

 Neptune does not support the `materializeProperties` flag that was introduced in TinkerPop 3.7.0 to return properties on elements. As a result, Neptune will still only return vertices or edges as references with just their `id` and `label`.

## Script execution
<a name="feature-gremlin-differences-script"></a>

All queries must begin with `g`, the traversal object. 

In String query submissions, multiple traversals can be issued separated by a semicolon (`;`) or a newline character (`\n`). To be executed, every statement other than the last must end with an `.iterate()` step. Only the final traversal data is returned. Note that this does not apply to GLV ByteCode query submissions.

## Sessions
<a name="feature-gremlin-differences-sessions"></a>

Sessions in Neptune are limited to only 10 minutes in duration. See [Gremlin script-based sessions](access-graph-gremlin-sessions.md) and the [TinkerPop Session Reference](https://tinkerpop.apache.org/docs/current/reference/#console-sessions) for more information.

## Transactions
<a name="feature-gremlin-differences-transactions"></a>

Neptune opens a new transaction at the beginning of each Gremlin traversal and closes the transaction upon the successful completion of the traversal. The transaction is rolled back when there is an error. 

 Multiple statements separated by a semicolon (`;`) or a newline character (`\n`) are included in a single transaction. Every statement other than the last must end with a `next()` step to be executed. Only the final traversal data is returned.

Manual transaction logic using `tx.commit()` and `tx.rollback()` is not supported.

**Important**  
This ***only*** applies to methods where you send the Gremlin query as a ***text string*** (see [Gremlin transactions](access-graph-gremlin-transactions.md)).

## Vertex and edge IDs
<a name="feature-gremlin-differences-vertex-edge-ids"></a>

Neptune Gremlin Vertex and Edge IDs must be of type `String`. These ID strings support Unicode characters, and cannot exceed 55 MB in size.

User-supplied IDs are supported, but they are optional in normal usage. If you don't provide an ID when you add a vertex or an edge, Neptune generates a UUID and converts it to a string, in a form like this: `"48af8178-50ce-971a-fc41-8c9a954cea62"`. These UUIDs do not conform to the RFC standard, so if you need standard UUIDs you should generate them externally and provide them when you add vertices or edges.

**Note**  
The Neptune `Load` command requires that you provide IDs, using the **\$1id** field in the Neptune CSV format.

## User-supplied IDs
<a name="feature-gremlin-differences-user-supplied-ids"></a>

User-supplied IDs are allowed in Neptune Gremlin with the following stipulations.
+ Supplied IDs are optional.
+ Only vertexes and edges are supported.
+ Only type `String` is supported.

To create a new vertex with a custom ID, use the `property` step with the `id` keyword: `g.addV().property(id, 'customid')`.

**Note**  
 Do not put quotation marks around the `id` keyword. It refers to `T.id`.

All vertex IDs must be unique, and all edge IDs must be unique. However, Neptune does allow a vertex and an edge to have the same ID.

If you try to create a new vertex using the `g.addV()` and a vertex with that ID already exists, the operation fails. The exception to this is if you specify a new label for the vertex, the operation succeeds but adds the new label and any additional properties specified to the existing vertex. Nothing is overwritten. A new vertex is not created. The vertex ID does not change and remains unique.

For example, the following Gremlin Console commands succeed:

```
gremlin> g.addV('label1').property(id, 'customid')
gremlin> g.addV('label2').property(id, 'customid')
gremlin> g.V('customid').label()
==>label1::label2
```

## Vertex property IDs
<a name="feature-gremlin-differences-vertex-property-ids"></a>

Vertex property IDs are generated automatically and can show up as positive or negative numbers when queried.

## Cardinality of vertex properties
<a name="feature-gremlin-differences-vertex-property-cardinality"></a>

Neptune supports set cardinality and single cardinality. If it isn't specified, set cardinality is selected. This means that if you set a property value, it adds a new value to the property, but only if it doesn't already appear in the set of values. This is the Gremlin enumeration value of [Set](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/structure/VertexProperty.Cardinality.html). 

`List` is not supported. For more information about property cardinality, see the [Vertex](https://tinkerpop.apache.org/javadocs/3.7.2/core/org/apache/tinkerpop/gremlin/structure/Vertex.html#property-org.apache.tinkerpop.gremlin.structure.VertexProperty.Cardinality-java.lang.String-V-java.lang.Object...-) topic in the Gremlin JavaDoc.

## Updating a vertex property
<a name="feature-gremlin-differences-vertex-property-update"></a>

To update a property value without adding an additional value to the set of values, specify `single` cardinality in the `property` step.

```
g.V('exampleid01').property(single, 'age', 25)
```

This removes all existing values for the property.

## Labels
<a name="feature-gremlin-differences-labels"></a>

Neptune supports multiple labels for a vertex. When you create a label, you can specify multiple labels by separating them with `::`. For example, `g.addV("Label1::Label2::Label3")` adds a vertex with three different labels. The `hasLabel` step matches this vertex with any of those three labels: `hasLabel("Label1")`, `hasLabel("Label2")`, and `hasLabel("Label3")`. 

**Important**  
The `::` delimiter is reserved for this use only. You cannot specify multiple labels in the `hasLabel` step. For example, `hasLabel("Label1::Label2")` does not match anything.

## Escape characters
<a name="feature-gremlin-differences-escapes"></a>

Neptune resolves all escape characters as described in the [Escaping Special Characters]( http://groovy-lang.org/syntax.html#_escaping_special_characters) section of the Apache Groovy language documentation.

## Groovy limitations
<a name="feature-gremlin-differences-groovy"></a>

Neptune doesn't support Groovy commands that don't start with `g`. This includes math (for example, `1+1`), system calls (for example, `System.nanoTime()`), and variable definitions (for example, `1+1`).

**Important**  
Neptune does not support fully qualified class names. For example, you must use `single` and not `org.apache.tinkerpop.gremlin.structure.VertexProperty.Cardinality.single` in your Groovy request.

## Serialization
<a name="feature-gremlin-differences-serialization"></a>

Neptune supports the following serializations based on the requested MIME type.

 Neptune exposes all of the serializers that TinkerPop does, with support for the various versions and configurations of GraphSON and GraphBinary. Despite there being many options present, the guidance for which to use is straightforward: 
+  If you are using Apache TinkerPop drivers, prefer the default for the driver without specifying one explicitly. Unless you have a very specific reason for doing so, you likely don’t need to specify the serializer in your driver initialization. In general, the default used by the drivers is `application/vnd.graphbinary-v1.0`. 
+  If you are connecting to Neptune over HTTP, prioritize the use of `application/vnd.gremlin-v3.0+json;types=false` as the embedded types in the alternative version of GraphSON 3 make it complicated to work with. 
+  The `application/vnd.graphbinary-v1.0-stringd` is generally only useful when used in conjunction with [Gremlin Console](https://docs.aws.amazon.com//neptune/latest/userguide/access-graph-gremlin-console.html) as it converts all results to a string representation for simple display. 
+  The remaining formats remain present for legacy reasons and should typically not be used with drivers without clear cause. 

|  |  |  | 
| --- |--- |--- |
| MIME type | Serialization | Configuration | 
| `application/vnd.gremlin-v1.0+json` | GraphSONMessageSerializerV1 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV1] | 
| `application/vnd.gremlin-v1.0+json;types=false` | GraphSONUntypedMessageSerializerV1 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV1] | 
| `application/vnd.gremlin-v2.0+json` | GraphSONMessageSerializerV2 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV2] | 
| `application/vnd.gremlin-v2.0+json;types=false` | GraphSONUntypedMessageSerializerV2 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV2] | 
| `application/vnd.gremlin-v3.0+json` | GraphSONMessageSerializerV3 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3] | 
| `application/vnd.gremlin-v3.0+json;types=false` | GraphSONUntypedMessageSerializerV3 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3] | 
| `application/json` | GraphSONUntypedMessageSerializerV3 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV1] | 
| `application/vnd.graphbinary-v1.0` | GraphBinaryMessageSerializerV1 |  | 
| `application/vnd.graphbinary-v1.0-stringd` | GraphBinaryMessageSerializerV1 | serializeResultToString: true | 
| `application/vnd.gremlin-v1.0+json` | GraphSONMessageSerializerGremlinV1 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV1] | 
| `application/vnd.gremlin-v2.0+json` | GraphSONMessageSerializerV2   (only works with WebSockets) | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV2] | 
| `application/vnd.gremlin-v3.0+json` | `GraphSONMessageSerializerV3` |  | 
| `application/json` | GraphSONMessageSerializerV3 | ioRegistries: [org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3] | 
| `application/vnd.graphbinary-v1.0` | GraphBinaryMessageSerializerV1 |  | 

**Note**  
 The serializer table shown here refers to naming as of TinkerPop 3.7.0. If you would like to know more about this change, please see the [TinkerPop upgrade documentation](https://tinkerpop.apache.org/docs/current/upgrade/#_serializer_renaming). Gryo serialization support was deprecated in 3.4.3 and was officially removed in 3.6.0. If you are explicitly using Gryo or on a driver version that uses it by default, then you should switch to GraphBinary or upgrade your driver. 

## Lambda steps
<a name="feature-gremlin-differences-lambda"></a>

Neptune does not support Lambda Steps.

## Unsupported Gremlin methods
<a name="feature-gremlin-differences-unsupported-methods"></a>

Neptune does not support the following Gremlin methods:
+ `org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal.program(org.apache.tinkerpop.gremlin.process.computer.VertexProgram)`
+ `org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal.sideEffect(java.util.function.Consumer)`
+ `org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal.from(org.apache.tinkerpop.gremlin.structure.Vertex)`
+ `org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal.to(org.apache.tinkerpop.gremlin.structure.Vertex)`

For example, the following traversal is not allowed: `g.V().addE('something').from(__.V().next()).to(__.V().next())`.

**Important**  
This ***only*** applies to methods where you send the Gremlin query as a ***text string***.

## Unsupported Gremlin steps
<a name="feature-gremlin-differences-unsupported-steps"></a>

Neptune does not support the following Gremlin steps:
+ The Gremlin [io( ) Step](http://tinkerpop.apache.org/docs/3.7.2/reference/#io-step) is only partially supported in Neptune. It can be used in a read context, as in `g.io((url)).read()`, but not to write.

## Gremlin graph features in Neptune
<a name="gremlin-api-reference-features"></a>

The Neptune implementation of Gremlin does not expose the `graph` object. The following tables list Gremlin features and indicate whether or not Neptune supports them.

### Neptune support for `graph` features
<a name="gremlin-api-graph-features"></a>

The Neptune graph features, where supported, are the same as would be returned by the `graph.features()` command.


| 
| 
| Graph feature | Enabled? | 
| --- |--- |
| Transactions |  true | 
| ThreadedTransactions |  false | 
| Computer |  false | 
| Persistence |  true | 
| ConcurrentAccess |  true | 

### Neptune support for variable features
<a name="gremlin-api-variable-features"></a>


| 
| 
| Variable feature | Enabled? | 
| --- |--- |
| Variables |  false | 
| SerializableValues |  false | 
| UniformListValues |  false | 
| BooleanArrayValues |  false | 
| DoubleArrayValues |  false | 
| IntegerArrayValues |  false | 
| StringArrayValues |  false | 
| BooleanValues |  false | 
| ByteValues |  false | 
| DoubleValues |  false | 
| FloatValues |  false | 
| IntegerValues |  false | 
| LongValues |  false | 
| MapValues |  false | 
| MixedListValues |  false | 
| StringValues |  false | 
| ByteArrayValues |  false | 
| FloatArrayValues |  false | 
| LongArrayValues |  false | 

### Neptune support for vertex features
<a name="gremlin-api-vertex-features"></a>


| 
| 
| Vertex feature | Enabled? | 
| --- |--- |
| MetaProperties |  false | 
| DuplicateMultiProperties |  false | 
| AddVertices |  true | 
| RemoveVertices |  true | 
| MultiProperties |  true | 
| UserSuppliedIds |  true | 
| AddProperty |  true | 
| RemoveProperty |  true | 
| NumericIds |  false | 
| StringIds |  true | 
| UuidIds |  false | 
| CustomIds |  false | 
| AnyIds |  false | 

### Neptune support for vertex property features
<a name="gremlin-api-vertex-property-features"></a>


| 
| 
| Vertex property feature | Enabled? | 
| --- |--- |
| UserSuppliedIds |  false | 
| AddProperty |  true | 
| RemoveProperty |  true | 
| NumericIds |  true | 
| StringIds |  true | 
| UuidIds |  false | 
| CustomIds |  false | 
| AnyIds |  false | 
| Properties |  true | 
| SerializableValues |  false | 
|  UniformListValues |  false | 
| BooleanArrayValues |  false | 
| DoubleArrayValues |  false | 
| IntegerArrayValues |  false | 
| StringArrayValues |  false | 
| BooleanValues |  true | 
| ByteValues |  true | 
| DoubleValues |  true | 
| FloatValues |  true | 
| IntegerValues |  true | 
| LongValues |  true | 
| MapValues |  false | 
| MixedListValues |  false | 
| StringValues |  true | 
| ByteArrayValues |  false | 
| FloatArrayValues |  false | 
| LongArrayValues |  false | 

### Neptune support for edge features
<a name="gremlin-api-edge-features"></a>


| 
| 
| Edge feature | Enabled? | 
| --- |--- |
| AddEdges |  true | 
| RemoveEdges |  true | 
| UserSuppliedIds |  true | 
| AddProperty |  true | 
| RemoveProperty |  true | 
| NumericIds |  false | 
| StringIds |  true | 
| UuidIds |  false | 
| CustomIds |  false | 
| AnyIds |  false | 

### Neptune support for edge property features
<a name="gremlin-api-edge-property-features"></a>


| 
| 
| Edge property feature | Enabled? | 
| --- |--- |
| Properties |  true | 
| SerializableValues |  false | 
| UniformListValues |  false | 
| BooleanArrayValues |  false | 
| DoubleArrayValues |  false | 
| IntegerArrayValues |  false | 
| StringArrayValues |  false | 
| BooleanValues |  true | 
| ByteValues |  true | 
| DoubleValues |  true | 
| FloatValues |  true | 
| IntegerValues |  true | 
| LongValues |  true | 
| MapValues |  false | 
| MixedListValues |  false | 
| StringValues |  true | 
| ByteArrayValues |  false | 
| FloatArrayValues |  false | 
| LongArrayValues |  false | 

# SPARQL standards compliance in Amazon Neptune
<a name="feature-sparql-compliance"></a>

After listing applicable SPARQL standards, the following sections provide specific details about how Neptune's SPARQL implementation extends or diverges from those standards.

**Topics**
+ [Applicable Standards for SPARQL](#feature-sparql-applicable-standards)
+ [Default Namespace Prefixes in Neptune SPARQL](#sparql-default-prefixes)
+ [SPARQL Default Graph and Named Graphs](#sparql-default-graph)
+ [SPARQL XPath Constructor Functions Supported by Neptune](#access-graph-sparql-xpath-constructors)
+ [Default base IRI for queries and updates](#opencypher-compliance-default-iri)
+ [xsd:dateTime Values in Neptune](#access-graph-sparql-xsd-date-time)
+ [Neptune Handling of Special Floating Point Values](#feature-overview-special-values-comparisons)
+ [Neptune Limitation of Arbitrary-Length Values](#feature-overview-arbitrary-length-values)
+ [Neptune Extends Equals Comparison in SPARQL](#feature-overview-sparql-not-equal)
+ [Handling of Out-of-Range Literals in Neptune SPARQL](#feature-overview-sparql-out-of-range)

Amazon Neptune complies with the following standards in implementing the SPARQL graph query language.

## Applicable Standards for SPARQL
<a name="feature-sparql-applicable-standards"></a>
+ SPARQL is defined by the W3C [SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/) recommendation of March 21, 2013.
+ The SPARQL Update protocol and query language are defined by the W3C [SPARQL 1.1 Update](https://www.w3.org/TR/sparql11-update/) specification.
+ For numeric formats, SPARQL follows the [W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes](https://www.w3.org/TR/xmlschema11-2/) specification, which is consistent with the IEEE 754 specification ([IEEE 754-2019 - IEEE Standard for Floating-Point Arithmetic](https://standards.ieee.org/content/ieee-standards/en/standard/754-2019.html). For more information, see also the [Wikipedia IEEE 754 page](https://en.wikipedia.org/wiki/IEEE_754)). However, features that were introduced after the `IEEE 754-1985` version are not included in the specification.

## Default Namespace Prefixes in Neptune SPARQL
<a name="sparql-default-prefixes"></a>

Neptune defines the following prefixes by default for use in SPARQL queries. For more information, see [Prefixed Names](https://www.w3.org/TR/sparql11-query/#prefNames) in the SPARQL specification.
+ `rdf`  – `http://www.w3.org/1999/02/22-rdf-syntax-ns#`
+ `rdfs` – `http://www.w3.org/2000/01/rdf-schema#`
+ `owl`  – `http://www.w3.org/2002/07/owl#`
+ `xsd`  – `http://www.w3.org/2001/XMLSchema#`

## SPARQL Default Graph and Named Graphs
<a name="sparql-default-graph"></a>

Amazon Neptune associates every triple with a named graph. The default graph is defined as the union of all named graphs. 

**Default Graph for Queries**  
If you submit a SPARQL query without explicitly specifying a graph via the `GRAPH` keyword or constructs such as `FROM NAMED`, Neptune always considers all triples in your DB instance. For example, the following query returns all triples from a Neptune SPARQL endpoint: 

`SELECT * WHERE { ?s ?p ?o }`

Triples that appear in more than one graph are returned only once.

For information about the default graph specification, see the [RDF Dataset](https://www.w3.org/TR/sparql11-query/#rdfDataset) section of the SPARQL 1.1 Query Language specification.

**Specifying the Named Graph for Loading, Inserts, or Updates**  
If you don't specify a named graph when loading, inserting, or updating triples, Neptune uses the fallback named graph defined by the URI `http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph`.

When you issue a Neptune `Load` request using a triple-based format, you can specify the named graph to use for all triples by using the `parserConfiguration: namedGraphUri` parameter. For information about the `Load` command syntax, see [Neptune Loader Command](load-api-reference-load.md).

**Important**  
 If you don't use this parameter, and you don't specify a named graph, the fallback URI is used: `http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph`.

This fallback named graph is also used if you load triples via `SPARQL UPDATE` without explicitly providing a named graph target.

You can use the quads-based format N-Quads to specify a named graph for each triple in the database. 

**Note**  
Using N-Quads allows you to leave the named graph blank. In this case, `http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph` is used.  
You can override the default named graph for N-Quads using the `namedGraphUri` parser configuration option.

## SPARQL XPath Constructor Functions Supported by Neptune
<a name="access-graph-sparql-xpath-constructors"></a>

The SPARQL standard allows SPARQL engines to support an extensible set of XPath constructor functions. Neptune currently supports the following constructor functions, where the `xsd` prefix is defined as `http://www.w3.org/2001/XMLSchema#`:
+ `xsd:boolean`
+ `xsd:integer`
+ `xsd:double`
+ `xsd:float`
+ `xsd:decimal`
+ `xsd:long`
+ `xsd:unsignedLong`

## Default base IRI for queries and updates
<a name="opencypher-compliance-default-iri"></a>

Because a Neptune cluster has several different endpoints, using the request URL of a query or update as the base IRI could lead to unexpected results when resolving relative IRIs.

As of [engine release 1.2.1.0](engine-releases-1.2.1.0.md), Neptune uses `http://aws.amazon.com/neptune/default/` as the base IRI if an explicit base IRI is not part of the request.

In the following request, the base IRI is part of the request:

```
BASE <http://example.org/default/>
INSERT DATA { <node1> <id> "n1" }

BASE <http://example.org/default/>
SELECT * { <node1> ?p ?o }
```

And the result would be:

```
?p                                                   ?o
http://example.org/default/id                        n1
```

In this request, however, no base IRI is included:

```
INSERT DATA { <node1> <id> "n1" }

SELECT * { <node1> ?p ?o }
```

In that case, the result would be:

```
?p                                                   ?o
http://aws.amazon.com/neptune/default/id             n1
```

## xsd:dateTime Values in Neptune
<a name="access-graph-sparql-xsd-date-time"></a>

For performance reasons, Neptune always stores date/time values as Coordinated Universal Time (UTC). This makes direct comparisons very efficient.

This also means that if you enter a `dateTime` value that specifies a particular time zone, Neptune translates the value to UTC and discards that time-zone information. Then, when you retrieve the `dateTime` value later, it is expressed in UTC, not the time of the original time zone, and you can no longer tell what that original time zone was.

## Neptune Handling of Special Floating Point Values
<a name="feature-overview-special-values-comparisons"></a>

Neptune handles special floating-point values in SPARQL as follows.

### SPARQL NaN Handling in Neptune
<a name="feature-overview-NaN-comparisons"></a>

In Neptune, SPARQL can accept a value of `NaN` in a query. No distinction is made between signaling and quiet `NaN` values. Neptune treats all `NaN` values as quiet.

Semantically, no comparison of a `NaN` is possible, because nothing is greater than, less than, or equal to a `NaN`. This means that a value of `NaN` on one side of a comparison in theory never matches *anything* on the other side.

 However, the [XSD specification](https://www.w3.org/TR/xmlschema-2/#double) does treat two `xsd:double` or `xsd:float` `NaN` values as equal. Neptune follows this for the `IN` filter, for the equal operator in filter expressions, and for exact match semantics (having a `NaN` in the object position of a triple pattern).

### SPARQL Infinite Value Handling in Neptune
<a name="feature-overview-infinity-comparisons"></a>

In Neptune, SPARQL can accept a value of `INF` or `-INF` in a query. `INF` compares as greater than any other numeric value, and `-INF` compares as less than any other numeric value.

Two INF values with matching signs compare as equal to each other regardless of their type (for example, a float `-INF` compares as equal to a double `-INF`).

Of course, no comparison with a `NaN` is possible because nothing is greater than, less than, or equal to a `NaN`.

### SPARQL Negative Zero Handling in Neptune
<a name="feature-overview-zero-comparisons"></a>

Neptune normalizes a negative zero value to an unsigned zero. You can use negative zero values in a query, but they aren't recorded as such in the database, and they compare as equal to unsigned zeros.

## Neptune Limitation of Arbitrary-Length Values
<a name="feature-overview-arbitrary-length-values"></a>

Neptune limits the storage size of XSD integer, floating point, and decimal values in SPARQL to 64 bits. Using larger values results in an `InvalidNumericDataException` error.

## Neptune Extends Equals Comparison in SPARQL
<a name="feature-overview-sparql-not-equal"></a>

The SPARQL standard defines a ternary logic for value expressions, where a value expression can either evaluate to `true`, `false`, or `error`. The default semantics for term equality as defined in the [SPARQL 1.1 specification](https://www.w3.org/TR/sparql11-query/#func-RDFterm-equal)), which applies to `=` and `!=` comparisons in `FILTER` conditions, produces an `error` when comparing data types that are not explicitly comparable in the [operators table](https://www.w3.org/TR/sparql11-query/#OperatorMapping) in the specification.

This behavior can lead to unintuitive results, as in the following example.

Data:

```
<http://example.com/Server/1> <http://example.com/ip> "127.0.0.1"^^<http://example.com/datatype/IPAddress>
```

Query 1:

```
SELECT * WHERE {
    <http://example.com/Server/1> <http://example.com/ip> ?o .
    FILTER(?o = "127.0.0.2"^^<http://example.com/datatype/IPAddress>)
}
```

Query 2:

```
SELECT * WHERE {
    <http://example.com/Server/1> <http://example.com/ip> ?o .
    FILTER(?o != "127.0.0.2"^^<http://example.com/datatype/IPAddress>)
}
```

With the default SPARQL semantics that Neptune used before release 1.0.2.1, both queries would return the empty result. The reason is that `?o = "127.0.0.2"^^<http://example.com/IPAddress>` when evaluated for `?o := "127.0.0.1"^^<http://example.com/IPAddress>` produces an `error` rather than `false` because there are no explicit comparison rules specified for the custom data type `<http://example.com/IPAddress>`. As a result, the negated version in the second query also produces an `error`. In both queries, the `error` causes the candidate solution to be filtered out.

Starting with release 1.0.2.1, Neptune has extended the SPARQL inequality operator in accord with the specification. See the [SPARQL 1.1 section on operator extensibility](https://www.w3.org/TR/sparql11-query/#operatorExtensibility), which allows engines to define additional rules on how to compare across user-defined and non-comparable built-in data types.

Using this option, Neptune now treats a comparison of any two data types that is not explicitly defined in the operator-mapping table as evaluating to `true` if the literal values and data types are syntactically equal, and false otherwise. An `error` is not produced in any case.

Using these new semantics, the second query would return `"127.0.0.1"^^<http://example.com/IPAddress>` instead of an empty result.

## Handling of Out-of-Range Literals in Neptune SPARQL
<a name="feature-overview-sparql-out-of-range"></a>

XSD semantics define each numeric type with its value space, except for `integer` and `decimal`. These definitions limit each type to a range of values. For example, the range of an `xsd:byte` range is from -128 to \$1127, inclusive. Any value outside of this range is considered invalid.

If you try to assign a literal value outside of the value space of a type (for example, if you try to set an `xsd:byte` to a literal value of 999), Neptune accepts the out-of-range value as-is, without rounding or truncating it. But it doesn't persist it as a numeric value because the given type can't represent it.

That is, Neptune accepts `"999"^^xsd:byte` even though it is a value outside of the defined `xsd:byte` value range. However, after the value is persisted in the database, it can only be used in exact match semantics, in an object position of a triple pattern. No range filter can be executed on it because out-of-range literals are not treated as numeric values.

The SPARQL 1.1 specification defines [range operators](https://www.w3.org/TR/sparql11-query/#OperatorMapping) in the form `numeric`*-operator-*`numeric`, `string`*-operator-*`string`, `literal`*-operator-*`literal`, and so forth. Neptune can't execute a range comparison operator anything like `invalid-literal`*-operator-*`numeric-value`.

# openCypher specification compliance in Amazon Neptune
<a name="feature-opencypher-compliance"></a>

The Amazon Neptune release of openCypher generally supports the clauses, operators, expressions, functions, and syntax defined in the by the current openCypher specification, which is the [Cypher Query Language Reference Version 9](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf). Limitations and differences in Neptune support for openCypher are called out below.

 Amazon Neptune also supports several features beyond the scope of the openCypher specification. Refer to [openCypher extensions in Amazon Neptune](access-graph-opencypher-extensions.md) for details. 

**Note**  
The current Neo4j implementation of Cypher contains functionality that is not contained in the openCypher specification mentioned above. If you are migrating current Cypher code to Neptune, see [Neptune compatibility with Neo4j](migration-compatibility.md) and [Rewriting Cypher queries to run in openCypher on Neptune](migration-opencypher-rewrites.md) for more information.

## Support for openCypher clauses in Neptune
<a name="opencypher-compliance-clauses"></a>

Neptune supports the following clauses, except as noted:
+ `MATCH`   –   Supported, except that *`shortestPath()`* and *`allShortestPaths()`* are not currently supported.
+ `OPTIONAL MATCH`
+ *`MANDATORY MATCH`*   –   is **not** currently supported in Neptune. Neptune does, however, support [custom ID values](access-graph-opencypher-extensions.md#opencypher-compliance-custom-ids) in `MATCH` queries.
+ `RETURN`   –   Supported, except when used with non-static values for `SKIP` or `LIMIT`. For example, the following currently does not work:

  ```
  MATCH (n)
  RETURN n LIMIT toInteger(rand())    // Does NOT work!
  ```
+ `WITH`   –   Supported, except when used with non-static values for `SKIP` or `LIMIT`. For example, the following currently does not work:

  ```
  MATCH (n)
  WITH n SKIP toInteger(rand())
  WITH count() AS count
  RETURN count > 0 AS nonEmpty    // Does NOT work!
  ```
+ `UNWIND`
+ `WHERE`
+ `ORDER BY`
+ `SKIP`
+ `LIMIT`
+ `CREATE`   –   Neptune lets you create [custom ID values](access-graph-opencypher-extensions.md#opencypher-compliance-custom-ids) in `CREATE` queries.
+ `DELETE`
+ `SET`
+ `REMOVE`
+ `MERGE`   –   Neptune supports [custom ID values](access-graph-opencypher-extensions.md#opencypher-compliance-custom-ids) in `MERGE` queries.
+ *`CALL[YIELD...]`*   –   is **not** currently supported in Neptune.
+ `UNION, UNION ALL`   –   read-only queries are supported, but mutation queries are **not** currently supported.
+  `USING`   –   `USING` is supported from engine version [1.3.2.0](https://docs.aws.amazon.com//neptune/latest/userguide/engine-releases-1.3.2.0.html). See [Query hints](https://docs.aws.amazon.com//neptune/latest/userguide/opencypher-query-hints.html) for more information. 

## Support for openCypher operators in Neptune
<a name="opencypher-compliance-operators"></a>

Neptune supports the following operators, except as noted:

**General operators**
+ `DISTINCT`
+ The `.` operator for accessing properties of a nested literal map.

**Mathematical operators**
+ The `+` addition operator.
+ The `-` subtraction operator.
+ The `*` multiplication operator.
+ The `/` division operator.
+ The `%` modulo division operator.
+ The `^` exponentiation operator *is NOT supported*.

**Comparison operators**
+ The `=` addition operator.
+ The `<>` inequality operator.
+ The `<` less-than operator is supported except when either of the arguments is a Path, List, or Map.
+ The `>` greater-than operator is supported except when either of the arguments is a Path, List, or Map.
+ The `<=` less-than-or-equal-to operator is supported except when either of the arguments is a Path, List, or Map.
+ The `>=` greater-than-or-equal-to operator is supported except when either of the arguments is a Path, List, or Map.
+ `IS NULL`
+ `IS NOT NULL`
+ `STARTS WITH` is supported if the data being searched for is a string.
+ `ENDS WITH` is supported if the data being searched for is a string.
+ `CONTAINS` is supported if the data being searched for is a string.

**Boolean operators**
+ `AND`
+ `OR`
+ `XOR`
+ `NOT`

**String operators**
+ The `+` concatenation operator.

**List operators**
+ The `+` concatenation operator.
+ `IN` (checks for the presence of an item in the list)

## Support for openCypher expressions in Neptune
<a name="opencypher-compliance-expressions"></a>

Neptune supports the following expressions, except as noted:
+ `CASE`
+ The `[]` expression is is **not** currently supported in Neptune for accessing dynamically computed property keys within a node, relationship, or map. For example, the following does not work:

  ```
  MATCH (n)
  WITH [5, n, {key: 'value'}] AS list
  RETURN list[1].name
  ```

## Support for openCypher functions in Neptune
<a name="opencypher-compliance-functions"></a>

Neptune supports the following functions, except as noted:

**Predicate functions**
+ `exists()`

**Scalar functions**
+ `coalesce()`
+ `endNode()`
+ `epochmillis()`
+ `head()`
+ `id()`
+ `last()`
+ `length()`
+ `randomUUID()`
+ `properties()`
+ `removeKeyFromMap`
+ `size()`   –   this overloaded method currently only works for pattern expressions, lists, and strings
+ `startNode()`
+ `timestamp()`
+ `toBoolean()`
+ `toFloat()`
+ `toInteger()`
+ `type()`

**Aggregating functions**
+ `avg()`
+ `collect()`
+ `count()`
+ `max()`
+ `min()`
+ `percentileDisc()`
+ `stDev()`
+ `percentileCont()`
+ `stDevP()`
+ `sum()`

**List functions**
+ [`join()`](access-graph-opencypher-extensions.md#opencypher-compliance-join-function) (concatenates strings in a list into a single string)
+ `keys()`
+ `labels()`
+ `nodes()`
+ `range()`
+ `relationships()`
+ `reverse()`
+ `tail()`

**Mathematical functions – numeric**
+ `abs()`
+ `ceil()`
+ `floor()`
+ `rand()`
+ `round()`
+ `sign()`

**Mathematical functions – logarithmic**
+ `e()`
+ `exp()`
+ `log()`
+ `log10()`
+ `sqrt()`

**Mathematical functions – trigonometric**
+ `acos()`
+ `asin()`
+ `atan()`
+ `atan2()`
+ `cos()`
+ `cot()`
+ `degrees()`
+ `pi()`
+ `radians()`
+ `sin()`
+ `tan()`

**String functions**
+ [`join()`](access-graph-opencypher-extensions.md#opencypher-compliance-join-function) (concatenates strings in a list into a single string)
+ `left()`
+ `lTrim()`
+ `replace()`
+ `reverse()`
+ `right()`
+ `rTrim()`
+ `split()`
+ `substring()`
+ `toLower()`
+ `toString()`
+ `toUpper()`
+ `trim()`

**User-defined functions**

*User-defined functions* are **not** currently supported in Neptune.

## Neptune-specific openCypher implementation details
<a name="opencypher-compliance-differences"></a>

The following sections describe ways in which the Neptune implementation of openCypher may differ from or go beyond the [openCypher spec](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf).

### Variable-length path (VLP) evaluations in Neptune
<a name="opencypher-compliance-differences-vlp"></a>

Variable length path (`VLP`) evaluations discover paths between nodes in the graph. Path length can be unrestricted in a query. To prevent cycles, the [openCypher spec](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf) specifies that each edge must be traversed at most once per solution.

For VLPs, the Neptune implementation deviates from the openCypher spec in that it only supports constant values for property equality filters. Take the following query:

```
MATCH (x)-[:route*1..2 {dist:33, code:x.name}]->(y) return x,y
```

Because the `x.name` property equality filter value is a not a constant, this query results in an `UnsupportedOperationException` with the message: `Property predicate over variable-length relationships with non-constant expression is not supported in this release.`

### Temporal support in the Neptune openCypher implementation (Neptune database 1.3.1.0 and below)
<a name="opencypher-compliance-time"></a>

Neptune currently provides limited support for temporal function in openCypher. It supports the `DateTime` data type for temporal types.

The `datetime()` function can be used to get the current UTC date and time like this:

```
RETURN  datetime() as res
```

Date and time values can be parsed from strings in a `"`*date*`T`*time*`"` format where *date* and *time* are both expressed in one of the supported forms below:

**Supported date formats**
+ `yyyy-MM-dd`
+ `yyyyMMdd`
+ `yyyy-MM`
+ `yyyy-DDD`
+ `yyyyDDD`
+ `yyyy`

**Supported time formats**
+ `HH:mm:ssZ`
+ `HHmmssZ`
+ `HH:mm:ssZ`
+ `HH:mmZ`
+ `HHmmZ`
+ `HHZ`
+ `HHmmss`
+ `HH:mm:ss`
+ `HH:mm`
+ `HHmm`
+ `HH`

For example:

```
RETURN datetime('2022-01-01T00:01')      // or another example:
RETURN datetime('2022T0001')
```

Note that all date/time values in Neptune openCypher are stored and retrieved as UTC values.

Neptune openCypher uses a `statement` clock, meaning that the same instant in time is used throughout the duration of a query. A different query within the same transaction may use a different instant in time.

Neptune doesn't support use of a function within a call to `datetime()`. For example, the following won't work:

```
CREATE (:n {date:datetime(tostring(2021))})  // ---> NOT ALLOWED!
```

Neptune does support the `epochmillis()` function that converts a `datetime` to `epochmillis`. For example:

```
MATCH (n) RETURN epochMillis(n.someDateTime)
1698972364782
```

Neptune doesn't currently support other functions and operations on `DateTime` objects, such as addition and subtraction.

### Temporal support in the Neptune openCypher implementation (Neptune Analytics and Neptune Database 1.3.2.0 and above)
<a name="opencypher-compliance-time-na"></a>

The following datetime functionality for OpenCypher applies to Neptune Analytics. Alternatively, you can use the labmode parameter `DatetimeMillisecond=enabled` for enabling the following datetime functionality on Neptune engine release version 1.3.2.0 and above. For more details about using this functionality in labmode, see [Extended datetime support](features-lab-mode.md#labmode-extended-datetime-support).
+ Support for milliseconds. Datetime literal will always be returned with milliseconds, even if milliseconds is 0. (Previous behavior was to truncate milliseconds.)

  ```
  CREATE (:event {time: datetime('2024-04-01T23:59:59Z')})
  
  # Returning the date returns with 000 suffixed representing milliseconds
  MATCH(n:event)
  RETURN n.time as datetime
  
  {
    "results" : [ {
      "n" : {
        "~id" : "0fe88f7f-a9d9-470a-bbf2-fd6dd5bf1a7d",
        "~entityType" : "node",
        "~labels" : [ "event" ],
        "~properties" : {
          "time" : "2024-04-01T23:59:59.000Z"
        }
      }
    } ]
  }
  ```
+ Support for calling the datetime() function over stored properties or intermediate results. For example, the following queries were not possible prior to this feature.

  Datetime() over properties:

  ```
  // Create node with property 'time' stored as string
  CREATE (:event {time: '2024-04-01T23:59:59Z'})
  
  // Match and return this property as datetime
  MATCH(n:event)
  RETURN datetime(n.time) as datetime
  ```

  Datetime() over intermediate results:

  ```
  // Parse datetime from parameter
  UNWIND $list as myDate
  RETURN datetime(myDate) as d
  ```
+ It is now also possible to save datetime perperties that are created the in cases mentioned above.

  Saving datetime from the string property of one property to another:

  ```
  // Create node with property 'time' stored as string
  CREATE (:event {time: '2024-04-01T23:59:59Z', name: 'crash'})
  
  // Match and update the same property to datetime type
  MATCH(n:event {name: 'crash'})
  SET n.time = datetime(n.time)
  
  // Match and update another node's property
  MATCH(e:event {name: 'crash'})
  MATCH(n:server {name: e.servername})
  SET n.time = datetime(e.time)
  ```

  Batch create nodes from a parameter with a datetime property:

  ```
  // Batch create from parameter
  UNWIND $list as events
  CREATE (n:crash) {time: datetime(events.time)}
  // Parameter value
  {
    "x":[
      {"time":"2024-01-01T23:59:29", "name":"crash1"},
      {"time":"2023-01-01T00:00:00Z", "name":"crash2"}
    ]
  }
  ```
+ Support for a larger subset of ISO8601 datetime formats. See below.

Supported formats

 The format of a datetime value is [Date]T[Time][Timezone], where T is the separator. If an explicit timezone is not provided, UTC (Z) is assumed to be the default. 

Timezone

Supported timezone formats are:
+ \$1/-HH:mm
+ \$1/-HHmm
+ \$1/-HH

 The presence of a timezone in a datetime string is optional. In case the timezone offset is 0, Z can be used instead of the timezone postfix above to indicate UTC time. The supported range of a timezone is from -14:00 to \$114:00. 

Date

If no timezone is present, or the timezone is UTC (Z), the supported date formats are as follows:

**Note**  
DDD refers to an ordinal date, which represents a day of the year from 001 to 365 (366 in leap years). For example, 2024-002 represents Jan 2, 2024.
+ `yyyy-MM-dd`
+ `yyyyMMdd`
+ `yyyy-MM`
+ `yyyyMM`
+ `yyyy-DDD`
+ `yyyyDDD`
+ `yyyy`

If a timezone other than Z is chosen, the supported date formats are limited to the following:
+ `yyyy-MM-dd`
+ `yyyy-DDD`
+ `yyyyDDD`

The supported range for dates is from 1400-01-01 to 9999-12-31.

Time

If no timezaone is present, or the timezone is UTC (Z), the supported time formats are:
+ `HH:mm:ss.SSS`
+ `HH:mm:ss`
+ `HHmmss.SSS`
+ `HHmmss`
+ `HH:mm`
+ `HHmm`
+ `HH`

If a timezone other than Z is chosen, the supported time formats are limited to the following:
+ `HH:mm:ss`
+ `HH:mm:ss.SSS`

### Differences in Neptune openCypher language semantics
<a name="opencypher-compliance-semantics"></a>

Neptune represents node and relationship IDs as strings rather than integers. The ID equals the ID supplied via the data loader. If there is a namespace for the column, the namespace plus the ID. Consequently, the `id` function returns a string instead of an integer.

The `INTEGER` datatype is limited to 64 bits. When converting larger floating point or string values to an integer using the `TOINTEGER` function, negative values are truncated to `LLONG_MIN` and positive values are truncated to `LLONG_MAX`.

For example:

```
RETURN TOINTEGER(2^100)
>  9223372036854775807

RETURN TOINTEGER(-1 * 2^100)
>  -9223372036854775808
```

### Multi-valued properties
<a name="openCypher-compliance-mvp"></a>

 Although openCypher CREATE does not create multi-valued properties, they can exist in data created using Gremlin (Neptune Database) or when loading data (Neptune Database and Neptune Analytics). If Neptune openCypher encounters a multi-value property, one of the values is arbitrarily chosen, creating a non-deterministic result. 

### Handling of NaN valuse
<a name="openCypher-compliance-handling-nan"></a>

 Neptune’s handling of the comparison of `NaN` property values is undefined. Relying on such comparisons may result in unexpected or non-deterministic results. 

# Neptune Graph Data Model
<a name="feature-overview-data-model"></a>

The basic unit of Amazon Neptune graph data is a four-position (quad) element, which is similar to a Resource Description Framework (RDF) quad. The following are the four positions of a Neptune quad:
+ `subject    (S)`
+ `predicate  (P)`
+ `object     (O)`
+ `graph      (G)`

Each quad is a statement that makes an assertion about one or more resources. A statement can assert the existence of a relationship between two resources, or it can attach a property (key-value pair) to a resource. You can think of the quad predicate value generally as the verb of the statement. It describes the type of relationship or property that's being defined. The object is the target of the relationship, or the value of the property. The following are examples:
+ A relationship between two vertices can be represented by storing the source vertex identifier in the `S` position, the target vertex identifier in the `O` position, and the edge label in the `P` position.
+ A property can be represented by storing the element identifier in the `S` position, the property key in the `P` position, and the property value in the `O` position.

The graph position `G` is used differently in the different stacks. For RDF data in Neptune, the `G` position contains a [named graph identifier](https://www.w3.org/TR/rdf11-concepts/#section-dataset). For property graphs in Gremlin, it is used to store the edge ID value in the case of an edge. In all other cases, it defaults to a fixed value.

A set of quad statements with shared resource identifiers creates a graph.

# Dictionary of user-facing values
<a name="feature-overview-storage-dictionary"></a>

Neptune does not store most user-facing values directly in the various indexes it maintains. Instead, it stores them separately in a dictionary and replaces them in the indexes with 8-byte identifiers.
+ All user-facing values that would go in `S`, `P`, or `G` indexes are stored in the dictionary in this way.
+ In the `O` index, numeric values are stored directly in the index (inlined). This includes `date` and `datetime` values (represented as milliseconds from the epoch).
+ All other user-facing values that would go in the `O` index are stored in the dictionary and represented in the index by IDs.

The dictionary contains a forward mapping of user-facing values to 8-byte IDs in a `value_to_id` index.

It stores the reverse mapping of 8-byte IDs to values in one of two indexes, depending on the size of the values:
+ An `id_to_value` index maps IDs to user-facing values that are smaller than 767 bytes after internal encoding.
+ An `id_to_blob` index maps IDs to user-facing values that are larger.

# How Statements Are Indexed in Neptune
<a name="feature-overview-storage-indexing"></a>

When you query a graph of quads, for each quad position, you can either specify a value constraint, or not. The query returns all the quads that match the value constraints that you specified.

 Neptune uses indexes to resolve graph queries patterns. These indexes are over the four primary components of a graph edge: Subject (source vertex in LPG); Predicate (RDF), or Property or Edge Label (LPG); Object (target vertex or property value in LPG); and Graph (RDF) or Edge Identifier (LPG). There are 16 (2^4) possible access patterns for these four quad component positions. You can query all 16 patterns efficiently without having to scan and filter by using six indexes. Each quad statement index uses a key that is composed of the four position values concatenated in a different order. One possible combination of quad statement indexes that would cover all 16 access paths is: 

```
       Access Pattern                                     Index key order
  ----------------------------------------------------    ---------------
   1.  ????  (No constraints; returns every quad)             SPOG
   2.  SPOG  (Every position is constrained)                  SPOG
   3.  SPO?  (S, P, and O are constrained; G is not)          SPOG
   4.  SP??  (S and P are constrained; O and G are not)       SPOG
   5.  S???  (S is constrained; P, O, and G are not)          SPOG
   6.  S??G  (S and G are constrained; P and O are not)       SPOG

   7.  ?POG  (P, O, and G are constrained; S is not)          POGS
   8.  ?PO?  (P and O are constrained; S and G are not)       POGS
   9.  ?P??  (P is constrained; S, O, and G are not)          POGS

  10.  ?P?G  (P and G are constrained; S and O are not)       GPSO
  11.  SP?G  (S, P, and G are constrained; O is not)          GPSO
  12.  ???G  (G is constrained; S, P, and O are not)          GPSO

  13.  S?OG  (S, O, and G are constrained; P is not)          OGSP
  14.  ??OG  (O and G are constrained; S and P are not)       OGSP
  15.  ??O?  (O is constrained; S, P, and G are not)          OGSP

  16.  S?O?  (S and O are constrained; P and G are not)       OSGP
```

Neptune creates and maintains only three out of those six indexes by default:
+ `SPOG –  ` Uses a key composed of `Subject + Predicate + Object + Graph`.
+ `POGS –  ` Uses a key composed of `Predicate + Object + Graph + Subject`.
+ `GPSO –  ` Uses a key composed of `Graph + Predicate + Subject + Object`.

These three indexes handle many of the most common access patterns. Maintaining only three full statement indexes instead of six greatly reduces the resources that you need to support rapid access without scanning and filtering. For example, the `SPOG` index allows efficient lookup whenever a prefix of the positions, such as the vertex or vertex and property identifier, is bound. The `POGS` index allows efficient access when only the edge or property label stored in `P` position is bound.

The low-level API for finding statements takes a statement pattern in which some positions are known and the rest are left for discovery by index search. By composing the known positions into a key prefix according to the index key order for one of the statement indexes, Neptune performs a range scan to retrieve all the statements matching the known positions.

However, one of the statement indexes that Neptune does *not* create by default is a reverse traversal `OSGP` index, which can gather predicates across objects and subjects. Instead, Neptune by default tracks distinct predicates in a separate index that it uses to do a union scan of `{all P x POGS}`. When you are working with Gremlin, a predicate corresponds to a property or an edge label.

If the number of distinct predicates in a graph becomes large, the default Neptune access strategy can become inefficient. In Gremlin, for example, an `in()` step where no edge labels are given, or any step that uses `in()` internally such as `both()` or `drop()`, may become quite inefficient.

## Enabling OSGP Index Creation Using Lab Mode
<a name="feature-overview-storage-indexing-osgp"></a>

If your data model creates a large number of distinct predicates, you may experience reduced performance and higher operational costs that can be dramatically improved by using Lab Mode to enable the [OSGP index](features-lab-mode.md#features-lab-mode-features-osgp-index) in addition to the three indexes that Neptune maintains by default.

Enabling the OSGP index can have a few down-sides:
+ The insert rate may slow by up to 23%.
+ Storage increases by up to 20%.
+ Read queries that touch all indexes equally (which is quite rare) may have increased latencies.

In general, however, it is worth enabling the OSGP index for DB Clusters with a large number of distinct predicates. Object-based searches become highly efficient (for example, finding all incoming edges to a vertex, or all subjects connected to a given object), and as a result dropping vertices becomes much more efficient too.

**Important**  
You can only enable the OSGP index in an empty DB cluster, before you load any data into it.

   

## Gremlin statements in the Neptune data model
<a name="feature-overview-storage-indexing-gremlin"></a>

Gremlin property-graph data is expressed in the SPOG model using three classes of statements, namely:
+ [Vertex Label Statements](gremlin-explain-background-statements.md#gremlin-explain-background-vertex-labels)
+ [Edge Statements](gremlin-explain-background-statements.md#gremlin-explain-background-edge-statements) 
+ [Property Statements](gremlin-explain-background-statements.md#gremlin-explain-background-property-statements) 

For an explanation of how these are used in Gremlin queries, see [Understanding how Gremlin queries work in Neptune](gremlin-explain-background.md).

# The Neptune lookup cache can accelerate read queries
<a name="feature-overview-lookup-cache"></a>

Amazon Neptune implements a lookup cache that uses the `R5d` instance's NVMe-based SSD to improve read performance for queries with frequent, repetitive lookups of property values or RDF literals. The lookup cache temporarily stores these values in the NVMe SSD volume where they can be accessed rapidly.

Read queries that return the properties of a large number of vertices and edges, or many RDF triples, can have a high latency if the property values or literals need to be retrieved from cluster storage volumes rather than memory. Examples include long-running read queries that return a large number of full names from an identity graph, or of IP addresses from a fraud-detection graph. As the number of property values or RDF literals returned by your query increases, available memory decreases and your query execution can significantly degrade.

# Use cases for the Neptune lookup cache
<a name="feature-overview-lookup-cache-when-to-use"></a>

The lookup cache only helps when your read queries are returning the properties of a very large number of vertices and edges, or of RDF triples.

To optimize query performance, Amazon Neptune uses the `R5d` instance type to create a large cache for such property values or literals. Retrieving them from the cache is then much faster than retrieving them from cluster storage volumes.

As a rule of thumb, it's only worthwhile to enable the lookup cache if all three of the following conditions are met:
+ You have been observing increased latency in read queries.
+ You're also observing a drop in the `BufferCacheHitRatio` [CloudWatch metric](cw-metrics.md#cw-metrics-available) when running read queries (see [Monitoring Neptune Using Amazon CloudWatch](cloudwatch.md)).
+ Your read queries are spending a lot of time in materializing return values prior to rendering the results (see the Gremlin-profile example below for a way to determine how many property values are being materialized for a query).

**Note**  
This feature is helpful *only* in the specific scenario described above. For example, the lookup cache doesn't help aggregation queries at all. Unless you are running queries that would benefit from the lookup cache, there is no reason to use an `R5d` instance type instead of an equivalent and less expensive `R5` instance type.

If you're using Gremlin, you can assess the materialization costs of a query with the [Gremlin `profile` API](gremlin-profile-api.md). Under "Index Operations', it shows the number of terms materialized during execution:

```
Index Operations
Query execution:
    # of statement index ops: 3
    # of unique statement index ops: 3
    Duplication ratio: 1.0
    # of terms materialized: 5273
Serialization:
    # of statement index ops: 200
    # of unique statement index ops: 140
    Duplication ratio: 1.43
    # of terms materialized: 32693
```

The number of non-numerical terms that are materialized is directly proportional to the number of term look-ups that Neptune has to perform.

# Using the lookup cache
<a name="feature-overview-lookup-cache-using"></a>

The lookup cache is only available on an `R5d` instance type, where it is automatically enabled by default. Neptune `R5d` instances have the same specifications as `R5` instances, plus up to 1.8 TB of local NVMe-based SSD storage. Lookup caches are instance-specific, and workloads that benefit can be directed specifically to `R5d` instances in a Neptune cluster, while other workloads can be directed to `R5` or other instance types.

To use the lookup cache on a Neptune instance, simply upgrade that instance to the `R5d` instance type. When you do, Neptune automatically sets the [neptune\$1lookup\$1cache](parameters.md#parameters-db-cluster-parameters-neptune_lookup_cache) DB cluster parameter to `1` (enabled), and creates the lookup cache on that particular instance. You can then use the [Instance Status](access-graph-status.md) API to confirm that the cache has been enabled.

Similarly, to disable the lookup cache on a given instance, scale the instance down from an `R5d` instance type to an equivalent `R5` instance type.

When an `R5d` instance is launched, the lookup cache is enabled and in cold-start mode, meaning that it is empty. Neptune first checks in the lookup cache for property values or RDF literals while processing queries, and adds them if they are not yet present. This gradually warm up the cache.

When you direct the read queries that require property-value or RDF-literal lookups to an R5d *reader* instance, read performance degrades slightly while its cache is warming up. When the cache is warmed up, however, read performance speeds up significantly and you may also see a drop in I/O costs related to lookups hitting the cache rather than cluster storage. Memory utilization also improves.

If your *writer* instance is an `R5d`, it warms up its lookup cache automatically on every write operation. This approach does increase latency for write queries slightly, but warms up the lookup cache more efficiently. Then if you direct the read queries that require property-value or RDF-literal lookups to the writer instance, you start getting improved read performance immediately, since the values have already been cached there.

Also, if you are running the bulk loader on an `R5d` writer instance, you may notice that its performance is slightly degraded because of the cache.

Because the lookup cache is specific to each node, host replacement resets the cache to a cold start.

You can temporarily disable the lookup cache on all instances in your DB cluster by setting the [neptune\$1lookup\$1cache](parameters.md#parameters-db-cluster-parameters-neptune_lookup_cache) DB cluster parameter to `0` (disabled). In general, however, it makes more sense to disable the cache on specific instances by scaling them down from `R5d` to `R5` instance types.

# Transaction Semantics in Neptune
<a name="transactions"></a>

Amazon Neptune is designed to support highly concurrent online transactional processing (OLTP) workloads over data graphs. The [W3C SPARQL Query Language for RDF](https://www.w3.org/TR/rdf-sparql-query/) specification and the [Apache TinkerPop Gremlin Graph Traversal Language](http://tinkerpop.apache.org/gremlin.html) documentation do not define transaction semantics for concurrent query processing. Because ACID support and well-defined transaction guarantees can be very important, we enforce strict semantics to help avoid data anomalies.

This section defines these semantics and illustrates how they apply to some common use cases in Neptune.

**Topics**
+ [Definition of Isolation Levels](transactions-isolation-levels.md)
+ [Transaction Isolation Levels in Neptune](transactions-neptune.md)
+ [Examples of Neptune transaction semantics](transactions-examples.md)

# Definition of Isolation Levels
<a name="transactions-isolation-levels"></a>

The "I" in `ACID` stands for *isolation*. The degree of isolation of a transaction determines how much or little other concurrent transactions can affect the data that it operates on.

The [SQL:1992 Standard](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) created a vocabulary for describing isolation levels. It defines three types of interactions (that it calls *phenomena*) that can occur between two concurrent transactions, `Tx1` and `Tx2`:
+ `Dirty read` – This occurs when `Tx1` modifies an item, and then `Tx2` reads that item before `Tx1` has committed the change. Then, if `Tx1` never succeeds in committing the change, or rolls it back, `Tx2` has read a value that never made it into the database.
+ `Non-repeatable read` – This happens when `Tx1` reads an item, then `Tx2` modifies or deletes that item and commits the change, and then `Tx1` tries to reread the item. `Tx1` now reads a different value than before, or finds that the item no longer exists.
+ `Phantom read` – This happens when `Tx1` reads a set of items that satisfy a search criterion, and then `Tx2` adds a new item that satisfies the search criterion, and then `Tx1` repeats the search. `Tx1` now obtains a different set of items than it did before.

Each of these three types of interaction can cause inconsistencies in the resulting data in a database.

The SQL:1992 standard defined four isolation levels that have different guarantees in terms of the three types of interaction and the inconsistencies that they can produce. At all four levels, a transaction can be guaranteed to execute completely or not at all:
+ `READ UNCOMMITTED` – Allows all three kinds of interaction (that is, dirty reads, non-repeatable reads, and phantom reads).
+ `READ COMMITTED` – Dirty reads are not possible, but nonrepeatable and phantom reads are.
+ `REPEATABLE READ` – Neither dirty reads nor nonrepeatable reads are possible, but phantom reads still are.
+ `SERIALIZABLE` – None of the three types of interaction phenomena can occur.

Multiversion concurrency control (MVCC) allows one other kind of isolation, namely *SNAPSHOT* isolation. This guarantees that a transaction operates on a snapshot of data as it exists when the transaction begins, and that no other transaction can change that snapshot.

# Transaction Isolation Levels in Neptune
<a name="transactions-neptune"></a>

Amazon Neptune implements different transaction isolation levels for read-only queries and for mutation queries. SPARQL and Gremlin queries are classified as read-only or mutation based on the following criteria:
+ In SPARQL, there is a clear distinction between read queries (`SELECT`, `ASK`, `CONSTRUCT`, and `DESCRIBE` as defined in the [SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/) specification), and mutation queries (`INSERT` and `DELETE` as defined in the [SPARQL 1.1 Update](https://www.w3.org/TR/sparql11-update/) specification).

  Note that Neptune treats multiple mutation queries submitted together (for example, in a `POST` message, separated by semicolons) as a single transaction. They are guaranteed either to succeed or fail as an atomic unit, and in the case of failure, partial changes are rolled back.
+ However, in Gremlin, Neptune classifies a query as a read-only query or a mutation query based on whether it contains any query-path steps such as `addE()`, `addV()`, `property()`, or `drop()` that manipulates data. If the query contains any such path step, it is classified and executed as a mutation query.

It is also possible to use standing sessions in Gremlin. For more information, see [Gremlin script-based sessions](access-graph-gremlin-sessions.md). In these sessions, all queries, including read-only queries, are executed under the same isolation as mutation queries on the writer endpoint.

Using bolt read-write sessions in openCypher, all queries including read-only queries are executed under the same isolation as mutation queries, on the writer endpoint. 

**Topics**
+ [Read-only query isolation in Neptune](#transactions-neptune-read-only)
+ [Mutation query isolation in Neptune](#transactions-neptune-mutation)
+ [Conflict Resolution Using Lock-Wait Timeouts](#transactions-neptune-conflicts)
+ [Range locks and false conflicts](#transactions-neptune-false-conflicts)

## Read-only query isolation in Neptune
<a name="transactions-neptune-read-only"></a>

Neptune evaluates read-only queries under snapshot isolation semantics. This means that a read-only query logically operates on a consistent snapshot of the database taken when query evaluation begins. Neptune can then guarantee that none of the following phenomena will happen:
+ `Dirty reads` – Read-only queries in Neptune will never see uncommitted data from a concurrent transaction.
+ `Non-repeatable reads` – A read-only transaction that reads the same data more than once will always get back the same values.
+ `Phantom reads` – A read-only transaction will never read data that was added after the transaction began.

Because snapshot isolation is achieved using multiversion concurrency control (MVCC), read-only queries have no need to lock data and therefore do not block mutation queries.

Read replicas only accept read-only queries, so all queries against read replicas execute under `SNAPSHOT` isolation semantics.

The only additional consideration when querying a read replica is that there can be a small replication lag between the writer and read replicas. This means that an update made on the writer might take a short time to be propagated to the read replica you are reading from. The actual replication time depends on the write-load against the primary instance. Neptune architecture supports low-latency replication and the replication lag is instrumented in an Amazon CloudWatch metric.

Still, because of the `SNAPSHOT` isolation level, read queries always see a consistent state of the database, even if it is not the most recent one.

In cases where you require a strong guarantee that a query observes the result of a previous update, send the query to the writer endpoint itself rather than to a read replica.

## Mutation query isolation in Neptune
<a name="transactions-neptune-mutation"></a>

Reads made as part of mutation queries are executed under `READ COMMITTED` transaction isolation, which rules out the possibility of dirty reads. Going beyond the usual guarantees provided for `READ COMMITTED` transaction isolation, Neptune provides the strong guarantee that neither `NON-REPEATABLE` nor `PHANTOM` reads can happen.

These strong guarantees are achieved by locking records and ranges of records when reading data. This prevents concurrent transactions from making insertions or deletions in index ranges after they have been read, thus guaranteeing repeatable reads.

**Note**  
However, a concurrent mutation transaction `Tx2` could begin after the start of mutation transaction `Tx1`, and could commit a change before `Tx1` had locked data to read it. In that case, `Tx1` would see `Tx2`'s change just as if `Tx2` had completed before `Tx1` started. Because this only applies to committed changes, a `dirty read` could never occur.

To understand the locking mechanism that Neptune uses for mutation queries, it helps first to understand the details of the Neptune [Graph Data Model](feature-overview-data-model.md) and [Indexing Strategy](feature-overview-storage-indexing.md). Neptune manages data using three indexes, namely `SPOG`, `POGS`, and `GPSO`.

To achieve repeatable reads for the `READ COMMITTED` transaction level, Neptune takes range locks in the index that is being used. For example, if a mutation query reads all properties and outgoing edges of a vertex named `person1`, the node would lock the entire range defined by the prefix `S=person1` in the `SPOG` index before reading the data.

The same mechanism applies when using other indexes. For example, when a mutation transaction looks up all the source-target vertex pairs for a given edge label using the `POGS` index, the range for the edge label in the `P` position would be locked. Any concurrent transaction, regardless of whether it was a read-only or mutation query, could still perform reads within the locked range. However, any mutation involving insertion or deletion of new records in the locked prefix range would require an exclusive lock and would be prevented.

In other words, when a range of the index has been read by a mutation transaction, there is a strong guarantee that this range will not be modified by any concurrent transactions until the end of the reading transaction. This guarantees that no `non-repeatable reads` will occur.

## Conflict Resolution Using Lock-Wait Timeouts
<a name="transactions-neptune-conflicts"></a>

If a second transaction tries to modify a record in a range that a first transaction has locked, Neptune detects the conflict immediately and blocks the second transaction.

If no dependency deadlock is detected, Neptune automatically applies a lock-wait timeout mechanism, in which the blocked transaction waits for up to 60 seconds for the transaction that holds the lock to finish and release the lock.
+ If the lock-wait timeout expires before the lock is released, the blocked transaction is rolled back.
+ If the lock is released within the lock-wait timeout, the second transaction is unblocked and can finish successfully without needing to retry.

However, if Neptune detects a dependency deadlock between the two transactions, automatic reconciliation of the conflict is not possible. In this case, Neptune immediately cancels and rolls back one of the two transactions without initiating a lock-wait timeout. Neptune makes a best effort to roll back the transaction that has the fewest records inserted or deleted.

### Measuring lock-wait time (engine ≥ 1.4.5.0)
<a name="transactions-neptune-lock-wait-metrics"></a>

Starting with engine version 1.4.5.0, you can observe exactly how long a mutation query was blocked by using two slow-query-log counters:


| Counter | Description | 
| --- | --- | 
| `sharedLocksWaitTimeMillis` | Time spent waiting to obtain shared (S) locks, which allow multiple readers but block writers. | 
| `exclusiveLocksWaitTimeMillis` | Time spent waiting to obtain exclusive (X) locks, which block all other access. | 

These two fields appear in the `storageCounters` object only when you enable slow-query logging in `debug` mode (`neptune_enable_slow_query_log=debug`).

**Tip**  
If `sharedLocksWaitTimeMillis + exclusiveLocksWaitTimeMillis` approaches the query's `overallRunTimeMs`, the query is bottlenecked by lock contention rather than CPU, network, or I/O.

Practical tips for reducing contention:
+ **Stagger conflicting jobs** – Run heavy batch mutations during periods of lower user activity.
+ **Break large mutations into smaller chunks** – Smaller transactions hold locks for less time, reducing the chance of timeouts.

## Range locks and false conflicts
<a name="transactions-neptune-false-conflicts"></a>

Neptune takes range locks using gap locks. A gap lock is a lock on a gap between index records, or a lock on the gap before the first or after the last index record.

Neptune uses a so-called dictionary table to associate numeric ID values with specific string literals. Here is a sample state of such a Neptune dictionary: table:


| String | ID | 
| --- | --- | 
| type | 1 | 
| default\$1graph | 2 | 
| person\$13 | 3 | 
| person\$11 | 5 | 
| knows | 6 | 
| person\$12 | 7 | 
| age | 8 | 
| edge\$11 | 9 | 
| lives\$1in | 10 | 
| New York | 11 | 
| Person | 12 | 
| Place | 13 | 
| edge\$12 | 14 | 

The strings above belong to a property-graph model, but the concepts apply equally to all RDF graph models as well.

The corresponding state of the SPOG (Subject-Predicate-Object\$1Graph) index is shown below on the left. On the right, the corresponding strings are shown, to help understand what the index data means.


| S (ID) | P (ID) | O (ID) | G (ID) |  | S (string) | P (string) | O (string) | G (string) | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| 3 | 1 | 12 | 2 |  | person\$13 | type | Person | default\$1graph | 
| 5 | 1 | 12 | 2 |  | person\$11 | type | Person | default\$1graph | 
| 5 | 6 | 3 | 9 |  | person\$11 | knows | person\$13 | edge\$11 | 
| 5 | 8 | 40 | 2 |  | person\$11 | age | 40 | default\$1graph | 
| 5 | 10 | 11 | 14 |  | person\$11 | lives\$1in | New York | edge\$12 | 
| 7 | 1 | 12 | 2 |  | person\$12 | type | Person | default\$1graph | 
| 11 | 1 | 13 | 2 |  | New York | type | Place | default\$1graph | 

Now, if a mutation query reads all properties and outgoing edges of a vertex named `person_1`, the node would lock the entire range defined by the prefix `S=person_1` in the SPOG index before reading the data. The range lock would place gap locks on all matching records and the first record that is not a match. Matching records would be locked, and non-matching records would not be locked. Neptune would place the gap-locks as follows:
+ ` 5 1 12 2 ` *(gap 1)*
+ ` 5 6 3 9 ` *(gap 2)*
+ ` 5 8 40 2 ` *(gap 3)*
+ ` 5 10 11 14 ` *(gap 4)*
+ ` 7 1 12 2 ` *(gap 5)*

This locks the following records:
+ ` 5 1 12 2`
+ ` 5 6 3 9`
+ ` 5 8 40 2`
+ ` 5 10 11 14`

In this state, the following operations are legitimately blocked:
+ Insertion of a new property or edge for `S=person_1`. A new property different from `type` or a new edge would have to go in either gap 2, gap 3, gap 4, or gap 5, all of which are locked.
+ Deletion of any of the existing records.

At the same time, a few concurrent operations would be blocked falsely (generating false conflicts):
+ Any property or edge insertions for `S=person_3` are blocked because they would have to go in gap 1.
+ Any new vertex insertion which gets assigned an ID between 3 and 5 would be blocked because it would have to go in gap 1.
+ Any new vertex insertion which gets assigned an ID between 5 and 7 would be blocked because it would have to go in gap 5.

Gap locks are not precise enough to lock the gap for one specific predicate (for example, to lock gap5 for predicate `S=5`).

The range locks are only placed in the index where the read happens. In the case above, records are locked only in the SPOG index, not in POGS or GPSO. Reads for a query may be performed across all indexes depending on the access patterns, which can be listed using the `explain` APIs (for [Sparql](sparql-explain-examples.md) and for [Gremlin](gremlin-explain.md)).

**Note**  
Gap locks can also be taken for safe concurrent updates on underlying indexes, which can also lead to false conflicts. These gap locks are placed independent of isolation level or read operations performed by the transaction.

False conflicts can happen not only when *concurrent* transactions collide because of gap locks, but also in some cases when a transaction is being retried after any sort of failure. If the roll-back that was triggered by the failure is still in progress and the locks previously taken for the transaction have not yet been fully released, the retry will encounter a false conflict and fail.

Under a high load, you might typically find that 3-4% of write queries fail because of false conflicts. For an external client, such false conflicts are hard to predict, and should be handled using [retries](transactions-exceptions.md).

# Examples of Neptune transaction semantics
<a name="transactions-examples"></a>

The following examples illustrate different use cases for transaction semantics in Amazon Neptune.

**Topics**
+ [Conditional Insertion of a Property](#transactions-examples-conditional-insertion)
+ [Property Value Uniqueness](#transactions-examples-unique-property)
+ [Conditional Property Change](#transactions-examples-conditional-edit)
+ [Replacing a Property](#transactions-examples-replace)
+ [Avoiding Dangling Elements](#transactions-examples-dangling)

## Example 1 – Inserting a Property Only If It Does Not Exist
<a name="transactions-examples-conditional-insertion"></a>

Suppose that you want to ensure that a property is set only once. For example, suppose that multiple queries are trying to assign a person a credit score concurrently. You only want one instance of the property to be inserted, and the other queries to fail because the property has already been set.

```
# GREMLIN:
g.V('person1').hasLabel('Person').coalesce(has('creditScore'), property('creditScore', 'AAA+'))

# SPARQL:
INSERT { :person1 :creditScore "AAA+" .}
WHERE  { :person1 rdf:type :Person .
         FILTER NOT EXISTS { :person1 :creditScore ?o .} }
```

The Gremlin `property()` step inserts a property with the given key and value. The `coalesce()` step executes the first argument in the first step, and if it fails, then it executes the second step:

Before inserting the value for the `creditScore` property for a given `person1` vertex, a transaction must try to read the possibly non-existent `creditScore` value for `person1`. This attempted read locks the `SP` range for `S=person1` and `P=creditScore` in the `SPOG` index where the `creditScore` value either exists or will be written.

Taking this range lock prevents any concurrent transaction from inserting a `creditScore` value concurrently. When there are multiple parallel transactions, at most one of them can update the value at a time. This rules out the anomaly of more than one `creditScore` property being created.

## Example 2 – Asserting That a Property Value Is Globally Unique
<a name="transactions-examples-unique-property"></a>

Suppose that you want to insert a person with a Social Security number as a primary key. You would want your mutation query to guarantee that, at a global level, no one else in the database has that same Social Security number:

```
# GREMLIN:
g.V().has('ssn', 123456789).fold()
  .coalesce(__.unfold(),
            __.addV('Person').property('name', 'John Doe').property('ssn', 123456789'))

# SPARQL:
INSERT { :person1 rdf:type :Person .
         :person1 :name "John Doe" .
         :person1 :ssn 123456789 .}
WHERE  { FILTER NOT EXISTS { ?person :ssn 123456789 } }
```

This example is similar to the previous one. The main difference is that the range lock is taken on the `POGS` index rather than the `SPOG` index.

The transaction executing the query must read the pattern, `?person :ssn 123456789`, in which the `P` and `O` positions are bound. The range lock is taken on the `POGS` index for `P=ssn` and `O=123456789`.
+ If the pattern does exist, no action is taken.
+ If it does not exist, the lock prevents any concurrent transaction from inserting that Social Security number also

## Example 3 – Changing a Property If Another Property Has a Specified Value
<a name="transactions-examples-conditional-edit"></a>

Suppose that various events in a game move a person from level one to level two, and assign them a new `level2Score` property set to zero. You need to be sure that multiple concurrent instances of such a transaction could not create multiple instances of the level-two score property. The queries in Gremlin and SPARQL might look like the following.

```
# GREMLIN:
g.V('person1').hasLabel('Person').has('level', 1)
 .property('level2Score', 0)
 .property(Cardinality.single, 'level', 2)

# SPARQL:
DELETE { :person1 :level 1 .}
INSERT { :person1 :level2Score 0 .
         :person1 :level 2 .}
WHERE  { :person1 rdf:type :Person .
         :person1 :level 1 .}
```

In Gremlin, when `Cardinality.single` is specified, the `property()` step either adds a new property or replaces an existing property value with the new value that is specified.

Any update to a property value, such as increasing the `level` from 1 to 2, is implemented as a deletion of the current record and insertion of a new record with the new property value. In this case, the record with level number 1 is deleted and a record with level number 2 is reinserted.

For the transaction to be able to add `level2Score` and update the `level` from 1 to 2, it must first validate that the `level` value is currently equal to 1. In doing so, it takes a range lock on the `SPO` prefix for `S=person1`, `P=level`, and `O=1` in the `SPOG` index. This lock prevents concurrent transactions from deleting the version 1 triple, and as a result, no conflicting concurrent updates can happen.

## Example 4 – Replacing an Existing Property
<a name="transactions-examples-replace"></a>

Certain events might update a person's credit score to a new value (here `BBB`). But you want to be sure that concurrent events of that type can't create multiple credit score properties for a person.

```
# GREMLIN:
g.V('person1').hasLabel('Person')
 .sideEffect(properties('creditScore').drop())
 .property('creditScore', 'BBB')

# SPARQL:
DELETE { :person1 :creditScore ?o .}
INSERT { :person1 :creditScore "BBB" .}
WHERE  { :person1 rdf:type :Person .
         :person1 :creditScore ?o .}
```

This case is similar to example 3, except that instead of locking the `SPO` prefix, Neptune locks the `SP` prefix with `S=person1` and `P=creditScore` only. This prevents concurrent transactions from inserting or deleting any triples with the `creditScore` property for the `person1` subject.

## Example 5 – Avoiding Dangling Properties or Edges
<a name="transactions-examples-dangling"></a>

The update on an entity should not leave a dangling element, that is, a property or edge associated to an entity that is not typed. This is only an issue in SPARQL, because Gremlin has built-in constraints to prevent dangling elements.

```
# SPARQL:
tx1: INSERT { :person1 :age 23 } WHERE { :person1 rdf:type :Person }
tx2: DELETE { :person1 ?p ?o }
```

The `INSERT` query must read and lock the `SPO` prefix with `S=person1`, `P=rdf:type`, and `O=Person` in the `SPOG` index. The lock prevents the `DELETE` query from succeeding in parallel.

In the race between the `DELETE` query trying to delete the `:person1 rdf:type :Person` record and the `INSERT` query reading the record and creating a range lock on its `SPO` in the `SPOG` index, the following outcomes are possible:
+ If the `INSERT` query commits before the `DELETE` query reads and deletes all records for `:person1`, `:person1` is removed entirely from the database, including the newly inserted record.
+ If the `DELETE` query commits before the `INSERT` query tries to read the `:person1 rdf:type :Person` record, the read observes the committed change. That is, it does not find any `:person1 rdf:type :Person` record and hence becomes a no-op.
+ If the `INSERT` query reads before the `DELETE` query does, the `:person1 rdf:type :Person` triple is locked and the `DELETE` query is blocked until the INSERT query commits, as in the first case previously.
+ If the `DELETE` reads before the `INSERT` query, and the `INSERT` query tries to read and take a lock on the `SPO` prefix for the record, a conflict is detected. This is because the triple has been marked for removal, and the `INSERT` then fails.

In all these different possible sequences of events, no dangling edge is created.

# Amazon Neptune DB Clusters and Instances
<a name="feature-overview-db-clusters"></a>

An Amazon Neptune *DB cluster* manages access to your data through queries. A cluster consists of:
+ One *primary DB instance.*.
+ Up to 15 *read-replica DB instances.*.

All the instances in a cluster share the same [underlying managed storage volume](feature-overview-storage.md), which is designed for reliability and high availability.

You connect to the DB instances in your DB cluster through [Neptune endpoints](feature-overview-endpoints.md).

## The primary DB instance in a Neptune DB cluster
<a name="feature-overview-primary-instance"></a>

The primary DB instance coordinates all write operations to the DB cluster's underlying storage volume. It also supports read operations.

There can only be one primary DB instance in a Neptune DB cluster. If the primary instance becomes unavailable, Neptune automatically fails over to one of the read-replica instances with a priority that you can specify.

## Read-replica DB instances in a Neptune DB cluster
<a name="feature-overview-read-replicas"></a>

After you create the primary instance for a DB cluster, you can create up to 15 read-replica instances in your DB cluster to support read-only queries.

Neptune read-replica DB instances work well for scaling read capacity because they are fully dedicated to read operations on your cluster volume. All write operations are managed by the primary instance. Each read-replica DB instance has its own endpoint.

Because the cluster storage volume is shared among all instances in a cluster, all read-replica instances return the same data for query results with very little replication lag. This lag is usually much less than 100 milliseconds after the primary instance writes an update, although it can be somewhat longer when the volume of write operations is very large.

Having one or more read-replica instances available in different Availability Zones can increase availability, because read-replicas serve as failover targets for the primary instance. That is, if the primary instance fails, Neptune promotes a read-replica instance to become the primary instance. When this happens, there is a brief interruption while the promoted instance is rebooted, during which read and write requests made to the primary instance fail with an exception.

By contrast, if your DB cluster doesn't include any read-replica instances, your DB cluster remains unavailable when the primary instance fails until it has been re-created. Re-creating the primary instance takes considerably longer than promoting a read-replica.

To ensure high availability, we recommend that you create one or more read-replica instances that have the same DB instance class as the primary instance and are located in different Availability Zones than the primary instance. See [Fault tolerance for a Neptune DB cluster](backup-restore-overview-fault-tolerance.md).

Using the console, you can create a Multi-AZ deployment by simply specifying Multi-AZ when creating a DB cluster. If a DB cluster is in a single Availability Zone, you can make it a Multi-AZ DB cluster adding a Neptune replica in a different Availability Zone.

**Note**  
You can't create an encrypted read-replica instance for an unencrypted Neptune DB cluster, or an unencrypted read-replica instance for an encrypted Neptune DB cluster.

For details on how to create a Neptune read-replica DB instance, see [Creating a Neptune reader instance using the console](manage-console-create-replica.md).

## Sizing DB instances in a Neptune DB cluster
<a name="feature-overview-sizing-instances"></a>

Size the instances in your Neptune DB cluster based on your CPU and memory requirements. The number of vCPUs on an instance determines the number of query threads that handle incoming queries. The amount of memory on an instance determines the size of the buffer cache, used for storing copies of data pages fetched from the underlying storage volume.

Each Neptune DB instance has a number of query threads equal to 2 x number of vCPUs on that instance. An `r5.4xlarge`, for example, with 16 vCPUs, has 32 query threads, and can therefore process 32 queries concurrently.

Additional queries that arrive while all query threads are occupied are put into a server-side queue, and processed in a FIFO manner as query threads become available. This server-side queue can hold approximately 8000 pending requests. Once it's full, Neptune respond to additional requests with a `ThrottlingException`. You can monitor the number of pending requests with the `MainRequestQueuePendingRequests` CloudWatch metric, or by using the [Gremlin query status endpoint](gremlin-api-status.md) with the `includeWaiting` parameter.

Query execution time from a client perspective includes of any time spent in the queue, in addition to the time taken to actually execute the query.

A sustained concurrent write load that utilizes all the query threads on the primary DB instance ideally shows 90% or more CPU utilization, which indicates that all the query threads on the server are actively engaged in doing useful work. However, actual CPU utilization is often somewhat lower, even under a sustained concurrent write load. This is usually because query threads are waiting on I/O operations to the underlying storage volume to complete. Neptune uses quorum writes that make six copies of your data across three Availability Zones, and four out of those six storage nodes must acknowledge a write for it to be considered durable. While a query thread waits for this quorum from the storage volume, it is stalled, which reduces CPU utilization.

If you have a serial write load where you are performing one write after another and waiting for the first to complete before beginning the next, you can expect the CPU utilization to be lower still. The exact amount will be a function of the number of vCPUs and query threads (the more query threads, the less overall CPU per query), with some reduction caused by waiting for I/O.

For more information about how best to size DB instances, see [Choosing instance types for Amazon Neptune](instance-types.md). For the pricing of each instance-type, please see the [Neptune pricing page](https://aws.amazon.com/neptune/pricing/).

## Monitoring DB instance performance in Neptune
<a name="feature-overview-monitoring-instances"></a>

You can use CloudWatch metrics in Neptune to monitor the performance of your DB instances and keep track of query latency as observed by the client. See [Using CloudWatch to monitor DB instance performance in Neptune](cloudwatch-monitoring-instances.md).

# Amazon Neptune storage, reliability and availability
<a name="feature-overview-storage"></a>

Amazon Neptune uses a distributed and shared storage architecture that scales automatically as your database storage needs grow.

Neptune data is stored in a cluster volume, which is a single, virtual volume that uses Non-Volatile Memory Express (NVMe) SSD-based drives. The cluster volume consists of a collection of logical blocks known as segments. Each of these segments is allocated 10 gigabytes (GB) of storage. The data in each segment is replicated into six copies, which are then allocated across three availability zones (AZs) in the AWS Region where the DB cluster resides.

When a Neptune DB cluster is created, it is allocated a single segment of 10 GB. As the volume of data increases and exceeds the currently allocated storage, Neptune automatically expands the cluster volume by adding new segments. A Neptune cluster volume can grow to a maximum size of 128 tebibytes (TiB) in all supported regions except China and GovCloud, where it is limited to 64 TiB. For engine releases earlier than [Release: 1.0.2.2 (2020-03-09)](engine-releases-1.0.2.2.md), however, the size of cluster volumes is limited to 64 TiB in all regions.

The DB cluster volume contains all your user data, indices and dictionaries (described in the [Neptune Graph Data Model](feature-overview-data-model.md) section, as well as internal metadata such as internal transaction logs. All this graph data, including indices and internal logs, cannot exceed the maximum size of the cluster volume.

## I/O–Optimized storage option
<a name="feature-overview-storage-iops"></a>

Neptune offers two pricing models for storage:
+ **Standard storage**   –   Standard storage provides cost-effective database storage for applications with moderate to low I/O usage.
+ **I/O–Optimized storage**   –   With I/O–Optimized storage, you pay only for the storage you are using, at a higher cost than for standard storage, and you pay nothing for the I/O that you use.

  I/O–Optimized storage is designed to meet the needs of I/O–intensive graph workloads at a predictable cost, with low I/O latency and consistent I/O throughput.

  For more information, see [I/O–Optimized storage](storage-types.md#provisioned-iops-storage).

## Neptune storage allocation
<a name="feature-overview-storage-allocation"></a>

Even though a Neptune cluster volume can grow to 128 TiB (or 64 TiB in a few regions), you are only charged for the space actually allocated. The total space allocated is determined by the storage *high water mark*, which is the maximum amount allocated to the cluster volume at any time during its existence.

This means that even if user data is removed from a cluster volume, such as by a drop query like `g.V().drop()`, the total allocated space remains the same. Neptune does automatically optimize the unused allocated space for reuse in the future.

In addition to user data, two additional types of content consume internal storage space, namely dictionary data and internal transaction logs. Although dictionary data is stored with graph data, it persists indefinitely, even when the graph data it supports has been deleted, which means that entries can be re-used if data is re-introduced. Internal log data is stored in a separate internal storage space that has its own high water mark. When an internal log expires, the storage it occupied can be re-used for other logs, but not for graph data. The amount of internal space that has been allocated for logs is included in the total space reported by the `VolumeBytesUsed` [CloudWatch metric](cloudwatch.md).

Check [Storage best practices](#feature-overview-storage-best-practices) for ways to keep allocated storage to a minimum and to re-use space.

## Neptune storage billing
<a name="feature-overview-storage-billing"></a>

Storage costs are billed based on the storage *high water mark*, as described above. Although your data is replicated into six copies, you are only billed for one copy of the data.

You can determine what the current storage high water mark of your DB cluster is by monitoring the `VolumeBytesUsed` CloudWatch metric (see [Monitoring Neptune Using Amazon CloudWatch](cloudwatch.md)).

Other factors that can affect your Neptune storage costs include database snapshots and backup, which are billed separately as backup storage and are based on the Neptune storage costs (see [CloudWatch metrics that are useful for managing Neptune backup storage](backup-restore-overview-metrics.md).

If you create a [clone](manage-console-cloning.md) of your database, however, the clone points to the same cluster volume that your DB cluster itself uses, so there is no additional storage charge for the original data. Subsequent changes to the clone use the [copy-on-write protocol](manage-console-cloning.md#manage-console-cloning-protocol), and do result in additional storage costs.

For more Neptune pricing information, see [Amazon Neptune Pricing](https://aws.amazon.com/neptune/pricing).

## Neptune storage best practices
<a name="feature-overview-storage-best-practices"></a>

Because certain types of data consume permanent storage in Neptune, use these best practices to avoid large spikes in storage growth:
+ When designing your graph data model, avoid as much as possible using property keys and user-facing values that are temporary in nature.
+ If you plan on making changes to your data model, do not load data onto an existing DB cluster using the new model until you have cleared the data in that DB cluster using the [fast reset API](manage-console-fast-reset.md). The best thing is often to load data that uses a new model onto a new DB cluster.
+ Transactions that operate on large amounts of data generate correspondingly large internal logs, which can permanently increase the high water mark of the internal log space. For example, a single transaction that deletes all the data in your DB cluster could generate a huge internal log that would require allocating a great deal of internal storage and thus permanently reduce space available for graph data.

  To avoid this, split large transactions into smaller ones and allow time between them so that the associated internal logs have a chance to expire and release their internal storage for re-use by subsequent logs.
+ For monitoring the growth of your Neptune cluster volume, you can set a CloudWatch alarm on the `VolumeBytesUsed` CloudWatch metric. This can be particularly helpful if your data is reaching the maximum size of the cluster volume. For more information, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html).

The only way to shrink the storage space used by your DB cluster when you have a large amount of unused allocated space is to export all the data in your graph and then reload it into a new DB cluster. See [Neptune's data export service and utility](machine-learning-data-export.md) for an easy way to export data from a DB cluster, and [Neptune's bulk loader](bulk-load.md) for an easy way to import data back into Neptune.

**Note**  
Creating and restoring a [snapshot](backup-restore-restore-snapshot.md) does not reduce the amount of storage allocated for your DB cluster, because a snapshot retains the original image of the cluster's underlying storage. If a substantial amount of your allocated storage is not being used, the only way to shrink the amount of allocated storage is to export your graph data and reload it into a new DB cluster.

## Neptune storage reliability and high availability
<a name="feature-overview-storage-reliability"></a>

Amazon Neptune is designed to be reliable, durable, and fault tolerant.

The fact that six copies of your Neptune data are maintained across three availability zones (AZs) ensures that storage of the data is highly durable, with very low likelihood of data loss. The data is replicated automatically across the availability zones regardless of whether there are DB instances in them, and the amount of replication is independent of the number of DB instances in your cluster.

This means that you can add a read-replica quickly, because Neptune doesn't make a new copy of the graph data. Instead, the read-replica connects to the cluster volume that already contains your data. Similarly, removing a read-replica doesn't remove any of the underlying data.

You can delete the cluster volume and its data only after deleting all of its DB instances.

Neptune also automatically detects failures in the segments that make up the cluster volume. When a copy of the data in a segment is corrupted, Neptune immediately repairs that segment, using other copies of the data within the same segment to ensure that the repaired data is current. As a result, Neptune avoids data loss and reduces the need to a perform point-in-time restore to recover from a disk failure.

# Connecting to Amazon Neptune Endpoints
<a name="feature-overview-endpoints"></a>

Amazon Neptune uses a cluster of DB instances rather than a single instance. Each Neptune connection is handled by a specific DB instance. When you connect to a Neptune cluster, the host name and port that you specify point to an intermediate handler called an *endpoint*. An endpoint is a URL that contains a host address and a port. Neptune endpoints use encrypted Transport Layer Security/Secure Sockets Layer (TLS/SSL) connections.

Neptune uses the endpoint mechanism to abstract these connections so that you don't have to hardcode the hostnames, or write your own logic for rerouting connections when some DB instances are unavailable.

Using endpoints, you can map each connection to the appropriate instance or group of instances, depending on your use case. Custom endpoints let you connect to subsets of DB instances. The following endpoints are available in a Neptune DB cluster:

## Neptune cluster endpoints
<a name="feature-overview-cluster-endpoints"></a>

A cluster endpoint is an endpoint for a Neptune DB cluster that connects to the current primary DB instance for that DB cluster. Each Neptune DB cluster has a cluster endpoint and one primary DB instance.

The cluster endpoint provides failover support for read/write connections to the DB cluster. Use the cluster endpoint for all write operations on the DB cluster, including inserts, updates, deletes, and data definition language (DDL) changes. You can also use the cluster endpoint for read operations, such as queries.

If the current primary DB instance of a DB cluster fails, Neptune automatically fails over to a new primary DB instance. During a failover, the DB cluster continues to serve connection requests to the cluster endpoint from the new primary DB instance, with minimal interruption of service.

The following example illustrates a cluster endpoint for a Neptune DB cluster.

`mydbcluster.cluster-123456789012.us-east-1.neptune.amazonaws.com:8182`

## Neptune reader endpoints
<a name="feature-overview-reader-endpoints"></a>

A reader endpoint is an endpoint for a Neptune DB cluster that connects to one of the available Neptune replicas for that DB cluster. Each Neptune DB cluster has a reader endpoint. If there is more than one Neptune replica, the reader endpoint directs each connection request to one of the Neptune replicas.

The reader endpoint provides round-robin routing for read-only connections to the DB cluster. Use the reader endpoint for read operations, such as queries.

You can't use the reader endpoint for write operations unless you have a single-instance cluster (a cluster with no read-replicas). In that case and that case only, the reader can be used for write operations as well as read operations.

The reader endpoint round-robin routing works by changing the host that the DNS entry points to. Each time you resolve the DNS, you get a different IP, and connections are opened against those IPs. After a connection is established, all the requests for that connection are sent to the same host. The client must create a new connection and resolve the DNS record again to get a connection to potentially different read replica. 

**Note**  
WebSockets connections are often kept alive for long periods. To get different read replicas, do the following:  
Ensure that your client resolves the DNS entry each time it connects.
Close the connection and reconnect. 

Various client software might resolve DNS in different ways. For example, if your client resolves DNS and then uses the IP for every connection, it directs all requests to a single host. 

DNS caching for clients or proxies resolves the DNS name to the same endpoint from the cache. This is a problem for both round robin routing and failover scenarios.

**Note**  
Disable any DNS caching settings to force DNS resolution each time.

The DB cluster distributes connection requests to the reader endpoint among available Neptune replicas. If the DB cluster contains only a primary DB instance, the reader endpoint serves connection requests from the primary DB instance. If a Neptune replica is created for that DB cluster, the reader endpoint continues to serve connection requests to the reader endpoint from the new Neptune replica, with minimal interruption in service.

The following example illustrates a reader endpoint for a Neptune DB cluster.

`mydbcluster.cluster-ro-123456789012.us-east-1.neptune.amazonaws.com:8182`

## Neptune instance endpoints
<a name="feature-overview-instance-endpoints"></a>

An instance endpoint is an endpoint for a DB instance in a Neptune DB cluster that connects to that specific DB instance. Each DB instance in a DB cluster, regardless of instance type, has its own unique instance endpoint. So, there is one instance endpoint for the current primary DB instance of the DB cluster. There is also one instance endpoint for each of the Neptune replicas in the DB cluster.

The instance endpoint provides direct control over connections to the DB cluster, for scenarios where using the cluster endpoint or reader endpoint might not be appropriate. For example, your client application might require fine-grained load balancing based on workload type. In this case, you can configure multiple clients to connect to different Neptune replicas in a DB cluster to distribute read workloads.

The following example illustrates an instance endpoint for a DB instance in a Neptune DB cluster.

`mydbinstance.123456789012.us-east-1.neptune.amazonaws.com:8182`

## Neptune custom endpoints
<a name="feature-overview-custom-endpoints"></a>

A custom endpoint for a Neptune cluster represents a set of DB instances that you choose. When you connect to the endpoint, Neptune chooses one of the instances in the group to handle the connection. You define which instances this endpoint refers to, and you decide what purpose the endpoint serves.

A Neptune DB cluster has no custom endpoints until you create one, and you can create up to five custom endpoints for each provisioned Neptune cluster.

The custom endpoint provides load-balanced database connections based on criteria other than the read-only or read/write capability of the DB instances. Because the connection can go to any DB instance associated with the endpoint, make sure that all the instances within that group share the same performance and memory capacity characteristics. When you use custom endpoints, you typically don't use the reader endpoint for that cluster.

This feature is intended for advanced users with specialized kinds of workloads where it isn't practical to keep all the Neptune Replicas in the cluster identical. With custom endpoints, you can adjust the capacity of the DB instances used with each connection.

For example, if you define several custom endpoints that connect to groups of instances with different instance classes, you can then direct users with different performance needs to the endpoints that best suit their use cases.

The following example illustrates a custom endpoint for a DB instance in a Neptune DB cluster:

`myendpoint.cluster-custom-123456789012.us-east-1.neptune.amazonaws.com:8182`

See [Working with custom endpoints](feature-custom-endpoint-membership.md) for more information.

## Neptune endpoint considerations
<a name="feature-overview-endpoint-considerations"></a>

Consider the following when working with Neptune endpoints:
+ Before using an instance endpoint to connect to a specific DB instance in a DB cluster, consider using the cluster endpoint or reader endpoint for the DB cluster instead.

  The cluster endpoint and reader endpoint provide support for high-availability scenarios. If the primary DB instance of a DB cluster fails, Neptune automatically fails over to a new primary DB instance. It does so by either promoting an existing Neptune replica to a new primary DB instance or creating a new primary DB instance. If a failover occurs, you can use the cluster endpoint to reconnect to the newly promoted or created primary DB instance, or use the reader endpoint to reconnect to one of the other Neptune replicas in the DB cluster. 

  If you don't take this approach, you can still make sure that you're connecting to the right DB instance in the DB cluster for the intended operation. To do so, you can manually or programmatically discover the resulting set of available DB instances in the DB cluster and confirm their instance types after failover, before using the instance endpoint of a specific DB instance.

  For more information about failovers, see [Fault tolerance for a Neptune DB cluster](backup-restore-overview-fault-tolerance.md).

   
+ The reader endpoint only directs connections to available Neptune replicas in a Neptune DB cluster. It does not direct specific queries. 
**Important**  
Neptune does not load balance.

  If you want to load balance queries to distribute the read workload for a DB cluster, you must manage that in your application. You must use instance endpoints to connect directly to Neptune replicas to balance the load.

   
+ The reader endpoint round-robin routing works by changing the host that the DNS entry points to. The client must create a new connection and resolve the DNS record again to get a connection to a potentially new read replica.

   
+ During a failover, the reader endpoint might direct connections to the new primary DB instance of a DB cluster for a short time, when a Neptune replica is promoted to the new primary DB instance.

# Working with custom endpoints in Neptune
<a name="feature-custom-endpoint-membership"></a>

When you add a DB instance to a custom endpoint or remove it from a custom endpoint, any existing connections to that DB instance remain active.

You can define a list of DB instances to include in a custom endpoint (the *static* list), or one to exclude from the custom endpoint (the. *exclusion* list). You can use the inclusion/exclusion mechanism to subdivide the DB instances into groups and make sure that the custom endpoints covers all the DB instances in the cluster. Each custom endpoint can contain only one of these list types.

In the AWS Management Console, the choice is represented by the check box **Attach future instances added to this cluster**. When you keep check box clear, the custom endpoint uses a static list containing only the DB instances specified in the dialog. When you check the check box, the custom endpoint uses an exclusion list. In that case, the custom endpoint represents all DB instances in the cluster (including any that you add in the future) except the ones left unselected in the dialog.

Neptune doesn't change the DB instances specified in the static or exclusion lists when DB instances change roles between primary instance and Neptune Replica because of failover or promotion. 

You can associate a DB instance with more than one custom endpoint. For example, suppose you add a new DB instance to a cluster. In that case the DB instance is added to all custom endpoints for which it is eligible. The static or exclusion list defined for it determines which DB instance can be added to it.

If an endpoint includes a static list of DB instances, newly added Neptune Replicas aren't added to it. Conversely, if the endpoint has an exclusion list, newly added Neptune Replicas are added to it provided that they aren't named in the exclusion list.

 If a Neptune Replica becomes unavailable, it remains associated with its custom endpoints. This is true whether it is unhealthy, stopped, rebooting, or unavailable for another reason. However, as long as it remains unavailable you can't connect to it through any endpoint.

Because newly created Neptune clusters have no custom endpoints, you must create and manage them yourself. This is also true for Neptune clusters restored from snapshots, because custom endpoints are not included in the snapshot. You have create them again after restoring, and choose new endpoint names if the restored cluster is in the same region as the original one.

## Creating a custom endpoint
<a name="feature-custom-endpoint-create"></a>

Manage custom endpoints using the Neptune console. Do this by navigating to the details page for your Neptune cluster and use the controls in the **Custom Endpoints** section.

1. Sign in to the AWS Management Console, and open the Amazon Neptune console at [https://console.aws.amazon.com/neptune/home](https://console.aws.amazon.com/neptune/home).

1. Navigate to the cluster detail page.

1. Choose the `Create custom endpoint` action in the **Endpoints** section.

1. Choose a name for the custom endpoint that is unique for your user ID and region. The name must be 63 characters or less in length and take the following form:

   `endpointName.cluster-custom-customerDnsIdentifier.dnsSuffix`

   Because custom endpoint names don't include the name of your cluster, you don't have to change those names if you rename a cluster. However, you can't reuse the same custom endpoint name for more than one cluster in the same region. Give each custom endpoint a name that is unique across the clusters owned by your user ID within a particular region.

1. To choose a list of DB instances that remains the same even as the cluster expands, keep the check box **Attach future instances added to this cluster** clear. When that check box is checked, the custom endpoint dynamically adds any new instances as that are added to the cluster.

## Viewing custom endpoints
<a name="feature-custom-endpoints-view"></a>

1. Sign in to the AWS Management Console, and open the Amazon Neptune console at [https://console.aws.amazon.com/neptune/home](https://console.aws.amazon.com/neptune/home).

1. Navigate to the cluster detail page of your DB cluster.

1. The **Endpoints** section only contains information about custom endpoints (details about the built-in endpoints are listed in the main **Details** section). To see details for a specific custom endpoint, select its name to bring up the detail page for that endpoint.

## Editing a custom endpoint
<a name="feature-custom-endpoint-edit"></a>

You can edit the properties of a custom endpoint to change which DB instances are associated with it. You can also switch between a static list and an exclusion list.

You can't connect to or use a custom endpoint while the changes from an edit action are in progress. It might take some minutes after you make a change before the endpoint status returns to **Available** and you can connect again.

1. Sign in to the AWS Management Console, and open the Amazon Neptune console at [https://console.aws.amazon.com/neptune/home](https://console.aws.amazon.com/neptune/home).

1. Navigate to the cluster detail page.

1. In the **Endpoints** section, choose the name of the custom endpoint you want to edit.

1. In the detail page for that endpoint, choose the **Edit** action.

## Deleting a custom endpoint
<a name="feature-custom-endpoint-delete"></a>

1. Sign in to the AWS Management Console, and open the Amazon Neptune console at [https://console.aws.amazon.com/neptune/home](https://console.aws.amazon.com/neptune/home).

1. Navigate to the cluster detail page.

1. In the **Endpoints** section, choose the name of the custom endpoint you want to delete.

1. In the detail page for that endpoint, choose the **Delete** action.

# Neptune Lab Mode
<a name="features-lab-mode"></a>

You can use Amazon Neptune *lab mode* to enable new features that are in the current Neptune engine release, but that aren't yet ready for production use and aren't enabled by default. This lets you try out these features in your development and test environments.

## Using Neptune Lab Mode
<a name="features-lab-mode-using"></a>

Use the [`neptune_lab_mode` DB cluster parameter](parameters.md#parameters-db-cluster-parameters-neptune_lab_mode) to enable or disable features. You do this by including `(feature name)=enabled` or `(feature name)=disabled` in the value of the `neptune_lab_mode` parameter in the DB Cluster Parameter group.

For example, in this engine release you might set the `neptune_lab_mode` parameter to `Streams=disabled, ReadWriteConflictDetection=enabled`.

For information about how to edit the DB cluster parameter group for your database, see [Editing a Parameter Group](parameter-groups.md#parameters-editgroup). Note that you cannot edit the default DB cluster parameter group; if you are using the default group, you must create a new DB cluster parameter group before you can set the `neptune_lab_mode` parameter.

**Note**  
When you make a change to a static DB cluster parameter such as `neptune_lab_mode`, you must re-start the primary (writer) instance of the cluster for the change to take effect. Before [Release: 1.2.0.0 (2022-07-21)](engine-releases-1.2.0.0.md), all the read-replicas in a DB cluster would then automatically be rebooted when the primary instance restarted.  
Beginning with [Release: 1.2.0.0 (2022-07-21)](engine-releases-1.2.0.0.md), restarting the primary instance does not cause any of the replicas to restart. This means that you must restart each instance separately to pick up a DB cluster parameter change (see [Parameter groups](parameter-groups.md)).

**Important**  
At present, if you supply the wrong lab-mode parameters or your request fails for another reason, you may not be notified of the failure. You should always verify that a lab-mode change request has succeeded by calling the [status API](access-graph-status.md) as shown below:  

```
curl -G https://your-neptune-endpoint:port/status
```
The status results include lab-mode information which will show whether or not the changes you requested were made:  

```
{
  "status":"healthy",
  "startTime":"Wed Dec 29 02:29:24 UTC 2021",
  "dbEngineVersion":"development",
  "role":"writer",
  "dfeQueryEngine":"viaQueryHint",
  "gremlin":{"version":"tinkerpop-3.5.2"},
  "sparql":{"version":"sparql-1.1"},
  "opencypher":{"version":"Neptune-9.0.20190305-1.0"},
  "labMode":{
    "ObjectIndex":"disabled",
    "ReadWriteConflictDetection":"enabled"
  },
  "features":{
    "LookupCache":{"status":"Available"},
    "ResultCache":{"status":"disabled"},
    "IAMAuthentication":"disabled",
    "Streams":"disabled",
    "AuditLog":"disabled"
  },
  "settings":{"clusterQueryTimeoutInMs":"120000"}
}
```

The following features are currently accessed using lab mode:

## The OSGP index
<a name="features-lab-mode-features-osgp-index"></a>

Neptune can now maintain a fourth index, namely the OSGP index, which is useful for data sets having a large number of predicates (see [Enabling an OSGP Index](feature-overview-storage-indexing.md#feature-overview-storage-indexing-osgp)).

You can enable an OSGP index in a new, empty Neptune DB cluster by setting `ObjectIndex=enabled` in the `neptune_lab_mode` DB cluster parameter. An OSGP index can **only** be enabled in a new, empty DB cluster.

By default, the OSGP index is disabled.

**Note**  
After setting the `neptune_lab_mode` DB cluster parameter so as to enable the OSGP index, you must restart the writer instance of the cluster for the change to take effect.

**Warning**  
If you disable an enabled OSGP index by setting `ObjectIndex=disabled` and then later re-enable it after adding more data, the index will not build correctly. On-demand rebuilding of the index is not supported, so you should only enable the OSGP index when the database is empty.

## Enabling dictionary garbage collection
<a name="features-lab-mode-features-gc"></a>

Dictionary garbage collection can be enabled for property graph data when neptune-streams is not enabled via the `DictionaryGCMode` parameter. The concurrency can be controlled via the `DictionaryGCConcurrency` parameter. See [Dictionary garbage collection](storage-gc.md) for more details.

## Formalized Transaction Semantics
<a name="features-lab-mode-features-transaction-semantics"></a>

Neptune has updated the formal semantics for concurrent transactions (see [Transaction Semantics in Neptune](transactions.md)).

Use `ReadWriteConflictDetection` as the name in the `neptune_lab_mode` parameter that enables or disables formalized transaction semantics.

By default, formalized transaction semantics are already enabled. If you want to revert to the earlier behavior, include `ReadWriteConflictDetection=disabled` in the value set for the DB Cluster `neptune_lab_mode` parameter.

## Extended datetime support
<a name="labmode-extended-datetime-support"></a>

 Neptune has extended support for the datetime functionality. To enable datetime with extended formats, include `DatetimeMillisecond=enabled` in the value set for the DB Cluster `neptune_lab_mode` parameter. 

## StrictTimeoutValidation
<a name="labmode-StrictTimeoutValidation"></a>

**Note**  
This feature is available starting in [Neptune engine release 1.3.2.0](engine-releases-1.3.2.0.md).

 Default value: enabled (disabled by default prior to [Neptune engine release 1.4.0.0](engine-releases-1.4.0.0.md)) 

 Allowed values: enabled/disabled 

 When this parameter is `enabled`, a per-query timeout value specified as a request option or a query hint cannot exceed the value set globally in the [`neptune_query_timeout`](parameters.md#parameters-db-cluster-parameters-neptune_query_timeout) parameter group setting. If the per-query timeout exceeds the global setting, Neptune throws an `InvalidParameterException`. In engine versions prior to 1.4.0.0, this parameter was `disabled` by default and had to be explicitly enabled. 

 This setting can be confirmed in a response on the `/status` endpoint when the value is `disabled`. 

 For more information, see [Per-query timeouts](best-practices-gremlin-java-per-query-timeout.md). 

## AccurateQRCMemoryEstimation
<a name="labmode-AccurateQRCMemoryEstimation"></a>

**Note**  
This feature is available starting in [Neptune engine release 1.4.0.0](engine-releases-1.4.0.0.md).

 Default value: disabled 

 Allowed values: enabled/disabled 

 [Gremlin query result cache](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-results-cache.html), when enabled, allows caching of query results on the database. By deafult the approximate estimate is used to determine the size of the result cached, with this lab mode param `AccurateQRCMemoryEstimation` enabled, the size estimation for cached results will use accurate size estimates instead of approximate. This labmode parameter is available starting from Neptune engine release version 1.4.0.0. 

# The Amazon Neptune alternative query engine (DFE)
<a name="neptune-dfe-engine"></a>

Amazon Neptune has an alternative query engine known as the DFE that uses DB instance resources such as CPU cores, memory, and I/O more efficiently than the original Neptune engine.

**Note**  
With large data sets, the DFE engine may not run well on t3 instances.

The DFE engine runs SPARQL, Gremlin and openCypher queries, and supports a wide variety of plan types, including left-deep, bushy, and hybrid ones. Plan operators can invoke both compute operations, which run on a reserved set of compute cores, and I/O operations, each of which runs on its own thread in an I/O thread pool.

The DFE uses pre-generated statistics about your Neptune graph data to make informed decisions about how to structure queries. See [DFE statistics](neptune-dfe-statistics.md) for information about how these statistics are generated.

The choice of plan type and the number of compute threads used is made automatically based on pre-generated statistics and on the resources that are available in the Neptune head node. The order of results is not predetermined for plans that have internal compute parallelism.

# Controlling where the Neptune DFE engine is used
<a name="neptune-dfe-enabling-disabling"></a>

By default, the [neptune\$1dfe\$1query\$1engine](parameters.md#parameters-instance-parameters-neptune_dfe_query_engine) instance parameter of an instance is set to `viaQueryHint`, which causes the DFE engine to be used only for openCypher queries and for Gremlin and SPARQL queries that explicitly include the `useDFE` query hint set to `true`.

You can fully enable the DFE engine so that it is used wherever possible by setting the `neptune_dfe_query_engine` instance parameter to `enabled`.

You can also disable the DFE by including the `useDFE` query hint for a particular [Gremlin query](gremlin-query-hints-useDFE.md) or [SPARQL query](sparql-query-hints-useDFE.md). This query hint lets you prevent the DFE from executing that particular query.

You can determine whether or not the DFE is enabled in an instance using an [Instance Status](access-graph-status.md) call, like this:

```
curl -G https://your-neptune-endpoint:port/status
```

The status response then specifies whether the DFE is enabled or not:

```
{
  "status":"healthy",
  "startTime":"Wed Dec 29 02:29:24 UTC 2021",
  "dbEngineVersion":"development",
  "role":"writer",
  "dfeQueryEngine":"viaQueryHint",
  "gremlin":{"version":"tinkerpop-3.5.2"},
  "sparql":{"version":"sparql-1.1"},
  "opencypher":{"version":"Neptune-9.0.20190305-1.0"},
  "labMode":{
    "ObjectIndex":"disabled",
    "ReadWriteConflictDetection":"enabled"
  },
  "features":{
    "ResultCache":{"status":"disabled"},
    "IAMAuthentication":"disabled",
    "Streams":"disabled",
    "AuditLog":"disabled"
  },
  "settings":{"clusterQueryTimeoutInMs":"120000"}
}
```

The Gremlin `explain` and `profile` results tell you whether a query is being executed by the DFE. See [Information contained in a Gremlin `explain` report](gremlin-explain-api.md#gremlin-explain-api-results) for `explain` and [DFE `profile` reports](gremlin-profile-api.md#gremlin-profile-dfe-output) for `profile`.

Similarly, SPARQL `explain` tells you whether a SPARQL query is being executed by the DFE. See [Example of SPARQL `explain` output when the DFE is enabled](sparql-explain-examples.md#sparql-explain-output-dfe) and [`DFENode` operator](sparql-explain-operators.md#sparql-explain-operator-dfenode) for more details.

# Query constructs supported by the Neptune DFE
<a name="neptune-dfe-suppoorts-subset"></a>

Currently, the Neptune DFE supports a subset of SPARQL and Gremlin query constructs.

For SPARQL, this is the subset of conjunctive [basic graph patterns](https://www.w3.org/TR/sparql11-query/#BasicGraphPatterns).

For Gremlin, it is generally the subset of queries that contain a chain of traversals which do not contain some of the more complex steps.

You can find out whether one of your queries is being executed in whole or in part by the DFE as follows:
+ In Gremlin, `explain` and `profile` results tell you what parts of a query are being executed by the DFE, if any. See [Information contained in a Gremlin `explain` report](gremlin-explain-api.md#gremlin-explain-api-results) for `explain` and [DFE `profile` reports](gremlin-profile-api.md#gremlin-profile-dfe-output) for `profile`. Also, see [Tuning Gremlin queries using `explain` and `profile`](gremlin-traversal-tuning.md).

  Details about Neptune engine support for individual Gremlin steps are documented in [Gremlin step support](gremlin-step-support.md).
+ Similarly, SPARQL `explain` tells you whether a SPARQL query is being executed by the DFE. See [Example of SPARQL `explain` output when the DFE is enabled](sparql-explain-examples.md#sparql-explain-output-dfe) and [`DFENode` operator](sparql-explain-operators.md#sparql-explain-operator-dfenode) for more details.

# Managing statistics for the Neptune DFE to use
<a name="neptune-dfe-statistics"></a>

**Note**  
Support for openCypher depends on the DFE query engine in Neptune.  
The DFE engine was first available in lab mode in [Neptune engine release 1.0.3.0](engine-releases-1.0.3.0.md), and starting in [Neptune engine release 1.0.5.0](engine-releases-1.0.5.0.md), it became enabled by default, but only for use with query hints and for openCypher support.   
Beginning with [Neptune engine release 1.1.1.0](engine-releases-1.1.1.0.md) the DFE engine is no longer in lab mode, and is now controlled using the [neptune\$1dfe\$1query\$1engine](parameters.md#parameters-instance-parameters-neptune_dfe_query_engine) instance parameter in an instance's DB parameter group.

The DFE engine uses information about the data in your Neptune graph to make effective trade-offs when planning query execution. This information takes the form of statistics that include so-called characteristic sets and predicate statistics that can guide query planning.

Starting with [ engine release 1.2.1.0](engine-releases-1.2.1.0.md), you can retrieve [summary information](neptune-graph-summary.md) about your graph from these statistics using the [GetGraphSummary](iam-dp-actions.md#getgraphsummary) API or the `summary` endpoint.

These DFE statistics are currently re-generated whenever either more than 10% of data in your graph has changed or when the latest statistics are more than 10 days old. However, these triggers may change in the future.

**Note**  
Statistics generation is disabled on `T3` and `T4g` instances because it can exceed the memory capacity of those instance types.

You can manage the generation of DFE statistics through one of the following endpoints:
+ `https://your-neptune-host:port/rdf/statistics `    (for SPARQL).
+ `https://your-neptune-host:port/propertygraph/statistics`    (for Gremlin and openCypher), and its alternate version: `https://your-neptune-host:port/pg/statistics`.

**Note**  
As of [engine release 1.1.1.0](engine-releases-1.1.1.0.md), the Gremlin statistics endpoint (`https://your-neptune-host:port/gremlin/statistics`) is being deprecated in favor of the `propertygraph` or `pg` endpoint. It is still supported for backward compatibility but may be removed in future releases.  
As of [engine release 1.2.1.0](engine-releases-1.2.1.0.md), the SPARQL statistics endpoint (`https://your-neptune-host:port/sparql/statistics`) is being deprecated in favor of the `rdf` endpoint. It is still supported for backward compatibility but may be removed in future releases.

In the examples below, `$STATISTICS_ENDPOINT` stands for any of these endpoint URLs.

**Note**  
If a DFE statistics endpoint is on a reader instance, the only requests that it can process are [status requests](#neptune-dfe-statistics-status). Other requests will fail with a `ReadOnlyViolationException`.

## Size limits for DFE statistic generation
<a name="neptune-dfe-statistics-limits"></a>

Currently, DFE statistics generation halts if either of the following size limits is reached:
+ The number of characteristic sets generated may not exceed 50,000.
+ The number of predicate statistics generated may not exceed one million.

These limits may change.

## Current status of DFE statistics
<a name="neptune-dfe-statistics-status"></a>

You can check the current status of DFE statistics using the following `curl` request:

```
curl -G "$STATISTICS_ENDPOINT"
```

The response to a status request contains the following fields:
+ `status`  –   the HTTP return code of the request. If the request succeeded, the code is `200`. See [Common errors](#neptune-dfe-statistics-errors) for a list of common errors.
+ `payload`:
  + `autoCompute`  –   (Boolean) Indicates whether or not automatic statistics generation is enabled.
  + `active`  –   (Boolean) Indicates whether or not DFE statistics generation is enabled at all.
  + `statisticsId `  –   Reports the ID of the current statistics generation run. A value of` -1 `indicates that no statistics have been generated.
  + `date`  –   The UTC time at which DFE statistics have most recently been generated, in ISO 8601 format.
**Note**  
Prior to [engine release 1.2.1.0](engine-releases-1.2.1.0.md), this was represented with minute precision, but from engine release 1.2.1.0 forward, it is represented with millisecond precision (for example, `2023-01-24T00:47:43.319Z`).
  + `note`  –   A note about problems in the case where statistics are invalid.
  + `signatureInfo`  –   Contains information about the characteristic sets generated in the statistics (prior to [engine release 1.2.1.0](engine-releases-1.2.1.0.md), this field was named `summary`). These are generally not directly actionable:
    + `signatureCount`  –   The total number of signatures across all characteristic sets.
    + `instanceCount`  –   The total number of characteristic-set instances.
    + `predicateCount`  –   The total number of unique predicates.

The response to a status request when no statistics have been generated looks like this:

```
{
  "status" : "200 OK",
  "payload" : {
    "autoCompute" : true,
    "active" : false,
    "statisticsId" : -1
   }
}
```

If DFE statistics are available, the response looks like this:

```
{
  "status" : "200 OK",
  "payload" : {
    "autoCompute" : true,
    "active" : true,
    "statisticsId" : 1588893232718,
    "date" : "2020-05-07T23:13Z",
    "summary" : {
      "signatureCount" : 5,
      "instanceCount" : 1000,
      "predicateCount" : 20
    }
  }
}
```

If the generation of DFE statistics failed, for example because it exceeded the [statistics size limitation](#neptune-dfe-statistics-limits), the response looks like this:

```
{
  "status" : "200 OK",
  "payload" : {
    "autoCompute" : true,
    "active" : false,
    "statisticsId" : 1588713528304,
    "date" : "2020-05-05T21:18Z",
    "note" : "Limit reached: Statistics are not available"
  }
}
```

## Disabling automatic generation of DFE statistics
<a name="neptune-dfe-statistics-auto-disable"></a>

By default, auto-generation of DFE statistics is enabled when you enable DFE.

You can disable auto-generation as follows:

```
curl -X POST "$STATISTICS_ENDPOINT" -d '{ "mode" : "disableAutoCompute" }'
```

If the request is successful, the HTTP response code is `200` and the response is:

```
{
  "status" : "200 OK"
}
```

You can confirm that automatic generation is disabled by issuing a [status request](#neptune-dfe-statistics-status) and checking that the `autoCompute` field in the response is set to `false`.

Disabling auto-generation of statistics does not terminate a statistics compution that is in progress.

If you make a request to disable auto-generation to a reader instance rather than the writer instance of your DB cluster, the request fails with an HTTP return code of 400 and output like the following:

```
{
  "detailedMessage" : "Writes are not permitted on a read replica instance",
  "code" : "ReadOnlyViolationException",
  "requestId":"8eb8d3e5-0996-4a1b-616a-74e0ec32d5f7"
}
```

See [Common errors](#neptune-dfe-statistics-errors) for a list of other common errors.

## Re-enabling automatic generation of DFE statistics
<a name="neptune-dfe-statistics-auto-re-enable"></a>

By default, auto-generation of DFE statistics is already enabled when you enable DFE. If you disable auto-generation, you can re-enable it later as follows:

```
curl -X POST "$STATISTICS_ENDPOINT" -d '{ "mode" : "enableAutoCompute" }'
```

If the request is successful, the HTTP response code is `200` and the response is:

```
{
  "status" : "200 OK"
}
```

You can confirm that automatic generation is enabled by issuing a [status request](#neptune-dfe-statistics-status) and checking that the `autoCompute` field in the response is set to `true`.

## Manually triggering the generation of DFE statistics
<a name="neptune-dfe-statistics-manual"></a>

You can initiate DFE statistics generation manually as follows:

```
curl -X POST "$STATISTICS_ENDPOINT" -d '{ "mode" : "refresh" }'
```

If the request succeeds, the output is as follows, with an HTTP return code of 200:

```
{
  "status" : "200 OK",
  "payload" : {
    "statisticsId" : 1588893232718
  }
}
```

The `statisticsId` in the output is the ID of the statistics generation run that is currently occurring. If a run was already in process at the time of the request, the request returns the ID of that run rather than initiating a new one. Only one statistics generation run can occur at a time.

If a fail-over happens while DFE statistics are being generated, the new writer node will pick up the last processed checkpoint and resume the statistics run from there.

## Using the `StatsNumStatementsScanned` CloudWatch metric to monitor statistics computation
<a name="neptune-dfe-statistics-monitoring"></a>

The `StatsNumStatementsScanned` CloudWatch metric returns the total number of statements scanned for statistics computation since the server started. It is updated at each statistics computation slice.

Every time statistics computation is triggered, this number increases, and when no computation is happening, it remains constant. Looking at a plot of `StatsNumStatementsScanned` values over time therefore gives you a pretty clear picture of when statistics computation was happening and how fast:

![\[Graph of StatsNumStatementsScanned metric values\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/StatsNumStatementsScanned-graph.png)


When computation is happening, the slope of the graph shows you how fast (the steeper the slope, the faster statistics are being computed).

If the graph is simply a flat line at 0, the statistics feature has been enabled, but no statistics have been computed at all. If the statistics feature has been disabled, or if you're using an engine version that does not support statistics computation, the `StatsNumStatementsScanned` does not exist.

As mentioned earlier, you can disable statistics computation using the statistics API, but leaving it off can result in statistics not being up to date, which in turn can result in poor query plan generation for the DFE engine.

See [Monitoring Neptune Using Amazon CloudWatch](cloudwatch.md) for information about how to use CloudWatch.

## Using AWS Identity and Access Management (IAM) authentication with DFE statistics endpoints
<a name="neptune-dfe-statistics-iam-auth"></a>

You can access DFE statistics endpoints securely with IAM authentication by using [awscurl](https://github.com/okigan/awscurl) or any other tool that works with HTTPS and IAM. See [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl) to see how to set up the proper credentials. Once you have done that, you can then make a status request like this:

```
awscurl "$STATISTICS_ENDPOINT" \
    --region (your region) \
    --service neptune-db
```

Or, for example, you can create a JSON file named `request.json` that contains:

```
{ "mode" : "refresh" }
```

You can then manually initiate statistics generation like this:

```
awscurl "$STATISTICS_ENDPOINT" \
    --region (your region) \
    --service neptune-db \
    -X POST -d @request.json
```

## Deleting DFE statistics
<a name="neptune-dfe-statistics-delete"></a>

You can delete all statistics in the database by making an HTTP DELETE request to the statistics endpoint:

```
curl -X "DELETE" "$STATISTICS_ENDPOINT"
```

Valid HTTP return codes are:
+ `200`   –   the delete was successful.

  In this case, a typical response would look like:

  ```
  {
    "status" : "200 OK",
    "payload" : {
        "active" : false,
        "statisticsId" : -1
    }
  }
  ```
+ `204`   –   there were no statistics to delete.

  In this case, the response is blank (no response).

If you send a delete request to a statistics endpoint on a reader node, a `ReadOnlyViolationException` is thrown.

## Common error codes for DFE statistics request
<a name="neptune-dfe-statistics-errors"></a>

The following is a list of common errors that can occur when you make a request to a statistics endpoint:
+ `AccessDeniedException`   –   *Return code:* `400`. *Message:* `Missing Authentication Token`.
+ `BadRequestException` (for Gremlin and openCypher)   –   *Return code:* `400`. *Message:* `Bad route: /pg/statistics`.
+ `BadRequestException` (for RDF data)   –   *Return code:* `400`. *Message:* `Bad route: /rdf/statistics`.
+ `InvalidParameterException`   –   *Return code:* `400`. *Message:* `Statistics command parameter 'mode' has unsupported value 'the invalid value'`.
+ `MissingParameterException`   –   *Return code:* `400`. *Message:* `Content-type header not specified.`.
+ `ReadOnlyViolationException`   –   *Return code:* `400`. *Message:* `Writes are not permitted on a read replica instance`.

For example, if you make a request when the DFE and statistics are not enabled, you would get a response like the following:

```
{
  "code" : "BadRequestException",
  "requestId" : "b2b8f8ee-18f1-e164-49ea-836381a3e174",
  "detailedMessage" : "Bad route: /sparql/statistics"
}
```

# Getting a quick summary report about your graph
<a name="neptune-graph-summary"></a>

The Neptune graph summary API retrieves the following information about your graph:
+ For property (PG) graphs, the graph summary API returns a read-only list of node and edge labels and property keys, along with counts of nodes, edges, and properties.
+ For resource description framework (RDF) graphs, the graph summary API returns a read-only list of classes and predicate keys, along with counts of quads, subjects, and predicates.

**Note**  
The graph summary API was introduced in Neptune [engine release 1.2.1.0](engine-releases-1.2.1.0.md).

With the graph summary API, you can quickly gain a high-level understanding of your graph data size and content. You can also use the API interactively within a Neptune notebook using the [`%summary`](notebooks-magics.md#notebooks-line-magics-summary) Neptune Workbench magic. In a graph application, the API can be used to improve search results by providing discovered node or edge labels as part of the search.

Graph summary data is drawn from the [DFE statistics](neptune-dfe-statistics.md) computed by the [Neptune DFE engine](neptune-dfe-engine.md) during runtime, and is available whenever DFE statistics are available. Statistics are enabled by default when you create a new Neptune DB cluster.

**Note**  
Statistics generation is disabled on `t3` and `t4` instance types (that is, on `db.t3.medium` and `db.t4g.medium` instance types) to conserve memory. As a result, graph summary data is not available either on those instance types.

You can check the status of DFE statistics using the [statistics status API](neptune-dfe-statistics.md#neptune-dfe-statistics-status). As long as auto-generation of statistics has not [been disabled](neptune-dfe-statistics.md#neptune-dfe-statistics-auto-disable), statistics are automatically updated periodically.

If you want to be sure that statistics are as up to date as possible when you request a graph summary, you can [manually trigger a statistics update](neptune-dfe-statistics.md#neptune-dfe-statistics-manual) right before retrieving the summary. If the graph is changing while the statistics are being computed, they will necessarily lag slightly behind, but not by much.

## Using the graph summary API to retrieve graph summary information
<a name="neptune-graph-summary-retrieving"></a>

For a property graph that you query using Gremlin or openCypher, you can retrieve a graph summary from the property-graph summary endpoint. There is both a long and a short URI for this endpoint:
+ `https://your-neptune-host:port/propertygraph/statistics/summary`
+ `https://your-neptune-host:port/pg/statistics/summary`

For an RDF graph that you query using SPARQL, you can retrieve a graph summary from the RDF summary endpoint:
+ `https://your-neptune-host:port/rdf/statistics/summary`

These endpoints are read-only, and only support an HTTP `GET` operation. If \$1GRAPH\$1SUMMARY\$1ENDPOINT is set to the address of whichever endpoint you want to query, you can retrieve the summary data using `curl` and HTTP `GET` as follows:

```
curl -G "$GRAPH_SUMMARY_ENDPOINT"
```

If no statistics are available when you try to retrieve a graph summary, the response looks like this:

```
{
  "detailedMessage": "Statistics are not available. Summary can only be generated after statistics are available.",
  "requestId": "48c1f788-f80b-b69c-d728-3f6df579a5f6",
  "code": "StatisticsNotAvailableException"
}
```

## The `mode` URL query parameter for the graph summary API
<a name="neptune-graph-summary-mode"></a>

The graph summary API accepts a URL query parameter named `mode`, which can take one of two values, namely `basic` (the default) and `detailed`. For an RDF graph, the `detailed` mode graph summary response contains an additional `subjectStructures` field. For a property graph, the detailed graph summary response contains two additional fields, namely `nodeStructures` and `edgeStructures`.

To request a `detailed` graph summary response, include the `mode` parameter as follows:

```
curl -G "$GRAPH_SUMMARY_ENDPOINT?mode=detailed"
```

If the `mode` parameter isn't present, `basic` mode is used by default, so while it is possible to specify `?mode=basic` explicitly, this is not necessary.

## Graph summary response for a property graph (PG)
<a name="neptune-graph-summary-pg-response"></a>

For an empty property graph, the detailed graph summary response looks like this:

```
{
  "status" : "200 OK",
  "payload" : {
    "version" : "v1",
    "lastStatisticsComputationTime" : "2023-01-10T07:58:47.972Z",
    "graphSummary" : {
      "numNodes" : 0,
      "numEdges" : 0,
      "numNodeLabels" : 0,
      "numEdgeLabels" : 0,
      "nodeLabels" : [ ],
      "edgeLabels" : [ ],
      "numNodeProperties" : 0,
      "numEdgeProperties" : 0,
      "nodeProperties" : [ ],
      "edgeProperties" : [ ],
      "totalNodePropertyValues" : 0,
      "totalEdgePropertyValues" : 0,
      "nodeStructures" : [ ],
      "edgeStructures" : [ ]
    }
  }
}
```

A property graph (PG) summary response has the following fields:
+ **`status`**   –   the HTTP return code of the request. If the request succeeded, the code is 200.

  See [Common graph summary errors](#neptune-graph-summary-errors) for a list of common errors.
+ **`payload`**
  + **`version`**   –   The version of this graph summary response.
  + **`lastStatisticsComputationTime `**   –   The timestamp, in ISO 8601 format, of the time at which Neptune last computed [statistics](neptune-dfe-statistics.md).
  + **`graphSummary`**
    + **`numNodes`**   –   The number of nodes in the graph.
    + **`numEdges`**   –   The number of edges in the graph.
    + **`numNodeLabels`**   –   The number of distinct node labels in the graph.
    + **`numEdgeLabels`**   –   The number of distinct edge labels in the graph.
    + **`nodeLabels`**   –   List of distinct node labels in the graph.
    + **`edgeLabels`**   –   List of distinct edge labels in the graph.
    + **`numNodeProperties`**   –   The number of distinct node properties in the graph.
    + **`numEdgeProperties`**   –   The number of distinct edge properties in the graph.
    + **`nodeProperties`**   –   List of distinct node properties in the graph, along with the count of nodes where each property is used.
    + **`edgeProperties`**   –   List of distinct edge properties in the graph along with the count of edges where each property is used.
    + **`totalNodePropertyValues`**   –   Total number of usages of all node properties.
    + **`totalEdgePropertyValues`**   –   Total number of usages of all edge properties.
    + **`nodeStructures`**   –   *This field is only present when `mode=detailed` is specified in the request.* It contains a list of node structures, each of which contains the following fields:
      + **`count`**   –   Number of nodes that have this specific structure.
      + **`nodeProperties`**   –   List of node properties present in this specific structure.
      + **`distinctOutgoingEdgeLabels`**   –   List of distinct outgoing edge labels present in this specific structure.
    + **`edgeStructures`**   –   *This field is only present when `mode=detailed` is specified in the request.* It contains a list of edge structures, each of which contains the following fields:
      + **`count`**   –   Number of edges that have this specific structure.
      + **`edgeProperties`**   –   List of edge properties present in this specific structure.

## Graph summary response for an RDF graph
<a name="neptune-graph-summary-rdf-response"></a>

For an empty RDF graph, the detailed graph summary response looks like this:

```
{
  "status" : "200 OK",
  "payload" : {
    "version" : "v1",
    "lastStatisticsComputationTime" : "2023-01-10T07:58:47.972Z",
    "graphSummary" : {
      "numDistinctSubjects" : 0,
      "numDistinctPredicates" : 0,
      "numQuads" : 0,
      "numClasses" : 0,
      "classes" : [ ],
      "predicates" : [ ],
      "subjectStructures" : [ ]
    }
  }
}
```

An RDF graph summary response has the following fields:
+ **`status`**   –   the HTTP return code of the request. If the request succeeded, the code is 200.

  See [Common graph summary errors](#neptune-graph-summary-errors) for a list of common errors.
+ **`payload`**
  + **`version`**   –   The version of this graph summary response.
  + **`lastStatisticsComputationTime `**   –   The timestamp, in ISO 8601 format, of the time at which Neptune last computed [statistics](neptune-dfe-statistics.md).
  + **`graphSummary`**
    + **`numDistinctSubjects`**   –   The number of distinct subjects in the graph.
    + **`numDistinctPredicates`**   –   The number of distinct predicates in the graph.
    + **`numQuads`**   –   The number of quads in the graph.
    + **`numClasses`**   –   The number of classes in the graph.
    + **`classes`**   –   List of classes in the graph.
    + **`predicates`**   –   List of predicates in the graph, along with the predicate counts.
    + **`subjectStructures`**   –   *This field is only present when `mode=detailed` is specified in the request.* It contains a list of subject structures, each of which contains the following fields:
      + **`count`**   –   Number of occurrences of this specific structure.
      + **`predicates`**   –   List of predicates present in this specific structure.

## Sample property-graph (PG) summary response
<a name="neptune-graph-summary-sample-pg-response"></a>

Here is the detailed summary response for a property graph that contains the [sample property-graph air routes dataset](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/seed/queries/propertygraph/gremlin/airports):

```
{
  "status" : "200 OK",
  "payload" : {
    "version" : "v1",
    "lastStatisticsComputationTime" : "2023-03-01T14:35:03.804Z",
    "graphSummary" : {
      "numNodes" : 3748,
      "numEdges" : 51300,
      "numNodeLabels" : 4,
      "numEdgeLabels" : 2,
      "nodeLabels" : [
        "continent",
        "country",
        "version",
        "airport"
      ],
      "edgeLabels" : [
        "contains",
        "route"
      ],
      "numNodeProperties" : 14,
      "numEdgeProperties" : 1,
      "nodeProperties" : [
        {
          "desc" : 3748
        },
        {
          "code" : 3748
        },
        {
          "type" : 3748
        },
        {
          "country" : 3503
        },
        {
          "longest" : 3503
        },
        {
          "city" : 3503
        },
        {
          "lon" : 3503
        },
        {
          "elev" : 3503
        },
        {
          "icao" : 3503
        },
        {
          "region" : 3503
        },
        {
          "runways" : 3503
        },
        {
          "lat" : 3503
        },
        {
          "date" : 1
        },
        {
          "author" : 1
        }
      ],
      "edgeProperties" : [
        {
          "dist" : 50532
        }
      ],
      "totalNodePropertyValues" : 42773,
      "totalEdgePropertyValues" : 50532,
      "nodeStructures" : [
        {
          "count" : 3471,
          "nodeProperties" : [
            "city",
            "code",
            "country",
            "desc",
            "elev",
            "icao",
            "lat",
            "lon",
            "longest",
            "region",
            "runways",
            "type"
          ],
          "distinctOutgoingEdgeLabels" : [
            "route"
          ]
        },
        {
          "count" : 161,
          "nodeProperties" : [
            "code",
            "desc",
            "type"
          ],
          "distinctOutgoingEdgeLabels" : [
            "contains"
          ]
        },
        {
          "count" : 83,
          "nodeProperties" : [
            "code",
            "desc",
            "type"
          ],
          "distinctOutgoingEdgeLabels" : [ ]
        },
        {
          "count" : 32,
          "nodeProperties" : [
            "city",
            "code",
            "country",
            "desc",
            "elev",
            "icao",
            "lat",
            "lon",
            "longest",
            "region",
            "runways",
            "type"
          ],
          "distinctOutgoingEdgeLabels" : [ ]
        },
        {
          "count" : 1,
          "nodeProperties" : [
            "author",
            "code",
            "date",
            "desc",
            "type"
          ],
          "distinctOutgoingEdgeLabels" : [ ]
        }
      ],
      "edgeStructures" : [
        {
          "count" : 50532,
          "edgeProperties" : [
            "dist"
          ]
        }
      ]
    }
  }
}
```

## Sample RDF graph summary response
<a name="neptune-graph-summary-sample-rdf-response"></a>

Here is the detailed summary response for an RDF graph that contains the [sample RDF air routes dataset](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/seed/queries/rdf/sparql/airports):

```
{
  "status" : "200 OK",
  "payload" : {
    "version" : "v1",
    "lastStatisticsComputationTime" : "2023-03-01T14:54:13.903Z",
    "graphSummary" : {
      "numDistinctSubjects" : 54403,
      "numDistinctPredicates" : 19,
      "numQuads" : 158571,
      "numClasses" : 4,
      "classes" : [
        "http://kelvinlawrence.net/air-routes/class/Version",
        "http://kelvinlawrence.net/air-routes/class/Airport",
        "http://kelvinlawrence.net/air-routes/class/Continent",
        "http://kelvinlawrence.net/air-routes/class/Country"
      ],
      "predicates" : [
        {
          "http://kelvinlawrence.net/air-routes/objectProperty/route" : 50656
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/dist" : 50656
        },
        {
          "http://kelvinlawrence.net/air-routes/objectProperty/contains" : 7004
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/code" : 3747
        },
        {
          "http://www.w3.org/2000/01/rdf-schema#label" : 3747
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/type" : 3747
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/desc" : 3747
        },
        {
          "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" : 3747
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/icao" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/lat" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/region" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/runways" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/longest" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/elev" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/lon" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/country" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/city" : 3502
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/author" : 1
        },
        {
          "http://kelvinlawrence.net/air-routes/datatypeProperty/date" : 1
        }
      ],
      "subjectStructures" : [
        {
          "count" : 50656,
          "predicates" : [
            "http://kelvinlawrence.net/air-routes/datatypeProperty/dist"
          ]
        },
        {
          "count" : 3471,
          "predicates" : [
            "http://kelvinlawrence.net/air-routes/datatypeProperty/city",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/code",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/country",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/desc",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/elev",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/icao",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/lat",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/lon",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/longest",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/region",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/runways",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/type",
            "http://kelvinlawrence.net/air-routes/objectProperty/route",
            "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
            "http://www.w3.org/2000/01/rdf-schema#label"
          ]
        },
        {
          "count" : 238,
          "predicates" : [
            "http://kelvinlawrence.net/air-routes/datatypeProperty/code",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/desc",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/type",
            "http://kelvinlawrence.net/air-routes/objectProperty/contains",
            "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
            "http://www.w3.org/2000/01/rdf-schema#label"
          ]
        },
        {
          "count" : 31,
          "predicates" : [
            "http://kelvinlawrence.net/air-routes/datatypeProperty/city",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/code",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/country",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/desc",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/elev",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/icao",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/lat",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/lon",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/longest",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/region",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/runways",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/type",
            "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
            "http://www.w3.org/2000/01/rdf-schema#label"
          ]
        },
        {
          "count" : 6,
          "predicates" : [
            "http://kelvinlawrence.net/air-routes/datatypeProperty/code",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/desc",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/type",
            "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
            "http://www.w3.org/2000/01/rdf-schema#label"
          ]
        },
        {
          "count" : 1,
          "predicates" : [
            "http://kelvinlawrence.net/air-routes/datatypeProperty/author",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/code",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/date",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/desc",
            "http://kelvinlawrence.net/air-routes/datatypeProperty/type",
            "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
            "http://www.w3.org/2000/01/rdf-schema#label"
          ]
        }
      ]
    }
  }
}
```

## Using AWS Identity and Access Management (IAM) authentication with graph summary endpoints
<a name="neptune-graph-summary-iam"></a>

You can access graph summary endpoints securely with IAM authentication by using [awscurl](https://github.com/okigan/awscurl) or any other tool that works with HTTPS and IAM. See [Using `awscurl` with temporary credentials to securely connect to a DB cluster with IAM authentication enabled](iam-auth-connect-command-line.md#iam-auth-connect-awscurl) to see how to set up the proper credentials. Once you have done that, you can then make requests like this:

```
awscurl "$GRAPH_SUMMARY_ENDPOINT" \
    --region (your region) \
    --service neptune-db
```

**Important**  
The IAM identity or role that creates the temporary credentials must have an IAM policy attached that allows the [GetGraphSummary](iam-dp-actions.md#getgraphsummary) IAM action.

See [IAM Authentication Errors](errors-engine-codes.md#errors-iam-auth) for a list of common IAM errors that you may encounter.

## Common error codes that a graph summary request may return
<a name="neptune-graph-summary-errors"></a>

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/neptune/latest/userguide/neptune-graph-summary.html)

For example, if you make a request to graph summary endpoint in a Neptune database that has IAM authentication enabled, and the necessary permissions are not present in the requestor’s IAM policy, then you would get a response like the following:

```
{
  "detailedMessage": "User: arn:aws:iam::(account ID):(user or user name) is not authorized to perform: neptune-db:GetGraphSummary on resource: arn:aws:neptune-db:(region):(account ID):(cluster resource ID)/*",
  "requestId": "7ac2b98e-b626-d239-1d05-74b4c88fce82",
  "code": "AccessDeniedException"
}
```

# Amazon Neptune JDBC connectivity
<a name="neptune-jdbc"></a>

Amazon Neptune has released an [open-source JDBC driver](https://github.com/aws/amazon-neptune-jdbc-driver) that supports openCypher, Gremlin, SQL-Gremlin, and SPARQL queries. JDBC connectivity makes it easy to connect to Neptune with business intelligence (BI) tools such as Tableau. There is no additional cost to using the JDBC driver with Neptune — you still pay only for the Neptune resources that are consumed.

The driver is compatible with JDBC 4.2, and requires at least Java 8. See the [JDBC API documentation](https://docs.oracle.com/javase/8/docs/technotes/guides/jdbc/) for information about how to use a JDBC driver.

The GitHub project, where you can file issues and open feature requests, contains detailed documentation for the driver:

**[JDBC Driver for Amazon Neptune](https://github.com/aws/amazon-neptune-jdbc-driver#readme)**
+ [Using SQL with the JDBC driver](https://github.com/aws/amazon-neptune-jdbc-driver/blob/develop/markdown/sql.md)
+ [Using Gremlin with the JDBC Driver](https://github.com/aws/amazon-neptune-jdbc-driver/blob/develop/markdown/gremlin.md)
+ [Using openCypher with the JDBC Driver](https://github.com/aws/amazon-neptune-jdbc-driver/blob/develop/markdown/opencypher.md)
+ [Using SPARQL with the JDBC Driver](https://github.com/aws/amazon-neptune-jdbc-driver/blob/develop/markdown/sparql.md)

# Getting started with the Neptune JDBC driver
<a name="neptune-jdbc-getting-started"></a>

To use the Neptune JDBC driver to connect to a Neptune instance, either the JDBC driver must be deployed on an Amazon EC2 instance in the same VPC as your Neptune DB cluster, or the instance must be available through an SSH tunnel or load balancer. An SSH tunnel can be set up in the driver internally, or it can be set up externally.

You can download the driver [here](https://github.com/aws/amazon-neptune-jdbc-driver/releases). The driver comes packaged as a single JAR file with a name like `neptune-jdbc-1.0.0-all.jar`. To use it, place the JAR file in the `classpath` of your application. Or, if your application uses Maven or Gradle, you can use the appropriate Maven or Gradle commands to install the driver from the JAR.

The driver needs a JDBC connection URL to connect with Neptune, in a form like this:

```
jdbc:neptune:(connection type)://(host);property=value;property=value;...;property=value
```

The sections for each query language in the GitHub project describe the properties that you can set in the JDBC connection URL for that query language.

If the JAR file is in your application's `classpath`, no other configuration is necessary. You can connect the driver using the JDBC `DriverManager` interface and a Neptune connection string. For example, if your Neptune DB cluster is accessible through the endpoint `neptune-example.com` on port 8182, you would be able to connect with openCypher like this:

```
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;

void example() {
    String url = "jdbc:neptune:opencypher://bolt://neptune-example:8182";

    Connection connection = DriverManager.getConnection(url);
    Statement statement = connection.createStatement();

    connection.close();
}
```

The documentation sections in the GitHub project for each query language describe how to construct the connection string when using that query language.

# Using Tableau with the Neptune JDBC driver
<a name="neptune-jdbc-tableau"></a>

To use Tableau with the Neptune JDBC driver, start by downloading and installing the most recent version of [Tableau Desktop](https://www.tableau.com/products/desktop). Download the JAR file for the Neptune JDBC driver, and also the Neptune Tableau connector file (a `.taco` file).

**To connect to Tableau for Neptune on a Mac**

1. Place the Neptune JDBC driver JAR file in the `/Users/(your user name)/Library/Tableau/Drivers` folder.

1. Place the Neptune Tableau connector `.taco` file in the `/Users/(your user name)/Documents/My Tableau Repository/Connectors` folder.

1. If you have IAM authentication enabled, set up the environment for it. Note that environment variables set in `.zprofile/`, `.zshenv/`, `.bash_profile`, and so forth, will not work. The environment variables must be set so that they can be loaded by a GUI application.

   One way to set your credentials is by placing your access key and secret key in the `/Users/(your user name)/.aws/credentials` file.

   An easy way to set the service region is to open a terminal and enter the following command, using your application's region (for example, `us-east-1`):

   ```
   launchctl setenv SERVICE_REGION region name
   ```

   There are other ways to set environment variables that persist after a restart, but whatever technique you use must set variables that are accessible to a GUI application.

1. To get environment variables to load into a GUI on the Mac, enter this command on a terminal:

   ```
   /Applications/Tableau/Desktop/2021.1.app/Contents/MacOS/Tableau
   ```

**To connect to Tableau for Neptune on a Windows machine**

1. Place the Neptune JDBC driver JAR file in the `C:\Program Files\Tableau\Drivers` folder.

1. Place the Neptune Tableau connector `.taco` file in the `C:\Users\(your user name)\Documents\My Tableau Repository\Connectors` folder.

1. If you have IAM authentication enabled, set up the environment for it.

   This can be as simple as setting user `ACCESS_KEY`, `SECRET_KEY`, and `SERVICE_REGION` environment variables.

With Tableau open, select **More** on the left side of the window. If the Tableau connector file is properly located, you can select **Amazon Neptune by AWS** in the list that appears:

![\[Choosing SQL in Tableau\]](http://docs.aws.amazon.com/neptune/latest/userguide/images/tableau-sql-gremlin.png)


You should not have to edit the port, or add any connection options. Enter your Neptune endpoint and set your IAM and SSL configuration (you must enable SSL if you are using IAM).

When you select **Sign In**, it may take more than 30 seconds to connect if your graph is large. Tableau is collecting vertex and edge tables and join vertices on edges, as well as creating visualizations.

# Troubleshooting a JDBC driver connection
<a name="neptune-jdbc-troubleshooting"></a>

If the driver fails to connect to the server, use the `isValid` function of the JDBC `Connection` object to check whether the connection is valid. If the function returns `false`, meaning that the connection is invalid, check that the endpoint being connected to is correct and that you are in the VPC of your Neptune DB cluster or that you have a valid SSH tunnel to the cluster.

If you get a `No suitable driver found for (connection string)` response from the `DriverManager.getConnection` call, there is likely an issue at the beginning of your connection string. Make sure that your connection string starts like this:

```
jdbc:neptune:opencypher://...
```

To gather more information about the connection, you can add a `LogLevel` to your connection string like this:

```
jdbc:neptune:opencypher://(JDBC URL):(port);logLevel=trace
```

Alternatively, you can add `properties.put("logLevel", "trace")` in your input properties to log trace information.

# Amazon Neptune engine updates
<a name="features-engine-updates"></a>

Amazon Neptune releases engine updates regularly. You can determine which engine release version you currently have installed using the [instance-status API](access-graph-status.md). 

Engine releases are listed at [Engine releases for Amazon Neptune](engine-releases.md), and patches are listed at [Latest Updates](doc-history.md).

You can find more information about how the updates are released and how to upgrade the Neptune engine in your database at [Cluster maintenance](cluster-maintenance.md). For example, version numbering is explained in [Engine version numbers](cluster-maintenance.md#engine-version-numbers).

# Exception Handling and Retries
<a name="transactions-exceptions"></a>

Building robust applications on Neptune often means preparing for the unexpected, especially when it comes to handling errors returned by the database. One of the most common responses to server-side exceptions is to retry the failed operation. While retry logic is essential for resilient systems, you need to recognize that not all errors should be treated the same way. Rather than relying on generic retry behaviors, a thoughtful approach can help you build more reliable and efficient applications.

## Why retry logic matters
<a name="why-retry-logic-matters"></a>

Retry logic is a critical component of any distributed application. Transient issues such as network instability, temporary resource constraints, or concurrent modification conflicts can cause operations to fail. In many cases, these failures don't indicate a permanent problem and can be resolved by waiting and trying again. Implementing a solid retry strategy acknowledges the reality of imperfect environments in distributed systems, ensuring stronger reliability and continuity with less need for manual intervention.

## The risks of indiscriminate retries
<a name="risks-of-indiscriminate-retries"></a>

Retrying every error by default can lead to several unintended consequences:
+ **Increased contention** – When operations that fail due to high concurrency are retried repeatedly, the overall contention can get worse. This might result in a cycle of failed transactions and degraded performance.
+ **Resource exhaustion** – Indiscriminate retries can consume additional system resources, both on the client and server side. This can potentially lead to throttling or even service degradation.
+ **Increased latency for clients** – Excessive retries can cause significant delays for client applications, especially if each retry involves waiting periods. This can negatively impact user experience and downstream processes.

## Developing a practical retry strategy
<a name="developing-practical-retry-strategy"></a>

To build a resilient and efficient application, develop a retry strategy that's tailored to the specific error conditions your application might encounter. Here are some considerations to guide your approach:
+ **Identify retryable errors** – Not all exceptions should be retried. For example, syntax errors, authentication failures, or invalid queries should not trigger a retry. Neptune provides [error codes](https://docs.aws.amazon.com//neptune/latest/userguide/errors-engine-codes.html) and general recommendations for which errors are safe to retry, but you need to implement the logic that fits your use case.
+ **Implement exponential backoff** – For transient errors, use an exponential backoff strategy to progressively increase the wait time between retries. This helps alleviate contention and reduces the risk of cascading failures.
+ **Consider initial pause length** – Performing the first retry too quickly might just end with the same error if the server hasn't been given enough time to release resources that the query needs to succeed. A longer pause in the right situations could reduce wasted requests and server pressure.
+ **Add jitter to backoff** – While exponential backoff is effective, it can still lead to synchronized retry storms if many clients fail at the same time and then retry together. Adding jitter, a small random variation to the backoff delay, helps spread out retry attempts thereby reducing the chance of all clients retrying simultaneously and causing another spike in load.
+ **Limit retry attempts** – Set a reasonable maximum number of retries to prevent infinite loops and resource exhaustion.
+ **Monitor and adjust** – Continuously monitor your application's error rate and adjust your retry strategy as needed. If you notice a high number of retries for a particular operation, consider whether the operation can be optimized or serialized.

## Example scenarios
<a name="example-scenarios"></a>

The right retry strategy depends on the nature of the failure, the workload, and the error patterns you observe. The following table summarizes some common failure scenarios and how the retry strategy considerations apply to each. Explanatory paragraphs follow for additional context.


|  Scenario  |  Retryable?  |  Backoff & Jitter  |  Initial Pause  |  Retry Limit  |  Monitor & Adjust  | 
| --- | --- | --- | --- | --- | --- | 
|  Occasional CME on Short Queries  |  Yes  |  Short backoff, add jitter  |  Short (for example, 100ms)  |  High  |  Watch for rising CME Rates  | 
|  Frequent CME on Longer-Running Queries  |  Yes  |  Longer backoff, add jitter  |  Longer (for example, 2s)  |  Moderate  |  Investigate and reduce contention  | 
|  Memory Limits on Expensive Queries  |  Yes  |  Long backoff  |  Long (for example, 5-10s)  |  Low  |  Optimize query, alert if persistent  | 
|  Timeout on Moderate Queries  |  Maybe  |  Moderate backoff, add jitter  |  Moderate (for example, 1s)  |  Low to Moderate  |  Assess server load and query design  | 

### Scenario 1: Occasional CME on short queries
<a name="scenario-1-occasional-cme"></a>

For a workload where `ConcurrentModificationException` appears infrequently during short, simple updates, these errors are typically transient and safe to retry. Use a short initial pause (for example, 100 milliseconds) before the first retry. This time allows any brief lock to clear. Combine this with a short exponential backoff and jitter to avoid synchronized retries. Since the cost of retrying is low, a higher retry limit is reasonable. Still, monitor the CME rate to catch any trend toward increased contention in your data.

### Scenario 2: Frequent CME on long-running queries
<a name="scenario-2-frequent-cme"></a>

If your application sees frequent CMEs on long-running queries, this suggests more severe contention. In this case, start with a longer initial pause (for example, 2 seconds), to give the current query holding the lock enough time to complete. Use a longer exponential backoff and add jitter. Limit the number of retries to avoid excessive delays and resource usage. If contention persists, review your workload for patterns and consider serializing updates or reducing concurrency to address the root cause.

### Scenario 3: Memory limits on expensive queries
<a name="scenario-3-memory-limits"></a>

When memory-based errors occur during a known resource-intensive query, retries can make sense, but only after a long initial pause (for example, 5 to 10 seconds or more) to allow the server to release resources. Use a long backoff strategy and set a low retry limit, since repeated failures are unlikely to resolve without changes to the query or workload. Persistent errors should trigger alerts and prompt a review of query complexity and resource usage.

### Scenario 4: Timeout on moderate queries
<a name="scenario-4-timeout-moderate"></a>

A timeout on a moderately expensive query is a more ambiguous case. Sometimes, a retry might succeed if the timeout was due to a temporary spike in server load or network conditions. Start with a moderate initial pause (for example, 1 second) to give the system a chance to recover. Apply a moderate backoff and add jitter to avoid synchronized retries. Keep the retry limit low to moderate, since repeated timeouts might indicate a deeper issue with the query or the server's capacity. Monitor for patterns: if timeouts become frequent, assess whether the query needs optimization or if the Neptune cluster is under-provisioned.

## Monitoring and observability
<a name="monitoring-and-observability"></a>

Monitoring is a critical part of any retry strategy. Effective observability helps you understand how well your retry logic is working and provides early signals when something in your workload or cluster configuration needs attention.

### MainRequestQueuePendingRequests
<a name="main-request-queue"></a>

This CloudWatch metric tracks the number of requests waiting in Neptune's input queue. A rising value indicates that queries are backing up, which can be a sign of excessive contention, under-provisioned resources, or retry storms. Monitoring this metric helps you spot when your retry strategy is causing or compounding queuing issues, and can prompt you to adjust your approach before failures escalate.

### Other CloudWatch metrics
<a name="other-cloudwatch-metrics"></a>

Other [Neptune metrics](https://docs.aws.amazon.com/neptune/latest/userguide/cw-metrics.html) like `CPUUtilization`, `TotalRequestsPerSecond`, and query latency provide additional context. For example, high CPU and I/O combined with growing queue lengths might indicate that your cluster is overloaded or that queries are too large or too frequent. [CloudWatch alarms](https://docs.aws.amazon.com//AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) can be set on these metrics to alert you to abnormal behavior and help you correlate spikes in errors or retries with underlying resource constraints.

### Neptune Status and Query APIs
<a name="neptune-status-query-apis"></a>

The Neptune [Status API for Gremlin](https://docs.aws.amazon.com//neptune/latest/userguide/gremlin-api-status.html) and its analogous APIs for [OpenCypher](https://docs.aws.amazon.com//neptune/latest/userguide/access-graph-opencypher-status.html) and [SPARQL](https://docs.aws.amazon.com//neptune/latest/userguide/sparql-api-status.html) give a real-time view of the queries accepted and running on the cluster which is useful for diagnosing bottlenecks or understanding the impact of retry logic in real time.

By combining these monitoring tools, you can:
+ Detect when retries are contributing to queuing and performance degradation.
+ Identify when to scale your Neptune cluster or optimize queries.
+ Validate that your retry strategy is resolving transient failures without masking deeper issues.
+ Receive early warnings about emerging contention or resource exhaustion.

Proactive monitoring and alerting are essential for maintaining a healthy Neptune deployment, especially as your application's concurrency and complexity grow.