

# Migrating from Neo4j to Amazon Neptune
Migrating from Neo4j

Neo4j and Amazon Neptune are both graph databases, designed for online, transactional graph workloads that support the labeled property graph data model. These similarities make Neptune a common choice for customers seeking to migrate their current Neo4j applications. However, these migrations are not simply lift and shift, because there are differences in languages and feature support, operational characteristics, server architecture, and storage capabilities between the two databases.

This page frames the migration process and brings up things to consider before migrating a Neo4j graph application to Neptune. These considerations apply generally to any Neo4j graph application, whether powered by a Community, Enterprise, or Aura database. Although each solution is unique and may require additional procedures, all migrations follow the same general pattern.

Each of the steps described in the following sections includes considerations and recommendations to simplify the migration process. Additionally, there are [open-source tools, and blog posts](migration-resources.md) that describe the process, and a [feature compatibility section](migration-compatibility.md) with recommended architectural options.

**Topics**
+ [

# General information about migrating from Neo4j to Neptune
](migrating-from-neo4j-general.md)
+ [

# Preparing to migrate from Neo4j to Neptune
](preparing-to-migrate-from-neo4j.md)
+ [

# Provisioning infrastructure when migrating from Neo4j to Neptune
](migration-provisioning-infrastructure.md)
+ [

# Data migration from Neo4j to Neptune
](migration-data-migration.md)
+ [

# Application migration from Neo4j to Neptune
](migration-app-migration.md)
+ [

# Neptune compatibility with Neo4j
](migration-compatibility.md)
+ [

# Rewriting Cypher queries to run in openCypher on Neptune
](migration-opencypher-rewrites.md)
+ [

# Resources for migrating from Neo4j to Neptune
](migration-resources.md)

# General information about migrating from Neo4j to Neptune
General information

With Neptune [support for the openCypher query language](feature-opencypher-compliance.md), you can move most Neo4j workloads that use the Bolt protocol or HTTPS to Neptune. However, openCypher is an open-source specification that contains most but not all of the functionality supported by other databases such as Neo4j.

In spite of being compatible in many ways, Neptune is not a drop-in replacement for Neo4j. Neptune is a fully managed graph database service with enterprise features like high availability and high durability that is architecturally different from Neo4j. Neptune is instance-based, with a single primary writer instance and up to 15 read replica instances that let you scale read capacity horizontally. Using [Neptune Serverless](neptune-serverless.md), you can automatically scale your compute capacity up and down depending on query volume. This is independent of Neptune storage, which scales automatically as you add data.

Neptune supports the open-source [openCypher standard specification, version 9](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf). At AWS, we believe that open source is good for everyone and we are committed both to bringing the value of open source to our customers, and to bringing the operational excellence of AWS to open source communities.

However, many applications running on Neo4j also use proprietary features that are not open-sourced and that Neptune doesn't support. For example, Neptune doesn't support APOC procedures, some Cypher-specific clauses and functions, and `Char`, `Date`, or `Duration` data types. Neptune does auto-cast the missing data types to [data types that are supported](bulk-load-tutorial-format-opencypher.md#bulk-load-tutorial-format-opencypher-data-types).

In addition to openCypher, Neptune also supports the [Apache TinkerPop Gremlin](https://tinkerpop.apache.org/docs/current/reference/#traversal) query language for property graphs (as well as SPARQL for RDF data). Gremlin can interoperate with openCypher on the same property graph, and in many cases you can use Gremlin to supply functionality that openCypher does not provide. Below is a quick comparison of the two languages:


|  | openCypher | Gremlin | 
| --- | --- | --- | 
| Style | Declarative | Imperative | 
| Syntax |  Pattern matching <pre>Match p=(a)-[:route]->(d)<br />WHERE a.code='ANC'<br />RETURN p<br /></pre>  |  Traversal based <pre>g.V().has('code', 'ANC').<br />out('route').path().<br />by(elementMap())</pre>  | 
| Ease of use | SQL-inspired, readable by non-programmers | Steeper learning curve, similar to programming languages like Java | 
| Flexibility | Low | High | 
| Query support | String-based queries | String-based queries or in-line code supported by client libraries | 
| Clients | HTTPS and Bolt | HTTPS and Websockets | 

In general, it isn't necessary to change your data model to migrate from Neo4j to Neptune, because both Neo4j and Neptune support labeled property graph (LPG) data. However, Neptune has some architectural and data model differences that you can take advantage of to optimize performance. For example:
+ Neptune IDs are treated as first-class citizens.
+ Neptune uses [AWS Identity and Access Management (IAM) policies](iam-auth.md) to secure access to your graph data in flexible and granular ways.
+ Neptune provides several ways to [use Jupyter notebooks](graph-notebooks.md) to run queries and [visualize the results](notebooks-visualization.md). Neptune also works with [third-party visualization tools](visualization-tools.md).
+ >Although Neptune has no drop-in replacement for the Neo4j Graph Data Science (GDS) library, Neptune supports graph analytics today through a variety of solutions. For example, several [sample notebooks](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks/01-Neptune-Database/03-Sample-Applications/06-Data-Science-Samples) demonstrate how to leverage the Neptune [integration with the AWS Pandas SDK](https://github.com/amazon-archives/fully-automated-neo4j-to-neptune) within Python environments to run analytics on graph data.

Please reach out to AWS support or engage your AWS account team if you have questions. We use your feedback to prioritize new features that will meet your needs.

# Preparing to migrate from Neo4j to Neptune
Preparing for migration

 Migrating from the Neo4j graph database to the Neptune graph database service can be approached in one of two main ways: re-platforming or refactoring/re-architecting. The re-platforming approach involves modifying the existing data model and application architecture to best leverage Neptune's capabilities, while the refactoring approach focuses on finding equivalent components in Neptune to build a comparable implementation. In practice, a combination of these strategies is often used, as the migration process involves balancing the target Neptune architecture with constraints and requirements from the existing Neo4j implementation. Regardless of the approach, the key is to work backwards from the application's use cases to design the data model, queries, and overall architecture that best address your needs. 

## Approaches to migrating
Approaches

When migrating a Neo4j application to Neptune, we recommend one of two strategies: either re-platforming, or refactoring/re-architecting. For more information about migration strategies, see [6 Strategies for Migrating Applications to the Cloud](https://aws.amazon.com/blogs/enterprise-strategy/6-strategies-for-migrating-applications-to-the-cloud/), a blog post by Stephen Orban.

The *re-platforming approach*, sometimes called *lift-tinker-and-shift*, involves the following steps:
+ Identify the use cases your application is intended to satisfy.
+ Modify the existing graph data model and application architecture to best address these workload needs using Neptune's capabilities.
+ Determine how to migrate data, queries and other parts of the source application into the target model and architecture.

This working-backwards approach lets you migrate your application to the kind of Neptune solution you might design if this were a brand-new project.

The *refactoring approach*, by contrast, involves:
+ Identifying the components of the existing implementation, including infrastructure, data, queries, and application capabilities.
+ Finding equivalents in Neptune that can be used to build a comparable implementation.

This working-forwards approach seeks to swap one implementation for another.

In practice, you're likely to adopt a mix of these two approaches. You might start with a use case, design the target Neptune architecture, but then turn to the existing Neo4j implementation to identify constraints and invariants you'll have to maintain. For example, you might have to continue integrating with other external systems, or continue offering specific APIs to consumers of your graph application. With this information, you can determine what data already exists to move to your target model, and what must be sourced elsewhere.

At other points, you might start by analyzing a specific piece of your Neo4j implementation as the best source of information about the job your application is intended to do. That kind of archaeology in the existing application can help define a use case that you can then design towards using Neptune's capabilities.

Whether you're building a new application using Neptune or migrating an existing application from Neo4j, we recommend working backwards from use cases to design a data model, a set of queries, and an application architecture that address your business needs.

# Architectural differences between Neptune and Neo4j
Architectural differences

When customers first consider migrating an application from Neo4j to Neptune, it is often tempting to perform a like-to-like comparison based on instance size. However, the architectures of Neo4j and Neptune have fundamental differences. Neo4j is based on an all-in-one approach where data loading, data ETL, application queries, data storage, and management operations all happen in the same set of compute resources, such as EC2 instances.

Neptune, by contrast, is an OLTP focused graph database where the architecture separates responsibilities and where resources are decoupled so they can scale dynamically and independently.

When migrating from Neo4j to Neptune, determine the data durability, availability and scalability requirements of your application. Neptune's cluster architecture simplifies the design of applications that require high durability, availability and scalability. With an understanding of Neptune's cluster architecture, you can then design a Neptune cluster topology to satisfy these requirements.

## Neo4j's cluster architecture
Neo4j clusters

Many production applications use Neo4j's [causal clustering](https://neo4j.com/docs/operations-manual/current/clustering/introduction/) to provide data durability, high availability and scalability. Neo4j's clustering architecture uses core-server and read-replica instances:
+ Core servers provide for data durability and fault tolerance by replicating data using the Raft protocol.
+ Read replicas use transaction log shipping to asynchronously replicate data for high read throughput workloads.

Every instance in a cluster, whether core server or read replica, contains a full copy of the graph data.

## Neptune's cluster architecture
Neptune clusters

[A Neptune cluster](feature-overview-db-clusters.md) is made up of a primary writer instance and up to 15 read replica instances. All the instances in the cluster share the same underlying distributed storage service that is separate from the instances.
+ The primary writer instance coordinates all write operations to the database and is vertically scalable to provide flexible support for different write workloads. It also supports read operations.
+ Read replica instances support read operations from the underlying storage volume, and allow you to scale horizontally to support high read workloads. They also provide for high availability by serving as failover targets for the primary instance.
**Note**  
For heavy write workloads, it is best to scale the read replica instances to the same size as the writer instance, to ensure that the readers can stay consistent with the data changes.
+ The underlying storage volume scales storage capacity automatically as the data in your database increases, up to 128 tebibytes (TiB) of storage.

Instance sizes are dynamic and independent. Each instance can be resized while the cluster is running, and read replicas can be added or removed while the cluster is running.

The [Neptune Serverless](neptune-serverless.md) feature can scale your compute capacity up and down automatically as demand rises and falls. Not only can this decrease your administrative overhead, it also lets you configure the database to handle large spikes in demand without degrading performance or requiring you to over-provision.

You can stop a Neptune cluster for up to 7 days.

Neptune also supports [auto-scaling](manage-console-autoscaling.md), to adjust the reader instance sizes automatically based on workload.

Using Neptune's [global database feature](neptune-global-database.md), you can mirror a cluster in up to 5 other regions.

Neptune is also [fault tolerant by design](backup-restore-overview-fault-tolerance.md):
+ The cluster volume that provides data storage to all the instances in the cluster spans multiple Availability Zones (AZs) in a single AWS Region. Each AZ contains a full copy of the cluster data.
+ If the primary instance becomes unavailable, Neptune automatically fails over to an existing read replica with zero data loss, typically in under 30 seconds. If there are no existing read replicas in the cluster, Neptune automatically provisions a new primary instance – again, with zero data loss.

What all this means is that when migrating from a Neo4j causal cluster to Neptune, you don't have to architect the cluster topology explicitly for high data durability and high availability. This leaves you to size your cluster for expected read and write workloads, and any increased availability requirements you may have, in just a few ways:
+ To scale read operations, [add read replica instances](feature-overview-db-clusters.md#feature-overview-read-replicas) or enable [Neptune Serverless](neptune-serverless.md) functionality.
+ To improve availability, distribute the primary instance and read replicas in your cluster over multiple Availability Zones (AZs).
+ To reduce any failover time, provision at least one read replica instance that can serve as a failover target for the primary. You can determine the order in which read replica instances are promoted to primary after a failure by [﻿assigning each replica a priority﻿](manage-console-add-replicas.md). It’s a best practice to ensure that a failover target has an instance class capable of handling your application’s write workload if promoted to primary.

# Data storage differences between Neptune and Neo4j
Data storage differences

Neptune uses a [graph data model](feature-overview-data-model.md) based on a native quad model. When migrating your data to Neptune, there are several differences in the architecture of the data model and storage layer that you should be aware of to make optimal use of the distributed and scalable shared storage that Neptune provides:
+ Neptune doesn't use any explicitly defined schema or constraints. It lets you add nodes, edges, and properties dynamically without having to define the schema ahead of time. Neptune doesn't limit the values and types of data stored, except as noted in [Neptune limits](limits.md#limits-properties). As part of Neptune's storage architecture, data is also [automatically indexed](feature-overview-storage-indexing.md) in a way that handles many of the most common access patterns. This storage architecture removes the operational overhead of creation and management of database schema and index optimization.
+ Neptune provides a unique distributed and shared storage architecture that automatically scales in 10 GB chunks as the storage needs of your database grow, up to 128 tebibytes (TiB). This storage layer is reliable, durable, and fault-tolerant, with data copied 6 times, twice in each of 3 Availability Zones. It provides all Neptune clusters with a highly available and fault-tolerant data storage layer by default. Neptune's storage architecture reduces costs and removes the need to provision or over-provision storage to handle future data growth.

Before migrating your data to Neptune, it's good to familiarize yourself with Neptune's [property graph data model](feature-overview-storage-indexing.md#feature-overview-storage-indexing-gremlin) and [transaction semantics](transactions.md). 

# Operational differences between Neptune and Neo4j
Operational differences

Neptune is a fully managed service that automates many of the normal operational tasks you would have to do when using on-premise or self-managed databases such as Neo4j Enterprise or Community Edition:
+ **[Automated backups](backup-restore.md#backup-restore-overview-backups)**   –   Neptune backs up your cluster volume automatically and retains the backup for a retention period that you specify (from 1 to 35 days). These backups are continuous and incremental, so you can quickly restore to any point within the retention period. No performance impact or interruption of database service occurs as backup data is being written.
+ **[Manual Snapshots](backup-restore.md)**   –   Neptune lets you make a storage-volume snapshot of your DB cluster to back up the entire DB cluster. This kind of snapshot can then be used to restore the database, make a copy of it, and share it across accounts.
+ **[Cloning](manage-console-cloning.md)**   –   Neptune supports a cloning feature that lets you create cost-effective clones of a database quickly. The clones use a copy-on-write protocol to require only minimal additional space after they are created. Database cloning is an effective way to try out new Neptune features or upgrades with no disruption to the originating cluster.
+ **[Monitoring](monitoring.md)**   –   Neptune provides various methods to monitor the performance and usage of your cluster, including:
  + Instance status
  + Integration with Amazon CloudWatch and AWS CloudTrail
  + Audit log capabilities
  + Event notifications
  + Tagging
+ **[Security](security.md)**   –   Neptune provides a secure environment by default. A cluster resides within a private VPC that provides network isolation from other resources. All traffic is encrypted via SSL, and all data is encrypted at rest using AWS KMS.

  In addition, Neptune integrates with AWS Identity and Access Management (IAM) to provide [authentication](iam-auth.md). By specifying [IAM condition keys](iam-condition-keys.md), you can use IAM policies to provide fine-grained access control over [data actions](iam-data-access-policies.md).

## Tooling and integration differences between Neptune and Neo4j
Tooling and integration differences

Neptune has a different architecture for integrations and tools than Neo4j, which may impact the architecture of your application. Neptune uses the compute resources of the cluster to process queries, but leverages other best-in-class AWS services for functionality like full-text search (using OpenSearch), ETL ( using Glue), and so forth. For a full listing of these integrations, see [Neptune integrations](integrations.md).

# Provisioning infrastructure when migrating from Neo4j to Neptune
Provisioning infrastructure

Amazon Neptune clusters are built to scale in three dimensions: storage, write capacity, and read capacity. The sections below discuss specific options to consider when migrating.

## Provisioning storage
Storage

The storage for any Neptune cluster is automatically provisioned, without any administrative overhead on your part. It resizes dynamically in 10 GB chunks as the storage needs of the cluster increase. As a result, there is no need to estimate and provision or over-provision storage to handle future data growth.

## Provisioning write capacity
Write capacity

Neptune provides a single writer instance that can be scaled vertically to any instance size available on the [Neptune pricing page](https://aws.amazon.com/neptune/pricing/). When reading and writing data to a writer instance, all transactions are ACID compliant, with data isolation as defined in [Transaction Isolation Levels in Neptune](transactions-neptune.md).

Choosing an optimal size for a writer instance requires running load tests to determine the optimal instance size for your workload. Any instance within Neptune can be resized at any time by [modifying the DB instance class](manage-console-instances-modify.md). You can estimate a starting instance size based on concurrency and average query latency as described below in [Estimating optimal instance size when provisioning your cluster](#migration-provisioning-instance-sizing).

## Provisioning read capacity
Read capacity

Neptune is built to scale read-replica instances both horizontally, by adding up to 15 of them within a cluster (or more in a [Neptune global database](neptune-global-database.md)), and vertically to any instance size available on the [Neptune pricing page](https://aws.amazon.com/neptune/pricing/). All Neptune read-replica instances uses the same underlying storage volume, enabling transparent replication of data with minimal lag.

In addition to enabling horizontal scaling of read requests within a Neptune cluster, read replicas also act as failover targets for the writer instance to enable high availability. See [Amazon Neptune basic operational guidelines](best-practices-general-basic.md) for suggestions about how to determine the appropriate number and placement of read replicas in your cluster.

For applications where connectivity and workload are unpredictable, Neptune also supports [an auto-scaling feature](manage-console-autoscaling.md) that can automatically adjust the number of Neptune replicas based on criteria that you specify.

To determine an optimal size and number of read replica instances requires running load tests to determine the characteristics of the read workload they must support. Any instance within Neptune can be resized at any time by [modifying the DB instance class](manage-console-instances-modify.md). You can estimate a starting instance size based on concurrency and average query latency, as described in the [next section](#migration-provisioning-instance-sizing).

## Use Neptune Serverless to scale reader and writer instances automatically as needed
Using Neptune Serverless

While it is often helpful to be able to estimate the compute capacity that your anticipated workloads will require, you can configure the [Neptune Serverless](neptune-serverless.md) feature to scale read and write capacity up and down automatically. This can help you meet peak requirements while also scaling back automatically when demand decreases.

## Estimating optimal instance size when provisioning your cluster
Optimal instance size

Estimation of optimal instance size requires knowing the average query latency in Neptune, when your workload is running, as well as the number of concurrent queries being processed. A rough estimate of instance size can be calculated as the average query latency multiplied by the number of concurrent queries. This gives you the average number of concurrent threads needed to handle the workload.

Each vCPU in a Neptune instance can support two concurrent query threads, so dividing the threads by 2 provides the number of vCPUs required, which can then be correlated to the appropriate instance size on the [Neptune pricing page](https://aws.amazon.com/neptune/pricing/). For example:

```
Average Query Latency:         30ms (0.03s)
Number of concurrent queries:  1000/second

Number of threads needed:      0.03 x 1000 = 30 threads
Number of vCPUs needed:        30 / 2 = 15 vCPUs
```

Correlating this to the number of vCPUs in an instance, we see that we get a rough estimate that a `r5.4xlarge` would be the recommended instance to try for this workload. This estimate is rough and only meant to provide initial guidance on instance-size selection. Any application should go through a right-sizing exercise to determine the appropriate number and type(s) of instances that are appropriate for the workload.

Memory requirements should also be taken into account, as well as processing requirements. Neptune is most performant when the data being accessed by queries is available in the main-memory buffer pool cache. Provisioning sufficient memory can also reduce I/O costs significantly.

Additional details and guidance on sizing of instances in a Neptune cluster can be found on the [Sizing DB instances in a Neptune DB cluster](feature-overview-db-clusters.md#feature-overview-sizing-instances) page.

# Data migration from Neo4j to Neptune
Data migration

When performing a migration from Neo4j to Amazon Neptune, migrating the data is a major step in the process. There are multiple approaches to migrating data. The correct approach is determined by the needs of the application, the data size, and the type of migration desired. However, many of these migrations all require assessment of the same considerations, of which several are highlighted below.

**Note**  
See the [Migrating a Neo4j graph database to Neptune with a fully automated utility](https://aws.amazon.com/blogs/database/migrating-a-neo4j-graph-database-to-amazon-neptune-with-a-fully-automated-utility/) in the [AWS Database Blog](https://aws.amazon.com/blogs/?awsf.blog-master-category=category%23database) for a complete, step-by-step walkthrough of one example of how to perform an offline data migration.

## Assessing data migration from Neo4j to Neptune
Assessing data migration

The first step when assessing any data migration is to determine how you will migrate the data. The options depend on the architecture of the application being migrated, the data size, and the availability needs during the migration. In general, migrations tend to fall into one of two categories: online or offline.

Offline migrations tend to be the simplest to accomplish, because the application doesn't accept read or write traffic during the migration. After the application stops accepting traffic, the data can be exported, optimized, imported, and the application tested before the application is re-enabled.

Online migrations are more complex, because the application still needs to accept read and write traffic while the data is being migrated. The exact needs of each online migration may differ, but the general architecture would generally be similar to the following:
+ A feed of ongoing changes to the database needs to be enabled in Neo4j by configuring [Neo4j Streams as a source to a Kafka cluster](https://neo4j.com/labs/kafka/4.0/producer/).
+ Once this is completed, an export of the running system can be taken, following the instructions in [Exporting data from Neo4j when migrating to Neptune](#migration-data-exporting), and the time noted for later correlation to the Kafka topic.
+ The exported data is then imported into Neptune, following instructions in [Importing data from Neo4j when migrating to Neptune](#migration-data-importing).
+ Changed data from the Kafka stream can then be copied to the Neptune cluster using an architecture similar to the one described in [Writing to Amazon Neptune from Amazon Kinesis Data Streams](https://github.com/aws-samples/amazon-neptune-samples/tree/master/gremlin/stream-2-neptune). Note that the changes replication can be run in parallel to validate the new application architecture and performance.
+ After the data migration is validated, then the application traffic can be redirected to the Neptune cluster and the Neo4j instance can be decommissioned.

## Data-model optimizations for migrating from Neo4j to Neptune
Data-model optimizations

Both Neptune and Neo4j support labeled property graphs (LPG). However, Neptune has some architectural and data-model differences that you can take advantage of to optimize performance:

### Optimizing node and edge IDs
Node and edge ID optimization

Neo4j automatically generates numeric long IDs. Using Cypher you can refer to nodes by ID, but this is generally discouraged in favor of looking up nodes by an indexed property.

Neptune allows you to [supply your own string-based IDs for vertices and edges](access-graph-gremlin-differences.md#feature-gremlin-differences-user-supplied-ids). If you don't supply your own IDs, Neptune automatically generates string representations of UUIDs for new edges and vertices.

If you migrate data from Neo4j to Neptune by exporting from Neo4j and then bulk importing into Neptune, you can preserve Neo4j's IDs. The numeric values generated by Neo4j can act as user-supplied IDs when importing into Neptune, where they are represented as strings rather than numeric values.

However, there are circumstances in which you may want to promote a vertex property to become a vertex ID. Just as looking up a node using an indexed property is the fastest way to find a node in Neo4j, looking up a vertex by ID is the fastest way to find a vertex in Neptune. Therefore, if you can identify a suitable vertex property that contains unique values, you should consider replacing the vertex \$1id with the nominated property value in your bulk load CSV files. If you do this, you will also have to rewrite any corresponding \$1from and \$1to edge values in your CSV files.

### Schema constraints when migrating data from Neo4j to Neptune
Schema constraints

Within Neptune, the only schema constraint available is the uniqueness of the ID of a node or edge. Applications that need to leverage a uniqueness constraint are encouraged to look at this approach for achieving a uniqueness constraint through specifying the node or edge ID. If the application used multiple columns as a uniqueness constraint, the ID may be set to a combination of these values. For instance, `id=123, code='SEA'` could be represented as `ID='123_SEA'`) to achieve a complex uniqueness constraint.

### Edge direction optimization when migrating data from Neo4j to Neptune
Edge direction

When nodes, edges, or properties are added to Neptune,they are automatically [indexed in three different ways](feature-overview-storage-indexing.md), with an [optional fourth index](features-lab-mode.md#features-lab-mode-features-osgp-index). Because of how Neptune builds and [uses the indices](feature-overview-storage-indexing.md), queries that follow outgoing edges are more efficient than ones that use incoming edges. In terms of Neptune's [graph data storage model](feature-overview-data-model.md), these are subject-based searches that use the SPOG index.

If, in migrating your data model and queries to Neptune, you find that your most important queries rely on traversing incoming edges where there is a high degree of fan out, you may want to consider altering your model so that these traversals follow outgoing edges instead, especially when you cannot specify which edge labels to traverse. To do so, reverse the direction of the relevant edges and update the edge labels to reflect the semantics of this direction change. For example, you might change:

```
person_A — parent_of — person_B
   to:
person_B — child_of — person_A
```

To make this change in a [bulk-load edge CSV file](bulk-load-tutorial-format.md), simply swap the `~from` and `~to` column headings, and update the values of the `~label` column.

As an alternative to reversing edge direction, you can enable a [fourth Neptune index, the OSGP index](feature-overview-storage-indexing.md#feature-overview-storage-indexing-osgp), which makes traversing incoming edges, or object-based searches, much more efficient. However, enabling this fourth index will lower insert rates and require more storage.

### Filtering optimization when migrating data from Neo4j to Neptune
Filtering optimization

Neptune is optimized to work best when properties are filtered to the most selective property available. When multiple filters are used, the set of matching items is found for each and then the overlap of all matching sets is calculated. When possible, combining multiple properties into a single property minimizes the number of index lookups and decreases the latency of a query.

For example, this query uses two index look-ups and a join:

```
MATCH (n) WHERE n.first_name='John' AND n.last_name='Doe' RETURN n
```

This query retrieves the same information using a single index look-up:

```
MATCH (n) WHERE n.name='John Doe' RETURN n
```

### 


Neptune supports [different data types](bulk-load-tutorial-format-opencypher.md#bulk-load-tutorial-format-opencypher-data-types) than Neo4j does.

**Neo4j data-type mappings into data types that Neptune supports**
+ **Logical**:   `Boolean`

  Map this in Neptune to `Bool` or `Boolean`.
+ **Numeric**:   `Number`

  Map this in Neptune to the narrowest of the following Neptune openCypher types that can support all values of the numeric property in question:

  ```
    Byte
    Short
    Integer
    Long
    Float
    Double
  ```
+ **Text**:   `String`

  Map this in Neptune to `String`.
+ **Point in time**:

  ```
    Date
    Time
    LocalTime
    DateTime
    LocalDateTime
  ```

  Map these in Neptune to `Date` as UTC, using one of the following ISO-8601 formats that Neptune supports:

  ```
    yyyy-MM-dd
    yyyy-MM-ddTHH:mm
    yyyy-MM-ddTHH:mm:ss
    yyyy-MM-ddTHH:mm:ssZ
  ```
+ **Time duration**:   `Duration`

  Map this in Neptune to a numeric value for date arithmetic, if necessary.
+ **Spatial**:   `Point`

  Map this in Neptune into component numeric values, each of which then becomes a separate property, or express as a String value to be interpreted by the client application. Note that Neptune's [full-text search](full-text-search.md) integration using OpenSearch lets you index geolocation properties.

### Migrating multivalued properties from Neo4j to Neptune
Multivalued properties

Neo4j allows [homogeneous lists of simple types](https://neo4j.com/docs/cypher-manual/current/values-and-types/) to be stored as properties of both nodes and edges. These lists can contain duplicate values.

Neptune, however, allows only [set or single cardinality](access-graph-gremlin-differences.md#feature-gremlin-differences-vertex-property-cardinality) for vertex properties, and single cardinality for edge properties in property graph data. As a result, there is no straightforward migration of Neo4j node list properties that contain duplicate values into Neptune vertex properties, or of Neo4j relationship-list properties into Neptune edge properties.

Some possible strategies for migrating Neo4j multivalued node properties with duplicate values into Neptune are as follows:
+ Discard the duplicate values and convert the multivalued Neo4j node property to a set cardinality Neptune vertex property. Note that the Neptune set may not then reflect the order of items in the original Neo4j multivalued property.
+ Convert the multivalued Neo4j node property to a string representation of a JSON-formatted list in a Neptune vertex string property.
+ Extract each of the multivalued property values into a separate vertex with a value property, and connect those vertices to the parent vertex using an edge labelled with the property name.

Similarly, possible strategies for migrating Neo4j multivalued relationship properties into Neptune are as follows:
+ Convert the multivalued Neo4j relationship property to a string representation of a JSON-formatted list and store it as a Neptune edge string property.
+ Refactor the Neo4j relationship into incoming and outgoing Neptune edges attached to an intermediate vertex. Extract each of the multivalued relationship property values into a separate vertex with a value property and those vertices to this intermediate vertex using an edge labelled with the property name.

Note that a string representation of a JSON-formatted list is opaque to the openCypher query language, although openCypher includes a `CONTAINS` predicate that allows for simple searches inside string values.

## Exporting data from Neo4j when migrating to Neptune
Exporting data

When exporting data from Neo4j, use the APOC procedures to export either to [CSV](https://neo4j.com/labs/apoc/4.1/export/csv/) or to [GraphML](https://neo4j.com/labs/apoc/4.1/export/graphml/). Although it's possible to export to other formats, there are [open-source tools](https://github.com/awslabs/amazon-neptune-tools/tree/master/neo4j-to-neptune) for converting CSV data exported from Neo4j to Neptune bulk-load format, and also [open-source tools](https://github.com/awslabs/amazon-neptune-tools/tree/master/graphml2csv) for converting GraphML data exported from Neo4j to Neptune bulk-load format.

You can also export data directly into Amazon S3 using the various APOC procedures. Exporting to an Amazon S3 bucket is disabled by default, but it can be enabled using the procedures highlighted in [Exporting to Amazon S3](https://neo4j.com/labs/apoc/4.3/export/csv/#export-csv-s3-export) in the Neo4j APOC documentation.

## Importing data from Neo4j when migrating to Neptune
Importing data

You can import data into Neptune either by using the [Neptune bulk loader](bulk-load.md) or by using application logic in a supported query language such as [openCypher](access-graph-opencypher.md).

The Neptune bulk loader is the preferred approach to importing large amounts of data because it provides optimized import performance if you follow [best practices](bulk-load-optimize.md). The bulk loader supports [two different CSV formats](bulk-load-tutorial-format.md), to which data exported from Neo4j can be converted using the the open-source utilities mentioned above in the [Exporting data](#migration-data-exporting) section.

You can also use openCypher to import data with custom logic for parsing, transforming, and importing. You can submit the openCypher queries either through the [HTTPS endpoint](access-graph-opencypher-queries.md) (which is recommended) or by using the [bolt driver](access-graph-opencypher-bolt.md).

# Application migration from Neo4j to Neptune
Application migration

After you have migrated your data from Neo4j to Neptune, the next step is to migrate the application itself. As with data, there are multiple approaches to migrating your application based on the tools your use, requirements, architectural differences, and so on. Things that you usually need to consider in this process are outlined below.

## Migrating connections when moving from Neo4j to Neptune
Migrating connections

If you don't currently use the Bolt drivers, or would like to use an alternative, you can connect to the [HTTPS endpoint](access-graph-opencypher-queries.md) which provides full access to the data returned. 

If you do have an application that uses the [Bolt protocol](access-graph-opencypher-bolt.md), you can migrate these connections to Neptune and let your applications connect using the same drivers as you did in Neo4j. To connect to Neptune, you may need to make one or more of the following changes to your application:
+ The URL and port will need to be updated to use the cluster endpoints and cluster port (the default is 8182).
+ Neptune requires all connections to use SSL, so you need to specify for each connection that it is encrypted.
+ Neptune manages authentication through the assignment of [IAM policies and roles](iam-auth.md). IAM policies and roles provide an extremely flexible level of user management within the application, so it is important to read and understand the information in the [IAM overview](iam-auth.md) before configuring your cluster.
+ Bolt connections behave differently in Neptune than in Neo4j in several ways, as explained in [Bolt connection behavior in Neptune](access-graph-opencypher-bolt.md#access-graph-opencypher-bolt-connections).
+ You can find more information and suggestions in [Neptune Best Practices Using openCypher and Bolt](best-practices-opencypher.md).

There are code samples for commonly used language such as Java, Python, .NET, and NodeJS, and for connection scenarios such as using IAM authentication, in [Using the Bolt protocol to make openCypher queries to Neptune](access-graph-opencypher-bolt.md).

## Routing queries to cluster instances when moving from Neo4j to Neptune
Routing queries

Neo4j client applications use a [routing driver](https://neo4j.com/docs/driver-manual/1.7/client-applications/#routing_drivers_bolt_routing) and specify an [access mode](https://neo4j.com/docs/driver-manual/1.7/sessions-transactions/#driver-transactions-access-mode) to route read and write requests to an appropriate server in a causal cluster.

When migrating a client application to Neptune, use [Neptune endpoints](feature-overview-endpoints.md) to route queries efficiently to an appropriate instance in your cluster:
+ All connections to Neptune should use `bolt://` rather than `bolt+routing://` or `neo4j://` in the URL.
+ The cluster endpoint connects to the current primary instance in your cluster. Use the cluster endpoint to route write requests to the primary.
+ The reader endpoint [distributes connections](best-practices-general-basic.md#best-practices-general-loadbalance) across read-replica instances in your cluster. If you have a single-instance cluster with no read-replica instances, the reader endpoint connects to the primary instance, which supports write operations. If the cluster does contain one or more read-replica instances, sending a write request to the reader endpoint generates an exception.
+ Each instance in your cluster can also have its own instance endpoint. Use an instance endpoint if your client application needs to send a request to a specific instance in the cluster.

For more information, see [Neptune endpoint considerations](feature-overview-endpoints.md#feature-overview-endpoint-considerations).

## Data consistency in Neptune
Data consistency

When using Neo4j causal clusters, read replicas are eventually consistent with core servers, but client applications can ensure causal consistency by using [causal chaining](https://neo4j.com/docs/driver-manual/1.7/sessions-transactions/#driver-transactions-causal-chaining). Causal chaining entails passing bookmarks between transactions, which allows a client application to write to a core server and then read its own write from a read-replica.

In Neptune, read-replica instances are eventually consistent with the writer, with replica lag that is usually less than 100 milliseconds. However, until a change has been replicated, updates to existing edges and vertices and additions of new edges and vertices are not visible on a replica instance. Therefore, if your application needs immediate consistency on Neptune by reading each write, use the cluster endpoint for the read-after-write operation. This is the only time to use the cluster endpoint for read operations. In all other circumstances, use the reader endpoint for reads.

## Migrating queries from Neo4j to Neptune
Migrating queries

Although Neptune's [support for openCypher](https://aws.amazon.com/blogs/database/announcing-the-general-availability-of-opencypher-support-for-amazon-neptune/) dramatically reduces the amount of work required to migrate queries from Neo4j, there are still some differences to assess when migrating:
+ As discussed in [Data-model optimizations](migration-data-migration.md#migration-data-model-optimization) above, there may be modifications to your data model that you need to make so as to create an optimized graph data model for Neptune, which in turn will require changes to your queries and testing.
+ Neo4j offers a variety of Cypher-specific language extensions that are not included in the openCypher specification implemented by Neptune. Depending on the use case and feature used, there may be workarounds within the openCypher language, or using the Gremlin language, or through other mechanisms as described in [Rewriting Cypher queries to run in openCypher on Neptune](migration-opencypher-rewrites.md).
+ Applications often use other middleware components to interact with the database instead of the Bolt drivers themselves. Please check [Neptune compatibility with Neo4j](migration-compatibility.md) to see if tools or middleware that you're using are supported.
+ In the case of a failover, the Bolt driver might continue to connect to the previous writer or reader instance because the cluster endpoint provided to the connection has resolved to an IP address. Proper error handling in your application should handle this, as described in [Create a new connection after failover](best-practices-opencypher.md#best-practices-opencypher-renew-connection).
+ When transactions are canceled because of unresolvable conflicts or lock-wait timeouts, Neptune responds with a `ConcurrentModificationException`. For more information, see [Engine Error Codes](errors-engine-codes.md). As a best practice, clients should always catch and handle these exceptions.

  A `ConcurrentModificationException` occurs occasionally when multiple threads or multiple applications are writing to the system simultaneously. Because of [transaction isolation levels](transactions-neptune.md#transactions-neptune-mutation), these conflicts may sometimes be unavoidable.
+ Neptune supports running both Gremlin and openCypher queries on the same data. This means that in some scenarios you may need to consider using Gremlin, with its more powerful querying capabilities, to perform some of the functionality of your queries.

As discussed in [Provisioning infrastructure](migration-provisioning-infrastructure.md) above, each application should go through a right-sizing exercise to ensure that the number of instances, the instance sizes, and the cluster topology are all optimized for the specific workload of the application.

The considerations discussed here for migrating your application are the most common ones, but this is not an exhaustive list. Each application is unique. Please reach out to AWS support or engage your account team if you have further questions.

## Migrating features and tools that are specific to Neo4j
Neo4j-specific features

Neo4j has a variety of custom features and add-ons with funtionality that your application may rely on. When evaluating the need to migrate this functionality, it often helps to investigate whether there is a better approach within AWS to achieve the same goal. Considering the [architectural differences between Neo4j and Neptune](migration-architectural-differences.md), you can often find effective alternatives that take advantage of other AWS services or [integrations](integrations.md).

See [Neptune compatibility with Neo4j](migration-compatibility.md) for a list of Neo4j-specific features and suggested workarounds.

# Neptune compatibility with Neo4j
Neptune compatibility

Neo4j's has an all-in-one architectural approach, where data loading, data ETL, application queries, data storage, and management operations all occur in the same set of compute resources, such as EC2 instances. Amazon Neptune is an OLTP-focused open-specifications graph database where the architecture separates operations and decouples resources so they can scale dynamically.

There are a variety of features and tooling in Neo4j, including third-party tooling, that are not part of the openCypher specification, are incompatible with openCypher, or are incompatible with Neptune's implementation of openCypher. Below are listed some of the most common of these.

## Neo4j-specific features not present in Neptune
Neo4j-specific features
+ **`LOAD CSV`**   –   Neptune has a different architectural approach to loading data than Neo4j. To allow for better scaling and cost optimization, Neptune implements a separation of concerns around resources, and recommends using one of the [AWS service integrations](integrations.md) such as AWS Glue to perform the required ETL processes to prepare data in a [format](bulk-load-tutorial-format.md) supported by the [Neptune bulk loader](bulk-load.md).

  Another option is to do the same thing using application code running on AWS compute resources such as Amazon EC2 instances, Lambda functions, Amazon Elastic Container Service, AWS Batch jobs, and so on. The code could use either Neptune's [HTTPS endpoint](access-graph-opencypher-queries.md) or [Bolt endpoint](access-graph-opencypher-bolt.md).
+ **Fine-grained access control**   –   Neptune supports granular access control over data-access actions [using IAM condition keys](iam-data-access-policies.md). Additional fine-grained access control can be implemented at the application layer.
+ **Neo4j Fabric**   –   Neptune does support query federation across databases for RDF workloads using the SPARQL [`SERVICE`](sparql-service.md) keyword. Because there is not currently an open standard or specification for query federation for property graph workloads, that functionality would need to be implemented at the application layer.
+ **Role-based access control (RBAC)**   –   Neptune manages authentication through the assignment of [IAM policies and roles](iam-auth.md). IAM policies and roles provide an extremely flexible level of user management within an application, so it is worth reading and understanding the information in the [IAM overview](iam-auth.md) before configuring your cluster.

  
+ **Bookmarking**   –   Neptune clusters consist of a single writer instance and up to 15 read-replica instances. Data written to the writer instance is ACID compliant and provides a strong consistency guarantee on subsequent reads. Read-replicas use the same storage volume as the writer instance and are eventually consistent, usually in less than 100ms from the time data is written. If your use case has an immediate need to guarantee read consistency of new writes, these reads should be directed to the cluster endpoint instead of the reader endpoint.
+ **APOC procedures**   –   Because APOC procedures are not included in the openCypher specification, Neptune does not provide direct support for external procedures. Instead, Neptune relies on [integrations with other AWS services](integrations.md) to achieve similar end user functionality in a scalable, secure, and robust manner. Sometimes APOC procedures can be rewritten in openCypher or Gremlin, and some are not relevant to Neptune applications.

  In general, APOC procedures fall into the categories below:
  + **[Import](https://neo4j.com/labs/apoc/4.2/import/)**   –   Neptune supports importing data using a variety of formats using query languages, the Neptune [bulk loader](bulk-load.md), or as an target of [AWS Database Migration Service](dms-neptune.md). ETL operations on data may be performed using AWS Glue and the [https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-python-utils](https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-python-utils) open-source package.
  + **[Export](https://neo4j.com/labs/apoc/4.2/export/)**   –   Neptune supports exporting data using the [`neptune-export`](neptune-data-export.md) utility, which supports a variety of common export formats and methods.
  + **[Database Integration](https://neo4j.com/labs/apoc/4.2/database-integration/)**   –   Neptune supports integration with other databases using ETL tools such as AWS Glue or migrations tools such as the [AWS Database Migration Service](dms-neptune.md).
  + **[Graph Updates](https://neo4j.com/labs/apoc/4.2/graph-updates/)**   –   Neptune supports a rich set of features for updating property-graph data through its support for both the openCypher and the Gremlin query languages. See [Cypher rewrites](migration-opencypher-rewrites.md) for examples of rewrites of commonly used procedures.
  + **[Data Structures](https://neo4j.com/labs/apoc/4.2/data-structures/)**   –   Neptune supports a rich set of features for updating property-graph data through its support for both the openCypher and the Gremlin query languages. See [Cypher rewrites](migration-opencypher-rewrites.md) for examples of rewrites of commonly used procedures.
  + **[Temporal (Date Time)](https://neo4j.com/labs/apoc/4.2/temporal/)**   –   Neptune supports a rich set of features for updating property-graph data through its support for both the openCypher and the Gremlin query languages. See [Cypher rewrites](migration-opencypher-rewrites.md) for examples of rewrites of commonly used procedures.
  + **[Mathematical](https://neo4j.com/labs/apoc/4.2/mathematical/)**   –   Neptune supports a rich set of features for updating property-graph data through its support for both the openCypher and the Gremlin query languages. See [Cypher rewrites](migration-opencypher-rewrites.md) for examples of rewrites of commonly used procedures.
  + **[Advanced Graph Querying](https://neo4j.com/labs/apoc/4.2/graph-querying/)**   –   Neptune supports a rich set of features for updating property-graph data through its support for both the openCypher and the Gremlin query languages. See [Cypher rewrites](migration-opencypher-rewrites.md) for examples of rewrites of commonly used procedures.
  + **[Comparing Graphs](https://neo4j.com/labs/apoc/4.2/comparing-graphs/)**   –   Neptune supports a rich set of features for updating property-graph data through its support for both the openCypher and the Gremlin query languages. See [Cypher rewrites](migration-opencypher-rewrites.md) for examples of rewrites of commonly used procedures.
  + **[Cypher Execution](https://neo4j.com/labs/apoc/4.2/cypher-execution/)**   –   Neptune supports a rich set of features for updating property-graph data through its support for both the openCypher and the Gremlin query languages. See [Cypher rewrites](migration-opencypher-rewrites.md) for examples of rewrites of commonly used procedures.
+ **Custom procedures**   –   Neptune does not support custom procedures created by users. This functionality would have to be implemented at the application layer.
+ **Geospatial**   –   Although Neptune doesn't provide native support for geospatial features, similar functionality can be achieved through integration with other AWS services, as shown in this blog post: [Combine Amazon Neptune and Amazon OpenSearch Service for geospatial queries](https://aws.amazon.com/blogs/database/combine-amazon-neptune-and-amazon-opensearch-service-for-geospatial-queries/) by Ross Gabay and Abhilash Vinod (1 February 2022).
+ **Graph Data Science**   –   Neptune supports graph analytics today through [Neptune Analytics](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html), a memory-optimized engine that supports a library of graph analytic algorithms.

  Neptune also provides an integration with the [AWS Pandas SDK](https://github.com/amazon-archives/fully-automated-neo4j-to-neptune) and several [sample notebooks](https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks/05-Data-Science) that show how to leverage this integration within Python environments to run analytics on graph data.

  
+ **Schema Constraints**   –   Within Neptune, the only schema constraint available is the uniqueness of the ID of a node or edge. There is no feature to specify any additional schema constraints, or any additional uniqueness or value constraints on an element in the graph. ID values in Neptune are strings and may be set using Gremlin, like this:

  ```
  g.addV('person').property(id, '1') )
  ```

  Applications that need to leverage the ID as a uniqueness constraint are encouraged to try this approach for achieving a uniqueness constraint. If the application used multiple columns as a uniqueness constraint, the ID may be set to a combination of these values. For example `id=123, code='SEA'` could be represented as `ID='123_SEA'` to achieve a complex uniqueness constraint.
+ **Multi-tenancy**   –   Neptune only supports a single graph per cluster. To build a multi-tenant system using Neptune, either use multiple clusters or logically partition the tenants within a single graph and use application-side logic to enforce separation. For example, add a property `tenantId` and include it in each query, like this:

  ```
  MATCH p=(n {tenantId:1})-[]->({tenantId:1}) RETURN p LIMIT 5)
  ```

  [Neptune Serverless](neptune-serverless.md) makes it relatively easy to implement multi-tenancy using multiple DB clusters, each of which is scaled independently and automatically as needed.

## Neptune support for Neo4j tools
Neo4j tools

Neptune provides the following alternatives to Neo4j tools:
+ **[Neo4j Browser](https://neo4j.com/docs/operations-manual/current/installation/neo4j-browser/)**   –   Neptune provides open-source [graph notebooks](graph-notebooks.md) that provide a developer-focused IDE for running queries and visualizing the results.
+ **[Neo4j Bloom](https://neo4j.com/product/bloom/)**   –   Neptune supports rich graph visualizations using [third-party visualization solutions](visualization-tools.md) such as Graph-explorer, Tom Sawyer, Cambridge Intelligence, Graphistry, metaphacts, and G.V().
+ **[GraphQL](https://graphql.org/)**   –   Neptune currently supports GraphQL though custom AWS AppSync integrations. See the [Build a graph application with Amazon Neptune and AWS Amplify](https://aws.amazon.com/blogs/database/build-a-graph-application-with-amazon-neptune-and-aws-amplify/) blog post, and the example project, [Building Serverless Calorie tracker application with AWS AppSync and Amazon Neptune](https://github.com/aws-samples/aws-appsync-calorie-tracker-workshop).
+ **[NeoSemantics](https://neo4j.com/labs/neosemantics/4.0/)**   –   Neptune natively supports the RDF data model, so customers wishing to run RDF workloads are advised to use Neptune's RDF model support.
+ **[Arrows.app](https://arrows.app/)**   –   The Cypher created when exporting the model using the export command is compatible with Neptune.
+ **[Linkurious Ogma](https://doc.linkurious.com/ogma/latest/)**   –   A sample integration with Linkurious Ogma is [available here](https://github.com/aws-samples/amazon-neptune-samples/tree/master/gremlin/ogma-neptune).
+ **[Spring Data Neo4j](https://spring.io/projects/spring-data-neo4j)**   –   This is not currently compatible with Neptune.
+ **[Neo4j Spark Connector](https://neo4j.com/docs/spark/current/)**   –   The Neo4j spark connector can be used within a Spark Job to connect to Neptune using openCypher. Here is some sample code and application configuration:

  **Sample code:**

  ```
  SparkSession spark = SparkSession
              .builder()
              .config("encryption.enabled", "true")
              .appName("Simple Application").config("spark.master", "local").getOrCreate();
  
  Dataset<Row> df = spark.read().format("org.neo4j.spark.DataSource")
              .option("url", "bolt://(your cluster endpoint):8182")
              .option("encryption.enabled", "true")
              .option("query", "MATCH (n:airport) RETURN n")
              .load();
              
  System.out.println("TOTAL RECORD COUNT: " + df.count());
  spark.stop();
  ```

  **Application configuration:**

  ```
  <dependency>
      <groupId>org.neo4j</groupId>
      <artifactId>neo4j-connector-apache-spark_2.12-4.1.0</artifactId>
      <version>4.0.1_for_spark_3</version>
  </dependency>
  ```

### Neo4j features and tools not listed here
Tools not listed

If you are using a tool or feature that is not listed here, we are unsure of its compatibility with Neptune or other services within AWS. Please reach out to AWS support or engage your account team if you have further questions.

# Rewriting Cypher queries to run in openCypher on Neptune
Cypher rewrites

The openCypher language is a declarative query language for property graphs that was originally developed by Neo4j, then open-sourced in 2015, and contributed to the [openCypher project](https://www.opencypher.org/) under an Apache 2 open-source license. At AWS, we believe that open source is good for everyone and we are committed to bringing the value of open source to our customers, and the operational excellence of AWS to open source communities.

OpenCypher syntax is documented in the [Cypher Query Language Reference, Version 9](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf).

Because openCypher contains a subset of the syntax and features of the Cypher query language, some migration scenarios require either rewriting queries in openCypher-compliant forms or examining alternative methods to achieve the desired functionality.

This section contains recommendations for handling common differences, but they are by no means exhaustive. You should test any application using these rewrites thoroughly to ensure that the results are what you expect.

## Rewriting `None`, `All`, and `Any` predicate functions
`None`, `All`, and `Any` predicate functions

These functions are not part of the openCypher specification. Comparable results can be achieved in openCypher using List Comprehension.

For example, find all the paths that go from node `Start` to node `End`, but no journey is allowed to pass through a node with a class property of `D`:

```
# Neo4J Cypher code
match p=(a:Start)-[:HOP*1..]->(z:End)
where none(node IN nodes(p) where node.class ='D')
return p

# Neptune openCypher code
match p=(a:Start)-[:HOP*1..]->(z:End)
where size([node IN nodes(p) where node.class = 'D']) = 0
return p
```

List comprehension can achieve these results as follows:

```
all  => size(list_comprehension(list)) = size(list)
any  => size(list_comprehension(list)) >= 1
none => size(list_comprehension(list)) = 0
```

## Rewriting the Cypher `reduce()` function in openCypher
Rewriting Cypher `reduce()`

The `reduce()` function is not part of the openCypher specification. It is often used to create an aggregation of data from elements within a list. In many cases, you can use a combination of List Comprehension and the `UNWIND` clause to achieve similar results.

For example, the following Cypher query finds all airports on paths having one to three stops between Anchorage (ANC) and Austin (AUS), and returns the total distance of each path:

```
MATCH p=(a:airport {code: 'ANC'})-[r:route*1..3]->(z:airport {code: 'AUS'})
RETURN p, reduce(totalDist=0, r in relationships(p) | totalDist + r.dist) AS totalDist
ORDER BY totalDist LIMIT 5
```

You can write the same query in openCypher for Neptune as follows:

```
MATCH p=(a:airport {code: 'ANC'})-[r:route*1..3]->(z:airport {code: 'AUS'})
UNWIND [i in relationships(p) | i.dist] AS di
RETURN p, sum(di) AS totalDist
ORDER BY totalDist
LIMIT 5
```

## Rewriting the Cypher FOREACH clause in openCypher
Rewriting Cypher FOREACH

The FOREACH clause is not part of the openCypher specification. It is often used to update data in the middle of a query, often from aggregations or elements within a path.

As a path example, find all airports on a path with no more than two stops between Anchorage (ANC) and Austin (AUS) and set a property of visited on each of them:

```
# Neo4J Example
MATCH p=(:airport {code: 'ANC'})-[*1..2]->({code: 'AUS'})
FOREACH (n IN nodes(p) | SET n.visited = true)

# Neptune openCypher
MATCH p=(:airport {code: 'ANC'})-[*1..2]->({code: 'AUS'})
WITH nodes(p) as airports
UNWIND airports as a
SET a.visited=true
```

Another example is:

```
# Neo4J Example
MATCH p=(start)-[*]->(finish)
WHERE start.name = 'A' AND finish.name = 'D'
FOREACH (n IN nodes(p) | SET n.marked = true)

# Neptune openCypher
MATCH p=(start)-[*]->(finish)
WHERE start.name = 'A' AND finish.name = 'D'
UNWIND nodes(p) AS n
SET n.marked = true
```

## Rewriting Neo4j APOC procedures in Neptune
Rewriting APOC procedures

The examples below use openCypher to replace some of the most commonly used [APOC procedures](https://neo4j.com/blog/intro-user-defined-procedures-apoc/). These examples are for reference only, and are intended to provide some suggestions about how to handle common scenarios. In practice, each application is different, and you'll have to come up with your own strategies for providing all the functionality you need.

### Rewriting `apoc.export` procedures
`apoc.export`

Neptune provides an array of options for both full graph and query-based exports in various output formats such as CSV and JSON, using the [neptune-export](https://github.com/aws/neptune-export) utility (see [Exporting data from a Neptune DB cluster](neptune-data-export.md)).

### Rewriting `apoc.schema` procedures
`apoc.schema`

Neptune does not have explicitly defined schema, indices, or constraints, so many `apoc.schema` procedures are no longer required. Examples are:
+ `apoc.schema.assert`
+ `apoc.schema.node.constraintExists`
+ `apoc.schema.node.indexExists`,
+ `apoc.schema.relationship.constraintExists`
+ `apoc.schema.relationship.indexExists`
+ `apoc.schema.nodes`
+ `apoc.schema.relationships`

Neptune openCypher does support retrieving similar values to those that the procedures do, as shown below, but can run into performance issues on larger graphs as doing so requires scanning a large portion of the graph to return the answer.

```
# openCypher replacement for apoc.schema.properties.distinct
MATCH (n:airport)
RETURN DISTINCT n.runways
```

```
# openCypher replacement for apoc.schema.properties.distinctCount
MATCH (n:airport)
RETURN DISTINCT n.runways, count(n.runways)
```

### Alternatives to `apoc.do` procedures
`apoc.do`

These procedures are used to provide conditional query execution that is easy to implement using other openCypher clauses. In Neptune there are at least two ways to achieve similar behavior:
+ One way is to combine openCypher's List Comprehension capabilities with the `UNWIND` clause.
+ Another way is to use the choose() and coalesce() steps in Gremlin.

Examples of these approaches are shown below.

#### Alternatives to apoc.do.when
apoc.do.when

```
# Neo4J Example
MATCH (n:airport {region: 'US-AK'})
CALL apoc.do.when(
 n.runways>=3,
 'SET n.is_large_airport=true RETURN n',
 'SET n.is_large_airport=false RETURN n',
 {n:n}
) YIELD value
WITH collect(value.n) as airports
RETURN size([a in airports where a.is_large_airport]) as large_airport_count,
size([a in airports where NOT a.is_large_airport]) as small_airport_count


# Neptune openCypher
MATCH (n:airport {region: 'US-AK'})
WITH n.region as region, collect(n) as airports
WITH [a IN airports where a.runways >= 3] as large_airports,
[a IN airports where a.runways < 3] as small_airports, airports
UNWIND large_airports as la
SET la.is_large_airport=true
WITH DISTINCT small_airports, airports
UNWIND small_airports as la
    SET la.small_airports=true
WITH DISTINCT airports
RETURN size([a in airports where a.is_large_airport]) as large_airport_count,
size([a in airports where NOT a.is_large_airport]) as small_airport_count

#Neptune Gremlin using choose()
g.V().
  has('airport', 'region', 'US-AK').
  choose(
    values('runways').is(lt(3)),
    property(single, 'is_large_airport', false),
    property(single, 'is_large_airport', true)).
  fold().
  project('large_airport_count', 'small_airport_count').
    by(unfold().has('is_large_airport', true).count()).
    by(unfold().has('is_large_airport', false).count())

 #Neptune Gremlin using coalesce() 
g.V().
  has('airport', 'region', 'US-AK').
  coalesce(
    where(values('runways').is(lt(3))).
    property(single, 'is_large_airport', false),
    property(single, 'is_large_airport', true)).
  fold().
  project('large_airport_count', 'small_airport_count').
    by(unfold().has('is_large_airport', true).count()).
    by(unfold().has('is_large_airport', false).count())
```

#### Alternatives to apoc.do.case
apoc.do.case

```
# Neo4J Example
MATCH (n:airport {region: 'US-AK'})
CALL apoc.case([
 n.runways=1, 'RETURN "Has one runway" as b',
 n.runways=2, 'RETURN "Has two runways" as b'
 ],
 'RETURN "Has more than 2 runways" as b'
) YIELD value 
RETURN {type: value.b,airport: n}

# Neptune openCypher
MATCH (n:airport {region: 'US-AK'})
WITH n.region as region, collect(n) as airports
WITH [a IN airports where a.runways =1] as single_runway,
[a IN airports where a.runways =2] as double_runway,
[a IN airports where a.runways >2] as many_runway
UNWIND single_runway as sr
    WITH {type: "Has one runway",airport: sr} as res, double_runway, many_runway
WITH DISTINCT double_runway as double_runway, collect(res) as res, many_runway
UNWIND double_runway as dr
    WITH {type: "Has two runways",airport: dr} as two_runways, res, many_runway
WITH collect(two_runways)+res as res, many_runway
UNWIND many_runway as mr
    WITH {type: "Has more than 2 runways",airport: mr} as res2, res, many_runway
WITH collect(res2)+res as res
UNWIND res as r
RETURN r

#Neptune Gremlin using choose()
g.V().
  has('airport', 'region', 'US-AK').
  project('type', 'airport').
    by(
      choose(values('runways')).
        option(1, constant("Has one runway")).
        option(2, constant("Has two runways")).
        option(none, constant("Has more than 2 runways"))).
    by(elementMap())

 #Neptune Gremlin using coalesce()
 g.V().
  has('airport', 'region', 'US-AK').
  project('type', 'airport').
    by(
      coalesce(
        has('runways', 1).constant("Has one runway"),
        has('runways', 2).constant("Has two runways"),
        constant("Has more than 2 runways"))).
    by(elementMap())
```

## Alternatives to List-based properties
Rewriting List support

Neptune does not currently support storing List-based properties. However, similar results can be obtained by storing list values as a comma separated string and then using the `join()` and `split()` functions to construct and deconstruct the list property.

For example, if we wanted to save a list of tags as a property, we could use the example rewrite which shows how to retrieve a comma separated property and then use the `split()` and `join()` functions with List Comprehension to achieve comparable results:

```
# Neo4j Example (In this example, tags is a durable list of string.
MATCH (person:person {name: "TeeMan"})
WITH person, [tag in person.tags WHERE NOT (tag IN ['test1', 'test2', 'test3'])] AS newTags
SET person.tags = newTags
RETURN person

# Neptune openCypher 
MATCH (person:person {name: "TeeMan"})
WITH person, [tag in split(person.tags, ',') WHERE NOT (tag IN ['test1', 'test2', 'test3'])] AS newTags
SET person.tags = join(newTags,',')
RETURN person
```

## Rewriting CALL subqueries


 Neptune `CALL` subqueries do not support the syntax `CALL (friend) { ... }` for importing variables into the subquery scope (`friend`, in this example). Please use the `WITH` clause inside the subquery for the same, e.g., `CALL { WITH friend ... }`. 

 Optional `CALL` subqueries are not supported at this moment. 

## Other differences between Neptune openCypher and Cypher
Other differences
+ Neptune only supports TCP connections for the Bolt protocol. WebSockets connections for Bolt are not supported.
+ Neptune openCypher removes whitespace as defined by Unicode in the `trim()`, `ltrim()` and `rtrim()` functions.
+ In Neptune openCypher, `tostring(`double`)` does not automatically switch to E notation for large values of the double.

# Resources for migrating from Neo4j to Neptune
Migration resources

Neptune provides several tools and resources that can assist in the migration process.

**Tools to help migrate from Neo4j to Neptune**
+ The openCypher [CheatSheet](https://github.com/aws-samples/amazon-neptune-samples/blob/master/opencypher/Cheatsheet.md).
+ [neo4j-to-neptune](https://github.com/awslabs/amazon-neptune-tools/tree/master/neo4j-to-neptune) – A command-line utility for migrating data from Neo4j to Neptune. This tool includes the ability to:
  + Export the data from a properly configured Neo4j graph.
  + Convert that data into Neptune format.
  + Bulk load that data into Neptune.
  + Perform some basic conversions of data during the conversion to Neptune format, such as renaming vertex or edge labels and generating elements.
  + Generate properties for nodes and edges using templates (for example, create an `~id` value using a template such as `Person_{personid}` for situations where you need to create the unique identifier for an element).
+ [openCypher Query Compatibility Checker](https://github.com/awslabs/amazon-neptune-tools/tree/master/opencypher-compatability-checker) – This tool takes an input of openCypher queries and will:
  + Check for compatibility with the selected version of Neptune.
  + Identify specific unsupported functions and clauses with their positions.
  + Suggest replacements if available.
  + Provide error descriptions of any other syntax errors.