Cost optimization pillar
The cost optimization pillar of the AWS Well-Architected Framework focuses on avoiding unnecessary costs. The following recommendations can help you meet the cost optimization design principles and architectural best practices for Amazon Neptune.
The cost optimization pillar focuses on the following key areas:
-
Understanding spending over time and controlling fund allocation
-
Selecting resources of the right type and quantity
-
Scaling to meet business needs without overspending
Understand usage patterns and services needed
Neptune is a good fit for your workload if your data model has a discernible graph structure, and your queries need to explore relationships and traverse multiple hops. A graph database isn't a good fit for the following patterns:
-
Mainly single-hop queries (consider whether your data might be better represented as attributes of an object)
-
JSON or BLOB data stored as properties
-
Queries that aggregate across a dataset, such as calculating the sum of a numeric property across a large number of nodes
Consider whether using several purpose-built databases together for specific access patterns might address all of your needs. For example:
-
An API that requires less frequent complex graph navigations alongside highly concurrent retrieval of properties for a single node might be best presented by using one or more of Neptune, DynamoDB, or Amazon DocumentDB.
-
Relational databases can co-exist with Neptune to maintain your existing functionality, but use Neptune only for multiple-hop traversals that do not perform and scale well in relational databases.
Understand the costs associated with services that interact with and complement Neptune, including the following:
-
Amazon Simple Storage Service (Amazon S3) storage costs for data files being bulk loaded into Neptune
-
Lambda functions used for insert or upsert queries, read queries, and Neptune streams processing
-
The API layer built on Neptune to interact with the client application (instead of having direct connections to the database) in Amazon API Gateway or AWS AppSync
-
AWS Glue jobs used to transfer data to and from Neptune
-
Amazon Kinesis or Amazon Managed Streaming for Apache Kafka (Amazon MSK) instances receiving streaming data for near real-time ingestion into Neptune.
-
AWS Database Migration Service for migration of relational data to Neptune
-
Amazon SageMaker Runtime costs for Jupyter notebooks and deep graph library machine learning models
Select resources with attention to cost
Neptune pricing
-
Does the
MainRequestQueuePendingRequestsCloudWatch metric stay at a consistently low number near zero? -
Does the
BufferCacheHitRatioCloudWatch metric stay at or above 99.9 percent a majority of the time? -
What are the cost and performance curves for instance costs and for associated data I/O costs? Data read costs might increase significantly with an undersized instance that requires frequent buffer cache swapping with storage.
BufferCacheHitRatiowill be dropping frequently in these scenarios.
Instance costs scale linearly with size within the same instance family. The hourly
cost of the db.r6i.2xlarge instance is twice that of the
db.r6i.xlarge instance and also has twice the resource allocation. The
db.r6i.24xlarge instance is 24 times the hourly cost of the
db.r6i.xlarge instance.
Estimate the number of concurrent queries you must support. You can have between zero
and fifteen read replicas for processing read-only queries. If your requirements vary by
the time of day, week, or month, you can use multiple smaller instances to scale on a
schedule. Each vCPU on an instance provides two threads for handling concurrent queries.
Three db.r6i.xlarge read replicas, with 4 vCPU each, can handle 24
concurrent queries..
If your traffic volume is instead measured in queries per second (QPS), you must
experiment to determine the average latency of your queries. The number of queries per
second a Neptune cluster can support is equal to vCPU × 2 × (1
second/average query latency). For example, if you have 4 vCPU and query
latency of 100 milliseconds (0.1 second), QPS = 4 × 2 × (1s/0.1s) = 80
queries per second.
Provisioned instances are cheaper than serverless for continuous, stable, and
predictable workloads. Serverless provides opportunities for optimizing costs when you
have a workload that requires very high usage for just a few hours per day (for example,
db.r6i.4xlarge) and then almost no traffic for the remainder of the day
(for example, 1 Neptune Compute Unit). A serverless instance that scales up for a few
hours and then back down will be less expensive than using a provisioned
db.r6i.4xlarge instance all day.
Consider upgrading to Neptune 1.4.5.0 or later and utilizing r8g
instances to achieve better read and write throughput at a lower cost than older
generation instances, such as r7g or r6g. For more
information, see 4.7 times better write query price-performance with AWS Graviton4 R8g instances
using Amazon Neptune v1.4.5
Neptune clusters are created by default with standard storage (if you
create using the console, it will default to selecting I/O-optimized storage). With
I/O-optimized storage, you pay a slightly higher cost for storage and instances, but
there are no I/O costs. This leads to more predictable recurring costs, but if your I/O
usage is generally low, it may be more cost efficient to utilize standard storage. If
you intend to load a lot of data initially, you can optimize cost by choosing
I/O-optimized storage, performing your initial data load, and then switch to standard
storage. The storage type affects the billing model only and has no technical difference
in the Neptune DB cluster or instance configuration. You can change the storage type
once per 30 days. After 30 days, check your detailed Neptune costs and use the Neptune pricing page
Choose the best Neptune instance configuration for your workload
If you created your AWS account before July 15th, 2025, you can use the AWS Free Tier db.t3.medium and
db.t4g.medium instance usage are enough for you to get a good
understanding of Neptune at low scale. Your cluster will remain after the free trial
period ends, although you will be charged for usage going forward from that
point.
The db.t3.medium and db.t4g.medium instances are good for
low-cost development environments where you are not using openCypher, Graph Explorer, or
various generative AI integrations. These instances have a smaller RAM-to-vCPU ratio
(2:1) than the R family instances (8:1) or X family instances
(16:1). This reduces ratio prevents the use of DFE engine
statistics that enable openCypher performance, GenAI integrations (to inform
the LLM of the graph schema), and Graph Explorer. Performance profiles might differ
significantly when using T family instances, especially for the previously
mentioned workloads. These instances can also increase the occurrence of
OutOfMemoryExceptions when queries navigate across a significant
portion of the graph. To determine whether the latter condition might be affected, check
the BufferCacheHitRatio CloudWatch metric.
We strongly advise against doing any performance or load testing with T
family instances because you might experience inconsistent results that are not
indicative of a production environment.
Provisioned instances give you the best cost and performance combination when your
workload is fairly stable and predictable. Choose the instance size based on the request
concurrency required and the query complexity. Higher concurrency requires more vCPUs.
Higher query complexity requires more RAM. Use the
MainRequestQueuePendingRequests CloudWatch metric to determine the impact of
the former (greater than zero represents more concurrent requests than can be handled).
Use the BufferCacheHitRatio CloudWatch metric to determine the impact of the
latter. A ratio that is frequently falling lower than 99.9 percent suggests that there
isn't enough RAM to contain the working portion of the graph being evaluated, which
results in more frequent cache swapping. If the R family of instances provides
sufficient concurrency but not enough RAM, consider trying the X family of
instances.
Ideal use cases for serverless instances are described in the Neptune documentation. If you are unsure whether provisioned or
serverless is best for you, and cost is your primary concern, test your workload in
serverless to determine the number of NCUs used and compare the cost of provisioned
(N hours × hourly provisioned cost) with serverless (sum of
NCUs × hourly cost per NCU). If you are unsure about the equivalent
sized provision instance, one NCU is equivalent to approximately 2 GB of RAM and
associated vCPU and networking. If your provisioned instance is from the
r6i family, the ratio is 1 vCPU per 8 GB of RAM, or 4 NCUs, along with
associated networking. The Amazon Neptune Pricing Calculator
When using serverless for primary and replica instances, remember that read replicas in promotion tiers 0 and 1 will scale their NCUs in line with the writer instance so that they are properly scaled if a failover event occurs. Set your NCU limits for these instances based on which of your instances—writer or readers—receive the most traffic.
In environments where the cluster is not needed 24 hours per day, 7 days a week, consider writing scripts that will turn off the Neptune instances when not in use and start them again before they are used. Neptune instances will automatically restart every 7 days to ensure required maintenance updates are applied. If you intend to leave the instances off for long durations, use a weekly script to shut them down again.
Right-size data storage and transfer
More efficient queries (for example, queries that need to touch fewer nodes, edges, and properties in the graph) require less I/O transfer and potentially can make use of smaller instances because less buffer cache is required. Use the profile or explain endpoints for your query language to optimize your query, and consider optimizing your graph model for your query performance.
Neptune uses dictionary encoding on large strings, and that dictionary is optimized for performance, not efficiency. If you have large BLOBs, JSON, or frequently changing strings for properties, consider storing them outside Neptune in Amazon S3, Amazon DynamoDB, or Amazon DocumentDB, and store only a reference within the Neptune node.
In some cases, choosing a larger instance size can be cheaper. If your I/O costs are
very high because of a low BufferCacheHitRatio, it's possible that the
larger buffer cache would significantly reduce that cost. That's because all of the data
would fit in the cache instead of being frequently swapped from storage and incurring
the I/O transfer rate.
Neptune uses copy-on-write cloning. When cloning to split a graph into multiple shards, it might be more efficient not to delete the unwanted data on the cloned cluster because that will involve the creation of new data pages, resulting in increased storage costs. Data that is unchanged from before the cloning event will exist in a single data page shared across the two clusters and will be charged only for that single copy.
Do not enable the OSGP index or use R5d instances unless you have tested to confirm that they make a substantial difference in your workload. Both are designed for rarely occurring scenarios, and they might increase your costs for minimal or no gains.