Guidance for Real-Time Text Search Using Amazon OpenSearch Service

Overview

This Guidance enables you to integrate Amazon DynamoDB with Amazon OpenSearch Service to enable real-time search. Most applications should use Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service. For applications with requirements that do not align with zero-ETL integration, this Guidance demonstrates how to perform an initial load of data from DynamoDB into OpenSearch Service through parallel functions and how to replicate new data into OpenSearch Service. By keeping data in both places, you can target queries to the database best suited to your requirements: DynamoDB powers any fixed access patterns that require performance and scalability, and OpenSearch Service powers access patterns that require flexibility in searching and filtering.

How it works

This architecture diagram shows how to load and stream data from an Amazon DynamoDB table to Amazon OpenSearch Service to support real-time, open-ended searching and filtering.

Architecture diagram Step 1
To process existing data, an AWS Lambda function is invoked to describe the Amazon DynamoDB table and split it into a number of segments based on the returned item count. The function writes one message to an Amazon Simple Queue Service (Amazon SQS) queue for each segment number.
Step 2
Amazon SQS acts as an event source for Lambda. Lambda will invoke functions from messages in the queue and process segments of the DynamoDB table in parallel.
Step 3
The Lambda function uses a parallel scan to read the segment of the DynamoDB table listed in the source event from Amazon SQS.
Step 4
The function then writes the data retrieved from DynamoDB into Amazon OpenSearch Service in batches through the bulk-create operation.
Step 5
Insert or update items in DynamoDB to invoke capture by DynamoDB streams.
Step 6
DynamoDB streams send item-level modifications captured from DynamoDB to the Lambda streaming update function.
Step 7
The Lambda function writes that data in batches to OpenSearch Service through the bulk index operation. Track ingested documents with the SearchableDocuments metric in Amazon CloudWatch.

Deploy with confidence

Everything you need to launch this Guidance in your account is right here.

Let's make it happen

Deploy this Guidance Use sample code to deploy this Guidance in your AWS account

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

AWS Cloud Development Kit (AWS CDK) defines the infrastructure for the solution as code, helping you achieve consistent deployment. Lambda divides work into smaller units of work, each responsible for a different application function. These single-task functions reduce human error and support small incremental changes that are easier to reverse if they fail.

Read the Operational Excellence whitepaper

Security

Where applicable, this Guidance launches services in private Amazon Virtual Private Cloud (Amazon VPC) networks rather than public. Private networking through Amazon VPC supports security at all layers by letting you control how data is accessed. Additionally, the use of single-purpose, least-privilege AWS Identity and Access Management (IAM) policies helps you prevent permission changes from having broader, unanticipated consequences and reduces the risk of users mishandling sensitive data. AWS Secrets Manager generates and securely stores admin secrets, preventing users from storing credentials in code or environment variables where they are at risk of exposure.

Read the Security whitepaper

Reliability

Amazon SQS provides an automatic retry mechanism if a portion of the import fails, helping you quickly recover from failures. As the system of record, DynamoDB uses point-in-time recovery for continuous backup, enabling recovery to any second within the last 35 days. OpenSearch Service helps you prevent drift between the two databases by using the “create” operation for initial data loading, preventing older data from overwriting newer data. OpenSearch Service is set to use a single-node cluster, but you can change this to a multi–Availability Zone cluster to maintain availability in production.

Read the Reliability whitepaper

Performance Efficiency

Lambda enables you to parallelize workloads: reads from DynamoDB go through segmented parallel scans split across multiple Lambda function invocations. This parallelization enables significantly higher throughput than a single thread could manage.

Read the Performance Efficiency whitepaper

Cost Optimization

Lambda reads DynamoDB items together in a batch rather than as individual GetItem requests. As a result, this Guidance consumes fewer read capacity units. By lowering the amount of work spent on tasks like initializing connections, the use of batches reduces compute time and the number of Lambda invocations, lowering your compute costs. Additionally, OpenSearch Service batch operations are efficient, helping you reduce the overall cost of compute resources.

Read the Cost Optimization whitepaper

Sustainability

Lambda only invokes functions when data needs to be moved into OpenSearch Service and does not run while idle. This helps you maximize your utilization of compute resources. Additionally, as a serverless, managed service, DynamoDB helps reduce inefficiencies and decrease the total power consumed by your workloads.

Read the Sustainability whitepaper

Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service is now available

This blog post demonstrates how to get started with Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service.