# DynamoDB zero-ETL integration with Amazon OpenSearch Service
Integrating with Amazon OpenSearch Service

Amazon DynamoDB offers a zero-ETL integration with Amazon OpenSearch Service through the **DynamoDB plugin for OpenSearch Ingestion**. Amazon OpenSearch Ingestion offers a fully managed, no-code experience for ingesting data into Amazon OpenSearch Service. 

With the DynamoDB plugin for OpenSearch Ingestion, you can use one or more DynamoDB tables as a source for ingestion to one or more OpenSearch Service indexes. You can browse and configure your OpenSearch Ingestion pipelines with DynamoDB as a source from either OpenSearch Ingestion or DynamoDB Integrations in the AWS Management Console.
+ Get started with OpenSearch Ingestion by following along in the [OpenSearch Ingestion getting started guide](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-getting-started-tutorials.html).
+ Learn about the prerequisites and all the configuration options for the DynamoDB plugin at [DynamoDB plugin for OpenSearch Ingestion documentation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/configure-client-ddb.html).

## How it works
How it works

The plugin uses [DynamoDB export to Amazon S3](S3DataExport.HowItWorks.md) to create an initial snapshot to load into OpenSearch. After the snapshot has been loaded, the plugin uses DynamoDB Streams to replicate any further changes in near real time. Every item is processed as an event in OpenSearch Ingestion and can be modified with processor plugins. You can drop attributes or create composite attributes and send them to different indexes through routes.

You must have [point-in-time recovery (PITR)](Point-in-time-recovery.md) enabled to use export to Amazon S3. You must also have [DynamoDB Streams](streamsmain.md) enabled (with the **new & old images** option selected) to be able to use it. It's possible to create a pipeline without taking a snapshot by excluding export settings. 

You can also create a pipeline with only a snapshot and no updates by excluding streams settings. The plugin does not use read or write throughput on your table, so it is safe to use without impacting your production traffic. There are limits to the number of parallel consumers on a stream that you should consider before creating this or other integrations. For other considerations, see [Best practices for integrating with DynamoDB](bp-integration.md).

For simple pipelines, a single OpenSearch Compute Unit (OCU) can process about 1 MB per second of writes. This is the equivalent of about 1000 write request units (WCU). Depending on your pipeline's complexity and other factors, you might achieve more or less than this.

OpenSearch Ingestion supports a dead-letter queue (DLQ) for events that cause unrecoverable errors. Additionally, the pipeline can resume from where it left off without user intervention even if there's an interruption of service with either DynamoDB, the pipeline, or Amazon OpenSearch Service. 

If interruption goes on for longer than 24 hours, this can cause a loss of updates. However, the pipeline would continue to process the updates that were still available when availability is restored. You would need to do a fresh index build to fix any irregularities due to the dropped events unless they were in the dead-letter queue.

For all the settings and details for the plugin, see [OpenSearch Ingestion DynamoDB plugin documentation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/configure-client-ddb.html).

## Integrated create experience through the console
Creating an integration

DynamoDB and OpenSearch Service have an integrated experience in the AWS Management Console, which streamlines the getting started process. When you go through these steps, the service will automatically select the DynamoDB blueprint and add the appropriate DynamoDB information for you.

To create an integration, follow along in the [OpenSearch Ingestion getting started guide](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-get-started.html). When you get to [Step 3: Create a pipeline](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-get-started.html#osis-get-started-pipeline), replace Steps 1 and 2 with the following steps:

1. Navigate to the DynamoDB console.

1. In the left-hand navigation pane, choose **Integration**.

1. Select the DynamoDB table that you'd like to replicate to OpenSearch.

1. Choose **Create**.

From here, you can continue on with the rest of the tutorial.

## Next steps
Next steps

For a better understanding of how DynamoDB integrates with OpenSearch Service, see the following:
+ [Getting started with Amazon OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-getting-started-tutorials.html)
+ [DynamoDB plugin configuration and requirements](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/configure-client-ddb.html)

# Handling breaking changes to your index
Handling breaking changes

OpenSearch can dynamically add new attributes to your index. However, after your mapping template has been set for a given key, you’ll need to take additional action to change it. Additionally, if your change requires you to reprocess all the data in your DynamoDB table, you’ll need to take steps to initiate a fresh export.

**Note**  
In all these options, you might still run into issues if your DynamoDB table has type conflicts with the mapping template you’ve specified. Ensure that you have a dead-letter queue (DLQ) enabled (even in development). This makes it easier to understand what might be wrong with the record that causes a conflict when it's being indexed into your index on OpenSearch.

**Topics**
+ [

## How it works
](#opensearch-for-dynamodb-change-index-howitworks)
+ [

## Delete your index and reset the pipeline (pipeline-centric option)
](#opensearch-for-dynamodb-change-index-delete)
+ [

## Recreate your index and reset the pipeline (index-centric option)
](#opensearch-for-dynamodb-change-index-recreate)
+ [

## Create a new index and sink (online option)
](#opensearch-for-dynamodb-change-index-create)
+ [

## Best practices for avoiding and debugging type conflicts
](#opensearch-for-dynamodb-change-index-bp)

## How it works
How it works

Here's a quick overview of the actions taken when handling breaking changes to your index. See the step-by-step procedures in the sections that follow.
+ **Stop and start the pipeline**: This option resets the pipeline’s state, and the pipeline will restart with a new full export. It is non-destructive, so it does **not** delete your index or any data in DynamoDB. If you don’t create a fresh index before you do this, you might see a high number of errors from version conflicts because the export tries to insert older documents than the current `_version` in the index. You can safely ignore these errors. You will not be billed for the pipeline while it is stopped.
+ **Update the pipeline**: This option updates the configuration in the pipeline with a [blue/green](https://docs.aws.amazon.com/whitepapers/latest/overview-deployment-options/bluegreen-deployments.html) approach, without losing any state. If you make significant changes to your pipeline (such as adding new routes, indexes, or keys to existing indexes), you might need to do a full reset of the pipeline and recreate your index. This option does **not** perform a full export.
+ **Delete and recreate the index**: This option removes your data and mapping settings on your index. You should do this before making any breaking changes to your mappings. It will break any applications that rely on the index until the index is recreated and synchronized. Deleting the index does **not** initiate a fresh export. You should delete your index only after you’ve updated your pipeline. Otherwise, your index might be recreated before you update your settings.

## Delete your index and reset the pipeline (pipeline-centric option)
Pipeline-centric option

This method is often the fastest option if you’re still in development. You’ll delete your index in OpenSearch Service, and then [stop and start](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline--stop-start.html) your pipeline to initiate a fresh export of all your data. This ensures that there are no mapping template conflicts with existing indexes, and no loss of data from an incomplete processed table.

1. Stop the pipeline either through the AWS Management Console, or by using the StopPipeline API operation with the AWS CLI or an SDK.

1. [Update your pipeline configuration](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/update-pipeline.html) with your new changes.

1. Delete your index in OpenSearch Service, either through a `REST` API call or your OpenSearch Dashboard.

1. Start the pipeline either through the console, or by using the `StartPipeline` API operation with the AWS CLI or an SDK.
**Note**  
This initiates a fresh full export, which will incur additional costs.

1. Monitor for any unexpected issues because a fresh export is generated to create the new index.

1. Confirm that the index matches your expectations in OpenSearch Service.

After the export has completed and it resumes reading from the stream, your DynamoDB table data will now be available in the index.

## Recreate your index and reset the pipeline (index-centric option)
Index-centric option

This method works well if you need to do a lot of iterations on the index design in OpenSearch Service before resuming the pipeline from DynamoDB. This can be useful for development when you want to iterate very quickly on your search patterns, and want to avoid waiting on fresh exports to complete between each iteration.

1. Stop the pipeline either through the AWS Management Console, or by calling the StopPipeline API operation with the AWS CLI or an SDK.

1. Delete and recreate your index in OpenSearch with the mapping template you want to use. You can manually insert some sample data to confirm that your searches are working as intended. If your sample data might conflict with any data from DynamoDB, be sure to delete it before moving onto the next step.

1. If you have an indexing template in your pipeline, remove it or replace it with the one you’ve created already in OpenSearch Service. Ensure that the name of your index matches the name in the pipeline.

1. Start the pipeline either through console, or by calling the `StartPipeline` API operation with the AWS CLI or an SDK.
**Note**  
This will initiate a fresh full export, which will incur additional costs.

1. Monitor for any unexpected issues because a fresh export is generated to create the new index.

After the export has completed and it resumes reading from the stream, you should be your DynamoDB table data will now be available in the index.

## Create a new index and sink (online option)
Online option

This method works well if you need to update your mapping template but are currently using your index in production. This creates a brand new index, which you’ll need to move your application over to after it’s synchronized and validated.

**Note**  
This will create another consumer on the stream. This can be an issue if you also have other consumers like AWS Lambda or global tables. You might need to pause updates to your existing pipeline to create capacity to load the new index.

1. [Create a new pipeline](OpenSearchIngestionForDynamoDB.md#opensearch-for-dynamodb-console-create) with new settings and a different index name.

1. Monitor the new index for any unexpected issues.

1. Swap the application over to the new index.

1. Stop and delete the old pipeline after validating that everything is working correctly.

## Best practices for avoiding and debugging type conflicts
Best practices
+ Always use a dead-letter queue (DLQ) to make it easier to debug when there are type conflicts.
+ Always use an index template with mappings and set `include_keys`. While OpenSearch Service dynamically maps new keys, this can cause issues with unexpected behaviors (such as expecting something to be a `GeoPoint`, but it’s created as a `string` or `object`) or errors (such as having a `number` that is a mix of `long` and `float` values).
+ If you need to keep your existing index working in production, you can also replace any of the previous [delete index steps](#opensearch-for-dynamodb-change-index-delete) with just renaming your index in your pipeline config file. This creates a brand new index. Your application will then need to be updated to point to the new index after it's complete.
+ If you have a type conversion issue that you fix with a processor, you can test this with `UpdatePipeline`. To do this, you’ll need to do a stop and start or [process your dead-letter queues](https://opensearch.org/docs/latest/data-prepper/pipelines/dlq/) to fix any previously skipped documents that had errors.

# Best practices for working with DynamoDB zero-ETL integration and OpenSearch Service
Zero-ETL integration with OpenSearch Service

DynamoDB has a [DynamoDB zero-ETL integration with ](OpenSearchIngestionForDynamoDB.md)Amazon OpenSearch Service. For more information, see the [DynamoDB plugin for OpenSearch Ingestion](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/configure-client-ddb.html) and [specific best practices for Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html).

## Configuration
Configuration
+ Only index data that you need to perform searches on. Always use a mapping template (`template_type: index_template` and `template_content`) and `include_keys` to implement this.
+ Monitor your logs for errors that are related to type conflicts. OpenSearch Service expects all values for a given key to have the same type. It generates exceptions if there's a mismatch. If you encounter one of these errors, you can add a processor to catch that a given key is always be the same value.
+ Generally use the `primary_key` metadata value for the `document_id` value. In OpenSearch Service, the document ID is the equivalent of the primary key in DynamoDB. Using the primary key will make it easy to find your document and ensure that updates are consistently replicated to it without conflicts. 

  You can use the helper function `getMetadata` to get your primary key (for example, `document_id: "${getMetadata('primary_key')}"`). If you're using a composite primary key, the helper function will concatenate them together for you.
+ In general, use the `opensearch_action` metadata value for the `action` setting. This will ensure that updates are replicated in such a way that the data in OpenSearch Service matches the latest state in DynamoDB. 

  You can use the helper function `getMetadata` to get your primary key (for example, `action: "${getMetadata('opensearch_action')}"`). You can also get the stream event type through `dynamodb_event_name` for use cases like filtering. However, you should typically not use it for the `action` setting.

## Observability
Observability
+ Always use a dead-letter queue (DLQ) on your OpenSearch sinks to handle dropped events. DynamoDB is generally less structured than OpenSearch Service, and it's always possible for something unexpected to happen. With a dead-letter queue, you can recover individual events, and even automate the recovery process. This will help you to avoid needing to rebuild your entire index.
+ Always set alerts that your replication delay doesn't go over an expected amount. It is typically safe to assume one minute without the alert being too noisy. This can vary depending on how spiky your write traffic is and your OpenSearch Compute Unit (OCU) settings on the pipeline. 

  If your replication delay goes over 24 hours, your stream will start to drop events, and you'll have accuracy issues unless you do a full rebuild of your index from scratch.

## Scaling
Scaling
+ Use auto scaling for pipelines to help scale up or down the OCUs to best fit the workload.
+ For provisioned throughput tables without auto scaling, we recommend setting OCUs based on your write capacity units (WCUs) divided by 1000. Set the minimum to 1 OCU below that amount (but at least 1), and set the maximum to at least 1 OCU above that amount.
  + **Formula:**

    ```
    OCU_minimum = GREATEST((table_WCU / 1000) - 1, 1)
    OCU_maximum = (table_WCU / 1000) + 1
    ```
  + **Example:** Your table has 25000 WCUs provisioned. Your pipeline's OCUs should be set with a minimum of 24 (25000/1000 - 1) and maximum of at least 26 (25000/1000 \$1 1).
+ For provisioned throughput tables with auto scaling, we recommend setting OCUs based on your minimum and maximum WCUs, divided by 1000. Set the minimum to 1 OCU below the minimum from DynamoDB, and set the maximum to at least 1 OCU above the maximum from DynamoDB.
  + **Formula:**

    ```
    OCU_minimum = GREATEST((table_minimum_WCU / 1000) - 1, 1)
    OCU_maximum = (table_maximum_WCU / 1000) + 1
    ```
  + **Example:** Your table has an auto scaling policy with a minimum of 8000 and maximum of 14000. Your pipeline's OCUs should be set with a minimum of 7 (8000/1000 - 1) and a maximum of 15 (14000/1000 \$1 1).
+ For on-demand throughput tables, we recommend setting OCUs based on your typical peak and valley for write request units per second. You might need to average over a longer time period, depending on the aggregation that's available to you. Set the minimum to 1 OCU below the minimum from DynamoDB, and set the maximum to at least 1 OCU above the maximum from DynamoDB.
  + **Formula:**

    ```
    # Assuming we have writes aggregated at the minute level
    OCU_minimum = GREATEST((min(table_writes_1min) / (60 * 1000)) - 1, 1)
    OCU_maximum = (max(table_writes_1min) / (60 * 1000)) + 1
    ```
  + **Example:** Your table has an average valley of 300 write request units per second and an average peak of 4300. Your pipeline's OCUs should be set with a minimum of 1 (300/1000 - 1, but at least 1) and a maximum of 5 (4300/1000 \$1 1).
+ Follow best practices on scaling your destination OpenSearch Service indexes. If your indexes are under-scaled, it will slow down ingestion from DynamoDB, and might cause delays.

**Note**  
[https://docs.aws.amazon.com/redshift/latest/dg/r_GREATEST_LEAST.html](https://docs.aws.amazon.com/redshift/latest/dg/r_GREATEST_LEAST.html) is a SQL function that, given a set of arguments, returns the argument with the greatest value.