View a markdown version of this page

Use the solution - Migration Assistant for Amazon OpenSearch Service

Use the solution

This section describes how to run a migration using the Migration Assistant for Amazon OpenSearch Service solution after you have deployed it on Amazon EKS. The day-to-day operator interface is the Workflow CLI, which runs in the Migration Console pod (migration-console-0) on Amazon EKS. The supporting console CLI provides component-level inspection and ad-hoc operations during validation and troubleshooting.

Getting started with the Workflow CLI

This sequence is the shortest safe path to your first migration to Amazon OpenSearch Service or Amazon OpenSearch Serverless: load the right schema for your version, prove connectivity, run a small pilot, and only then run the full workflow.

Before you start

Make sure all of the following are true:

  • Migration Assistant is deployed on Amazon EKS. See Deploy the solution.

  • The source cluster and the Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection are reachable from the Amazon EKS cluster.

  • Snapshot storage is ready in Amazon S3 if you plan to run backfill.

  • Any basic-auth Kubernetes secrets you need can be created in the ma namespace.

Step 1: Access the Migration Console

kubectl exec -it migration-console-0 -n ma -- /bin/bash

If you are accessing Amazon EKS from a new shell, refresh your kubeconfig first:

aws eks update-kubeconfig --region <REGION> --name migration-eks-cluster-<STAGE>-<REGION>

Step 2: Confirm the installed version

console --version

This matters because the workflow schema can change by release.

Step 3: Load the version-matched sample

workflow configure sample --load

This gives you the safest starting point for your installed release.

Step 4: Edit the workflow configuration

workflow configure edit

Fill in the fields that describe your migration:

  • Source endpoint, version, and authentication.

  • Target endpoint and authentication. For Amazon OpenSearch Serverless, set service: aoss in the SigV4 authConfig. For Amazon OpenSearch Service, set service: es.

  • Snapshot repository details if you are running backfill.

  • The migration pattern: backfill only, capture and replay only, or both.

Note

Do not start by editing every possible field. Start with the minimum required fields for your path.

Target configuration for Amazon OpenSearch Serverless

When the target is an Amazon OpenSearch Serverless collection, set the target cluster like this:

{ "targetClusters": { "target": { "endpoint": "https://<collection-id>.<region>.aoss.amazonaws.com", "authConfig": { "sigv4": { "region": "<region>", "service": "aoss" } } } } }

The migration IAM role created by the Amazon EKS deployment must also be added as a principal in your collection’s data access policy. The IAM role is named <eks-cluster-name>-migrations-role. Add it to the collection’s data access policy with both collection-level and index-level permissions before running the workflow.

Target configuration for Amazon OpenSearch Service

When the target is an Amazon OpenSearch Service domain:

{ "targetClusters": { "target": { "endpoint": "https://<domain-endpoint>", "authConfig": { "sigv4": { "region": "<region>", "service": "es" } } } } }

If your domain has fine-grained access control (FGAC) enabled, map the migration IAM role to a security role on the domain (typically all_access during migration, then scoped down). See Troubleshooting.

Step 5: Create Kubernetes secrets if you use basic authentication

kubectl create secret generic source-credentials \ --from-literal=username=<SOURCE_USER> \ --from-literal=password=<SOURCE_PASSWORD> \ -n ma kubectl create secret generic target-credentials \ --from-literal=username=<TARGET_USER> \ --from-literal=password=<TARGET_PASSWORD> \ -n ma

Reference those secret names in authConfig.basic.secretName in your workflow configuration.

Step 6: Verify connectivity before submitting a workflow

console clusters connection-check

The check runs against both source and target by default. To narrow it to one side:

console clusters connection-check --cluster source console clusters connection-check --cluster target

For a direct API check:

console clusters curl source / console clusters curl target /

If these checks fail, stop and fix connectivity or authentication first. Do not start a workflow yet.

Step 7: Verify AWS identity if you use SigV4

If your source or target uses Amazon OpenSearch Service or Amazon OpenSearch Serverless, verify pod identity is working from the Migration Console pod:

aws sts get-caller-identity

If console clusters connection-check works in the Migration Console but the workflow later fails with HTTP 401 or 403, verify that the Argo workflow executor pods are using the IRSA-backed argo-workflow-executor service account. On Amazon EKS, both the Migration Console pod and the workflow executor pods get IRSA-backed identity automatically through the bootstrap script.

Step 8: Run a pilot migration first

Use a small allowlist or a representative subset before you attempt the full migration. This is the easiest way to catch mapping issues, authentication issues, and throughput problems early.

workflow submit workflow manage

Use workflow manage to watch the run and approve any gated steps.

Step 9: Validate the pilot

Check counts and basic behavior on the Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection before you expand scope:

console clusters cat-indices console clusters curl target /<index>/_count console clusters curl target /<index>/_search?size=5&pretty

If you are migrating applications with live traffic, also validate representative queries against the target.

Step 10: Run the real migration

After the pilot succeeds, widen the configuration to the full index set and submit again:

workflow configure edit workflow submit workflow manage

Step 11: Use logs if anything fails

workflow status workflow output workflow output --follow

workflow submit automatically stops and replaces an existing workflow with the same name, so you do not need to manually clean up between runs. If a previous run left orphaned migration custom resources, use workflow reset instead of deleting Argo workflows directly:

workflow reset # interactive — lists CRDs and prompts before delete workflow reset migration-foo # delete a specific resource by name workflow reset --all # delete everything (capture proxies are protected) workflow reset --all --include-proxies --delete-storage # also remove capture proxies and Apache Kafka PVCs

Core commands

The workflow CLI orchestrates a full migration; the console CLI inspects or manually drives a single component during validation and troubleshooting.

Workflow commands

Command Why you use it

workflow configure sample

Shows the sample schema for your installed version

workflow configure sample --load

Loads that sample as your starting point

workflow configure edit

Opens the workflow config in your editor ($EDITOR, defaults to vi)

workflow configure view

Prints the current config

workflow configure clear

Clears the current config and lets you start over

workflow submit

Starts the migration workflow (auto-stops and replaces an existing one with the same name)

workflow submit --wait --timeout 300

Submits and blocks until the workflow completes or the timeout is reached

workflow manage

Primary day-to-day interface for monitoring, approvals, and logs (interactive TUI)

workflow status

Shows the current workflow tree in a non-interactive form

workflow status --all

Shows running and completed workflows

workflow output

Shows logs across workflow pods

workflow output --follow

Streams logs live

workflow approve <PATTERN>

Approves pending gates that match exact names or globs

workflow reset

Lists migration custom resources and lets you delete them safely

Console commands

The console CLI groups operations by component:

Command Why you use it

console --version

Confirms which schema and behavior your Migration Console is running

console clusters connection-check

Verifies the Migration Console can reach and authenticate to source and target

console clusters cat-indices [--cluster source|target|proxy]

Lists indexes on one or both clusters

console clusters curl source /_cat/indices?v

Issues a direct API request against the named cluster

console clusters clear-indices --cluster target --acknowledge-risk

Destructive — deletes all indexes on the named cluster

console snapshot {create|status|delete|unregister-repo}

Manage snapshots in Amazon S3

console metadata {migrate|evaluate}

Run or preview metadata migration outside the workflow

console backfill {describe|start|pause|stop|scale|status}

Inspect or drive RFS backfill

console replay {describe|start|stop|scale|status}

Inspect or drive Traffic Replayer

console metrics {list|get-data}

Inspect Migration Assistant metrics

console kafka {create-topic|list-topics|delete-topic|…​}

Inspect Strimzi-managed Apache Kafka used by capture and replay

console tuples

Inspect captured request/response tuples for replay validation

Note

The workflow path drives metadata, backfill, and replay automatically. Reach for the equivalent console command only when you want to inspect state or work around a specific failure (for example, to call console snapshot status while a long-running snapshot is in progress).

Approval gates

Not every migration step should run without human review. Approval gates let the workflow stop at meaningful checkpoints — typically transitions after metadata work, backfill milestones, and cutover-sensitive steps — so you can validate before continuing.

workflow manage workflow approve <STEP_NAME>

Status symbols

Symbol Meaning

Succeeded

Running

Pending

Failed

Waiting for approval

Migration scenarios

Migration Assistant supports three migration patterns. Pick the one that matches your downtime tolerance.

Scenario 1: Backfill only

Best when you can tolerate a brief write freeze, or when writes can be paused and replayed from an external queue.

Snapshot source → Migrate metadata → Backfill documents → Verify → Switch traffic

Scenario 2: Capture and Replay only

Best when the data is small enough that live replay alone can synchronize the target on Amazon OpenSearch Service or Amazon OpenSearch Serverless, or when you want to replay traffic against multiple targets to compare results.

Reroute traffic to capture proxy → Migrate metadata → Replay traffic → Verify → Switch traffic to target

Scenario 3: Backfill + Capture and Replay (zero-downtime)

The most comprehensive approach. Capture begins first so no writes are lost, then backfill brings over historical data, then replay catches the target up to real-time.

Reroute traffic to capture proxy → Snapshot source → Migrate metadata → Backfill documents → Replay captured traffic → Verify → Switch traffic to target

Backfill tuning

Useful Reindex-from-Snapshot settings include:

  • podReplicas — number of RFS pods running in parallel (one shard per pod).

  • maxConnections — bulk-indexer concurrency to the target.

  • documentsPerBulkRequest — bulk batch size.

  • maxShardSizeBytes — maximum supported shard size (default 80 GiB). Larger shards must be reduced before backfill (force-merge or split).

  • initialLeaseDuration — ISO-8601 duration each worker holds a shard lease before re-acquisition (default PT10M).

  • allowedDocExceptionTypes — list of exception class names from the target’s response that should be counted as success rather than retried.

  • allowLooseVersionMatching — bypass the strict source/target version compatibility check.

Because RFS reads from snapshot storage in Amazon S3, increasing worker count does not add live read load to the source cluster. It mainly changes how quickly the Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection is driven.

Replay tuning

Useful Traffic Replayer settings include:

  • podReplicas — number of replayer pods.

  • speedupFactor — default 1.1. 2.0 means twice the original traffic timeline.

  • removeAuthHeader — strips the captured Authorization header before replaying. Useful when the captured traffic carries credentials that would not be valid against the target.

  • authHeaderOverride — replaces the captured Authorization header with a static value.

  • dependsOnSnapshotMigrations — ensures replay only starts after backfill completes.

  • nonRetryableDocExceptionTypes — list of exception class names that should be counted as failures but not retried.

Warning

Setting both replayerConfig.removeAuthHeader: true and an authConfig block on the same target is rejected by the schema. Pick one — either rely on the target’s authConfig (the Traffic Replayer applies it for you) or strip the captured header.

Cutover and rollback

Switching traffic to the Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection is the cutover step. By this point, capture has already protected writes during backfill, replay has caught the target up, and validation is complete.

Before you switch:

  • replay has reached the live edge,

  • the Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection is healthy,

  • representative application queries work on the target,

  • the application team is ready to move traffic, and

  • the rollback path is still available.

The exact cutover mechanism depends on your environment, but the principle is always the same:

  1. Stop pointing clients at the capture proxy.

  2. Point clients directly at the Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection.

  3. Watch the target closely during the first production traffic window.

In practice, that usually means updating a DNS record, a load balancer backend, an application connection string, or a service-discovery entry. Keep the source cluster available during a rollback window (typically 24–72 hours) before decommissioning. After the rollback window has passed, see Uninstall the solution to remove the Migration Assistant infrastructure.

What is not migrated automatically

Plan separate work for:

  • security configuration,

  • ISM/ILM policies,

  • ingest pipelines,

  • OpenSearch Dashboards or Kibana saved objects,

  • data streams,

  • and cluster-level tuning.