Troubleshooting
This section provides known issue resolution when deploying or running the Migration Assistant for Amazon OpenSearch Service solution. If these instructions don’t address your issue, see the Contact AWS Support section for instructions on opening an AWS Support case for this solution. When opening a support case, please add a note to route the ticket to AWS OpenSearch / Migrations / AWS Solutions.
First-signal commands
Start with the simplest question: is this a deployment problem, an authentication problem, or a workflow problem? These commands give you the fastest first signal:
console --version console clusters connection-check workflow status workflow output kubectl get pods -n ma
If the platform itself is not healthy
Pods are not starting
kubectl describe pod <POD_NAME> -n ma kubectl logs <POD_NAME> -n ma
Common causes:
-
Image pull failures because the chart was installed without valid image overrides. The Amazon EKS bootstrap script handles this for you when run with
--version <tag>or default settings. -
Missing Kubernetes secrets in the
manamespace. -
Insufficient AWS IAM permissions on the IRSA role used by the pod.
-
Pending pods caused by missing capacity or a broken
StorageClass.
Pods are pending
kubectl get events -n ma --sort-by='.lastTimestamp' kubectl describe node <NODE_NAME>
This often means the Amazon EKS node group or Karpenter NodePool needs attention — check that capacity is available in the cluster’s Availability Zones and instance types are permitted by the NodePool selectors.
If connectivity checks fail
Start from the Migration Console pod:
console clusters connection-check console clusters curl source / console clusters curl target /
Common causes:
-
Source security group does not allow traffic from the Amazon EKS cluster security group.
-
The Amazon OpenSearch Service domain or Amazon OpenSearch Serverless collection’s network configuration does not allow traffic from the Amazon EKS cluster.
-
DNS does not resolve from inside the cluster.
-
The endpoint is wrong.
-
TLS verification fails and
allowInsecureis not set for a self-signed environment.
Quick DNS test from the Migration Console pod:
kubectl exec -it migration-console-0 -n ma -- nslookup <CLUSTER_ENDPOINT>
If authentication fails
Authentication issues usually show up as HTTP 401, 403, or "connection check passed from the Migration Console but the workflow failed later."
Basic authentication
Verify that the Kubernetes secret exists in the ma namespace and contains the expected keys:
kubectl get secret <SECRET_NAME> -n ma kubectl get secret <SECRET_NAME> -n ma -o jsonpath='{.data}' | jq 'keys'
Your workflow configuration must point to that same secret name in authConfig.basic.secretName.
AWS Signature Version 4 (SigV4) on Amazon EKS
The Amazon EKS deployment associates IAM roles with the Kubernetes service accounts used by the Migration Console pod and the Argo workflow executor pods through IAM Roles for Service Accounts (IRSA).
Check the identity inside the Migration Console pod:
kubectl exec -it migration-console-0 -n ma -- aws sts get-caller-identity
If the target is an Amazon OpenSearch Service domain with fine-grained access control, make sure the relevant IAM role is mapped with sufficient permissions on the domain. See Fine-grained access control: 403 on cluster:monitor/main.
Use es as the SigV4 service name for Amazon OpenSearch Service domains and aoss for Amazon OpenSearch Serverless collections.
Service account name mismatch
The Migration Console pod does not run under a service account named migration-console. The Helm chart uses migration-console-access-role. The Argo workflow executor pods run under argo-workflow-executor.
If you are inspecting service accounts or troubleshooting identity, check:
kubectl get serviceaccount -n ma kubectl describe serviceaccount migration-console-access-role -n ma kubectl describe serviceaccount argo-workflow-executor -n ma
Fine-grained access control: 403 on cluster:monitor/main
If authentication succeeds but the Amazon OpenSearch Service domain returns 403 on operations such as cluster:monitor/main, fine-grained access control (FGAC) is enabled and the Migration Assistant identity has no role mapping inside the domain. Authentication gets you to the domain; FGAC authorizes what you can do once you are in — both must be in place.
Map the Migration Assistant identity to all_access (or a more scoped role) using the OpenSearch Security API. The API path differs by engine:
-
Elasticsearch 7.x (Open Distro Security):
/_opendistro/_security/api/rolesmapping/<role> -
OpenSearch 1.x and later (Security plugin):
/_plugins/_security/api/rolesmapping/<role>
Use users for internal accounts and backend_roles for identities delivered by the authentication layer — an LDAP or SAML group, or an IAM role ARN when authenticating with AWS SigV4.
Elasticsearch 7.x:
curl -u <admin-user>:<admin-pass> \ -H 'Content-Type: application/json' \ -X PUT "https://<cluster>/_opendistro/_security/api/rolesmapping/all_access" \ -d '{ "backend_roles": ["<identity>"] }'
OpenSearch 1.x and later:
curl -u <admin-user>:<admin-pass> \ -H 'Content-Type: application/json' \ -X PUT "https://<cluster>/_plugins/_security/api/rolesmapping/all_access" \ -d '{ "backend_roles": ["<identity>"] }'
On Amazon OpenSearch Service domains that only accept IAM authentication (no admin password), you can map the role by temporarily setting the Migration Assistant IAM role as the master user:
aws opensearch update-domain-config \ --domain-name <DOMAIN_NAME> \ --advanced-security-options '{"MasterUserOptions":{"MasterUserARN":"<MIGRATION_ROLE_ARN>"}}'
Then scope the master user down again after the role mapping is set.
mTLS
Do not plan around mTLS unless you have validated it in the exact version you are running. The workflow path is centered on basic authentication and SigV4.
If the workflow fails after submission
workflow status workflow output workflow output --follow
Workflow already exists
workflow submit automatically stops and replaces an existing workflow with the same name, so this should rarely block you. If you see lingering custom resources after a partial failure, use workflow reset instead of deleting Argo workflows directly:
workflow reset # interactive list and prompt workflow reset --all # remove everything (capture proxies are protected)
Warning
Avoid kubectl delete workflow …. It bypasses the migration custom resource lifecycle and can leave orphaned Apache Kafka persistent volume claims (PVCs) or pending assignments.
Approval gate is blocking progress
Open the interactive UI:
workflow manage
Or approve the step directly:
workflow approve <STEP_NAME>
If snapshot creation fails
The most common cause for Elasticsearch sources is a missing repository-s3 plugin.
Check the source cluster:
curl http://<SOURCE_HOST>:9200/_cat/plugins?v
Also verify:
-
The source cluster can write to the snapshot bucket in Amazon S3.
-
The repository is registered correctly.
-
The bucket Region and path match the workflow configuration.
If metadata migration fails
Common causes:
-
Incompatible mappings across major versions.
-
Elasticsearch 6.x mapping-type cleanup issues — set
multiTypeBehaviortoNONE,UNION, orSPLITintentionally. -
Target-side settings rejected by the newer version on Amazon OpenSearch Service or Amazon OpenSearch Serverless.
Use a pilot allowlist first so these failures show up on a small slice of data instead of the whole cluster.
If document backfill is too slow or unstable
Check:
-
Target cluster ingest capacity on Amazon OpenSearch Service or Amazon OpenSearch Serverless.
-
Available disk space on the target.
-
RFS worker replica count (
podReplicas). -
Pod memory limits for large documents.
Backfill reads from snapshots in Amazon S3, so adding RFS workers does not increase load on the source cluster. It mainly changes how quickly the target is driven.
If console or workflow is not in PATH
Some Migration Console images install the binaries under /.venv/bin:
export PATH="/.venv/bin:$PATH" /.venv/bin/console --version /.venv/bin/workflow configure sample
If you need more data to debug
Collect the following before opening a support case:
-
console --version -
workflow status -
workflow output -
kubectl describe pods -n ma -
Source and target version numbers.
-
Exact authentication mode in use (basic, SigV4, or both).
-
AWS Region and Amazon EKS cluster name.