View a markdown version of this page

Transform data and requests - Migration Assistant for Amazon OpenSearch Service

Transform data and requests

Migration Assistant can transform metadata, field mappings, and captured request traffic during migration to make source behavior compatible with the target. Use this section when an upgrade path, target type, or source platform requires compatibility changes.

Metadata migration also applies built-in compatibility transforms that do not require a dedicated workflow field, including legacy multi-type mapping union, k-NN method and engine compatibility, Serverless-compatible vector mappings, and analyzer/tokenizer/filter cleanup. See Built-in transformations for the complete metadata list.

Workflow transform pipeline model

Custom transform fields use the same workflow pipeline shape across metadata migration, document backfill, traffic replay, and tuple audit records:

Pipeline field Used by

metadataMigrationConfig.metadataTransforms

Metadata documents before mappings, settings, templates, and aliases are applied to the target.

documentBackfillConfig.documentTransforms

Documents emitted by Reindex-from-Snapshot before bulk indexing to the target.

replayerConfig.requestTransforms

Captured requests before the Traffic Replayer sends them to the target.

replayerConfig.tupleTransforms

Tuple audit records written by the Traffic Replayer for validation and comparison.

Each pipeline accepts either one transform object or an ordered array of transform objects. Each transform object must choose exactly one selector:

  • entryPoint - use workflow-managed JavaScript or Python. Valid forms are javascript, javascriptFile, python, and pythonFile.

  • transformName - use a named transform provider that is already available in the relevant migration container, such as a built-in or packaged provider.

File-backed entry points use configMap or image references. For ConfigMaps, path is the ConfigMap key and cannot contain nested directories. For images, path is relative to the mounted image root; absolute paths and .. traversal are rejected. Image references can include pullPolicy, which defaults to IfNotPresent.

Use context when a transform needs configuration. It can be either a raw string or an object with named values:

  • context.values.<name>.value supplies an inline JSON-compatible value.

  • context.values.<name>.fromFile loads one named value from a ConfigMap key or image file.

  • context.valueDirectories loads the immediate files from a ConfigMap or image directory as context values. For ConfigMaps, the whole ConfigMap is used as the directory. For images, set path to a directory under the mounted image root, or omit path to use the image root.

For JavaScript and Python entry points, object context is passed to the script provider as bindingsObject, bindingsObjectFiles, and bindingsObjectDirs. For named providers, object context is passed as provider configuration, with file-backed values under providerConfigFiles and directories under providerConfigDirs. A raw string context is passed through as the script binding string or the named provider configuration string.

When context values are loaded from files, the provider type controls materialization. JavaScript and Python script providers read file-backed context as UTF-8 text; parse JSON or other structured formats in the script, or use context.values.<name>.value for inline structured values. Named providers can declare the expected materialization for each key, such as text, JSON, bytes, Base64, or a resolved file path. Directory-loaded context uses the immediate file name as the context key and ignores nested directories. If the same key appears in multiple places, later sources override earlier ones: directory values first, then individually named fromFile values, then inline value entries.

The workflow generates the container volume and mount fields for any configMap or image file references you declare in entryPoint, context.values.fromFile, or context.valueDirectories. Do not set fileSourceVolumes or fileSourceVolumeMounts manually unless you are bypassing the workflow pipeline and using raw transformer configuration files that you mount yourself.

{ "entryPoint": { "javascriptFile": { "configMap": "metadata-transforms", "path": "field-type-converter.js" } }, "context": { "values": { "mode": { "value": "pilot" }, "rules": { "fromFile": { "configMap": "metadata-transforms", "path": "rules.json" } } } } }

For workflow configurations, prefer these pipeline fields over the phase-specific raw fields, such as transformerConfig, transformerConfigBase64 for metadata, transformerConfigEncoded for replay, docTransformerConfig, tupleTransformerConfig, and their Base64 or file variants. The raw fields remain useful for manual runs and expert configurations where files are already mounted in the container.