Capture and replay live traffic from Solr
Migration Assistant supports live traffic capture and replay for Apache Solr sources running in SolrCloud mode. This enables zero-downtime migration by intercepting traffic flowing to your Solr cluster, transforming it to OpenSearch-compatible format, and replaying it against the target in real time.
How it works
The Solr traffic capture and replay pipeline consists of the following components:
-
Capture Proxy — A transparent proxy deployed in front of your Solr cluster. It forwards all requests to Solr unchanged and simultaneously records them to Apache Kafka.
-
Apache Kafka — A message broker (deployed automatically by Migration Assistant) that durably stores captured traffic.
-
Traffic Replayer — Consumes captured traffic from Kafka, applies Solr-to-OpenSearch transformations, and sends the translated requests to the target.
-
Solr request transform provider — Converts Solr requests into OpenSearch-compatible API calls using your Solr schema and config set metadata for field-type and request-handler awareness.
-
Solr tuple transform provider — Optionally back-translates target responses in tuple audit records into a Solr-shaped response so you can compare captured source behavior with replayed target behavior.
Prerequisites
Tip
For a guided, conversational experience that handles configuration generation, schema extraction, and troubleshooting automatically, use Migration Assistant AI agent mode. See AI-assisted migration.
-
Source cluster: Apache Solr running in SolrCloud mode.
-
Migration Assistant deployed: The EKS-based Migration Assistant infrastructure must be running. See Deploy the solution.
-
JSON-format writes: Your application must send write requests in JSON format. XML-format requests are not supported by the transform layer.
-
Traffic scope: The Solr request transform translates
/solr/collection/selectand/solr/collection/updatetraffic. Suppress admin, schema, config, ping, replication, metrics, or other Solr-only endpoints unless you add a custom request transform for them. -
Schema files available: For each Solr collection or config set that you replay, gather the matching schema XML and
solrconfig.xmlfiles. The schema XML can come frommanaged-schema.xml,managed-schema,schema.xml, or the Schema API withwt=schema.xml.SolrToOpenSearchTransformProvideruses these files to map field types and apply request handler defaults.SolrTupleTransformProvideris configuration-free.
Step 1: Extract schema and config files from Solr
The Solr request transform provider requires your collection’s schema XML and config set’s solrconfig.xml to correctly translate field types and understand request handler defaults. Repeat this step for each collection whose traffic you plan to replay. If multiple collections use the same config set, you can reuse the same solrconfig.xml by referencing the same ConfigMap file under each collection prefix, or by making it the unprefixed default if it applies to every replayed collection.
First, map each collection to its SolrCloud config set. The transform configuration uses the collection name as the key prefix, but the ZooKeeper path for solrconfig.xml uses the config set name:
curl "http://SOLR_ENDPOINT:8983/solr/admin/collections?action=CLUSTERSTATUSwt=json" \ | jq -r '.cluster.collections | to_entries[] | "\(.key)\t\(.value.configName)"'
If jq is not available, inspect cluster.collections.COLLECTION.configName in the JSON response. Record both values: use COLLECTION in transform keys such as products.solrConfigXml, and use CONFIGSET in ZooKeeper paths such as zk:/configs/CONFIGSET/solrconfig.xml.
Get schema XML:
curl "http://SOLR_ENDPOINT:8983/solr/COLLECTION/schema?wt=schema.xml" COLLECTION-managed-schema.xml
The output file name is arbitrary. The important part is that the file contains Solr schema XML, not the JSON returned by the default Schema API response.
Get solrconfig.xml:
Important
Do not use the /config REST endpoint as it returns JSON, which the transform provider cannot parse. Always retrieve solrconfig.xml from ZooKeeper.
# From a host with access to ZooKeeper solr zk cp zk:/configs/CONFIGSET/solrconfig.xml /tmp/CONFIGSET-solrconfig.xml -z ZK_HOST:2181 2/dev/null
If you are running Solr on Kubernetes within the same cluster as Migration Assistant:
# Copy from ZK to a temp file inside the Solr pod kubectl exec -n NAMESPACE SOLR_POD -- bash -c \ 'solr zk cp zk:/configs/CONFIGSET/solrconfig.xml /tmp/CONFIGSET-solrconfig.xml -z ZK_HOST:2181 2/dev/null' # Then copy to your local machine kubectl cp NAMESPACE/SOLR_POD:/tmp/CONFIGSET-solrconfig.xml ./CONFIGSET-solrconfig.xml
Create a Kubernetes ConfigMap:
Use one ConfigMap key per file. The ConfigMap key names are arbitrary, but the workflow path values in the next step must match them. These file names do not control transform routing; routing comes from the context.values key prefix, such as products.solrSchemaXml or reviews.solrConfigXml.
kubectl create configmap solr-config -n ma \ --from-file=products-managed-schema.xml=./products-managed-schema.xml \ --from-file=products-solrconfig.xml=./products-solrconfig.xml \ --from-file=reviews-managed-schema.xml=./reviews-managed-schema.xml \ --from-file=reviews-solrconfig.xml=./reviews-solrconfig.xml
Step 2: Configure the workflow
Add a traffic section to your workflow configuration JSON. The traffic section defines the capture proxy and the replayer with Solr-specific transform providers.
For tuple audit output on Amazon EKS, configure the replayer to write directly to Amazon S3. The standard deployment creates a default bucket and mounts it read-only on the Migration Console pod at /s3/artifacts, so tuple objects written under the default tuples/ prefix are visible from the console at /s3/artifacts/tuples/.
kubectl get configmap migrations-default-s3-config -n ma \ -o jsonpath='{.data.BUCKET_NAME}{"\n"}{.data.AWS_REGION}{"\n"}'
{ "sourceClusters": { "source": { "endpoint": "http://SOLR_ENDPOINT:8983", "version": "SOLR VERSION", "snapshotInfo": { "repos": { "s3": { "awsRegion": "REGION", "s3RepoPathUri": "s3://BACKUP_BUCKET" } }, "snapshots": { "main": { "repoName": "s3", "config": { "createSnapshotConfig": { "snapshotPrefix": "solr" } } } } } } }, "targetClusters": { "target": { "endpoint": "https://TARGET_ENDPOINT", "authConfig": { "sigv4": { "region": "REGION", "service": "es" } } } }, "snapshotMigrationConfigs": [ { "fromSource": "source", "toTarget": "target", "perSnapshotConfig": { "main": [ { "metadataMigrationConfig": {}, "documentBackfillConfig": { "podReplicas": 3 } } ] } } ], "traffic": { "proxies": { "solr-proxy": { "source": "source", "proxyConfig": { "listenPort": 8983 } } }, "replayers": { "solr-replayer": { "fromCapturedTraffic": "solr-proxy", "toTarget": "target", "dependsOnSnapshotMigrations": [ { "source": "source", "snapshot": "main" } ], "replayerConfig": { "speedupFactor": 1.1, "podReplicas": 1, "tupleS3Bucket": "DEFAULT_MIGRATION_BUCKET", "tupleS3Region": "REGION", "requestTransforms": [ { "transformName": "SolrToOpenSearchTransformProvider", "context": { "values": { "targetType": { "value": "TARGET_TYPE" }, "products.solrSchemaXml": { "fromFile": { "configMap": "solr-config", "path": "products-managed-schema.xml" } }, "products.solrConfigXml": { "fromFile": { "configMap": "solr-config", "path": "products-solrconfig.xml" } }, "reviews.solrSchemaXml": { "fromFile": { "configMap": "solr-config", "path": "reviews-managed-schema.xml" } }, "reviews.solrConfigXml": { "fromFile": { "configMap": "solr-config", "path": "reviews-solrconfig.xml" } } } } } ], "tupleTransforms": [ { "transformName": "SolrTupleTransformProvider" } ] } } } } }
Solr transform provider configuration
SolrToOpenSearchTransformProvider supports both a default schema/config pair and collection-specific overrides. Use collection-specific keys when one replay stream contains traffic for multiple Solr collections that use different schemas or config sets.
Collection-specific keys use the collection name exactly as it appears in captured request paths such as /solr/products/select or /solr/reviews/update. The key prefix is the collection name, not the ZooKeeper config set name. If the provider does not find a collection-specific schema or config for the request’s collection, it falls back to the unprefixed default keys. Schema and config fall back independently, so a collection-specific schema can still use the default config, and a collection-specific config can still use the default schema.
The table lists the provider keys after workflow materialization. In workflow context.values, wrap literal strings as { "value": "…" } and ConfigMap-backed XML as { "fromFile": … }, as shown in the example. If you use raw transformerConfig outside the workflow pipeline, pass the raw string or file path value directly.
| Key | Value type | Description |
|---|---|---|
|
|
String |
Optional target type. Valid values are |
|
|
Inline XML or workflow |
Default |
|
|
Inline XML or workflow |
Default |
|
|
Inline XML or workflow |
Collection-specific |
|
|
Inline XML or workflow |
Collection-specific |
|
|
Filesystem path |
Default file path forms for environments where the replayer can read local files directly. In workflow configurations, prefer |
|
|
Filesystem path |
Collection-specific file path forms for direct replayer configuration outside the workflow ConfigMap pattern. |
For each scope and file type, set either the inline/XML key or the File key, not both. For example, do not set both products.solrSchemaXml and products.solrSchemaXmlFile. Blank, missing, or unparsable schema/config XML does not stop the replayer; the provider logs the problem and continues with an empty schema or config for that scope. That keeps replay moving, but fielded queries and request-handler defaults fall back to generic behavior, so verify ConfigMap paths and provider logs before trusting replay results.
The schema and config values are consumed by SolrToOpenSearchTransformProvider in requestTransforms. If you include SolrTupleTransformProvider under tupleTransforms, it does not require any context.values configuration, does not affect replayed requests, and can be omitted when you do not need Solr-shaped tuple comparison output.
targetType configuration
The targetType value tells the transform provider what type of OpenSearch target you are using. This controls whether certain features (like ?refresh) are included in the translated requests.
| Value | Target | Behavior |
|---|---|---|
|
|
Self-managed OpenSearch or Amazon OpenSearch Service |
Full feature set. |
|
|
Amazon OpenSearch Serverless |
Suppresses |
|
|
Amazon OpenSearch Serverless NextGen |
Same as |
For Amazon OpenSearch Serverless NextGen targets, also set authConfig.sigv4.service to aoss instead of es.
Important
targetType does not turn Solr commit-only update commands into no-ops. Requests such as {"commit":{}} and empty update arrays still translate to POST /collection/_refresh. Also avoid commit=true or commitWithin on captured delete-by-ID traffic for Serverless targets because the current delete-by-ID translation still appends ?refresh=true.
Replayer settings
| Setting | Description | Default |
|---|---|---|
|
|
Replay speed multiplier. |
1.1 |
|
|
Number of parallel replayer pods. Each independently consumes from Kafka. |
1 |
|
|
Array of request transform providers. Required for Solr sources. |
None |
|
|
Array of tuple transform providers. Optional; include |
None |
|
|
Strip the captured |
false |
|
|
Maximum in-flight requests to the target cluster. |
10000 |
|
|
Seconds of traffic to buffer ahead of current replay position. Must be greater than |
400 |
|
|
Seconds of inactivity on a captured connection before the replayer treats the original connection as closed. |
360 |
|
|
Maximum seconds to wait for a response from the target. |
150 |
|
|
Number of client threads used to send replayed requests. |
0 |
|
|
Milliseconds to delay the first request on a resumed connection after a Kafka partition reassignment. |
5000 |
|
|
Optional, and recommended for Amazon EKS workflow runs, S3 destination for tuple audit output. When set, the replayer writes gzip-compressed JSON Lines tuple objects directly to S3. |
None |
|
|
S3 key prefix for tuple objects. |
|
|
|
Custom S3-compatible endpoint for tuple output, such as LocalStack or another S3-compatible service. |
None |
|
|
Maximum age before the current S3 tuple file is rotated and uploaded. |
60 |
|
|
Maximum uncompressed tuple data size before the current S3 tuple file is rotated and uploaded. |
256 |
|
|
Maximum number of tuples per S3 object. |
0 |
|
|
OpenTelemetry metrics collector endpoint. Set to an empty string to disable metrics export. |
|
|
|
OpenTelemetry trace collector endpoint. Omit or set to an empty string to disable trace export. |
None |
|
|
String appended to the User-Agent header on replayed target requests so replay traffic can be identified in target logs. |
None |
|
|
Document-level bulk error type strings that should not be retried during replay. If set, your list replaces the replayer’s built-in defaults instead of adding to them. |
Built-in list |
|
|
Kubernetes resource requests/limits, JVM arguments, and logging override ConfigMap for replayer pods. |
Schema defaults |
Note
If tupleS3Bucket is omitted, the replayer falls back to local tuple log files inside the replayer pod; those files are not mounted into the Migration Console in the Amazon EKS workflow deployment. For operator-accessible audit output, set tupleS3Bucket and tupleS3Region. You can use the Migration Assistant-managed default bucket; retrieve its name with kubectl get configmap migrations-default-s3-config -n ma -o jsonpath='{.data.BUCKET_NAME}'. You can also set tupleS3Endpoint for custom S3-compatible endpoints. For the complete replayer tuning reference, see Replay tuning.
Step 3: Submit and start the workflow
From the Migration Console pod (migration-console-0) in the ma namespace:
# Load the configuration workflow configure edit --stdin /tmp/config.json # Create target credentials (if using basic auth instead of SigV4) echo "USERNAME:PASSWORD" | workflow configure credentials create target-creds --stdin # Submit the workflow workflow submit
Step 4: Reroute client traffic to the capture proxy
After workflow status shows Create Proxy: Succeeded, retrieve the proxy endpoint:
kubectl get svc solr-proxy -n ma -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'
Update your application’s Solr connection URL to point to the capture proxy (for example, https://PROXY_ENDPOINT:8983) instead of the source Solr cluster. The proxy is protocol-compatible — your application requires no code changes.
Step 5: Monitor replay progress
# Overall workflow status workflow status # Follow replayer logs workflow log all --follow
Key metrics in replayer logs:
| Metric | Description | Expected value |
|---|---|---|
|
|
Total requests replayed to target |
Increasing over time |
|
|
HTTP status codes from target (grouped) |
Mostly |
|
|
Transform or connection errors |
0 for write operations |
|
|
Kafka offsets committed |
Increasing (replay progressing) |
Step 6: Validate and cut over
# Check document count console clusters cat-indices --refresh # Verify a specific document console clusters curl target /INDEX/_doc/DOCUMENT_ID # Compare counts console clusters curl target /INDEX/_count
Before switching your application to point directly at OpenSearch:
-
Confirm replay has reached the live edge — The replayer should be processing traffic in near-real-time with no growing Kafka lag.
-
Verify document counts — Target count should match source count plus any documents added during replay.
-
Spot-check documents — Verify that recently ingested and deleted documents are correctly reflected in the target.
-
Review replayer errors — Ensure
exceptions=0for write operations.
Audit trail
The replayer writes a complete audit trail of every request-response pair (tuples). When tupleS3Bucket is configured, inspect the gzip-compressed tuple objects from the Migration Console pod through the mounted default bucket:
# List tuple files find /s3/artifacts/tuples -name 'tuples-*.log.gz' -print # View tuples in human-readable format gzip -dc /s3/artifacts/tuples/REPLAYER_POD/YYYY/MM/DD/HH/tuples-SINK-TIMESTAMP-SEQ.log.gz \ | console tuples show
Each tuple contains the original Solr request, the Solr source response, the transformed OpenSearch request, and the OpenSearch response. When SolrTupleTransformProvider is enabled, the tuple also includes targetResponsesTransformed, which contains the target response back-translated into a Solr-shaped response where possible for translated select and update traffic. If a tuple’s source request path is not recognized by the provider, the corresponding transformed entry is null; if back-translation fails, the entry contains an error object with the failure message. Do not treat recognized but unsupported Solr endpoints, such as admin, schema, config, ping, and metrics traffic, as Solr-equivalent tuple comparisons; suppress them during capture or add a custom tuple transform if you need to analyze them.
The tuple transform is intentionally configuration-free. It does not read the Solr schema/config values that SolrToOpenSearchTransformProvider uses for request translation, so reconstructed response.docs use schemaless field-shape defaults: arrays stay arrays, numeric and boolean values stay scalar, and unknown non-array scalar values can appear as single-item arrays. Account for that when comparing tuple output, or add a custom tuple transform if your audit tooling requires exact field cardinality.
Note
These logs contain the contents of all requests, including authorization headers and HTTP message bodies. Ensure that access to the migration environment is restricted.
Supported Solr transformations
For a full reference of supported write operations, search query features, and behavioral differences, see Transform Solr traffic.