View a markdown version of this page

Migration flow - AWS Prescriptive Guidance

Migration flow

This section describes an iterative approach to migrating your Solr configuration to Amazon OpenSearch Service. We recommend that you approach the process systematically by organizing your migration into three focus areas: 

  1. Settings: Translating Solr configuration to OpenSearch cluster or index settings

  2. Search functionality: Translating Solr search functionality to OpenSearch search functionality 

  3. Document processing: Translating Solr document processing to Amazon OpenSearch Ingestion (OSI)

Step 1. Migrate settings

The components of your Solr configuration map to different Amazon OpenSearch Service features. The following sections cover the migration of Solr shards and replicas, codecs, commits, caches and queries, and index segments and merging. Amazon OpenSearch Service manages index storage locations directly, so Solr dataDir settings don't need to be configured. 

Before you migrate every configuration setting from Solr, assess whether the setting can be adjusted based on your current search system experience and best practices. Some settings are configured at the cluster or node level, and not at the index level. These include the maximum number of clauses in a Boolean query, circuit breaker settings, and cache settings.

Instead of just lifting and shifting existing configurations, take the time to evaluate each setting's necessity and simplify complex configurations where possible. For example, the slow logs threshold of one second might be intensive for logging and can be revisited. You might also want to review and reduce the max.booleanClauses setting. 

Shards and replica settings

Solr allows replica shards to be distributed across nodes for redundancy and fault tolerance, but it lacks node awareness during shard placement. This means that primary and replica shards might inadvertently be allocated to the same node, which compromises the intended high availability and benefits of replication.

Amazon OpenSearch Service ensures that primary and replica shards are not on the same node. When you enable zone awareness, Amazon OpenSearch Service makes a best effort to distribute primary shards and their corresponding replica shards to different Availability Zones. When you configure dedicated master nodes and zone awareness for standby replicas, Amazon OpenSearch Service ensures that primary and standby replica shards are placed in different Availability Zones, which provides stronger resilience guarantees.

The OpenSearch and Solr definitions of a replica are different. In OpenSearch, you define a primary shard count by using the number_of_shards setting, which determines the partitioning of your data. You then set a replica count by using the number_of_replicas setting. Each replica is a copy of all the primary shards. Therefore, if you set number_of_shards to 5 and number_of_replicas to 1, you will have 10 shards (5 primary shards and 5 replica shards). 

In OpenSearch, the following code creates an index called test with five shards and one replica.

PUT test { "settings": { "number_of_shards": 5, "number_of_replicas": 1 } }

Codec settings

Both OpenSearch and Solr use the best_speed codec (LZ4 compression algorithm) by default. Both offer best_compression (zlib compression algorithm) as an alternative. OpenSearch also offers the Zstandard compression algorithm (zstd and zstd_no_dict).  Benchmarking for different compression codecs is also available. For more information, see Index codecs in the OpenSearch documentation.

Solr <codecFactory> configuration maps to OpenSearch index.codec settings; for example: 

PUT /my_index/_settings { "codec": "best_compression" }

Commit settings

For near real-time search in OpenSearch, you have to set refresh_interval. The default is 1 second, which is suitable for most use cases. To improve speed and throughput, especially for batch indexing, we recommend that you increase refresh_interval to 30 or 60 seconds.

PUT /my_index/_settings { "refresh_interval": "30s", }

Cache and query configuration comparison

OpenSearch uses percentage-based memory allocation instead of entry count, so it provides more adaptive resource management than Solr as the index size changes. OpenSearch also supports tiered caches, where each tier in a multi-tier cache has its own characteristics and performance levels.

The maximum Boolean clause is a static setting in OpenSearch. You set it at node level by using the indices.query.bool.max_clause_count setting. 

OpenSearch simplifies many configurations by providing sensible defaults while still allowing fine-tuning through its cluster and index settings APIs. OpenSearch supports various cache configurations and types, such as shard request caches and node query caches.

Additionally, in Amazon OpenSearch Service, the Auto-Tune feature uses performance and usage metrics from your OpenSearch cluster to suggest memory-related configuration changes, including queue and cache sizes and Java virtual machine (JVM) settings on your nodes. 

Index segments and merging

OpenSearch default settings for merge policies work well for most workload patterns, and explicit merge configurations are rarely needed.

Both Solr and OpenSearch strive to optimize the balance between indexing performance and search efficiency.  The Solr ramBufferSizeMB and maxBufferedDocs settings are handled internally by OpenSearch. OpenSearch manages compound files automatically. 

We recommend that you use the OpenSearch default merge policy settings as a starting point instead of copying settings from Solr. In most cases, you won't need to adjust these advanced settings unless you want to tune performance or you have unusual indexing patterns.  If tuning is required, as in the following example, see Index settings in the OpenSearch documentation.

PUT /my_index/_settings{ "index": { "merge.policy.segments_per_tier": 10, "merge.policy.max_merge_at_once": 10, "merge.scheduler.max_thread_count": 3, "merge.scheduler.max_merge_count": 7, "refresh_interval": "1s" } }

Solr request handlers define how search requests are processed. In OpenSearch, all searches use the _search or _msearch endpoint. As a best practice, most OpenSearch users prefer the _search API for its simplicity and clarity, whereas search templates are typically reserved for complex, reusable queries. 

If you're accustomed to using the /sql request handler in Solr, you can use SQL syntax and the Piped Processing Language (PPL) for querying in OpenSearch.

OpenSearch supports spell checking, also known as Did-you-mean, and highlighting, during query time. You don't need to explicitly define search components. 

Most API responses are limited to JSON format, except for the compact and aligned text (CAT) API

If you use the Velocity or XSLT response writer in Solr, you must manage it on the application layer in OpenSearch. 

When you migrate the Solr /select request handler to OpenSearch, you need to transform the XML-based configuration in Solr to a JSON-based search query in OpenSearch. You can also optionally convert the request handler to a search template. 

The following example shows how to transform a Solr /select handler configuration to OpenSearch. This handler specifies default search parameters, including results per page (rows:10) and default search field (df: text).

<!-- Request Handlers --> <requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">10</int> <str name="df">_text_</str> </lst> </requestHandler>

To migrate this handler, you first create the equivalent OpenSearch DSL query that matches the handler functionality. The following example demonstrates a basic search on the _text field with a size limit of 10.

GET /your_index/_search { "size": 10, "query": { "match": { "_text": "your search term" } } }

As a best practice, we recommend that you use the OpenSearch _search API to search directly. If you need reusable queries, you can create a search template for the equivalent query. The following example shows how to convert the query into a template by using mustache syntax with default parameters. The rows parameter in Solr maps to size (default: 10) in OpenSearch, and df maps to default_field (default: text). You can render the query by using the _render/template API or execute the template by using the _search/template API. 

PUT _scripts/select_handler { "script": { "lang": "mustache", "source": { "size": "{{#size}}{{size}}{{/size}}{{^size}}10{{/size}}", "from": "{{#from}}{{from}}{{/from}}{{^from}}0{{/from}}", "query": { "query_string": { "query": "{{query_string}}", "default_field": "{{#df}}{{df}}{{/df}}{{^df}}_text_{{/df}}" } } } }, "params": { "size": "Number of results to return (default: 10)", "from": "Starting offset for results (default: 0)", "query_string": "The search query string (required)", "df": "Default field to search (default: _text_)" } } # To render the query for this template, applications would call: POST _render/template { "id": "select_handler", "params": { "query_string": "your search terms" } }

Step 3. Migrate document processing pipelines

In Solr, update request processors (URPs) are essential for document transformation during indexing. In OpenSearch, this functionality is provided through ingest pipelines. For the updateRequestProcessorChain, OpenSearch provides the ingest pipeline APIs, which let you enrich or transform data before indexing. You can chain multiple processor stages to form a pipeline for data transformation. Processors include Grok, CSV, JSON, KV, Rename, Split, HTML strip, Drop, Script, and others. For more information, see Ingest processors in the OpenSearch documentation.

However, we strongly recommend that you perform data transformations in your extract, transform, load (ETL) layer to remove this processing complexity from OpenSearch. You can use Amazon OpenSearch Ingestion (OSI), which provides a framework and default processes for data transformation. OSI is built on OpenSearch Data Prepper, which is a server-side data collector that can filter, enrich, transform, normalize, and aggregate data for downstream analytics and visualization.  

OpenSearch also provides search pipelines, which are similar to ingest pipelines but tailored for search time operations. Search pipelines make it easier for you to process search queries and search results within OpenSearch. As of OpenSearch version 3.2 , available search processors include filter query, neural query enricher, normalization, rename field, script processor, and personalized search ranking, with more to come.

The following example shows how to transform a Solr updateRequestProcessorChain to OpenSearch. This chain includes two processors: a built-in TrimFieldUpdateProcessorFactory for whitespace trimming and a custom CustomPriceTaxProcessorFactory for price calculations.

<!-- updateRequestProcessorChain --> <updateRequestProcessorChain name="standard"> <processor class="solr.TrimFieldUpdateProcessorFactory"/> <processor class="com.mycompany.CustomPriceTaxProcessorFactory"/> </updateRequestProcessorChain>

The two migration options are Amazon OpenSearch Ingestion (recommended) or using a custom plugin.

Using Amazon OpenSearch Ingestion (recommended)

We recommend that you  use Amazon OpenSearch Ingestion, because it provides:

  • Better separation of data transformation logic

  • Rich set of built-in processors

  • More efficient processing outside of OpenSearch

  • Easier maintenance and monitoring

  • Scalable data processing pipeline

To use this approach, you identify all the processors in the Solr chain and map them to OpenSearch Ingestion processors. Identify any custom processors and analyze the functionality to determine whether it can be achieved through a combination of OpenSearch processors. You can then set it as a default pipeline for an index.

The following example creates an Amazon OpenSearch Ingestion pipeline with two processors. The first is a trim processor that removes leading and trailing whitespace from the product_name field. The second is an AWS Lambda processor that handles price calculations by applying a 10% tax rate to the price field. This is similar to the CustomPriceTaxProcessorFactory functionality in the custom Solr processor. We do not recommend using painless scripts for custom implementations, because they can affect performance.

First, create a Lambda function that implements the custom logic for price calculation:

def handler(event, context): if 'price' in event: price = float(event['price']) tax_rate = 0.10 event['price'] = round(price * (1 + tax_rate), 2) return event

Then create an Amazon OpenSearch Ingestion pipeline:

pipeline: source: file: path: "/full/path/to/logs_json.log" record_type: "event" format: "json" processor: - trim_string: with_keys: - "product_name" - aws_lambda: function_name: "calculateTax" invocation_type: "request-response" aws: region: "us-east-1" sink: - opensearch: hosts: ["https://your-opensearch-domain:443"] index: "my_index"

Using custom plugins

When you migrate custom plugins from Solr to OpenSearch, we recommend that you first evaluate OpenSearch native features and existing plugins to determine whether they provide the desired functionality before you consider custom development. This approach ensures optimal utilization of OpenSearch native capabilities while maintaining flexibility for custom requirements.

The preceding example follows this approach. It uses an Amazon OpenSearch Ingestion pipeline and implements a custom Lambda processor to migrate CustomPriceTaxProcessorFactory by using similar logic.

If native features don't meet your requirements, Amazon OpenSearch Service supports custom plugin development and deployment. For more information, see the Amazon OpenSearch Service documentation.