Migration flow
This section describes an iterative approach to migrating your Solr configuration to Amazon OpenSearch Service. We recommend that you approach the process systematically by organizing your migration into three focus areas:
-
Settings: Translating Solr configuration to OpenSearch cluster or index settings
-
Search functionality: Translating Solr search functionality to OpenSearch search functionality
-
Document processing: Translating Solr document processing to Amazon OpenSearch Ingestion (OSI)
Step 1. Migrate settings
The components of your Solr configuration map to different Amazon OpenSearch Service features. The
following sections cover the migration of Solr shards and replicas, codecs, commits, caches
and queries, and index segments and merging. Amazon OpenSearch Service manages index storage locations
directly, so Solr dataDir settings don't need to be configured.
Before you migrate every configuration setting from Solr, assess whether the setting can be adjusted based on your current search system experience and best practices. Some settings are configured at the cluster or node level, and not at the index level. These include the maximum number of clauses in a Boolean query, circuit breaker settings, and cache settings.
Instead of just lifting and shifting existing configurations, take the time to evaluate
each setting's necessity and simplify complex configurations where possible. For example,
the slow logs threshold of one second might be intensive for logging and can be revisited.
You might also want to review and reduce the max.booleanClauses
setting.
Shards and replica settings
Solr allows replica shards to be distributed across nodes for redundancy and fault tolerance, but it lacks node awareness during shard placement. This means that primary and replica shards might inadvertently be allocated to the same node, which compromises the intended high availability and benefits of replication.
Amazon OpenSearch Service ensures that primary and replica shards are not on the same node. When you enable zone awareness, Amazon OpenSearch Service makes a best effort to distribute primary shards and their corresponding replica shards to different Availability Zones. When you configure dedicated master nodes and zone awareness for standby replicas, Amazon OpenSearch Service ensures that primary and standby replica shards are placed in different Availability Zones, which provides stronger resilience guarantees.
The OpenSearch and Solr definitions of a replica are different. In OpenSearch, you
define a primary shard count by using the number_of_shards setting, which
determines the partitioning of your data. You then set a replica count by using the
number_of_replicas setting. Each replica is a copy of all the primary
shards. Therefore, if you set number_of_shards to 5 and
number_of_replicas to 1, you will have 10 shards (5 primary shards and 5
replica shards).
In OpenSearch, the following code creates an index called test with five shards and one replica.
PUT test { "settings": { "number_of_shards": 5, "number_of_replicas": 1 } }
Codec settings
Both OpenSearch and Solr use the best_speed codec (LZ4 compression
algorithmbest_compression (zlib compression algorithmzstd and
zstd_no_dict). Benchmarking for different compression codecs is also
available. For more information, see Index codecs
Solr <codecFactory> configuration maps to OpenSearch
index.codec settings; for example:
PUT /my_index/_settings { "codec": "best_compression" }
Commit settings
For near real-time search in OpenSearch, you have to set
refresh_interval. The default is 1 second, which is suitable for most use
cases. To improve speed and throughput, especially for batch indexing, we recommend that
you increase refresh_interval to 30 or 60 seconds.
PUT /my_index/_settings { "refresh_interval": "30s", }
Cache and query configuration comparison
OpenSearch uses percentage-based memory allocation instead of entry count, so it
provides more adaptive resource management than Solr as the index size changes. OpenSearch
also supports tiered
caches
The maximum Boolean clauseindices.query.bool.max_clause_count setting.
OpenSearch simplifies many configurations by providing sensible defaults while still allowing fine-tuning through its cluster and index settings APIs. OpenSearch supports various cache configurations and types, such as shard request caches and node query caches.
Additionally, in Amazon OpenSearch Service, the Auto-Tune feature uses performance and usage metrics from your OpenSearch cluster to suggest memory-related configuration changes, including queue and cache sizes and Java virtual machine (JVM) settings on your nodes.
Index segments and merging
OpenSearch default settings for merge policies work well for most workload patterns, and explicit merge configurations are rarely needed.
Both Solr and OpenSearch strive to optimize the balance between indexing performance
and search efficiency. The Solr ramBufferSizeMB and
maxBufferedDocs settings are handled internally by
OpenSearch. OpenSearch manages compound files automatically.
We recommend that you use the OpenSearch default merge policy settings as a starting
point instead of copying settings from Solr. In most cases, you won't need to adjust these
advanced settings unless you want to tune performance or you have unusual indexing
patterns. If tuning is required, as in the following example, see Index settings
PUT /my_index/_settings{ "index": { "merge.policy.segments_per_tier": 10, "merge.policy.max_merge_at_once": 10, "merge.scheduler.max_thread_count": 3, "merge.scheduler.max_merge_count": 7, "refresh_interval": "1s" } }
Step 2. Migrate search functionality
Solr request handlers define how search requests are processed. In OpenSearch, all
searches use the _search or _msearch endpoint. As a best practice,
most OpenSearch users prefer the _search API for its simplicity and clarity,
whereas search templates are typically reserved for complex, reusable queries.
If you're accustomed to using the /sql request handler in Solr, you
can use SQL
syntax
OpenSearch supports spell checking, also known as Did-you-mean
Most API responses are limited to JSON format, except for the compact and aligned text
(CAT) API
If you use the Velocity or XSLT response writer in Solr, you must manage it on the application layer in OpenSearch.
When you migrate the Solr /select request handler to OpenSearch, you need
to transform the XML-based configuration in Solr to a JSON-based search query in OpenSearch.
You can also optionally convert the request handler to a search template.
The following example shows how to transform a Solr /select handler
configuration to OpenSearch. This handler specifies default search parameters, including
results per page (rows:10) and default search field (df:
text).
<!-- Request Handlers --> <requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">10</int> <str name="df">_text_</str> </lst> </requestHandler>
To migrate this handler, you first create the equivalent OpenSearch DSL query that
matches the handler functionality. The following example demonstrates a basic search on
the _text field with a size limit of 10.
GET /your_index/_search { "size": 10, "query": { "match": { "_text": "your search term" } } }
As a best practice, we recommend that you use the OpenSearch _search API to
search directly. If you need reusable queries, you can create a search template for the
equivalent query. The following example shows how to convert the query into a template by
using mustache syntax with default parameters. The rows parameter in Solr
maps to size (default: 10) in OpenSearch, and df maps to
default_field (default: text). You can render the query by using
the _render/template API or execute the template by using
the _search/template API.
PUT _scripts/select_handler { "script": { "lang": "mustache", "source": { "size": "{{#size}}{{size}}{{/size}}{{^size}}10{{/size}}", "from": "{{#from}}{{from}}{{/from}}{{^from}}0{{/from}}", "query": { "query_string": { "query": "{{query_string}}", "default_field": "{{#df}}{{df}}{{/df}}{{^df}}_text_{{/df}}" } } } }, "params": { "size": "Number of results to return (default: 10)", "from": "Starting offset for results (default: 0)", "query_string": "The search query string (required)", "df": "Default field to search (default: _text_)" } } # To render the query for this template, applications would call: POST _render/template { "id": "select_handler", "params": { "query_string": "your search terms" } }
Step 3. Migrate document processing pipelines
In Solr, update request processors (URPs) are essential for document transformation
during indexing. In OpenSearch, this functionality is provided through ingest
pipelinesupdateRequestProcessorChain, OpenSearch
provides the ingest pipeline
APIs
However, we strongly recommend that you perform data
transformations in your extract, transform, load (ETL) layer to remove this processing
complexity from OpenSearch. You can use Amazon OpenSearch
Ingestion (OSI), which provides a framework and default processes for data
transformation. OSI is built on OpenSearch Data Prepper
OpenSearch also provides search
pipelines
The following example shows how to transform a
Solr updateRequestProcessorChain to OpenSearch. This chain includes two
processors: a built-in TrimFieldUpdateProcessorFactory for whitespace trimming
and a custom CustomPriceTaxProcessorFactory for price calculations.
<!-- updateRequestProcessorChain --> <updateRequestProcessorChain name="standard"> <processor class="solr.TrimFieldUpdateProcessorFactory"/> <processor class="com.mycompany.CustomPriceTaxProcessorFactory"/> </updateRequestProcessorChain>
The two migration options are Amazon OpenSearch Ingestion (recommended) or using a custom plugin.
Using Amazon OpenSearch Ingestion (recommended)
We recommend that you use Amazon OpenSearch Ingestion, because it provides:
-
Better separation of data transformation logic
-
Rich set of built-in processors
-
More efficient processing outside of OpenSearch
-
Easier maintenance and monitoring
-
Scalable data processing pipeline
To use this approach, you identify all the processors in the Solr chain and map them
to OpenSearch Ingestion processors
The following example creates an Amazon OpenSearch Ingestion pipeline with two
processors. The first is a trim processor that removes leading and trailing
whitespace from the product_name field. The second is an AWS Lambda
processor that handles price calculations by applying a 10% tax rate to the price field.
This is similar to the CustomPriceTaxProcessorFactory functionality in the
custom Solr processor. We do not recommend using painless scripts for custom
implementations, because they can affect performance.
First, create a Lambda function that implements the custom logic for price calculation:
def handler(event, context): if 'price' in event: price = float(event['price']) tax_rate = 0.10 event['price'] = round(price * (1 + tax_rate), 2) return event
Then create an Amazon OpenSearch Ingestion pipeline:
pipeline: source: file: path: "/full/path/to/logs_json.log" record_type: "event" format: "json" processor: - trim_string: with_keys: - "product_name" - aws_lambda: function_name: "calculateTax" invocation_type: "request-response" aws: region: "us-east-1" sink: - opensearch: hosts: ["https://your-opensearch-domain:443"] index: "my_index"
Using custom plugins
When you migrate custom plugins from Solr to OpenSearch, we recommend that you first evaluate OpenSearch native features and existing plugins to determine whether they provide the desired functionality before you consider custom development. This approach ensures optimal utilization of OpenSearch native capabilities while maintaining flexibility for custom requirements.
The preceding example follows this approach. It uses an Amazon OpenSearch Ingestion
pipeline and implements a custom Lambda processor to migrate
CustomPriceTaxProcessorFactory by using similar logic.
If native features don't meet your requirements, Amazon OpenSearch Service supports custom plugin development and deployment. For more information, see the Amazon OpenSearch Service documentation.