Field types Fields Copying fields Dynamic fields Unique keys Similarity

Solr schema

The schema for a Solr collection is defined in schema.xml, which contains definitions for fields and field types. The schema consists of the following elements: fieldType, field, copyField, dynamicField, similarity, and uniqueKey. The following sections describe and provide examples of schema elements. For an example of the full schema, see the appendix. For more information, see Schema Elements in the Solr documentation.

You can manage the schema in two ways: through the Schema API or through a file-based approach. When you use the API to change the schema, it automatically saves the schema to the file named managed-schema.xml . Alternatively, you can define the schema in a file that you can edit directly to modify the schema.

Field types

Field types define the analysis that occurs when you index data or send queries to your index. You define field types by using the fieldType element, which specifies the field type name, implementing class, and mandatory properties. For text fields, you specify analyzers as child elements of fieldType to define the text analysis process. Text analysis can occur at index time or query time.

An analyzer consists of three components:

Tokenizers break field data into lexical units or tokens.
Filters examine token streams and keep, transform, or discard them based on the filter type.
CharFilters add, change, or remove characters while preserving original character offsets to support features such as highlighting.

In the following example, a field type named text_general is defined as a text field (TextField class). This field type defines text analysis for both indexing and querying. It uses a standard tokenizer to split text into words, followed by filters that convert tokens to lowercase, remove common stop words, and create partial word matches through NGram generation. These operations enable case-insensitive search, autocomplete functionality, and comprehensive text processing capabilities.


    <fieldType name="text_general" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
            <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

If you have specialized data type needs, you can create your own custom field type or customize the text analysis by creating your own customized tokenizer, filter, and so on.

In the following example, a field type named custom_text_general is defined as a TextField class. This field type specifies a custom analysis configuration for the indexing phase by using a custom tokenizer implementation (CustomTokenizerFactory). The custom tokenizer, which is implemented in the com.mycompany package, provides customized text processing logic for breaking down text into tokens during document indexing.


    <fieldType name="custom_text_general" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="com.mycompany.CustomTokenizerFactory"/>
        </analyzer>
    </fieldType>

This configuration demonstrates how Solr can be extended with custom analysis components to meet specific text processing requirements beyond the standard tokenization capabilities.

Fields

You define fields by using the field element, which specifies the field name and type. Each field must have a corresponding fieldType defined. You can identify the field type by using the name attribute in the fieldType definition. Fields can have additional properties such as indexed, stored, and required.

The following example defines a field named title that uses the text_general field type. This field is configured to be both indexed (searchable) and stored (retrievable in search results).


<field name="title" type="text_general" indexed="true" stored="true"/>

Copying fields

If you want to interpret document fields in multiple ways, you can use the copyField directive to apply different field types to a consolidated piece of incoming information.

In the following example, copyField directives are defined to consolidate multiple source fields into a single unified destination field named text.


    <copyField source="title" dest="text"/>
    <copyField source="description" dest="text"/>
    <copyField source="brand" dest="text"/>
    <copyField source="category" dest="text"/>

This configuration enables comprehensive searching across all product information through a single field. It combines title, description, brand, and category data for simplified and more powerful search capabilities.

Dynamic fields

Dynamic fields allow Solr to index fields that you did not explicitly define in your schema.

This example defines a dynamic field pattern named attr_*, which uses the text_general field type:


<dynamicField name="attr_*" type="text_general" indexed="true" stored="true"/>

This dynamic field configuration automatically creates new fields at indexing time for any field name that starts with attr_. It enables flexible storage and searching of varying product attributes without requiring predefined field definitions in the schema.

Unique keys

The uniqueKey element specifies the field that serves as a unique identifier for documents. Although this field isn't mandatory, most applications use it to control when they need to update documents in the index.

In this example, product_id serves as the unique field that you can use to directly update documents:


<uniqueKey>product_id</uniqueKey>

Similarity

You can use the similarity element to specify the class for scoring documents. You can define this element globally or within fieldType definitions. For example:


<similarity class="org.apache.lucene.search.similarities.BM25Similarity"/>

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Migrating your schema

OpenSearch index