Broadleaf Microservices
  • v1.0.0-latest-prod

Search Fields and Solr

Introduction

This document will discuss the search field domain (represented by the BLC_FIELD table and FieldDefinition class) and explain how this domain is used at indexing/search time.

To help keep this document concise, we’ll look at a narrowed view of the domain and only how it relates to indexing and searching. Other concepts, such as translations, combined fields, rules, display order, etc. will not be discussed in this document.

Field Definition Anatomy

  1. indexableType - The indexable type this field belongs to, e.g. PRODUCT, ORDER, CUSTOMER, etc.

  2. propertyPath - Used at index time. The JSONPath to retrieve the value of this field at index time.

  3. abbreviation - Used at search time. uThe value used when referring to this field during a request/response. For example, "price", "name", etc.

  4. variants - Complex object(s) used at both search and index time. Represents how a value will be indexed in Solr. The values here range from primitive (string, text, integer) to specialized (spell check, email, date, etc) and represent fields defined in the Solr schema. See the Variant section for more information.

  5. faceted - Used at search time. Whether this field is able to be faceted on or not.

  6. facet - Complex object used at search time. If a facet is requested at search time for this field, the fields in this class describes how the facet will behave.

  7. facetVariantType - Used at search time. Designates which variant will be used when faceting.

  8. sortVariantType - Used at search time. Designates which variant will be used when sorting on this field

  9. searchable - Used at search time. Whether this field will be searched on when a search request is performed.

  10. fieldQueries - Used at search time. Describes how this field will be searched upon. See the Field Queries section for more information.

Variants

Variants, represented by the FieldVariant class, contain a 1 to 1 mapping to a field defined in the Solr schema. Inside FieldVariant, there is the property type. This property is represented by an instance of FieldType. Let’s look at an excerpt from that class:

public static final FieldType CURRENCY = new FieldType("CURRENCY");
public static final FieldType MONEY = new FieldType("MONEY");
public static final FieldType INTEGER = new FieldType("INTEGER");
public static final FieldType INTEGERS = new FieldType("INTEGERS", true);
public static final FieldType LONG = new FieldType("LONG");
public static final FieldType LONGS = new FieldType("LONGS", true);
public static final FieldType DOUBLE = new FieldType("DOUBLE");
public static final FieldType DOUBLES = new FieldType("DOUBLES", true);
public static final FieldType STRING = new FieldType("STRING");
public static final FieldType STRINGS = new FieldType("STRINGS", true);

//...

public static final FieldType SPELL_CHECK = new FieldType("SPELL_CHECK");
public static final FieldType SPELL_CHECKS = new FieldType("SPELL_CHECKS", true);
public static final FieldType EMAIL = new FieldType("EMAIL");
public static final FieldType EMAILS = new FieldType("EMAILS", true);

Note we have primitives — both single and multi-valued. In addition, we have special fields such as SPELL_CHECK and EMAIL. These types of fields are handled different in the search engine.

These fields are mapped to specific Solr fields in the class com.broadleafcommerce.search.provider.solr.SolrFieldTypeConverter.

Let’s take a look at a few of these fields:

    public SolrFieldTypeConverter() {
        addFieldType(DefaultFieldType.CURRENCY.getType(), "currency");
        addFieldType(DefaultFieldType.MONEY.getType(), "money");
        addFieldType(DefaultFieldType.INTEGER.getType(), "i");
        addFieldType(DefaultFieldType.INTEGERS.getType(), "is");
        addFieldType(DefaultFieldType.LONG.getType(), "l");
        addFieldType(DefaultFieldType.LONGS.getType(), "ls");
        addFieldType(DefaultFieldType.DOUBLE.getType(), "d");
        addFieldType(DefaultFieldType.DOUBLES.getType(), "ds");
        addFieldType(DefaultFieldType.STRING.getType(), "s");
        addFieldType(DefaultFieldType.STRINGS.getType(), "ss");
        addFieldType(DefaultFieldType.STRING_LOWERCASE.getType(), "lower");
        addFieldType(DefaultFieldType.STRINGS_LOWERCASE.getType(), "lowers");
        addFieldType(DefaultFieldType.EMAIL.getType(), "email");

        ...
    }

Notice we are referencing the field types, as mentioned earlier, and those fields types are associated with a string. Let’s look specifically at the lower and lowers fields. These are used to indicate a field will be indexed/search upon as a lowercase field.

The lower and lowers are actually referencing to dynamic fields that are defined in the Solr schema. This code essentially says to associate DefaultFieldType.STRING_LOWERCASE with the Solr dynamic field *_lower. We can see that field defined in the Solr schema.xml file:

    <dynamicField name="*_lower" type="lowercase" indexed="true" stored="true"/>
    <dynamicField name="*_lowers" type="lowercase" indexed="true" stored="true"/>

    <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

If we look in the Solr schema, we’ll also see dynamic fields for the other field types in SolrFieldTypeConverter as well, including *_currency, *_money, *_i, etc.

Important
When using phonetic matching, it is crucial to make sure that the language(s) your applications use are supported by the configured Search Engine (e.g. Solr) and its analyzer, to avoid adverse search behaviors (e.g. irrelevant data being matched). Out of the box, Broadleaf uses Apache Solr and Beider-Morse Phonetic Matching (BMPM) analyzer for phonetic matching. For all the other analyzers and supported languages in Apache Solr, please see Solr’s Phonetic Matching Reference Guide for more details

Field Queries

Field queries are a search time concept. Solr allows multiple queries to be performed simultaneously in a single request. Out of box, we use this capability to perform multiple queries, utilizing different fields and boost values.

This is achieved through Solr’s psuedofield feature. By default, there are three types of queries performed:

  1. Word

  2. Phrase

  3. Phrase exact

This is actually interpreted in the following way when the request is sent to Solr:

The q param: q=word:(my search)^2.0 || phrase_exact:("my search")^4.0 || phrase:("my search"~2)^3.0

The query fields: f.phrase.qf=some_field_s&f.word.qf=other_field_s&f.phrase_exact.f=some_third_field_s

Using this feature, we can search across different fields with different boost values, plus, in the case of the phrase query, different phrase slop values.