Majority of the e-commerce search engines rely on parameters such as product popularity, product rating, recency, click through rate and other factors to influence the result set for an input user search query.
Data suggests that the search pages have a significantly higher probability of customer engagement when relying on additional factors to re-arrange the result set versus serving results only using pure SOLR relevance score.
The blog assumes that readers have prior knowledge on the following
> Querying SOLR for content
> Query time boosting supported by SOLR
> Function queries in SOLR
> ValueSource parsers in SOLR
There are multiple ways in this can be achieved. Some of the most used approaches are :
> Using the LTR model exposed by SOLR
> Supplying a boost function / query to the search query
More information on leveraging LTR can be found in the SOLR documentation (reference : https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html)
Benefits / drawbacks of LTR approach:
> Easy to use and deploy
> SOLR provides options for feature engineering if there is no custom model available.
> Top N documents from the search query are considered for re-ranking.
> Most of the LTR models require the parameters to be normalised which may result in an additional SOLR query.
In this blog, I will try to focus more on the boosting function and custom functions to achieve result re-ranking
Given an input search term, there are multiple factors which can be considered.They can be mainly classified as :
Search term specific metrics
Search term independent metrics
Product metadata
Boosts in SOLR :
SOLR supports a variety of boosts as follows :
> Boost by query with an additive boost . (bq)
> Boost by function with an additive boost. (bf)
> Boost by function with a multiplicative boost (boost)
Generally multiplicative boosts are preferred over additive as they are predictive. But it depends on the use-case in hand.
If a set of products ids to boost for a particular search term is known ahead of time, they can be supplied as a boost to the SOLR query by passing them along with bq.
q=iphone&bq=productId(101^10 102^9 103^8 104^7) to provide additive boost to the 3 IDs.
Applying boost using BQ
Let’s consider a search term q=iphone
When I apply a boost to a product ID bq=id:PRC-60001–00424–00002 ^1.0
I will get the debug information as follows
7.7611732 = sum of:
2.1149852 = weight(name:iphone in 257987) [SchemaSimilarity], result of:
2.1149852 = score(freq=1.0), computed as boost * idf * tf from:
2.956811 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
10512 = n, number of documents containing term
202223 = N, total number of documents with field
0.7152927 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
1.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
1.0 = dl, length of field
9.1809435 = avgdl, average length of field
5.646188 = weight(id:PRC-60001-00424-00002 in 257987) [SchemaSimilarity], result of:
5.646188 = score(freq=1.0), computed as boost * idf * tf from:
12.421614 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
1 = n, number of documents containing term
372159 = N, total number of documents with field
0.45454544 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
1.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
1.0 = dl, length of field
1.0 = avgdl, average length of field
This suggests that the final score is a sum of 2.1149852 ( relevance score from name match ) + 5.646188 ( 1 * idf * tf) where 1 was the boost supplied .
Incase we do not need the boost to be multiplied with tf and idf, bq can be passed as below
bq=id:PRC-60001–00424–00002 ^=1.0
This would generate a debug as follows
3.1617308 = sum of:
2.1617308 = weight(name:iphone in 257987) [SchemaSimilarity], result of:
2.1617308 = score(freq=1.0), computed as boost * idf * tf from:
3.0451374 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
12654 = n, number of documents containing term
265907 = N, total number of documents with field
0.70989597 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
1.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
1.0 = dl, length of field
8.282933 = avgdl, average length of field
1.0 = ConstantScore(id:PRC-60001-00424-00002)
Indicating that the final score of 3.1617308 is a sum of 2.1617308 ( relevance score due to name match ) + 1.0 ( a constant score)
Applying boost using BF
bf is mainly used in cases where a function is supplied to SOLR which is evaluated and boost is determined.
bf=field(popularity) ^2.0
20.063128 = sum of:
2.0631282 = weight(name:iphone in 1341) [SchemaSimilarity], result of:
2.0631282 = score(freq=1.0), computed as boost * idf * tf from:
2.891674 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
12278 = n, number of documents containing term
221300 = N, total number of documents with field
0.7134719 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
1.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
1.0 = dl, length of field
8.858545 = avgdl, average length of field
18.0 = FunctionQuery(double(popularity)), product of:
9.0 = double(popularity)=9.0
2.0 = boost
In the above debug query, we can see that the final score of 20.063128 is a sum of 2.0631282 ( relevance score based on name match ) + 18.0 (which is a product of popularity * weight )
Applying boost using boost
In contrast to bf which is an additive boost, if one wants to apply a multiplicative boost, boost can be used
boost=field(popularity)
18.550365686416626 = weight(FunctionScoreQuery(nameSearch:iphone, scored by boost(double(popularity)))), result of:
18.550365686416626 = product of:
2.0611517 = weight(name:iphone in 1341) [SchemaSimilarity], result of:
2.0611517 = score(freq=1.0), computed as boost * idf * tf from:
2.8886645 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
12278 = n, number of documents containing term
220635 = N, total number of documents with field
0.713531 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
1.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
1.0 = dl, length of field
8.868674 = avgdl, average length of field
9.0 = double(popularity)=9.0
In the above debug query, the final score of 18.55 is a product of 2.0611517 ( relevance score based on name match) * 9.0 ( boost from popularity)
Leveraging SOLR’s Payloads and DocValues
SOLR payloads are per document map of terms to values which could be utilised to store searchTerm -> metadata mapping for a given document.
SOLR provides fieldTypes supporting Payloads . An example :
<fieldType name=”delimited_payloads_float” stored=”false” indexed=”true” class=”solr.TextField”>
<analyzer> <tokenizer name=”whitespace”/>
<filter name=”delimitedPayload” encoder=”float”/>
</analyzer>
</fieldType><dynamicField name="*_dpf" type="delimited_payloads_float" indexed="true" stored="true"/>
Finally, in-order to achieve the required equation to be supplied as a function to SOLR to re-rank the result set, a custom ValueSourceParser can be created to utilise the payloads feature for search term specific metrics and docValues for accessing the metadata of the document.
Overriding the getValues function of the ValueSource allows you to initialise the Payload fields and the docValues fields to be utilised :
final Terms terms = readerContext.reader().terms("ctr_dpf");
FunctionValues createdDateFunctionValues =
new LongFieldSource("rating").getValues(context, readerContext);
The expected return type is a DocValues supplier which would contain the logic to fetch appropriate information from the payload and the field
To fetch the documents corresponding to the given search term from the payload field.
if (terms != null) { final TermsEnum termsEnum = terms.iterator(); if (termsEnum.seekExact(indexedBytes)) { docs = termsEnum.postings(null, PostingsEnum.ALL); } else { docs = null; } }
To fetch the payload value from the doc in-hand
BytesRef payload = docs.getPayload();
if (payload != null) {
String stringVal = payload.utf8ToString();
}
To fetch the value from the docValues field
Double getDataFromFunctionValues(FunctionValues functionValues,
int doc){
return (int) functionValues.objectVal(doc);
}
Using the above ValueSource, it can be passed to a SOLR query as part of a bf or a boost as it is a function to take advantage of all the params required to re-rank the result set by passing the query as
q=iphone&boost=myFunction(iphone, weights_of_individual_params)
References :
https://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/
https://lucene.apache.org/solr/guide/8_6/the-dismax-query-parser.html#bq-boost-query-parameter
https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html