Inconsistent Relevance Scores Across Cluster Nodes with Custom Ranker

dimitrije_marinkovic · April 11, 2025, 12:21pm

Hi Manticore team,

I’m encountering an issue with inconsistent relevance scores across my replicated cluster setup and was hoping to get some insight from the community.

Setup:

Manticore Search cluster with 3 nodes
Each node has an identical copy of the data (same ad_complete table around 6,500,000 rows)
Custom ranker in use: ranker=expr('sum((4*lcs + 10*wlccs + (min_hit_pos==1) + word_count + 2*exact_hit + 25*tf_idf) * user_weight) * 100 + 10*bm25 + ...')

Here is ad_complete table definition:

CREATE TABLE ad_complete (
id bigint,
name text indexed,
description text indexed,
category_group_name text indexed,
condition_tag text indexed,
price_text_tag text indexed,
ad_tags text indexed,
car_fuel_type_tag text indexed,
car_body_type_tag text indexed,
car_gearbox_tag text indexed,
car_drive_tag text indexed,
car_model_tag text indexed,
vehicle_drive_tag text indexed,
vehicle_tags text indexed,
favorite_user_ids text indexed,
has_photo integer,
is_top integer,
is_top_search integer,
is_priority integer,
is_highlighted integer,
is_top_gold integer,
vehicle_km integer,
vehicle_make_year integer,
vehicle_cc integer,
vehicle_power integer,
vehicle_power_h integer,
view_count integer,
is_currency_eur integer,
is_currency_rsd integer,
location_id integer,
courier_delivery integer,
local_pickup integer,
posted integer,
pre_sort integer,
posted_sort integer,
price_sort integer,
price_rsd float,
price_eur float,
location_lon float,
location_lat float,
last_updated_ts integer
) wordforms = ‘/etc/manticoresearch/wordforms.txt’
charset_table = ‘0..9, A..Z->a..z, _, a..z, U+017E->z, U+017D->z, U+0161->s, U+0160->s, U+0107->c, U+0106->c, U+010C->c, U+010D->c, U+0111->d, U+0110->d, U+0430->a, U+0431->b, U+0432->v, U+0433->g, U+0434->d, U+0452->d, U+0435->e, U+0436->z, U+0437->z, U+0438->i, U+0458->j, U+043A->k, U+043B->l, U+043C->m, U+043D->n, U+043E->o, U+043F->p, U+0440->r, U+0441->s, U+0442->t, U+045B->c, U+0443->u, U+0444->f, U+0445->h, U+0446->c, U+0447->c, U+045F->U+01C6, U+0448->s, U+0410->a, U+0411->b, U+0412->v, U+0413->g, U+0414->d, U+0402->d, U+0415->e, U+0416->z, U+0417->z, U+0418->i, U+0419->j, U+041A->k, U+041B->l, U+041C->m, U+041D->n, U+041E->o, U+041F->p, U+0420->r, U+0421->s, U+0422->t, U+040B->c, U+0423->u, U+0424->f, U+0425->h, U+0426->c, U+0427->c, U+040F->U+01C4, U+0428->s’
min_prefix_len = ‘3’
html_strip = ‘1’
blend_chars = ‘+, -, &, U+23’
prefix_fields = ‘name,description’
expand_keywords = ‘1’
index_exact_words = ‘1’
;

Problem: When I run the same search query on different nodes against the same document set, I get different relevance scores for identical documents.

The data is completely replicated, and I’ve verified content is the same. Although, when i run
CALL KEYWORDS(‘some_keyword’, ‘ad_complete’, 1);
i get different results for docs and hits on each server.

Question: What could cause the custom ranker to produce inconsistent results across nodes in a replicated cluster? Is there any part of the ranking pipeline that could introduce non-determinism?

Thanks in advance for your help!

Best regards,
Dimitrije Marinković

tomat · April 11, 2025, 12:26pm

Stats from the CALL KEYWORDS(‘some_keyword’, ‘ad_complete’, 1); should be the same on all nodes.

Could you select full result set ordered by id asc and check what documents you missed?

dimitrije_marinkovic · April 11, 2025, 2:18pm

Hi,

it is a bit challenging as i have around 6.500.000 documents in index.

SELECT COUNT(*) FROM ad_complete;

Count query gives same result across all nodes which indicates that there are no missing documents.

I will try to detect if any document is missing.

tomat · April 11, 2025, 2:29pm

if stats for some_keyword via call keywords differs then and select id from ad_complete where match('some_keyword') order by id asc should produce different result sets for nodes.

You could compare these result sets and check what documents missed on every node.

dimitrije_marinkovic · April 11, 2025, 2:31pm

I have created test index with same structure also replicated on 3 nodes and populated it with 100.000 documents.

This is example where stats from CALL KEYWORDS(‘some_keyword’, ‘ad_complete’, 1); are same on all nodes but relevance differs.

Here are the results:

On 2 nodes i get same results for top 5 documents but third node differs.

tomat · April 11, 2025, 3:24pm

could you check result set with order just by id like ... order by id asc as you show your query with order by relevance that is useless as you said that relevance differs and you got different result set. Now it is not clear is the doc with id 25542030 is missed on the 3d node or just has low weight and do not get on top of the result set.

dimitrije_marinkovic · April 14, 2025, 8:51am

Hi @tomat

I did run suggested query and i got 100% match on all 3 nodes.
There are no documents missing.
As i stated in the beginning of conversation, i am sure that data is properly replicated on all nodes.

tomat · April 14, 2025, 10:58am

not quite sure if you have the same result set on all nodes but call keywords provide you different statistics on the nodes - the index seems invalid

As for now I sure if the index stats different then resut sets should be differ too.

dimitrije_marinkovic · April 14, 2025, 11:57am

As of now i have tried several different approaches to locate difference in data between nodes:

SELECT COUNT(*) FROM ad_complete; - this gives same count on each node
SELECT id FROM ad_complete ORDER BY id ASC; - i have exported data to 3 csv files and then compared csv’s using diff and hexdiff which show no difference. I also counted number of rows in each csv, also same result.
As i mentioned, i have created test table “ad_complete_test” with 100.000 documents. In this case i get same results for CALL KEYWORDS(‘some_keyword’, ‘ad_complete_test’, 1); but different ranking relevance.

dimitrije_marinkovic · April 16, 2025, 9:43am

I found this in documentation:

“On a multi-table cluster, per-keyword frequencies are quite likely to differ across different tables. That means that when the ranking function uses TF-IDF based values, such as BM25 family of factors, the results might be ranked slightly differently depending on what cluster node they reside.” ------------------------ this seems like problem i am encountering------------------------

Could this be the solution to my problem?

“The easiest way to fix that issue is to create and utilize a global frequency dictionary, or a global IDF file for short.”