Using score during document insertion and use it during retrieval

Newbie to manticore. So, please bear with me.

I am working on a new project and need to decide between Elastic and other alternatives. One of the features, that I am looking for is the following:

  1. During document insertion, use a score (e.g. conversion score) that is used in construction of the posting list
  2. During query time, retrieve the top ‘N’ documents (based on this conversion score) and the term that’s passed in the “match” part and then use BM25 (etc…) to score the retrieved documents. For e.g. if I want 1000 documents that contain “pizza”, first pull 1000 top documents that contain “pizza” (based on conversion score) and then rank these 1000 documents based on BM25.

Is this even possible ?

Hi. Should be smth like this:

mysql> drop table if exists t; create table t(f text, score int) index_field_lengths='1'; insert into t values(1, 'pizza', 1),(2, 'pizza pizza smth else', 1),(3, 'pizza 3rd doc', 2),(4, 'pasta', 1),(5, 'pizza pasta', 2); select * from (select *, weight() w from t where match('pizza') order by score desc limit 2 option ranker=expr('bm25a(1.2, 0.75) * 1000')) order by w desc;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table t(f text, score int) index_field_lengths='1'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t values(1, 'pizza', 1),(2, 'pizza pizza smth else', 1),(3, 'pizza 3rd doc', 2),(4, 'pasta', 1),(5, 'pizza pasta', 2)
--------------

Query OK, 5 rows affected (0.00 sec)

--------------
select * from (select *, weight() w from t where match('pizza') order by score desc limit 2 option ranker=expr('bm25a(1.2, 0.75) * 1000')) order by w desc
--------------

+------+---------------+-------+-------+------+
| id   | f             | score | f_len | w    |
+------+---------------+-------+-------+------+
|    3 | pizza 3rd doc |     2 | 3     |  423 |
|    5 | pizza pasta   |     2 | 2     |  408 |
+------+---------------+-------+-------+------+
2 rows in set (0.00 sec)
--- 2 out of 4 results in 0ms ---

Thanks Sergey. A follow up q:

In your example, you inserted 5 docs out of which 4 of them contain the term “pizza”. In the match query, you used “limit 2” to return 2 results. But will the query retrieve all 4 documents ?

If I have say 10k documents that contain “pizza” and I want 1000 results to be returned to the caller. I want to only look at top (by score) 1000 docs that contain “pizza” and then use BM25 for relevance within the 1000 retrieved. I don’t want to see all 10k docs.

Is that what your query would do ? I don’t think so but I might be wrong.

In your example, you inserted 5 docs out of which 4 of them contain the term “pizza”. In the match query, you used “limit 2” to return 2 results. But will the query retrieve all 4 documents ?

Internally at some phase it may retrieve all 4 documents, but the point is that the inner query will return only 2 ordered by score and the outer query will reorder them by weight. Same should work with 1k instead of 2.