I am working on a new project and need to decide between Elastic and other alternatives. One of the features, that I am looking for is the following:
During document insertion, use a score (e.g. conversion score) that is used in construction of the posting list
During query time, retrieve the top ‘N’ documents (based on this conversion score) and the term that’s passed in the “match” part and then use BM25 (etc…) to score the retrieved documents. For e.g. if I want 1000 documents that contain “pizza”, first pull 1000 top documents that contain “pizza” (based on conversion score) and then rank these 1000 documents based on BM25.
In your example, you inserted 5 docs out of which 4 of them contain the term “pizza”. In the match query, you used “limit 2” to return 2 results. But will the query retrieve all 4 documents ?
If I have say 10k documents that contain “pizza” and I want 1000 results to be returned to the caller. I want to only look at top (by score) 1000 docs that contain “pizza” and then use BM25 for relevance within the 1000 retrieved. I don’t want to see all 10k docs.
Is that what your query would do ? I don’t think so but I might be wrong.
In your example, you inserted 5 docs out of which 4 of them contain the term “pizza”. In the match query, you used “limit 2” to return 2 results. But will the query retrieve all 4 documents ?
Internally at some phase it may retrieve all 4 documents, but the point is that the inner query will return only 2 ordered by score and the outer query will reorder them by weight. Same should work with 1k instead of 2.