We are trying to evaluate Manticore Search (MS) vs Elasticsearch (ES) for an Information Retrieval (IR) project. In order to decide we wanted to first come up with a baseline study of how information retrieval works with both MS and ES. We evaluate and compare using well adopted retrieval metrics (NDCG@10, and others) implemented as part of the BEIR project on two datasets TREC-COVID and NF-CORPUS.
All the setup, scripts and thorough results are available as part of this public github repository. The README tries to explain all the different strategies we used for the comparison. Please do let us know if anything requires more explanation and we can clarify. The primary question we are trying to answer is how can we get competitive results with MS similar to what we get from ES.
A summary of the results is below:
Results for trec-covid:
dataset | settings | NDCG@10 |
---|---|---|
trec-covid | MS (default) | 0.29494 |
trec-covid | MS (es-like) | 0.59764 |
trec-covid | ES | 0.68803 |
trec-covid | ES (reported in BEIR) | 0.616 |
Results for nfcorpus:
dataset | settings | NDCG@10 |
---|---|---|
nfcorpus | MS (default) | 0.28791 |
nfcorpus | MS (es-like) | 0.31715 |
nfcorpus | ES | 0.34281 |
nfcorpus | ES (reported in BEIR) | 0.297 |
A few questions we had were:
- What options are we missing on our MS index for us to get competitive results - similar to ES?
- What options are we missing on our MS ranking options for us to get competitive results - similar to ES?
- We’ve observed the best results for MS with the default
en
stop words although that list is much larger than the list for English stop words in ES. How can we explain this behavior?