I am looking for advices about how to rank results in a very particular way: based on index-level frequency of words contained in a matched document that did not match the search. Let me explain my use case.
My indexed documents are all very short, just one or two sentences. In fact, they are all example sentences. I use Manticore to find example sentences that contain a particular word and it works great.
However, as more and more example sentences are being added, I am looking for a clever way to rank the results in order to bring the most useful example sentences to the top. So what makes an example sentence useful? It depends on many factors, but here is one I think Manticore may be able to help: word frequency. When learning a language, you learn the most frequent words first. So if you are learning the language of those example sentences, chances are a sentence composed with frequent words will be easier to understand, so I want to rank it up. I am talking about frequency among the whole index, not the document.
Assuming that word frequencies in my documents reflect actual language use, I would like to compare the frequency of the searched keyword with the frequency of all the other unmatched words of the matched document in a ranker. I looked in Manticore’s documentation and even in the source code, but I haven’t found a way, even with a custom ranker.
Any help appreciated.