I’m just starting with Manticore (looks very nice!) and have a question that I’m unable to find answer for anywhere: is it possible to use the NEAR operator but still take into account ordering of the tokens?
For example, given the documents:
old lazy dog
old and very lazy dog
is the dog lazy?
is the dog very lazy?
I want the query to match old and dog if:
the tokens are up to 3 words apart when in this order (should match 1. & 2.)
or the tokens are close together when in reversed order (should match 3. but NOT 4.)
I tried combining NEAR/proximity operators with <</BEFORE but with no success. Is it possible?
document 3. does not contain the word ‘old’, so not sure how you say ‘should match 3.’ as doesnt.
Wonder if you actually meaning the words lazy and dog as that seems to fit better.
But wonder if you can just do something like
((lazy NEAR/3 dog) (lazy << dog) ) | "dog lazy"
The left side would need both NEARand<< match, and the right catches the reverse order, but only when directly adjacent.
You will probably have to experiment with ranking get these results into a good order.
In fact might end up simplifing to something like
(lazy NEAR/3 dog) | (lazy << dog)
which would allow document with the items in the right order to rank higher (matches both sides of OR) - and allows ones where near to outrank ones that just match in order
Thanks and sorry for the confusion, I got the sample documents messed up, indeed.
That’s a great point with the AND query for the same terms, that was the missing point for me! I don’t even have to play with the ranking as I’m in fact only interested in matches, not in their ordering.
I have a similar question.
Your answer is good, but the query ((lazy NEAR/3 dog) (lazy << dog) ) | "dog lazy" will match document: find dog abc lazy def ghi find abc dog - it is wrong, there are no lazy and dog at a distance of 3 words (in strict order).
Would think it something worth checking with a benchmark.
BUt in general manticore has to first get all the documents matching the keywords, then check if the are ‘near’ or not. ie still loading the same initial doclists, and then checking ‘proximity’ by checking if hit are within X.
Cant imagine the phrase operator is much slower (if at all) than NEAR/