Combining NEAR with strict order

borama · March 6, 2023, 1:52pm

Hello,

I’m just starting with Manticore (looks very nice!) and have a question that I’m unable to find answer for anywhere: is it possible to use the NEAR operator but still take into account ordering of the tokens?

For example, given the documents:

old lazy dog
old and very lazy dog
is the dog lazy?
is the dog very lazy?

I want the query to match old and dog if:

the tokens are up to 3 words apart when in this order (should match 1. & 2.)
or the tokens are close together when in reversed order (should match 3. but NOT 4.)

I tried combining NEAR/proximity operators with <</BEFORE but with no success. Is it possible?

Thanks!

Matouš

barryhunter · March 6, 2023, 2:40pm

document 3. does not contain the word ‘old’, so not sure how you say ‘should match 3.’ as doesnt.

Wonder if you actually meaning the words lazy and dog as that seems to fit better.

But wonder if you can just do something like

((lazy NEAR/3 dog) (lazy << dog) ) | "dog lazy"

The left side would need both NEAR and << match, and the right catches the reverse order, but only when directly adjacent.

You will probably have to experiment with ranking get these results into a good order.

In fact might end up simplifing to something like

(lazy NEAR/3 dog) | (lazy << dog)

which would allow document with the items in the right order to rank higher (matches both sides of OR) - and allows ones where near to outrank ones that just match in order

borama · March 6, 2023, 4:30pm

Thanks and sorry for the confusion, I got the sample documents messed up, indeed.

That’s a great point with the AND query for the same terms, that was the missing point for me! I don’t even have to play with the ranking as I’m in fact only interested in matches, not in their ordering.

Thanks again!

Daniil · March 8, 2023, 3:56pm

Hi, Barry!

I have a similar question.
Your answer is good, but the query ((lazy NEAR/3 dog) (lazy << dog) ) | "dog lazy" will match document: find dog abc lazy def ghi find abc dog - it is wrong, there are no lazy and dog at a distance of 3 words (in strict order).

barryhunter · March 8, 2023, 4:45pm

Hmm, yes, that is good point. Not actulyl ensuring that the NEAR and stict order apply to ‘same’ two words.

I can only think to do

"lazy dog" | "lazy * dog" | "lazy * * dog" | "dog lazy"

Using the wildcard inside phrase operator, allows similar functionalty to NEAR/primoxity while still preserving words.

Daniil · March 8, 2023, 5:49pm

Okay, but i think it will be very slow for big queries. For example 4 words instead lazy dog, proximity is 4 and million of documents.

barryhunter · March 8, 2023, 6:24pm

Would think it something worth checking with a benchmark.

BUt in general manticore has to first get all the documents matching the keywords, then check if the are ‘near’ or not. ie still loading the same initial doclists, and then checking ‘proximity’ by checking if hit are within X.

Cant imagine the phrase operator is much slower (if at all) than NEAR/

Daniil · March 9, 2023, 7:06am

okay, thank you, I will check