Hello,
I noticed in default manticore search settings
“3060” does not match “3060ti” but
"3060* does match “3060ti”
How would I setup manticore search so that “3080ti” is split up during indexing as “3080” and “ti” so that searches for “3080” and “3080ti” and “3080 ti” would match it?
Thank you.
Sergey
February 13, 2023, 7:19am
2
There’s wordbreaker for that, but it may require an effort to integrate your app with it. Unfortunately, there’s no out-of-the-box way to do that.
Its a bit of a hack, and can break ‘phrase operator’ if you use that, but can do something like
regexp_filter = \b(\d+)([a-z]+)\b => \1\2 \1 \2
… it ‘injects’ it as both with and without a space.
Been a while since used it, might need
regexp_filter = (\d+)([a-z]+) => (\1\2 | ("\1 \2"))
… as regexp_filter runs on the query as well. The extra charactors wont affect indexing, but needed as regexp_filter is used on query too.
@barryhunter
Thanks
I tested
regexp_filter = \b(\d+)([a-z]+)\b => \1\2 \1 \2
regexp_filter = \b([a-z]+)(\d+)\b => \1\2 \1 \2
Wow… works like a charm
Why did you think (\1\2 | (“\1 \2”)) might be needed? I didn’t understand that part.
Also what problems could it cause with phrase operator?
Again thank you!
Well in concept, if have a document, containing say
one 3060ti two three
the regex filter will ‘expand’ that to
one 3060ti 3060 ti two three
which is good as will match 3060
, and 3060ti
But if user was to do a query like
"one 3060 two"
would never match, because 3060
no longer ‘directly’ follows one
Neither would the query "one 3060 ti two"
work.
However "one 3060ti two"
would work, as it expands.
The reason for adding the query operators, becaue if was to enter a query like say
test 3060ti | nine
It will get expanded to
test 3060ti 3600 ti | nine
which changes the meaning on of the query - ie effectively
test 3060ti 3600 (ti | nine)
Needs test
+3060ti
in all documents, the nine
on it own wont match. (the original query would work for find documents with JUST test
or nine
)
@barryhunter
Excellent explanation. Thank you.
I would guess your initial post of
regexp_filter = (\d+)([a-z]+) => (\1\2 | ("\1 \2"))
missed \b by accident?
so it should be
regexp_filter = \b(\d+)([a-z]+)\b => (\1\2 | ("\1 \2"))
Again thank you.
Your reply was very helpful.
Ah yes, not having the \b on the second was a mistake. Not deliberate.
I think I realised should have the \b right before posting and only added to one.