How can I split up "3060ti" during indexing so that "3060" successfully matches it

Hello,

I noticed in default manticore search settings
“3060” does not match “3060ti” but
"3060* does match “3060ti”

How would I setup manticore search so that “3080ti” is split up during indexing as “3080” and “ti” so that searches for “3080” and “3080ti” and “3080 ti” would match it?

Thank you.

There’s wordbreaker for that, but it may require an effort to integrate your app with it. Unfortunately, there’s no out-of-the-box way to do that.

Its a bit of a hack, and can break ‘phrase operator’ if you use that, but can do something like

regexp_filter = \b(\d+)([a-z]+)\b => \1\2 \1 \2

… it ‘injects’ it as both with and without a space.

Been a while since used it, might need

regexp_filter = (\d+)([a-z]+) => (\1\2 | ("\1 \2"))

… as regexp_filter runs on the query as well. The extra charactors wont affect indexing, but needed as regexp_filter is used on query too.

@barryhunter
Thanks

I tested

	regexp_filter = \b(\d+)([a-z]+)\b => \1\2 \1 \2
	regexp_filter = \b([a-z]+)(\d+)\b => \1\2 \1 \2

Wow… works like a charm

Why did you think (\1\2 | (“\1 \2”)) might be needed? I didn’t understand that part.
Also what problems could it cause with phrase operator?

Again thank you!

Well in concept, if have a document, containing say

one 3060ti two three

the regex filter will ‘expand’ that to

one 3060ti 3060 ti two three

which is good as will match 3060, and 3060ti

But if user was to do a query like

"one 3060 two"

would never match, because 3060 no longer ‘directly’ follows one

Neither would the query "one 3060 ti two" work.

However "one 3060ti two" would work, as it expands.


The reason for adding the query operators, becaue if was to enter a query like say

test 3060ti | nine

It will get expanded to

test 3060ti 3600 ti | nine

which changes the meaning on of the query - ie effectively

test 3060ti 3600 (ti | nine)

Needs test+3060ti in all documents, the nine on it own wont match. (the original query would work for find documents with JUST test or nine)

@barryhunter
Excellent explanation. Thank you.

I would guess your initial post of
regexp_filter = (\d+)([a-z]+) => (\1\2 | ("\1 \2"))
missed \b by accident?
so it should be
regexp_filter = \b(\d+)([a-z]+)\b => (\1\2 | ("\1 \2"))

Again thank you.
Your reply was very helpful.

Ah yes, not having the \b on the second was a mistake. Not deliberate.

I think I realised should have the \b right before posting and only added to one.