Hi, sorry for the bad thread title, it was the best I could come up with! lol
Just a really basic question I suspect, but we’ve got a list of companies in the index and we can match them on the company name, so if we search for ‘AB Comp’, it would match “AB Company”, “AB Companies Group”, “AB Compressors ltd” etc…
But given a search term of ‘@company_name AB Comp’ how would we be able to match entries like ‘A B Company’ or ‘A B Companies Group’ where there are spaces between A and B as well as the company names without spaces between the letters. Currently the companies with spaces in there don’t get returned by the search.
Hopefully this makes sense!
Table Create Statement:
CREATE TABLE cmp_test (
source string attribute,
extid string attribute,
town string attribute,
postcode string attribute,
status string attribute,
) min_prefix_len=‘2’ min_word_len=‘2’ blend_chars=‘@,.,U+23,-’ expand_keywords=‘1’ rt_mem_limit=‘2147483648’;
SELECT *, WEIGHT() as relevance FROM cmp_test WHERE MATCH(’ @company_name AB Comp’) ORDER BY relevance desc;
This isn’t easy thing to ‘solve’, as there is possibly no perfect solution. BUt there are lots of compromises…
There is Manticore Search Manual: Miscellaneous tools which tries to offer statististical breakdowns.
In theory would run the query
AB Comp though wordbreaker, and it would suggest that AB is worth splitting, so would then rewrite the query to for example
(AB | "A B") Comp
Or you could just do that ‘blindly’, ie dont bother with wordbreaker, and just split all short words.
… or could try ‘normalizing’ abbreviations, using regexp_filter - the idea would be to for example join all split aborivations, so they alway smatch
regexp_filter = \b([A-Z])\ ([A-Z])\ ?([A-Z])?\b => \1\2\3
which would remove the spaces in two/three abbreviations, and join them. Could extend to allow for longer. Could also have something similar to remove periods in the able (ie cope with ‘A.B. Company’)
As regexp_filter is run against BOTH the indexing and the qyery, in theory queries should match in both directions.
The other potentially interesting solution is to simply remove spaces, and index things as one word.
Eg store company name without spaces, probably as well as normal one! ie store as
Then when issue a query (again with its spaces removed!), it will match
abcomp* regardless if was AB or A B
(also wordforms could help with dealing with LTD <-> Limited)