Implementing equivalent tokens


#1

I have a case where 2 terms may be used in a search (i.e. “sodium” and “natrium”) and I want results for both terms to be returned whenever EITHER is used.

I am aware of the use of regexp but wonder if this is the best approach (I may have several hundreds of these in various languages) considering:

  • partial matches (i.e. search for “sodi*”)
  • autocorrection (i.e. search for “sosium” or “narium”)
  • case-independance
  • special/accented characters (UTF8 )

all possible complications you can encounter.

Ideas ?

Thanks
Roberto


#2

could you use wordforms and map pairs to single form?

?>cat wordforms.txt
sodium > natrium

this wait query any term will return documents with either source term


#3

For info on autocorrection watch this course - https://play.manticoresearch.com/didyoumean/

partial matches (i.e. search for “sodi*”)
read about infixes and prefixes in the doc (https://docs.manticoresearch.com/)

case-independance
special/accented characters

charset_table = non_cjk should take care of this


#4

The question may seem simple but as there are more than 1 way to implement what I want I also need to know when the specific mapping of the synonym is applied , for example it is before or after the charset_table mapping as this may mean that case-independence is taken care of or not.

I hope this explains
Roberto