I am trying out the new Chinese icu morphology available in version 3.1.
I previously used ngram_len=1 and ngram_chars = cjk.
Question is: if I set morphology = icu_chinese do I need to disable ngram ?
What about other languages: Japanese, Korean, etc…
Yes, just use
charset_table = cjk, non_cjk
BTW here’s an interactive course on the basics of that - https://play.manticoresearch.com/icu-chinese/
Only Chinese is supported now.
I think my setup is a bit more complicated, I need to support all cjk languages (non just Chinese). I have see the course but I need to understand how I can handle also Japanese and Korean if I disable ngram.
Can I use morphology for Chinese and ngram for Japanese and Korean ?
Which setting takes precedence ? I need a bit of a insight into how these settings are actually handled during the indexing phase.
Can you suggest an optimal setup to handle all cjk languages and not just Chinese ?