Thai Character Support


#1

Hi,

How to configure Thai support. What are the sets for “ngram_chars” and “charset_table”?

Thanks.


#2

Hi. According to https://www.unicode.org/charts/PDF/U0E00.pdf should be:

ngram_chars = U+0E01...U+0E3A, U+0E3F...U+0E5B

As far as I know Thai language does not use spaces so just nrgam_chars should be enough.


#3

Hi Sergey!

Now we can use charset_table ‘non_cjk’ and this include all languages in the world?


#4

Hi @meverona

Not all, but should be enough in most cases. We’re updating the documentation to make it clear what languages can be considered covered in non_cjk and cjk aliases.


#5

When I use Sphinx search, I have create a big charset table. It’s very difficult. And I hope, you do this job correctly. I need these languages support:

(EN) English
(DA) Dansk
(DE) Deutsch
(ES) Español
(FR) Français
(ID) Indonesia
(IT) Italiano
(HU) Magyar‎
(NL) Nederlands
(NB) Norsk‎
(PL) Polski
(PT) Português
(RO) Română‎
(FI) Suomi
(SV) Svenska
(VI) Tiếng Việt
(TR) Türkçe
(CS) Čeština‎
(EL) Ελληνικά‎
(RU) Русский
(AR) العربية
(TH) ภาษาไทย
(CN) 中文(简体)‎
(TW) 中文(繁體)‎
(JA) 日本語
(KO) 한국어

Really Thanks for this feature!