Thai Character Support

Andrew_Aculana · October 25, 2018, 10:03am

Hi,

How to configure Thai support. What are the sets for “ngram_chars” and “charset_table”?

Thanks.

Sergey · October 25, 2018, 1:51pm

Hi. According to https://www.unicode.org/charts/PDF/U0E00.pdf should be:

ngram_chars = U+0E01...U+0E3A, U+0E3F...U+0E5B

As far as I know Thai language does not use spaces so just nrgam_chars should be enough.

meverona · January 29, 2019, 6:49am

Hi Sergey!

Now we can use charset_table ‘non_cjk’ and this include all languages in the world?

Sergey · January 29, 2019, 7:03am

Hi @meverona

Not all, but should be enough in most cases. We’re updating the documentation to make it clear what languages can be considered covered in non_cjk and cjk aliases.

meverona · January 29, 2019, 7:53am

When I use Sphinx search, I have create a big charset table. It’s very difficult. And I hope, you do this job correctly. I need these languages support:


(EN) English
(DA) Dansk
(DE) Deutsch
(ES) Español
(FR) Français
(ID) Indonesia
(IT) Italiano
(HU) Magyar‎
(NL) Nederlands
(NB) Norsk‎
(PL) Polski
(PT) Português
(RO) Română‎
(FI) Suomi
(SV) Svenska
(VI) Tiếng Việt
(TR) Türkçe
(CS) Čeština‎
(EL) Ελληνικά‎
(RU) Русский
(AR) العربية
(TH) ภาษาไทย
(CN) 中文（简体）‎
(TW) 中文（繁體）‎
(JA) 日本語
(KO) 한국어

Really Thanks for this feature!