Collation not working for mixed languages

I have a table of documents with multiple languages, English, Spanish, Bosnian, German, Finnish, French. Some of the documents mention foreign language names or text with English text. I am having issue with collation not working on any of these documents. I am using the newest version 5.0.2 and columnar version 1.15.4.

I’ve tried collation_server = utf8_general_ci and libc_ci and for the specific index I’ve tried charset_table=non_cjk and and ```
non_cjk, U+00E4, U+00C4->U+00E4, U+00F6, U+00D6->U+00F6, U+00FC, U+00DC->U+00FC, U+00DF, U+1E9E->U+00DF and as many variations as I could find in the documentation.

I’ve confirmed the data in the mysql table has the correct encoding, but when I index it is when I get corrupted accents.

The following text is what is in the mysql table:
poljoprivredi i ekološkom otisku mesne industrije, stvorilo je potrebu za alternativama mesu proizvedenih bez životinja… U posljednjem

The following text is what I end up getting from manticore:
poljoprivredi i ekolo�kom otisku mesne industrije, stvorilo je potrebu za alternativama mesu proizvedenih bez �ivotinja. U posljednjem

Am I missing some configuration some where in the searchd or in the index itself. The index is a plain index

it should ben better to provide complete example as collation affect only attribute filtering and sorting but have nothing to do with full text matching

That is not clear what is wrong based on your description

Hello, Here is the example of the response I get back using manticore. Accented characters are turning in the diamond icon.

Can you please provide an example in the following form?

➜  ~ mysql -P9306 -h0 -v
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.0.2 348514c86@220530 dev (columnar 1.15.4 2fef34e@220522) (secondary 1.15.4 2fef34e@220522) from tarball

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Reading history-file /Users/snikolaev/.mysql_history
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop table if exists t; create table t(f text); insert into t values(0,'ekološkom'); select * from t;
drop table if exists t

Query OK, 0 rows affected (0.00 sec)

create table t(f text)

Query OK, 0 rows affected (0.01 sec)

insert into t values(0,'ekološkom')

Query OK, 1 row affected (0.00 sec)

select * from t

| id                  | f          |
| 1514885734832537604 | ekološkom  |
1 row in set (0.00 sec)

As you can see there’s no problem in this case.

attributes or fields from docstore are not get converted by collation_server option but returned as is into result set strings
It up to your client to convert these

I wanted to give an update. Tomat was right with the client changing the collation. So I had to receive the manticore response a text then convert it to json response to resolve the issue. Sergey I am using the plain indexing with the config file. So that I can call make queries through the 9308 port. Would my queries yield faster results if I use the pure mysql interface?

If you issuing SQL queries to get the data make sure you set the following:
sql_query_pre = SET NAMES utf8
sql_query_pre = SET CHARACTER_SET_RESULTS=utf8

Sergey I am using the plain indexing with the config file

Here’s a good config example I use when I need to provide a reproducible example which involves external mysql:

source min {
    type = mysql
    sql_host = localhost
    sql_user = test
    sql_pass =
    sql_db = test
    sql_query = select 1, 'dog' Doc, 1 group_id, 'red' color, 3.5 size
    sql_field_string = doc
    sql_attr_uint = group_id
    sql_attr_string = color
    sql_attr_float = size

index idx {
    path = idx
    source = min

searchd {
    listen =
    log = sphinx_min.log
    pid_file = /home/snikolaev/
    binlog_path =