Searching ignoring accents


#1

Hello,
Is there any way to make a search ignoring accents?
For example.
Jose or José, María or Maria, etc.

I’ve set the collation_libc_locale = es_ES and restart manticore service, but there is no diference.
It looks that only affect to GROUP and ORDER (https://manticoresearch.gitlab.io/dev/searching/collations.html#collations)

Thanks in advance.


#2

As the documentation says, collation affects only string comparisons, it doesn’t affect full-text matching, for this you need to adjust charset_table. What do you have in charset_table directive?


#3

charset_table = non_cjk

mysql> insert into idx_min(id,f) values(1, 'José María');
Query OK, 1 row affected (0.01 sec)

mysql> select * from idx_min where match('José');
+------+------+
| id   | a    |
+------+------+
|    1 |    0 |
+------+------+
1 row in set (0.00 sec)

mysql> select * from idx_min where match('Jose');
+------+------+
| id   | a    |
+------+------+
|    1 |    0 |
+------+------+
1 row in set (0.00 sec)

mysql> select * from idx_min where match('Maria');
+------+------+
| id   | a    |
+------+------+
|    1 |    0 |
+------+------+
1 row in set (0.00 sec)

#4

Thanks but I can’t find where set the charset_table configuration path.
Here don’t say where is: https://docs.manticoresearch.com/latest/html/conf_options_reference/index_configuration_options.html?highlight=charset_table#
Any more help?
Thanks in advance.


#5

charset_table is not a path directive, it declares how characters are (or not) indexed, for example:

charset_table = 0..9, A..Z->a..z, _, a..z, \
    U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+451, U+451

Means numbers, latin, _ and russian characters are indexed (anything else is replaced with a whitespace). Also uppercases are mapped to lowercases (Apple and apple will match the same).

You can use the newly non_cjk alias, it’s a map with most UTF non-CJK characters (and includes mapping accents to latin form):

charset_table = non_cjk

(if you want to see the non_cjk mapping, it can be found here: https://github.com/manticoresoftware/manticoresearch/blob/master/src/charsets/non_cjk.txt)