is it possible to prevent morphology modification for a specific word?

can i create a list of phrases that will not have morphology applied to them?

i’m using manticore search 6.2.12, with index defined with:

morphology              = stem_en      
index_exact_words       = 1            
min_stemming_len        = 4            

while morphology works great overall, i’ve just discovered that this approach leads to an undesired transformation: unitedunit:

mysql> call keywords('united','idx');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | united    | unit       |
+------+-----------+------------+

thanks!

You can use the wordforms functionality for that:

Example:

➜  ~ cat /tmp/wf
united => united

➜  ~ mysql -P9306 -h0 -v -e "drop table if exists t; create table t(f text) morphology='stem_en' index_exact_words='1' min_stemming_len='4' wordforms='/tmp/wf'; call keywords('united','t'); call keywords('uniting', 't');"
--------------
drop table if exists t
--------------

--------------
create table t(f text) morphology='stem_en' index_exact_words='1' min_stemming_len='4' wordforms='/tmp/wf'
--------------

--------------
call keywords('united','t')
--------------

+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | united    | united     |
+------+-----------+------------+
--------------
call keywords('uniting', 't')
--------------

+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | uniting   | unit       |
+------+-----------+------------+

thanks Sergey, that’s exactly what i needed!

it works as expected - with wordforms file having

united > united

after index rebuild i get:

mysql> call keywords('united','test1');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | united    | united     | // this word is not transformed
+------+-----------+------------+


mysql> call keywords('unitedly','test1');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | unitedly  | unit       | // this word is subjected to stemmer rules and transformed
+------+-----------+------------+