pQd
February 19, 2024, 8:35am
1
can i create a list of phrases that will not have morphology applied to them?
i’m using manticore search 6.2.12, with index defined with:
morphology = stem_en
index_exact_words = 1
min_stemming_len = 4
while morphology works great overall, i’ve just discovered that this approach leads to an undesired transformation: united → unit :
mysql> call keywords('united','idx');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | united | unit |
+------+-----------+------------+
thanks!
Sergey
February 20, 2024, 4:37am
2
You can use the wordforms functionality for that:
Example:
➜ ~ cat /tmp/wf
united => united
➜ ~ mysql -P9306 -h0 -v -e "drop table if exists t; create table t(f text) morphology='stem_en' index_exact_words='1' min_stemming_len='4' wordforms='/tmp/wf'; call keywords('united','t'); call keywords('uniting', 't');"
--------------
drop table if exists t
--------------
--------------
create table t(f text) morphology='stem_en' index_exact_words='1' min_stemming_len='4' wordforms='/tmp/wf'
--------------
--------------
call keywords('united','t')
--------------
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | united | united |
+------+-----------+------------+
--------------
call keywords('uniting', 't')
--------------
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | uniting | unit |
+------+-----------+------------+
pQd
February 20, 2024, 7:49am
3
thanks Sergey, that’s exactly what i needed!
it works as expected - with wordforms file having
united > united
after index rebuild i get:
mysql> call keywords('united','test1');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | united | united | // this word is not transformed
+------+-----------+------------+
mysql> call keywords('unitedly','test1');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | unitedly | unit | // this word is subjected to stemmer rules and transformed
+------+-----------+------------+