strange behavior with morphology_skip_fields

steven2612 · May 19, 2021, 3:51am

Hi,
i have a problem with morphology_skip_fields

i have a search index for a book database. so there are fields for title, autor, isbn, …
i want to happen morphology on the title field but not on the autor or isbn fields.
so i defined a index similar to this:

create table books (title text, autor text, isbn text) min_stemming_len='4' min_prefix_len='1' index_exact_words='1' min_infix_len='2' morphology='lemmatize_de_all' morphology_skip_fields='title,isbn';

i insert my datasets - for example:

insert into books values (0,'test1', 'autor1 test2', '1234567');
insert into books values (0,'test2', 'autor2', '2234567');

searching for the datasets:

select count(*) from books where match ('test1'); -> count 1 - expected 1
select count(*) from books where match ('test2'); -> count 1 - expected 2
select count(*) from books where match ('=test2'); -> count 2 - expected 2
select count(*) from books where match ('1234567'); -> count 0 - expected 1
select count(*) from books where match ('1234567*'); -> count 1 - expected 1
select count(*) from books where match ('=1234567'); -> count 1 - expected 1

if dont using morphology_skip_fields='title,isbn'; or if i am using the option morphology=none in the select statement results are as expected.
am i doing something wrong or is this the expected behavior?
why are there no results for datasets with morphoplogy skipped?

tomat · May 19, 2021, 6:51am

Seems indexing issue. Could you create ticket at Github where to put info from this post?

barryhunter · May 19, 2021, 2:12pm

Not sure if applicable, but I have had problems with using morphology_skip_fields before.

If the search query keywords gets morphed, ie has a different form from the processor, then it doesnt match the ‘skip’ fields.

ie the skip field, stores the ‘unmorphed’ keyword in the index. The other fields stores morphed keyword.
… so when run the query, they keyword gets morphed, and no longer matches the skipped column.

The ‘exact form’ query modifier works, because matches unmorphed in both fields.

A quick fix, is perhaps expand_keywords, then a query like match('test2') will get expanded to include the exact-form modifier. So it has the exact form to match the ‘skipped’ column, and the morphed version to match the column with morphology.

Run your test, with a index with expand_keywords…

mysql> create table books2 (title text, autor text, isbn text) min_stemming_len='4' expand_keywords='1' min_prefix_len='1' index_exact_words='1' min_infix_len='2' morphology='lemmatize_de_all' morphology_skip_fields='title,isbn';
Query OK, 0 rows affected (0.01 sec)

mysql> insert into books2 values (0,'test1', 'autor1 test2', '1234567');Query OK, 1 row affected (0.00 sec)
mysql> insert into books2 values (0,'test2', 'autor2', '2234567');Query OK, 1 row affected (0.00 sec)

mysql> select count(*) from books where match ('test2');
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from books2 where match ('test2');
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

barryhunter · May 19, 2021, 2:20pm

mysql> show meta;
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| total         | 1       |
| total_found   | 1       |
| time          | 0.001   |
| keyword[0]    | *test2* |
| docs[0]       | 2       |
| hits[0]       | 2       |
| keyword[1]    | =test2  |
| docs[1]       | 2       |   <- matching both fields
| hits[1]       | 2       |
| keyword[2]    | test2   |
| docs[2]       | 1       |   <- only matching the unskipped field 
| hits[2]       | 1       |
+---------------+---------+
12 rows in set (0.00 sec)

steven2612 · May 19, 2021, 2:23pm

Thanks for the response!
I have opened an issue on github.
@barryhunter - i will try with expand_keywords - but i have in mind i turned it off because search results have been bad with enabled expand_keywords, but i will test it
at the moment i expand manually certain querys with the “=” so i can do ISBN search and similar.

barryhunter · May 19, 2021, 2:31pm

The auto expansion via expand_keywords, could certainly cause issues, particularly as have infix enabled, so is adding the wildcard expansion too, meaning you get automatic part word matches.

Also it could still be a bug, does seem somewhat odd that need ‘exact-form’ operator to match the skipped field.