strange behavior with morphology_skip_fields

Hi,
i have a problem with morphology_skip_fields

i have a search index for a book database. so there are fields for title, autor, isbn, …
i want to happen morphology on the title field but not on the autor or isbn fields.
so i defined a index similar to this:

create table books (title text, autor text, isbn text) min_stemming_len='4' min_prefix_len='1' index_exact_words='1' min_infix_len='2' morphology='lemmatize_de_all' morphology_skip_fields='title,isbn';

i insert my datasets - for example:

insert into books values (0,'test1', 'autor1 test2', '1234567');
insert into books values (0,'test2', 'autor2', '2234567');

searching for the datasets:

select count(*) from books where match ('test1'); -> count 1 - expected 1
select count(*) from books where match ('test2'); -> count 1 - expected 2
select count(*) from books where match ('=test2'); -> count 2 - expected 2
select count(*) from books where match ('1234567'); -> count 0 - expected 1
select count(*) from books where match ('1234567*'); -> count 1 - expected 1
select count(*) from books where match ('=1234567'); -> count 1 - expected 1

if dont using morphology_skip_fields='title,isbn'; or if i am using the option morphology=none in the select statement results are as expected.
am i doing something wrong or is this the expected behavior?
why are there no results for datasets with morphoplogy skipped?

Seems indexing issue. Could you create ticket at Github where to put info from this post?

Not sure if applicable, but I have had problems with using morphology_skip_fields before.

If the search query keywords gets morphed, ie has a different form from the processor, then it doesnt match the ‘skip’ fields.

ie the skip field, stores the ‘unmorphed’ keyword in the index. The other fields stores morphed keyword.
… so when run the query, they keyword gets morphed, and no longer matches the skipped column.

The ‘exact form’ query modifier works, because matches unmorphed in both fields.

A quick fix, is perhaps expand_keywords, then a query like match('test2') will get expanded to include the exact-form modifier. So it has the exact form to match the ‘skipped’ column, and the morphed version to match the column with morphology.

Run your test, with a index with expand_keywords…

mysql> create table books2 (title text, autor text, isbn text) min_stemming_len='4' expand_keywords='1' min_prefix_len='1' index_exact_words='1' min_infix_len='2' morphology='lemmatize_de_all' morphology_skip_fields='title,isbn';
Query OK, 0 rows affected (0.01 sec)

mysql> insert into books2 values (0,'test1', 'autor1 test2', '1234567');Query OK, 1 row affected (0.00 sec)
mysql> insert into books2 values (0,'test2', 'autor2', '2234567');Query OK, 1 row affected (0.00 sec)

mysql> select count(*) from books where match ('test2');
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from books2 where match ('test2');
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)
mysql> show meta;
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| total         | 1       |
| total_found   | 1       |
| time          | 0.001   |
| keyword[0]    | *test2* |
| docs[0]       | 2       |
| hits[0]       | 2       |
| keyword[1]    | =test2  |
| docs[1]       | 2       |   <- matching both fields
| hits[1]       | 2       |
| keyword[2]    | test2   |
| docs[2]       | 1       |   <- only matching the unskipped field 
| hits[2]       | 1       |
+---------------+---------+
12 rows in set (0.00 sec)

Thanks for the response!
I have opened an issue on github.
@barryhunter - i will try with expand_keywords - but i have in mind i turned it off because search results have been bad with enabled expand_keywords, but i will test it
at the moment i expand manually certain querys with the “=” so i can do ISBN search and similar.

The auto expansion via expand_keywords, could certainly cause issues, particularly as have infix enabled, so is adding the wildcard expansion too, meaning you get automatic part word matches.

Also it could still be a bug, does seem somewhat odd that need ‘exact-form’ operator to match the skipped field.