i have a search index for a book database. so there are fields for title, autor, isbn, …
i want to happen morphology on the title field but not on the autor or isbn fields.
so i defined a index similar to this:
create table books (title text, autor text, isbn text) min_stemming_len='4' min_prefix_len='1' index_exact_words='1' min_infix_len='2' morphology='lemmatize_de_all' morphology_skip_fields='title,isbn';
i insert my datasets - for example:
insert into books values (0,'test1', 'autor1 test2', '1234567');
insert into books values (0,'test2', 'autor2', '2234567');
searching for the datasets:
select count(*) from books where match ('test1'); -> count 1 - expected 1
select count(*) from books where match ('test2'); -> count 1 - expected 2
select count(*) from books where match ('=test2'); -> count 2 - expected 2
select count(*) from books where match ('1234567'); -> count 0 - expected 1
select count(*) from books where match ('1234567*'); -> count 1 - expected 1
select count(*) from books where match ('=1234567'); -> count 1 - expected 1
if dont using morphology_skip_fields='title,isbn'; or if i am using the option morphology=none in the select statement results are as expected.
am i doing something wrong or is this the expected behavior?
why are there no results for datasets with morphoplogy skipped?
Not sure if applicable, but I have had problems with using morphology_skip_fields before.
If the search query keywords gets morphed, ie has a different form from the processor, then it doesnt match the ‘skip’ fields.
ie the skip field, stores the ‘unmorphed’ keyword in the index. The other fields stores morphed keyword.
… so when run the query, they keyword gets morphed, and no longer matches the skipped column.
The ‘exact form’ query modifier works, because matches unmorphed in both fields.
A quick fix, is perhaps expand_keywords, then a query like match('test2') will get expanded to include the exact-form modifier. So it has the exact form to match the ‘skipped’ column, and the morphed version to match the column with morphology.
Run your test, with a index with expand_keywords…
mysql> create table books2 (title text, autor text, isbn text) min_stemming_len='4' expand_keywords='1' min_prefix_len='1' index_exact_words='1' min_infix_len='2' morphology='lemmatize_de_all' morphology_skip_fields='title,isbn';
Query OK, 0 rows affected (0.01 sec)
mysql> insert into books2 values (0,'test1', 'autor1 test2', '1234567');Query OK, 1 row affected (0.00 sec)
mysql> insert into books2 values (0,'test2', 'autor2', '2234567');Query OK, 1 row affected (0.00 sec)
mysql> select count(*) from books where match ('test2');
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from books2 where match ('test2');
+----------+
| count(*) |
+----------+
| 2 |
+----------+
1 row in set (0.00 sec)
Thanks for the response!
I have opened an issue on github. @barryhunter - i will try with expand_keywords - but i have in mind i turned it off because search results have been bad with enabled expand_keywords, but i will test it
at the moment i expand manually certain querys with the “=” so i can do ISBN search and similar.
The auto expansion via expand_keywords, could certainly cause issues, particularly as have infix enabled, so is adding the wildcard expansion too, meaning you get automatic part word matches.
Also it could still be a bug, does seem somewhat odd that need ‘exact-form’ operator to match the skipped field.