Fewer results than expected

We have indexed about 600k documents with the following type of text:

iso18872 % iso18872 % iso527-1 % iso527-1 % iso868 % iso868 % iso2409 % iso2409 % iso4017 % iso4017 % iso4032 % iso4032 % iso4892-11 % iso4892-11

When we search on ‘iso188*’, the example above is found but when we search on ‘iso*’ it’s not in our result set?

The expected result for us would be to find the document in both cases. Are we missing something?

PS: please ignore the duplicate values in the example, we will correct this as well. I lieft them in in case they are related to the issue.

I can’t reproduce your issue:

mysql> drop table if exists t; create table t(f text) min_infix_len='2'; insert into t values(1, 'iso18872 % iso18872 % iso527-1 % iso527-1 % iso868 % iso868 % iso2409 % iso2409 % iso4017 % iso4017 % iso4032 % iso4032 % iso4892-11 % iso4892-11'); select * from t where match('iso188*'); select * from t where match('iso*');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t(f text) min_infix_len='2'
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t values(1, 'iso18872 % iso18872 % iso527-1 % iso527-1 % iso868 % iso868 % iso2409 % iso2409 % iso4017 % iso4017 % iso4032 % iso4032 % iso4892-11 % iso4892-11')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select * from t where match('iso188*')
--------------

+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| id   | f                                                                                                                                                 |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
|    1 | iso18872 % iso18872 % iso527-1 % iso527-1 % iso868 % iso868 % iso2409 % iso2409 % iso4017 % iso4017 % iso4032 % iso4032 % iso4892-11 % iso4892-11 |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---

--------------
select * from t where match('iso*')
--------------

+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| id   | f                                                                                                                                                 |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
|    1 | iso18872 % iso18872 % iso527-1 % iso527-1 % iso868 % iso868 % iso2409 % iso2409 % iso4017 % iso4017 % iso4032 % iso4032 % iso4892-11 % iso4892-11 |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---

Please provide more details or, best of a all, a similar reproducible example.

Hi Sergey,

Thank you for your response, I have added some of our settings below:

index index1
{
        type = plain

        html_strip = 1
        charset_table = non_cjk
        ngram_len = 1
        ngram_chars = cjk

        min_prefix_len = 1
        min_stemming_len = 4

        morphology = stem_en

        wordforms = /usr/local/etc/britishmapping.txt

        index_exact_words = 1

        index_sp = 1

        blend_chars= U+002C, ., /, -, ', :, [, ], (, )
        blend_mode = trim_none, trim_head, trim_tail, trim_both, skip_pure
}

Still can’t reproduce:

snikolaev@dev2:~$ cat cracanut.conf
searchd {
    listen = 9315:mysql41
    log = searchd.log
    pid_file = searchd.pid
    binlog_path =
}

source src {
    type = csvpipe
    csvpipe_command = echo "1,iso18872 % iso18872 % iso527-1 % iso527-1 % iso868 % iso868 % iso2409 % iso2409 % iso4017 % iso4017 % iso4032 % iso4032 % iso4892-11 % iso4892-11"
    csvpipe_field = f
}

index index1
{
        type = plain

        source = src
        path = /tmp/idx

        html_strip = 1
        charset_table = non_cjk
        ngram_len = 1
        ngram_chars = cjk

        min_prefix_len = 1
        min_stemming_len = 4

        morphology = stem_en

        wordforms = /usr/local/etc/britishmapping.txt

        index_exact_words = 1

        index_sp = 1

        blend_chars= U+002C, ., /, -, ', :, [, ], (, )
        blend_mode = trim_none, trim_head, trim_tail, trim_both, skip_pure
}
snikolaev@dev2:~$ indexer -c cracanut.conf --all
Manticore 6.2.13 56aaf1f55@24021713 dev (columnar 2.2.5 a5342a1@240217) (secondary 2.2.5 a5342a1@240217) (knn 2.2.5 a5342a1@240217)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

using config file '/home/snikolaev/cracanut.conf'...
indexing table 'index1'...
collected 1 docs, 0.0 MB
creating secondary index
creating lookup: 0.0 Kdocs, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 1 docs, 145 bytes
total 0.058 sec, 2464 bytes/sec, 16.99 docs/sec
total 3 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 15 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
snikolaev@dev2:~$ searchd -c cracanut.conf
Manticore 6.2.13 56aaf1f55@24021713 dev (columnar 2.2.5 a5342a1@240217) (secondary 2.2.5 a5342a1@240217) (knn 2.2.5 a5342a1@240217)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

[25:12.195] [1606558] using config file '/home/snikolaev/cracanut.conf' (858 chars)...
starting daemon version '6.2.13 56aaf1f55@24021713 dev (columnar 2.2.5 a5342a1@240217) (secondary 2.2.5 a5342a1@240217) (knn 2.2.5 a5342a1@240217)' ...
listening on all interfaces for mysql, port=9315
precaching table 'index1'
precached 1 tables in 0.002 sec
snikolaev@dev2:~$ mysql -P9315 -h0
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 6.2.13 56aaf1f55@24021713 dev (columnar 2.2.5 a5342a1@240217) (secondary 2.2.5 a5342a1@240217) (knn 2.2.5 a5342a1@240217) git branch master...origin/master

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select * from index1 where match('iso*');
+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| id   | f                                                                                                                                                 |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
|    1 | iso18872 % iso18872 % iso527-1 % iso527-1 % iso868 % iso868 % iso2409 % iso2409 % iso4017 % iso4017 % iso4032 % iso4032 % iso4892-11 % iso4892-11 |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---

We are trying to reproduce the issue on a test instance but so far it’s only happening on our production server:

mysql> select id, descriptionstandards from applications_full where match('@descriptionstandards iso*') and id = 479459870 LIMIT 1;
Empty set (0.03 sec)

mysql> select id, descriptionstandards from applications_full where match('@descriptionstandards iso1*') and id = 479459870 LIMIT 1;
Empty set (0.71 sec)

mysql> select id, descriptionstandards from applications_full where match('@descriptionstandards iso18*') and id = 479459870 LIMIT 1;
+-----------+----------------------+
| id        | descriptionstandards |
+-----------+----------------------+
| 479459870 | iso18880 % iso18880  |
+-----------+----------------------+
1 row in set (0.01 sec)

Strangely, it does find some documents:

mysql> select id, descriptionstandards from applications_full where match('@descriptionstandards iso*') and id = 602382718 LIMIT 1;
+-----------+----------------------------------------------------------------------------------------+
| id        | descriptionstandards                                                                   |
+-----------+----------------------------------------------------------------------------------------+
| 602382718 | iso/iec23008-3 % iso23008-3 % iec23008-3 % iso/iec14496-24 % iso14496-24 % iec14496-24 |
+-----------+----------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

We did just notice that our expansion_limit is set to 100 instead of the default 0.
Are we correct in assuming this is most likely the cause given there are certainly more than 100 options for possible wildcard values.

If we change this setting to 0, what performance impact could we expect (we have several other large text fields)?

Yes
the expansion_limit could be the issue causes such behavior