Exact search doesnt return exact values

Hello,

I have issued problem with exact search in manticore while sphinx returns correct values.

This is the way I created my real time table:
CREATE TABLE ads_wf (
id BIGINT,
name TEXT indexed,
description TEXT indexed,
categoryname TEXT indexed
.
.
.
.
)
wordforms = ‘/etc/manticoresearch/wordforms.txt’
charset_table = ‘0…9, A…Z->a…z, _, a…z, U+017E->z, U+017D->z, U+0161->s, U+0160->s, U+0107->c, U+0106->c, U+010C->c, U+010D->c, U+0111->d, U+0110->d, U+0430->a, U+0431->b, U+0432->v, U+0433->g, U+0434->d, U+0452->d, U+0435->e, U+0436->z, U+0437->z, U+0438->i, U+0458->j, U+043A->k, U+043B->l, U+043C->m, U+043D->n, U+043E->o, U+043F->p, U+0440->r, U+0441->s, U+0442->t, U+045B->c, U+0443->u, U+0444->f, U+0445->h, U+0446->c, U+0447->c, U+045F->U+01C6, U+0448->s, U+0410->a, U+0411->b, U+0412->v, U+0413->g, U+0414->d, U+0402->d, U+0415->e, U+0416->z, U+0417->z, U+0418->i, U+0419->j, U+041A->k, U+041B->l, U+041C->m, U+041D->n, U+041E->o, U+041F->p, U+0420->r, U+0421->s, U+0422->t, U+040B->c, U+0423->u, U+0424->f, U+0425->h, U+0426->c, U+0427->c, U+040F->U+01C4, U+0428->s’
min_prefix_len = ‘3’
html_strip = ‘1’
blend_chars = ‘+, -, &, U+23’
prefix_fields = ‘name,description’
expand_keywords = ‘1’ - here I also tried ‘exact’
index_exact_words = ‘1’
;

When I do the exact search for “ryzen 9 7900” manticore returns values like ryzen 9 7900 X, ryzen 9 7900X, ryzen 9 7900XT, ryzen 9 7900x and other irrelevant results while sphinx returns exact matches. Behavior is the same on my production server as well as on local environment.

Query that Im running from mysql using SphinxSE is:

SELECT SQL_NO_CACHE id AS ad_id, weight AS relevance, sph.posted FROM mnt_ads_wf_rt AS sph WHERE sph.query = '(@(name,categoryName) \=\"ryzen 9 7900\");mode=extended2;fieldweights=name,8,description,3,categoryName,1,adIdTag,6,topTag,9,priorityTag,5,normalTag,0;sort=extended:WEIGHT() DESC,posted DESC;offset=0;limit=10000;maxmatches=19999;ranker=expr:(sum((4*…… the rest of ranker

Any idea what Im doing wrong and what could fix this issue?

Thanks.

Hello

"ryzen 9 7900" is not an exact search. It’s a phrase search. Given you have min_prefix_len and expand_keywords set 7900 (as well as the other keywords) gets expanded to 7900 | 7900*. Don’t use expand_keywords if you don’t need it or use the exact form modifier - Manticore Search Manual: Searching > Full text matching > Operators

Hey, thank you for the response.

Maybe I wasn’t clean enough what I want to achieve. I want to be able to search using keyword prefix with value 3. At the meantime I want to be able to use exact search.

Documentation says:
" Another use case is to prevent expanding a keyword to its *keyword* form. For example, with index_exact_words=1 + expand_keywords=1/star , bcd will find a document containing abcde , but =bcd will not."

To me, this looks like even tho I’m using min_prefix_len and expand_keywords, if I set index_exact_words=1 + expand_keywords=1/star ‘=ryzen 9 7900’ will find only ryzen 9 7900 but it returns ‘ryzen 9 7900X’. I completely understand it returns 7900 X but still don’t get why I’m getting 7900X.

If I still misunderstood this part of documentation, is there a way to implement exact and prefix search at the same time and make distinction between those?

Thanks.

I completely understand it returns 7900 X but still don’t get why I’m getting 7900X.

Because you have to use = before each keyword to match with its exact form.

It obviously doesn’t work.

Current Sphinx query is:
SELECT group_concat(id) FROM sph_3_ads_wf_index AS sph WHERE sph.query = '(@(name,categoryName) \=\"ryzen 9 7900\");

The result i got:

I changed query to SELECT GROUP_CONCAT(id) FROM mnt_ads_wf_rt AS sph WHERE sph.query = '(@(name,categoryName) \=ryzen \=9 \=7900);

Im getting items like “7900X3D” and not only that but also items like “ZVER AMD Ryzen 9 5900X 12/24core 32Gb SSD480RX 7900 XT 20Gb”. I believe it’s because Im not using Phrase search operator so the words don’t have to be next to each other.

I also tried combining = before each keyword with " " for phrase search, unfortunately without success. As I wrote in first post, using sphinx syntax (\=\"ryzen 9 7900\) for manticore doesn’t work as Im not getting things with 7900 but with 7900+something.

I tried x combinations with configuration and modifying SphinxSE query so I get exact word form + phrase search so I make sure words are next to each other but it still doesn’t work.

I’ve reproduced the issue and it looks like a bug to me. The MRE is:

mysql> set profiling=1; drop table if exists t; create table t(f text) min_prefix_len='3' index_exact_words='1' expand_keywords='1'; insert into t values(1, 'ryzen 9 7900XT'), (2, 'ryzen 9 7900'); select * from t where match('=9 =7900'); show plan\G select * from t where match('"=9 =7900"'); show plan\G select * from t where match('="9 7900"'); show plan\G
--------------
set profiling=1
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table t(f text) min_prefix_len='3' index_exact_words='1' expand_keywords='1'
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
insert into t values(1, 'ryzen 9 7900XT'), (2, 'ryzen 9 7900')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select * from t where match('=9 =7900')
--------------

+------+--------------+
| id   | f            |
+------+--------------+
|    2 | ryzen 9 7900 |
+------+--------------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---

--------------
show plan
--------------

*************************** 1. row ***************************
Variable: transformed_tree
   Value: AND(
  AND(KEYWORD(=9, querypos=1)),
  AND(KEYWORD(=7900, querypos=2)))
1 row in set (0.00 sec)

--------------
 select * from t where match('"=9 =7900"')
--------------

+------+----------------+
| id   | f              |
+------+----------------+
|    2 | ryzen 9 7900   |
|    1 | ryzen 9 7900XT |
+------+----------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

--------------
show plan
--------------

*************************** 1. row ***************************
Variable: transformed_tree
   Value: PHRASE(
  OR(
    AND(KEYWORD(=9, querypos=1)),
    AND(KEYWORD(=9, querypos=1))),
  OR(
    AND(KEYWORD(=7900, querypos=2)),
    AND(KEYWORD(=7900*, querypos=2, expanded)),
    AND(KEYWORD(=7900, querypos=2))))
1 row in set (0.00 sec)

--------------
 select * from t where match('="9 7900"')
--------------

+------+----------------+
| id   | f              |
+------+----------------+
|    2 | ryzen 9 7900   |
|    1 | ryzen 9 7900XT |
+------+----------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

--------------
show plan
--------------

*************************** 1. row ***************************
Variable: transformed_tree
   Value: PHRASE(
  OR(
    AND(KEYWORD(=9, querypos=1)),
    AND(KEYWORD(=9, querypos=1))),
  OR(
    AND(KEYWORD(=7900, querypos=2)),
    AND(KEYWORD(=7900*, querypos=2, expanded)),
    AND(KEYWORD(=7900, querypos=2))))
1 row in set (0.00 sec)

As it can be seen in the SHOW PLAN outputs, when = is combined with the phrase operator either in the form of ="smth smth" or "=smth =smth", the exact form modifier is ignored and the keywords are still expanded. It works fine w/o the phrase operator.

I’ve created an issue about it on GitHub - Exact form modifier is ignored when combined with phrase search · Issue #2493 · manticoresoftware/manticoresearch · GitHub

Thanks for pointing this out. We’ll discuss it with the dev team.