Can access index_field_lengths in expression ranker

barryhunter · April 4, 2025, 6:29pm

I kinda want to do

 option ranker=expr('sum(lcs=field_len)');

Ie the want to favour where the ‘whole’ field matches the query.

But want to to be the length of the specific field. I can do

>select id,tag,tag_len,weight() from tagsk where match('Mountain')
 option ranker=expr('sum(lcs=tag_len)');
+--------+--------------------------+---------+----------+
| id     | tag                      | tag_len | weight() |
+--------+--------------------------+---------+----------+
|      8 | Mountain                 | 1       |        1 |
|   1812 | Mountainside             | 1       |        1 |
|   3427 | mountains                | 1       |        1 |
|  19681 | Snowdon                  | 1       |        1 |
|  60516 | Goatfell                 | 1       |        1 |
|  60518 | Arkle                    | 1       |        1 |
|  60999 | tryfan                   | 1       |        1 |
|  73617 | mountaineers             | 1       |        1 |
|  76940 | mountaineer              | 1       |        1 |
|    167 | Divis Mountain           | 2       |        0 |
|   1661 | mountain railway         | 2       |        0 |
|   1832 | snowdon mountain railway | 3       |        0 |

Where tag is the field, can access the _len attribute by name.

But the sum() is looping over all the fields and want to be the specific the length of the specific matched field.

The above query demonstrates the issue, in that (for example), Snowdon is listed with rank of 1, but has a lcs of zero with the tag field (it some OTHER field that lcs matches tag_len! caption_len != 1 - the matched field)

(my real query is more complicated, this is a ‘simplified’ example)

Know there is a kinda workaround, with user_weight

 option ranker=expr('sum(if(user_weight=10,lcs=tag_len,0))'), field_weights=(tag=10)

works because user_weight is the weight of the specific field, so can be sensitive to specific fields inside sum(). But gets messy with lots of fields (nested IFs with lots of different weights), the above only works with one field.
hoping for something simpler. Also don’t want the ‘side effect’ of changing the weight - eg in the internal ranking.

tomat · April 4, 2025, 7:34pm

I sure Searching > Sorting and ranking | Manticore Search Manual

the exact_hit field level factor is what you need, ie bool set to true \ 1 whether query == field as manual stated

tomat · April 4, 2025, 7:35pm

and sph04 formula uses that factor similar to your need as sum(... + exact_hit)

sph04 = sum((4*lcs+2*(min_hit_pos==1)+exact_hit)*user_weight)*1000+bm25

barryhunter · April 4, 2025, 8:15pm

I’m actully wanting to compare the field length the lccs, not with the query length.

It’s a kind of prospective search (for various reasons not using an actual percolate index)

Its actully with ‘quorum’ to make like a ‘match any’ query.

This is more complete demo, (still a bit simplified)

sphinxQL>select id,tag,tag_len,weight() from tagsk 
 where match('"looking towards black Mountains see sky"/1')
 option ranker=expr('sum(lcs=tag_len)*bm25'), field_weights=(tag=10), morphology=none;
+--------+---------------------+---------+----------+
| id     | tag                 | tag_len | weight() |
+--------+---------------------+---------+----------+
|  35260 | Black House         | 2       |      549 |
| 248464 | Black Notley        | 2       |      549 |
|   4164 | Black Mountains     | 2       |      545 |
|  28323 | Grove Farm          | 2       |      529 |
|  28419 | Bruisyard Wood      | 2       |      529 |
|  28449 | Martins Farm        | 2       |      529 |
|  28538 | Wolsey Cottages     | 2       |      529 |
|  28656 | Spexhall Crossroads | 2       |      529 |
|  29000 | Sun Corner          | 2       |      529 |
|  29039 | Ash Road            | 2       |      529 |
|  29295 | Fir Pits            | 2       |      529 |
|  29391 | Englishes Lane      | 2       |      529 |
|  29399 | High Street         | 2       |      529 |
|  29461 | Friday Street       | 2       |      529 |
|  29544 | Potash Farm         | 2       |      529 |
|  29564 | Little Glemham      | 2       |      529 |
|  29726 | Nollers Lane        | 2       |      529 |
|  29740 | Campsea Ashe        | 2       |      529 |
|  29849 | Wangford Road       | 2       |      529 |

sphinxQL>select id,tag,tag_len,weight() from tagsk
 where match('"looking towards black Mountains see sky"/1')
 option ranker=expr('sum(if(user_weight=10,lcs=tag_len,0))*bm25'), field_weights=(tag=10), morphology=none;
+--------+-------------------+---------+----------+
| id     | tag               | tag_len | weight() |
+--------+-------------------+---------+----------+
|   4164 | Black Mountains   | 2       |      545 |
|   3427 | mountains         | 1       |      525 |
|   6191 | Sky               | 1       |      524 |
|   4767 | black             | 1       |      519 |
|    410 | grey sky          | 2       |        0 |
|    447 | Black five        | 2       |        0 |
|    465 | blue              | 1       |        0 |
|   1060 | Air, Sky, Weather | 3       |        0 |
|   1932 | black and white   | 3       |        0 |
|   2494 | black death       | 2       |        0 |
|   3693 | black isle        | 2       |        0 |
|   3717 | blue sky          | 2       |        0 |
|   4591 | the Black Burn    | 3       |        0 |
|   5434 | see-saw           | 2       |        0 |

This second one, is much better at picking out tags that actually exist in the ‘query’.