HIGHLIGHT Confusion

I have been a little confused by the manual regarding HIGHLIGHT() options.

The limit option seems to apply to the length of ALL snippets added together excluding the separator characters, not the individual snippets. Thus if set too small the number of snippets you get is limited and worse the limit_snippets and limits_per_field options have no effect.
HIGHLIGHT({limit=10, limit_snippets=5}) - one ridiculously short snippet
HIGHLIGHT({limit=100, limit_snippets=5}) - two passable snippets
HIGHLIGHT({limit=1000, limit_snippets=5}) - A correct number of 5 snippets of a reasonable size
HIGHLIGHT({limit=2000, limit_snippets=5}) - in some cases where there is just one snippet that snippet can be a ridiculous length

Is this by design? Was the limit options meant to refer to total length of all snippets, or the length of each individual snippet (which is how the document reads to me)?

At the current time I have no control over an individual snippet length. One snippet was 530 chars long!

Server version:
6.2.12 dc5144d35@230822

Yes, this is by design. I believe the idea was that when displaying search results, it’s most important for the results to have similar lengths, so this takes precedence over other conflicting parameters, e.g., the number of snippets:

select highlight({limit=15}) from t where match('c n')
--------------

+----------------------------------------------+
| highlight({limit=15})                        |
+----------------------------------------------+
|  ...  b <b>c</b> d  ...  m <b>n</b> o p ...  |
+----------------------------------------------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---

However,

select highlight({limit=15}) from t where match('c n y j ')
--------------

+-----------------------------------------+
| highlight({limit=15})                   |
+-----------------------------------------+
|  ... h i <b>j</b> k l m <b>n</b> o ...  |
+-----------------------------------------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---

We see that c and y are not highlighted at all.

force_all_words changes this behavior:

select highlight({limit=15,force_all_words=1}) from t where match('c n y j')
--------------

+-------------------------------------------------------------------------------------+
| highlight({limit=15,force_all_words=1})                                             |
+-------------------------------------------------------------------------------------+
|  ... <b>c</b> d e f g h i <b>j</b> k l m <b>n</b> o p q r s t u v w x <b>y</b> ...  |
+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---

But in this case, you may need to post-process it in your application to split it into snippets.

BTW there are many open issues related with highlighting -

Feel free to add another one.

Thanks for your reply.

I have created a feature request on GH.

Can I suggest an adjustment to the docs to make things a little clearer here:
https://manual.manticoresearch.com/Searching/Highlighting#limit

Change:
The maximum snippet size, in symbols (codepoints). The default is 256. This is …
to:
The maximum length of total snippet output, in symbols (codepoints). The default is 256. With smaller values the number of snippets returned may be reduced to fit the length. This is …

In the meantime I can cope :wink:

I have created a feature request on GH.

Thanks.

Can I suggest an adjustment to the docs to make things a little clearer here:

Thanks! BTW there’s an edit button, you can click it and you’ll be taken to github where you can edit the page and create a pull request with your changes. Feel free to use this functionality. We added it especially for cases like this.

1 Like