I’m using Sphinx 2.2.11 and am having trouble with how Sphinx (and probably also Manticore) indexes terms that contain more than one instance of a blend character.
For example, I have the hyphen and period set as blend_chars:
blend_chars = ., -
Let’s say I have a term in the database as follows:
part1-part2.part3
I would expect that Sphinx would index this term in all possible combinations for each blend_char. For example:
- Variant 1:
part1-part2.part3
- Variant 2:
part1 part2.part3
- Variant 3:
part1-part2 part3
- Variant 4:
part1 part2 part3
However, that doesn’t seem to be the case.
If I search for:
part2.part3
I don’t find the record containing the term part1-part2.part3
.
However, if I search for:
part2 part3
OR
part1 part2 part3
I do find the record.
This suggests to me that Sphinx does not index all possible combinations of the blend_chars. Instead, it appears to index just two versions:
-
part1-part2.part3
(with blend_chars intact) -
part1 part2 part3
(with blend_chars ignored, treated as whitespace)
The documentation suggests this is true, especially in the entry for blend_mode
:
To quote:
By default, tokens that mix blended and non-blended characters get indexed in there [sic] entirety. For instance, when both at-sign and an exclamation are in
blend_chars
, “@dude!” will get result in two tokens indexed: “@dude!” (with all the blended characters) and “dude” (without any). Therefore “@dude” query will not match it.
So, that’s bad news indeed. But it confirms what I’m seeing.
I explored using blend_mode
to fix this in the hope that it would create multiple tokens for each term. However, it seems to help only in situations where the blend_char is at the beginning or end of the term (as in the examples in the docs). In my example, however, the blend_chars are in the middle of the search term, so it doesn’t help to trim them.
Can anyone confirm that they are seeing the same behavior? And can anyone suggest tips on how to fix or work around it?
Thanks very much!