Migration from Sphinx 2.2.11 to Manticore 4.2

I’ve been running Sphinx 2.2.11 in production for quite a long time. I’ve not bothered upgrading this ancient version for a few reasons:

  1. It’s been absolutely rock solid. It just runs, barely uses any CPU/memory, and doesn’t need any maintenance or monitoring whatsoever.

  2. When I tried (long ago) testing out later Sphinx releases, I found my wildcarded searches incredibly slow. Apparently the option dict = crc had been removed. In my application, all query terms are wildcarded (a * is appended automatically to every term), so this performance hit was not sustainable.

  3. The Sphinx project stopped packaging their new releases as RPMs. While this is just a small quibble, it makes installation more difficult. I’d like for all software on my production systems to be deployed through a package management system.

I recently decided it’s time to get serious about finding a Sphinx alternative for the future, because I don’t want to be running software that hasn’t been updated in 6 years.

So, I did a test installation of Manticore 4.2 and tried to use my old Sphinx config file. I had to make about 6-8 minor changes to the config in order to get the indexer to run – all of which were explained fully in the excellent documentation. After indexing, I started searchd, threw all caution to the wind, and pointed my existing application to it. I didn’t even bother to update the PHP Sphinx API I’m using (which is from 2015!).

I was shocked – it just worked!

I’ve just replaced a 2016 release of Sphinx with modern software, making only the most minor config changes, and no changes at all in my application. And my application works like nothing happened. Amazing!

Before moving forward, I’ve got a few questions:

  1. Is it important to update my PHP API code to the latest version? If I do, should I expect that to be a drop-in replacement, without any changes needed in my queries?

  2. I’ve read your documentation regarding the dict=crc vs. dict=keyword setting. It’s a bit unclear to me whether it makes sense to change this from crc to keyword. (As I mentioned, my application makes extensive use of wildcard suffixes, and this setting change resulted in bad performance in Sphinx.) Is there any reason I should make any change to Manticore?

  3. Are there any other things I should be aware of when jumping so many versions? I realize I need to test extensively, but just wanted to see if there are any big changes I should definitely watch out for.

Thanks a lot for an excellent piece of software!

1 Like

Hi

Thank you for so kind words.

  1. Is it important to update my PHP API code to the latest version? If I do, should I expect that to be a drop-in replacement, without any changes needed in my queries?

If you mean the old sphinxapi.php, then yes, you’d better update it to https://github.com/manticoresoftware/manticoresearch/blob/master/api/sphinxapi.php since it got some changes in 2019. We also have a new php client - https://github.com/manticoresoftware/manticoresearch-php

  1. I’ve read your documentation regarding the dict=crc vs. dict=keyword setting. It’s a bit unclear to me whether it makes sense to change this from crc to keyword. (As I mentioned, my application makes extensive use of wildcard suffixes, and this setting change resulted in bad performance in Sphinx.) Is there any reason I should make any change to Manticore?

If you tested it well and switching from crc to keywords made things slower in Sphinx, then it makes sense to stick with crc, since if I recall correctly we didn’t change much the keywords mode that would make it significantly faster.

  1. Are there any other things I should be aware of when jumping so many versions? I realize I need to test extensively, but just wanted to see if there are any big changes I should definitely watch out for.

If you don’t need all the new functionality, e.g.:

  • RT mode and RT indexes in general with auto OPTIMIZE etc.,
  • percolate indexes
  • columnar storage
  • replication

then I can’t think of something that can become a problem if you just keep using plain indexes the old way. The only thing I would recommend is https://mnt.cr/pseudo_sharding . We are going to make it enabled by default in future releases, but it can be already helpful to lower response time. In some cases many times.

1 Like

Manticore at some point made a change to improve the performance on dict=keyword queries, when very large expansion. (ie when there are lots of full keywords matching the wildcard)

That fix, enabled me to switch to dict=keywords on most indexes. (although some of the bigger indexes still use dict=crc)

In short dict=keywords IS better in manticore, but still not that it completely obsolates dict=crc.

(althoguh I think Manticore v4, has a slightly newer index format. havnt well tested that, still using manticore 3.x)

2 Likes

Hi Barry,

Thanks for the feedback. I’m not sure what counts as a big index. Mine are in the single digit millions of records, with a size of a few GB each. But I think I’ll stick with the crc mode as long as it’s not deprecated.

For me it’s important that queries with Manticore 4 return the same results as Sphinx 2.2 at an equivalent performance level, and I think Manticore accomplishes that well.

I don’t find dict=keywords, any faster searching - if anything its slower most of the time!

dict=crc is still great for query performance. Its just that ‘indexer’ takes longer building an index. (plus index files for dict=crc can be bigger than keywords)

The real benefit of keywords is to be able to use ‘CALL KEYWORDS()’ or related ‘CALL SUGGEST()’ - which frankly if you using SphinxAPI, probably not using!

But otehrwise agree, that yes, manticore is a great drop in replacement for sphinx. While its perhaps moving away from some of the ‘sphinxisms’, the core searching functions still work great.

1 Like