Indexer performance tuning suggestions

I am using a fairly small 50915 document index for documents and testing on the same server with identical configuration on Sphinx 3.3.1 and ManticoreSearch 3.53 dev versions. I have run each separately several times and am getting the same result. Sphinx 9.5 seconds, ManticoreSearch 15.4 seconds., 12 mln record index 231 sec vs 253 sec. In both cases slower. What am I missing as ManticoreSearch should go faster than Sphinx? No lemmatizer , no fancy languages. Would appreciate some feedback.

I have spent a day reading documentation but i found nothing so far on how to speed up indexing. The results are as such:

/usr/local/manticore/usr/bin/indexer --config /usr/local/manticore/etc/manticoresearch/manticore.conf --rotate pages
Manticore 3.5.3 0709fc5b@201119 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2020, Manticore Software LTD (http://manticoresearch.com)

using config file ‘/usr/local/manticore/etc/manticoresearch/manticore.conf’…
indexing index ‘pages’…
collected 50915 docs, 201.1 MB
creating lookup: 50.9 Kdocs, 100.0% done
creating histograms: 50.9 Kdocs, 100.0% done
sorted 23.2 Mhits, 100.0% done
total 50915 docs, 201127826 bytes
total 15.432 sec, 13032463 bytes/sec, 3299.13 docs/sec
total 26 reads, 0.047 sec, 3194.7 kb/call avg, 1.8 msec/call avg
total 199 writes, 0.194 sec, 838.2 kb/call avg, 0.9 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=83246).

/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --rotate pages
Sphinx 3.3.1 (commit b72d67b)
Copyright (c) 2001-2020, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file ‘/usr/local/sphinx/etc/sphinx.conf’…
indexing index ‘pages’…
collected 50915 docs, 201.1 MB
sorted 23.2 Mhits, 100.0% done
total 50915 docs, 201.1 Mb
total 9.5 sec, 21.10 Mb/sec, 5342 docs/sec
rotating indices: successfully sent SIGHUP to searchd (pid=70405).

Then I tried a bigger index of nearly 12 mln records and Sphinx was 20 seconds faster.
indexing index ‘cities1’…
collected 11854146 docs, 441.4 MB
creating lookup: 11854.1 Kdocs, 100.0% done
creating histograms: 11854.1 Kdocs, 100.0% done
sorted 76.3 Mhits, 100.0% done
total 11854146 docs, 441352986 bytes
total 253.812 sec, 1738896 bytes/sec, 46704.41 docs/sec
total 4347 reads, 0.536 sec, 198.4 kb/call avg, 0.1 msec/call avg
total 2195 writes, 4.259 sec, 619.0 kb/call avg, 1.9 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=83246).

indexing index ‘cities1’…
collected 11854146 docs, 441.4 MB
sorted 76.3 Mhits, 100.0% done
total 11854146 docs, 441.4 Mb
total 231.1 sec, 1.910 Mb/sec, 51297 docs/sec
rotating indices: successfully sent SIGHUP to searchd (pid=70405).

At Manticore we were never seriously optimizing indexation performance since Manticore Search was forked from Sphinx 2.3, so most likely it’s at that Sphinx 2.3. level.

If you need faster indexation the general suggestion would be to saturate your hardware better by splitting your indexes into parts and building them all at once. We are working on automated built-in sharding which will provide that out of the box, but it will work for RT indexes only.

Fair enough. Not a big deal just thought I may be missing some settings as configuration file is already very different from Sphinx.

On a good note - the same data index created by Sphinx 3.3 was 5.2 Gb, ManticoreSearch managed to squeeze it to 4.8 Gb. That is a big saving in disk space and probably faster searching. Looks promising.

While fast full reindex is a good thing, that is not as nearly important as fast searching.

Thank you for clarification.