Build and rotate indexes on remote servers

Idsel_Seno · May 6, 2020, 7:03am

I have the following server setup:

web_server_1
web_server_2
manticore_server_1
manticore_server_2

How do I run the indexers of the manticore servers from any web server? The web servers will have a kind of script that will manage the building of indexes. I’m kind of lost how to run the indexers without programming some kind of SSH. Is there any other way than creating a SSH script?

Sergey · May 6, 2020, 9:04am

There’s no HTTP or SQL or any other TCP-based interface to run indexer remotely, so you need to go there and run it.

ssh manticore_server_1 indexer ... doesn’t seem to be too difficult.

Alternatively you can use RT indexes and just do INSERT INTO ... remotely via SQL over mysql, SQL over HTTO, JSON over HTTP or our new PHP client (GitHub - manticoresoftware/manticoresearch-php: Official PHP client for Manticore Search)

GhermanSokolov · July 19, 2022, 6:20pm

@Sergey Hi! Thanks for your work!

Still no option to call indexer via TCP? Old-style Sphinx-ish indexes inherently depend on separate Cron process updating them. This is especially problematic in containerized environments like Docker/Kubernetes. Running both Cron and searchd in one container is problematic because running two processes is problematic by itself. Normally I could run Cron outside in another container but without remote indexer it’s not possible. And running sshd instead of Cron won’t help either.

Also it makes writing automated integration tests a trouble because integration test needs to find a way to trigger reindexing after arrange-part before asserting that the search is working.

I was considering switching to Manticore’s new “RT” indexes but that’s really a completely different design. I would have to rewrite all infra code. Now I am wondering if I should switch to RT index or code a good Cron+searchd daemon runner.

GhermanSokolov · July 19, 2022, 9:15pm

I feel rather unintelligent and obtrusive as I am talking with myself. But maybe other Googlers could find it useful, so…

I was dwelling for several days upon xmlpipe/csvpipe, sophisticated main+delta scheme, trying to generate ids, solving merging duplicates issues and dealing with Cron setup. And then I just migrated to Manticore-specific RT indexes using only HTTP, very small config file and ready-made Docker image. Took me a couple of hours maybe overall. I was coming from Sphinx and trying old Sphinx approach. But this new approach seems quite a lot easier.

Sergey · July 20, 2022, 3:42am

Still no option to call indexer via TCP?

Unfortunately no.

Running both Cron and searchd in one container is problematic because running two processes is problematic by itself.

I’ve just tried this How to run a cron job inside a docker container? - Stack Overflow and it worked fine in a manticore container (official image):

root@7656cbf91f92:/var/lib/manticore# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mantico+       1  0.0  0.0 768404  9224 ?        Ssl  03:34   0:00 searchd --nodetach
root          37  0.0  0.0  40480  8004 pts/0    Ss+  03:34   0:00 mysql
root          42  0.0  0.0  20256  3852 pts/1    Ss   03:35   0:00 bash
root         644  0.0  0.0  30104  2844 ?        Ss   03:38   0:00 cron
root         648  0.0  0.0  36152  3208 pts/1    R+   03:39   0:00 ps aux

root@7656cbf91f92:/var/lib/manticore# tail -f /var/log/cron.log
Hello world

So you can build your own image FROM manticoresearch/manticore and add cron to it.

And then I just migrated to Manticore-specific RT indexes using only HTTP, very small config file and ready-made Docker image. Took me a couple of hours maybe overall

That’s exactly why we’ve been always focusing more on the RT part of the project since the beginning and added the HTTP JSON interface. Plain indexation is fine, but it made more sense back in the days when Sphinx was more of an extension to mysql/postgres/whatever.

barryhunter · July 20, 2022, 8:25am

For one of my projects, use a ‘wrapper’ script on the maniticore servers (well containors), that just fetches a list of indexes to build, and calls indexer

github.com

geograph-project/geograph-project/blob/british-isles/system/docker/manticore/usr/local/bin/indexer-wrapper.php

#!/usr/bin/php
<?php

$param = array('single'=>1,'prime'=>0, 'lock'=>1);

#####################################################
//very simple argument parser

for($i=1; $i<count($_SERVER['argv']); $i++) {
        $arg=$_SERVER['argv'][$i];
        if (substr($arg,0,2)=='--') {
                $arg=substr($arg,2);
                $bits=explode('=', $arg,2);
                if (isset($param[$bits[0]])) {
                        //if we have a value, use it, else just flag as true
                        $param[$bits[0]]=isset($bits[1])?$bits[1]:true;
                }
                else die("unknown argument --$arg\n");
        }
        else die("unexpected argument $arg\n");

This file has been truncated. show original

(just using PHP as most familar with that)

I simplified the code, to make it easier to follow

gist.github.com

https://gist.github.com/barryhunter/fba3938f7d9764a35be14bc363e673db

indexer-wrapper.php

#!/usr/bin/php
<?php

$db = mysqli_connect($_SERVER['MYSQL_HOST'],$_SERVER['MYSQL_USER'],$_SERVER['CONF_DB_PWD'],$_SERVER['MYSQL_DATABASE']);
if (mysqli_connect_errno()) {
    throw new RuntimeException('mysqli connection error: ' . mysqli_connect_error());
}

$server_id = db_Quote(trim(`hostname`));

This file has been truncated. show original

Cron, just calls the wrapper every 5 minutes

*/5 * * * *	    /usr/local/bin/indexer-wrapper.php

In the container, supercronic in running in a sidecar container to searchd.

The list of indexes to index come from mysql database.

The application can then set the schedule of individual indexes, or even dynamically add new schedules, pause indexing. Also manages main+delta indexes.

itsunixoid · January 28, 2023, 5:23pm

Sorry if late. You can do it like this:

define your own Dockerfile with xinetd installed:

FROM manticoresearch/manticore:5.0.2

RUN apt-get update && apt-get install -y xinetd && rm -rf /var/cache/apt/archives /var/lib/apt/lists

COPY xinetd.d/indexer_server /etc/xinetd.d/
COPY run_indexer.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/run_indexer.sh

COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

xinetd.d/indexer_server:

service indexer_server
{
    port            = 1234
    socket_type     = stream
    protocol        = tcp
    wait            = no
    user            = root
    server          = /usr/local/bin/run_indexer.sh
    instances       = 1
    type            = unlisted
}

entrypoint.sh:

#!/bin/bash
xinetd
exec searchd --nodetach

run_indexer.sh:

#!/bin/sh

cat << EOF
HTTP/1.0 204 No Content
Connection: close
Content-Length: 0

EOF

indexer --all --rotate --quiet

and then just start container with that image.

To rotate indexes send http GET to http://yourcontainer:1234