Can't start from systemd - hangs

I am able to run Manticore as follows, manually:

/usr/bin/searchd --config /etc/manticoresearch/manticore.conf

However, if I start as so:

systemd start manticore

… the start process hangs (I’ve made sure to stop any searchd and mysql processes first).

There’s nothing shown in journalctl -u manticore

Interestingly, if I try to start with systemd, then run it manually, journalctl shows:

Sep 27 13:44:58 asti systemd[1]: manticore.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /run/manticore/searchd.pid

But it seems to work…

I have made sure that searchd and mysql processes are stopped before starting via either systemd or manually.

Why might it be hanging, and how can I debug? I don’t know how far it’s getting in its startup.

The unit file is:

[Unit]
Description=Manticore Search Engine
After=network.target remote-fs.target nss-lookup.target
Documentation=https://manual.manticoresearch.com/, man:searchd(1)

[Service]
Type=forking
User=manticore
Group=manticore
# Run ExecStartPre with root-permissions
PermissionsStartOnly=true
ExecStartPre=/bin/mkdir -p /var/run/manticore
ExecStartPre=/bin/chown manticore.manticore /var/run/manticore
# Run ExecStart with User=manticore / Group=manticore
ExecStart=/usr/bin/searchd --config /etc/manticoresearch/manticore.conf
ExecStop=/usr/bin/searchd --config /etc/manticoresearch/manticore.conf --stopwait
KillMode=process
KillSignal=SIGTERM
SendSIGKILL=no
LimitNOFILE=65536
LimitCORE=infinity
Restart=on-failure
TimeoutStartSec=infinity
PIDFile=/var/run/manticore/searchd.pid

[Install]
WantedBy=multi-user.target
Alias=manticore.service
Alias=searchd.service

could you change your ExecStart to

ExecStart=/usr/bin/searchd --config /etc/manticoresearch/manticore.conf --logdebug

kill your daemon then try to start it again and provide searchd.log there should be events related to daemon start

Hmmm. So the first thing I did is added the --logdebug switch. I tried to stop the searchd process manually with --stopwait but this didn’t appear to work - so I used kill which did work, eventually. It might just have been --stopwait was running slowly.

I started with systemctl start again. This time it worked, and returned fine. The logs in searchd.log weren’t very verbose, and I realised I forgot to daemon-reload after editing the unit file.

So I systemctl stop-ped Manticore, which worked, reloaded, and started again.

Now it’s hanging again.

searchd.log is actually empty for the current systemctl start - it ends with logs from the last stop. It doesn’t appear Manticore is even being started.

I tried restarting manually and it worked. Then I decided to see if I could get logs when starting manually, so I sent --stopwait. This took quite a long time to stop (a few minutes).

Restarting searchd manually shows this on stdout/stderr:

# /usr/bin/searchd --config /etc/manticoresearch/manticore.conf --logdebug
Manticore 3.6.0 96d61d8bf@210504 release
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2021, Manticore Software LTD (https://manticoresearch.com)

[17:11.002] [2596992] using config file '/etc/manticoresearch/manticore.conf' (2622 chars)...
[17:11.004] [2596992] DEBUG: 'read_timeout' - nothing specified, using default value 5000000
[17:11.004] [2596992] DEBUG: 'network_timeout' - nothing specified, using default value 5000000
[17:11.004] [2596992] DEBUG: 'sphinxql_timeout' - nothing specified, using default value 900000000
[17:11.004] [2596992] DEBUG: 'client_timeout' - nothing specified, using default value 300000000
[17:11.004] [2596992] DEBUG: SetMaxChildrenThreads to 4
[17:11.004] [2596992] DEBUG: 'read_unhinted' - nothing specified, using default value 32768
[17:11.004] [2596992] DEBUG: 'read_buffer' - nothing specified, using default value 262144
[17:11.004] [2596992] DEBUG: 'read_buffer_docs' - nothing specified, using default value 262144
[17:11.004] [2596992] DEBUG: 'read_buffer_hits' - nothing specified, using default value 262144
[17:11.004] [2596992] DEBUG: 'attr_flush_period' - nothing specified, using default value 0
[17:11.004] [2596992] DEBUG: 'max_packet_size' - nothing specified, using default value 8388608
[17:11.004] [2596992] DEBUG: 'rt_merge_maxiosize' - nothing specified, using default value 0
[17:11.004] [2596992] DEBUG: 'ha_ping_interval' - nothing specified, using default value 1000000
[17:11.004] [2596992] DEBUG: 'ha_period_karma' - nothing specified, using default value 60000000
[17:11.004] [2596992] DEBUG: 'query_log_min_msec' - nothing specified, using default value 0
[17:11.004] [2596992] DEBUG: 'agent_connect_timeout' - nothing specified, using default value 1000000
[17:11.004] [2596992] DEBUG: 'agent_query_timeout' - nothing specified, using default value 3000000
[17:11.004] [2596992] DEBUG: 'agent_retry_delay' - nothing specified, using default value 500000
[17:11.004] [2596992] DEBUG: 'docstore_cache_size' - nothing specified, using default value 16777216
[17:11.004] [2596992] DEBUG: 'qcache_max_bytes' - nothing specified, using default value 16777216
[17:11.004] [2596992] DEBUG: 'qcache_thresh_msec' - nothing specified, using default value 3000000
[17:11.004] [2596992] DEBUG: 'qcache_ttl_sec' - nothing specified, using default value 60000000
DEBUG: StartGlobalWorkpool
starting daemon version '3.6.0 96d61d8bf@210504 release' ...
listening on 127.0.0.1:9312 for sphinx and http(s)
listening on 127.0.0.1:9306 for mysql
listening on 10.0.0.3:9306 for mysql
DEBUG: 'rt_flush_period' - nothing specified, using default value 36000000000
precaching index 'library'
DEBUG: Locking the index via file /mnt/HC_Volume_10495637/manticore/data/library.spl
DEBUG: lock /mnt/HC_Volume_10495637/manticore/data/library.spl success
precached 1 indexes in 0.552 sec
DEBUG: expression stack for creation 16
DEBUG: expression stack for eval/deletion 48
DEBUG: filter stack delta 144

In searchd.log:

[Mon Oct  2 10:17:11.006 2023] [2596994] watchdog: main process 2596995 forked ok
[Mon Oct  2 10:17:11.006 2023] [2596995] DEBUG: StartGlobalWorkpool
[Mon Oct  2 10:17:11.006 2023] [2596995] starting daemon version '3.6.0 96d61d8bf@210504 release' ...
[Mon Oct  2 10:17:11.007 2023] [2596995] listening on 127.0.0.1:9312 for sphinx and http(s)
[Mon Oct  2 10:17:11.007 2023] [2596995] listening on 127.0.0.1:9306 for mysql
[Mon Oct  2 10:17:11.007 2023] [2596995] listening on 10.0.0.3:9306 for mysql
[Mon Oct  2 10:17:11.007 2023] [2596995] DEBUG: 'rt_flush_period' - nothing specified, using default value 36000000000
[Mon Oct  2 10:17:11.019 2023] [2596996] DEBUG: Locking the index via file /mnt/HC_Volume_10495637/manticore/data/library.spl
[Mon Oct  2 10:17:11.019 2023] [2596996] DEBUG: lock /mnt/HC_Volume_10495637/manticore/data/library.spl success
[Mon Oct  2 10:17:11.562 2023] [2596995] DEBUG: expression stack for creation 16
[Mon Oct  2 10:17:11.563 2023] [2596995] DEBUG: expression stack for eval/deletion 48
[Mon Oct  2 10:17:11.565 2023] [2596995] DEBUG: filter stack delta 144
[Mon Oct  2 10:17:11.566 2023] [2596995] DEBUG: 'binlog_max_log_size' - nothing specified, using default value 268435456
[Mon Oct  2 10:17:11.567 2023] [2596995] DEBUG: MAC address 96:00:00:a8:97:0a for uuid-short server_id
[Mon Oct  2 10:17:11.567 2023] [2596995] DEBUG: uid-short server_id 113, started 139573031, seed 8144849773174718464
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: replaying log /mnt/HC_Volume_10495637/manticore/data/binlog.001
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: finished replaying /mnt/HC_Volume_10495637/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: replaying log /mnt/HC_Volume_10495637/manticore/data/binlog.001
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: finished replaying /mnt/HC_Volume_10495637/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: replaying log /mnt/HC_Volume_10495637/manticore/data/binlog.001
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: replay stats: 0 rows in 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 indexes
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: finished replaying /mnt/HC_Volume_10495637/manticore/data/binlog.001; 0.0 MB in 0.000 sec
[Mon Oct  2 10:17:11.568 2023] [2596995] binlog: finished replaying total 3 in 0.000 sec
[Mon Oct  2 10:17:11.568 2023] [2596995] DEBUG: SaveMeta: Done.
[Mon Oct  2 10:17:11.569 2023] [2596999] prereading 1 indexes
[Mon Oct  2 10:17:11.569 2023] [2596995] accepting connections
[Mon Oct  2 10:17:11.569 2023] [2596999] DEBUG: prereading index 'library'
[Mon Oct  2 10:17:11.569 2023] [2596999] DEBUG: CSphIndex_VLN::Preread invoked 'library'(/mnt/HC_Volume_10495637/manticore/data/library)

Is everything ok with the permissions? When you started the searchd manually, did you ensure to do it under the same user which is used to run it via systemd? If you did, what’s in the systemd log at the moment when it’s starting Manticore and there’s nothing in the searchd log?

Sorry for the protracted nature of this, only get the chance to look at this now and then!

I have made sure I run searchd manually with the same user as via systemd. I noticed that when I issued a manual stop it took a long time to return after sending SIGTERM to the running process. This was the output:

# sudo -u manticore /usr/bin/searchd --config /etc/manticoresearch/manticore.conf --stopwait
Manticore 3.6.0 96d61d8bf@210504 release
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2021, Manticore Software LTD (https://manticoresearch.com)

[40:41.929] [786476] using config file '/etc/manticoresearch/manticore.conf' (2622 chars)...
[40:41.931] [786476] stop: successfully sent SIGTERM to pid 786431

If I make sure no searchd is running, then I run systemctl start manticore I get the hang:

In systemd log

Oct 13 13:39:20 asti systemd[1]: manticore.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /run/manticore/searchd.pid
Oct 13 13:39:20 asti systemd[1]: manticore.service: Refusing to accept PID outside of service control group, acquired through unsafe symlink chain: /run/manticore/searchd.pid
Oct 13 13:40:06 asti systemd[1]: manticore.service: New main PID 786431 does not belong to service, and PID file is not owned by root. Refusing.
Oct 13 13:40:06 asti systemd[1]: manticore.service: New main PID 786431 does not belong to service, and PID file is not owned by root. Refusing.

/run/manticore/searchd.pid does not exist. The pid_file should be:

pid_file = /var/run/manticore/searchd.pid

/var/run/manticore/ is currently empty. The unit file for systemctl (as above) has:

PIDFile=/var/run/manticore/searchd.pid

/var/run is a symbolic link to /run.

In searchd log there’s nothing since the previous shutdown.

If I stop the hung systemctl startup and then restart searchd manually, I do get a PID:

# ls -lh /var/run/manticore/
total 4.0K
-rw------- 1 manticore manticore 7 Oct 13 13:53 searchd.pid

… but as I wrote above, after stopping searchd from its manual start I could see there was no pid file before starting with systemctl.

We made a fix related with --stopwait in Manticore 4 Manticore Search Manual: Changelog

The related commit is https://github.com/manticoresoftware/manticoresearch/commit/3506cbe4da9deb6b09a5a4e6db0b69b3af12d59f

It may be the reason.

I think it’s running as the same user though. Is there somewhere in the logs above that indicates otherwise?

I’ve found the related issue - searchd --stopwait fails under root · Issue #618 · manticoresoftware/manticoresearch · GitHub

Doesn’t it look similar to your case?

I’m not running as root though… but if the fix for the issue you linked to fixes some other issues, which is suggested, then it might be worth a try…

You run systemctl under root, don’t you? Systemd then runs searchd --stop also under root.

Oh - I thought:

User=manticore

… in the unit file meant systemctl ran it as that user… no?

You are right. In case if User=manticore systemd uses it for both ExecStart and ExecStop.

Anyway, I see that in 3.6.0 systemctl stop manticore takes 3+ second and then fails with manticore.service: Failed with result 'exit-code':

[root@459fa65b8e4d /]# systemctl start manticore
[root@459fa65b8e4d /]# time systemctl stop manticore

real    0m3.128s
user    0m0.010s
sys     0m0.000s

[root@459fa65b8e4d /]# journalctl -u manticore|tail -10
Oct 20 07:30:50 459fa65b8e4d systemd[1]: Stopping Manticore Search Engine...
Oct 20 07:30:50 459fa65b8e4d searchd[286]: [Fri Oct 20 07:30:50.658 2023] [286] using config file '/etc/manticoresearch/manticore.conf' (360 chars)...
Oct 20 07:30:50 459fa65b8e4d searchd[286]: [Fri Oct 20 07:30:50.659 2023] [286] stop: successfully sent SIGTERM to pid 248
Oct 20 07:30:53 459fa65b8e4d searchd[286]: Manticore 3.6.0 96d61d8bf@210504 release
Oct 20 07:30:53 459fa65b8e4d searchd[286]: Copyright (c) 2001-2016, Andrew Aksyonoff
Oct 20 07:30:53 459fa65b8e4d searchd[286]: Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Oct 20 07:30:53 459fa65b8e4d searchd[286]: Copyright (c) 2017-2021, Manticore Software LTD (https://manticoresearch.com)
Oct 20 07:30:53 459fa65b8e4d systemd[1]: manticore.service: Control process exited, code=exited status=1
Oct 20 07:30:53 459fa65b8e4d systemd[1]: manticore.service: Failed with result 'exit-code'.
Oct 20 07:30:53 459fa65b8e4d systemd[1]: Stopped Manticore Search Engine.

while in 6.2.12 it stops much faster and doesn’t fail:

[root@02f99f47d491 /]# time systemctl stop manticore

real    0m0.126s
user    0m0.012s
sys     0m0.000s

[root@02f99f47d491 /]# journalctl -u manticore|tail -10
Oct 20 07:34:41 02f99f47d491 systemd[1]: Started Manticore Search Engine.
Oct 20 07:34:46 02f99f47d491 systemd[1]: Stopping Manticore Search Engine...
Oct 20 07:34:46 02f99f47d491 searchd[300]: [Fri Oct 20 07:34:46.866 2023] [300] using config file '/etc/manticoresearch/manticore.conf' (269 chars)...
Oct 20 07:34:46 02f99f47d491 searchd[300]: [Fri Oct 20 07:34:46.867 2023] [300] stop: successfully sent SIGTERM to pid 218
Oct 20 07:34:46 02f99f47d491 searchd[300]: Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Oct 20 07:34:46 02f99f47d491 searchd[300]: Copyright (c) 2001-2016, Andrew Aksyonoff
Oct 20 07:34:46 02f99f47d491 searchd[300]: Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Oct 20 07:34:46 02f99f47d491 searchd[300]: Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)
Oct 20 07:34:46 02f99f47d491 systemd[1]: manticore.service: Succeeded.
Oct 20 07:34:46 02f99f47d491 systemd[1]: Stopped Manticore Search Engine.

Looks like we could do with updating! Thanks for looking into this.