When a node joins a cluster what happens if alrady local indexes?

barryhunter · August 25, 2022, 10:44am

Ok have put what I have been able to recover

github.com/manticoresoftware/manticoresearch

searchd closed, when manually issue 'JOIN CLUSTER' following a pod failure.

opened 10:39AM - 25 Aug 22 UTC

closed 01:01AM - 02 Nov 22 UTC

barryhunter

waiting for reply wontfix

**Describe the bug** Was unable to manually issue 'JOIN CLUSTER' following a po…d failure. I THINK it might of been memory related. See https://forum.manticoresearch.com/t/when-a-node-joins-a-cluster-what-happens-if-alrady-local-indexes/1155 for context. When rejoining the cluster the worker pod still had local indexes. Possibly the replication needed too much memory to resync the files on JOIN'ing. So searchd was killed for using too much memory, rather than actually crashing. Once I deleted the local indexes, was able to join the cluster successfully. **To Reproduce** **Describe the environment:** - Manticore Search version: 5.0.0 b4cb7da02@220518 release - OS version: Manticore Search Helm Chart Version: 5.0.0.2 **Messages from log files:** https://staging.data.geograph.org.uk/facets/manticorert2.2022-08-24.log.filtered.txt This is the entireity of the searchd.log being able to recover (complicated as the pod puts query_log into the same stream, so had to filter out queries - we have multi-line queries, so tricky!) The first KILL is known. That is when I inserted too much data, the rt_mem_limit exceeded the resources.limit for the worker pod. The second KILL is when I tried to get the worker pod to rejoin the cluster manually. The 'drop gridprefix' syntax error is when searchd has come back just after the second KILL. It was me attempting to delete the local indexes to retry joining the cluster. I dont know why the logs end there. I get nothing after that. **Additional context** [Add any other context about the problem here. In case you've faced a crash what `indextool --check` returns.](https://forum.manticoresearch.com/t/when-a-node-joins-a-cluster-what-happens-if-alrady-local-indexes/1155)

To be honest, suspect it searchd was killed by the OS when issued the JOIN, probably for using too much memory.

Going to try spinning up to clusters, one with more memory, so can do all the data insertion testing.
And another I can then intentionally try to crash

Also going to start logging searchd.log seperately to query.log

github.com/manticoresoftware/manticoresearch-helm

Pipe searchd.log to stderr by default?

opened 09:52AM - 25 Aug 22 UTC

barryhunter

waiting for reply

I know we can configure this, be redefining values.config.content, but wonder if… there is virtu in piping searchd to `stderr` by default? (leaving query log to `stdout`) log = /dev/stderr https://github.com/manticoresoftware/manticoresearch-helm/blob/master/chart/values.yaml#L66 ... This makes it much easier to inspect the logs, in systems that can separate stderr/stdout streams. We use loki to ingest logs from containers.