When a node joins a cluster what happens if alrady local indexes?

barryhunter · August 24, 2022, 3:02pm

I managed to crash a node on a cluster (inserted too much data, the combined total of rt_mem_limit, was larger than the resources limit, so container was forcibly restarted by k8s)

Now the replication is broken. The current service IPs:

manticorert-worker-0: 10.72.38.198
manticorert-worker-1: 10.72.42.238
manticorert-worker-2: 10.72.45.210

manticorert-worker-2 is the node that was restarted, and I think USED to have the IP 10.72.47.81

… manticorert-worker-2 wont start, because it cant contact 10.72.47.81 - its old IP!

# php scripts/runsphrt.php "show status like 'uptime'" | grep Value
0:                                    Value: 254864
1:                                    Value: 254933
2:                                    Value: 2334

Log from: manticorert-worker-2
[Wed Aug 24 14:19:26.878 2022] [42] WARNING: cluster 'manticore': no available nodes (10.72.47.81,10.72.43.101,10.72.45.210), replication is disabled, error: '10.72.47.81:9312': connect timed out;'10.72.43.101:9312': connect timed out

Frankly not sure what 10.72.43.101 is!

And each node have different list of nodes

# php scripts/runsphrt.php "show status like 'cluster%node%'"
0: Counter, Value
0:   cluster_manticore_node_state, synced
0:   cluster_manticore_nodes_set, 10.72.38.198,10.72.42.238,10.72.45.210
0:   cluster_manticore_nodes_view, 10.72.42.238:9312,10.72.42.238:9315:replication,10.72.38.198:9312,10.72.38.198:9315:replication

1: Counter, Value
1:   cluster_manticore_node_state, synced
1:   cluster_manticore_nodes_set, 10.72.47.81,10.72.42.238,10.72.45.210
1:   cluster_manticore_nodes_view, 10.72.42.238:9312,10.72.42.238:9315:replication,10.72.38.198:9312,10.72.38.198:9315:replication

2: success but zero rows returned

… so intend to run UPDATE nodes on instances 0 and 1. Possibly promote one to master for bootstrap

On 2, will have to run JOIN cluster. But as it already has local copies of all the indexes, wont JOINing fail? I guess I need to clear out the data folder, so can ‘start fresh’ (syncing data from either 0 or 1)

tomat · August 24, 2022, 3:06pm

node that joins cluster replace index files by files from donor then reloads index.

during replace it uses SST file transfer (like rsync does - split file in chunks, calc hash of every chunk then transfers from donor chunks these do not match)

barryhunter · August 24, 2022, 3:21pm

Well I kinda answered my own question. It crashes!

mysql -hmanticorert-worker-2.manticorert-worker-svc.staging.svc.cluster.local -P9306 -A --prompt='RT2>'

RT2>show status like 'cluster%';
Empty set (0.001 sec)

RT2>join cluster manticore at '10.72.42.238:9312';
ERROR 2013 (HY000): Lost connection to MySQL server during query
RT2>show status like 'cluster%';
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    12
Current database: *** NONE ***

Empty set (0.003 sec)

RT2>show status like 'cluster%';
Empty set (0.001 sec)

RT2>show status like 'uptime';
+---------+-------+
| Counter | Value |
+---------+-------+
| uptime  | 17    |
+---------+-------+
1 row in set (0.001 sec)

RT2>show tables;
+-----------------+------+
| Index           | Type |
+-----------------+------+
| gridprefix      | rt   |
| gridsquare      | rt   |
| loc_placenames  | rt   |
| os_gaz          | rt   |
| os_gaz_250      | rt   |
| placename_index | rt   |
| snippet         | rt   |
+-----------------+------+
7 rows in set (0.003 sec)

RT2>drop gridprefix;
ERROR 1064 (42000): sphinxql: syntax error, unexpected IDENT, expecting FUNCTION or PLUGIN or TABLE near 'gridprefix'
### THis was my own typo, but can use this mistake to correlate with the logs!
### normal drop queries don't appear in query log, but the parse error does!

RT2>drop table gridprefix;
Query OK, 0 rows affected (0.005 sec)

###... dropped each one manually

RT2>show tables;
Empty set (0.001 sec)

RT2>join cluster manticore at '10.72.42.238:9312';
Query OK, 0 rows affected (3.323 sec)

RT2>show status like 'cluster%node%';
+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| Counter                      | Value                                                                                                                                           |
+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| cluster_manticore_node_state | synced                                                                                                                                          |
| cluster_manticore_nodes_set  | 10.72.45.210:9312,10.72.42.238:9312,10.72.38.198:9312                                                                                           |
| cluster_manticore_nodes_view | 10.72.45.210:9312,10.72.45.210:9315:replication,10.72.42.238:9312,10.72.42.238:9315:replication,10.72.38.198:9312,10.72.38.198:9315:replication |
+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.002 sec)

RT2>show tables;
+-----------------+------+
| Index           | Type |
+-----------------+------+
| gridprefix      | rt   |
| gridsquare      | rt   |
| loc_placenames  | rt   |
| os_gaz          | rt   |
| os_gaz_250      | rt   |
| placename_index | rt   |
| snippet         | rt   |
+-----------------+------+
7 rows in set (0.001 sec)

RT2>

If drop the local indexes, then can join the cluster!

(havent been able to access the logs from the pod yet, to find what searchd says)

tomat · August 25, 2022, 9:38am

could you create ticket at Github and provide searchd.log from node that crashed and donor node into that ticket?

barryhunter · August 25, 2022, 10:44am

Ok have put what I have been able to recover

github.com/manticoresoftware/manticoresearch

searchd closed, when manually issue 'JOIN CLUSTER' following a pod failure.

opened 10:39AM - 25 Aug 22 UTC

closed 01:01AM - 02 Nov 22 UTC

barryhunter

waiting for reply wontfix

**Describe the bug** Was unable to manually issue 'JOIN CLUSTER' following a po…d failure. I THINK it might of been memory related. See https://forum.manticoresearch.com/t/when-a-node-joins-a-cluster-what-happens-if-alrady-local-indexes/1155 for context. When rejoining the cluster the worker pod still had local indexes. Possibly the replication needed too much memory to resync the files on JOIN'ing. So searchd was killed for using too much memory, rather than actually crashing. Once I deleted the local indexes, was able to join the cluster successfully. **To Reproduce** **Describe the environment:** - Manticore Search version: 5.0.0 b4cb7da02@220518 release - OS version: Manticore Search Helm Chart Version: 5.0.0.2 **Messages from log files:** https://staging.data.geograph.org.uk/facets/manticorert2.2022-08-24.log.filtered.txt This is the entireity of the searchd.log being able to recover (complicated as the pod puts query_log into the same stream, so had to filter out queries - we have multi-line queries, so tricky!) The first KILL is known. That is when I inserted too much data, the rt_mem_limit exceeded the resources.limit for the worker pod. The second KILL is when I tried to get the worker pod to rejoin the cluster manually. The 'drop gridprefix' syntax error is when searchd has come back just after the second KILL. It was me attempting to delete the local indexes to retry joining the cluster. I dont know why the logs end there. I get nothing after that. **Additional context** [Add any other context about the problem here. In case you've faced a crash what `indextool --check` returns.](https://forum.manticoresearch.com/t/when-a-node-joins-a-cluster-what-happens-if-alrady-local-indexes/1155)

To be honest, suspect it searchd was killed by the OS when issued the JOIN, probably for using too much memory.

Going to try spinning up to clusters, one with more memory, so can do all the data insertion testing.
And another I can then intentionally try to crash

Also going to start logging searchd.log seperately to query.log

github.com/manticoresoftware/manticoresearch-helm

Pipe searchd.log to stderr by default?

opened 09:52AM - 25 Aug 22 UTC

barryhunter

waiting for reply

I know we can configure this, be redefining values.config.content, but wonder if… there is virtu in piping searchd to `stderr` by default? (leaving query log to `stdout`) log = /dev/stderr https://github.com/manticoresoftware/manticoresearch-helm/blob/master/chart/values.yaml#L66 ... This makes it much easier to inspect the logs, in systems that can separate stderr/stdout streams. We use loki to ingest logs from containers.

barryhunter · August 25, 2022, 10:45am

btw, whats ‘doner node’? Do you mean the node I tried to join the failed node TO?

Ie the one the the data would be resynced FROM?

tomat · August 25, 2022, 10:58am

yes donor node that node try to connect cluster select as donor to resync from

barryhunter · August 25, 2022, 11:01am

ok thanks. Have added the log from the worker node 1 (I entered its IP in the JOIN CLUSTER … TO )

It just sees the connection getting closed.