добрый день
после вчерашнего падения кластера не могу присоединить третью ноду
какие-то странные тайм-ауты хотя все порты открыты и работали
размер таблиц 10-15гб
Примерно такие ошибки:
НА исходнике
[Thu Sep 21 14:29:07.846 2023] [943499] RPL: calculated sha1 of table ‘rt_contractors_search’, files 507, hashes 42211880
[Thu Sep 21 14:29:07.846 2023] [943499] RPL: reserve table ‘rt_contractors_search’ at 1 nodes with timeout 900.000 sec
[Thu Sep 21 14:29:08.962 2023] [943499] RPL: reserved table ‘rt_contractors_search’ - ok
[Thu Sep 21 14:29:08.988 2023] [943499] RPL: sending table ‘rt_contractors_search’
[Thu Sep 21 14:29:08.988 2023] [943499] RPL: sending file rt_contractors_search.0.spi (7) to 10.30.0.168:9312, packets 1, timeout 120.000 sec
[Thu Sep 21 14:29:14.885 2023] [943499] RPL: ‘10.30.0.168:9312’ error when sending data: Broken pipe
[Thu Sep 21 14:29:14.885 2023] [943499] RPL: sending file rt_contractors_search.0.spidx (8) to 10.30.0.168:9312, packets 1, timeout 120.000 sec
[Thu Sep 21 14:29:15.018 2023] [943499] RPL: sending file rt_contractors_search.0.spm (9) to 10.30.0.168:9312, packets 2, timeout 120.000 sec
[Thu Sep 21 14:29:15.023 2023] [943499] RPL: sending file rt_contractors_search.0.spp (10) to 10.30.0.168:9312, packets 3, timeout 120.000 sec
[Thu Sep 21 14:29:26.306 2023] [943499] RPL: ‘10.30.0.168:9312’ error when sending data: Broken pipe
[Thu Sep 21 14:29:26.306 2023] [943499] RPL: sending file rt_contractors_search.0.spt (11) to 10.30.0.168:9312, packets 4, timeout 120.000 sec
[Thu Sep 21 14:29:26.319 2023] [943499] RPL: sending file rt_contractors_search.1.spa (12) to 10.30.0.168:9312, packets 5, timeout 120.000 sec
[Thu Sep 21 14:29:26.904 2023] [943499] RPL: sending file rt_contractors_search.1.spb (13) to 10.30.0.168:9312, packets 6, timeout 120.000 sec
[Thu Sep 21 14:29:32.773 2023] [943499] RPL: ‘10.30.0.168:9312’ error when sending data: Broken pipe
[Thu Sep 21 14:29:32.773 2023] [943499] RPL: sending file rt_contractors_search.1.spd (14) to 10.30.0.168:9312, packets 7, timeout 120.000 sec
[Thu Sep 21 14:29:33.874 2023] [943499] RPL: sending file rt_contractors_search.1.spds (15) to 10.30.0.168:9312, packets 8, timeout 120.000 sec
[Thu Sep 21 14:30:22.451 2023] [943499] WARNING: ‘10.30.0.168:9312’ error when sending data: Broken pipe;‘10.30.0.168:9312’ error when sending data: Broken pipe;‘10.30.0.168:9312’
error when sending data: Broken pipe;‘10.30.0.168:9312’ error when sending data: Broken pipe;‘10.30.0.168:9312’ error when sending data: Broken pipe;‘10.30.0.168:9312’ error when s
ending data: Broken pipe;‘10.30.0.168:9312’ error when sending data: Broken pipe
[Thu Sep 21 14:30:22.452 2023] [943499] RPL: 0(1) nodes finished well
НА получателе (новый)
[Thu Sep 21 14:28:08.925 2023] [715077] RPL: /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/galera/src/replicator_str.cpp:prepar
e_state_request():604: State gap can’t be serviced using IST. Switching to SST
[Thu Sep 21 14:28:08.925 2023] [715077] RPL: /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/galera/src/replicator_str.cpp:prepar
e_state_request():606: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (a9dd5c12-5230-11ee
-86fc-b691867bf737): 1 (Operation not permitted)
at /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/galera/src/replicator_str.cpp:prepare_for_IST():538. IST will be unav
ailable.
[Thu Sep 21 14:28:08.925 2023] [715077] RPL: /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/gcs/src/gcs.cpp:gcs_request_state_tr
ansfer():1817: ist_uuid[00000000-0000-0000-0000-000000000000], ist_seqno[-1]
[Thu Sep 21 14:28:08.925 2023] [715076] RPL: /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/gcs/src/gcs_group.cpp:group_select_d
onor():1354: Member 2.0 (node_10.30.0.168_prodmain01_685933) requested state transfer from ‘any’. Selected 0.0 (node_10.30.0.167_prodmain01_943323)(SYNCED) as donor.
[Thu Sep 21 14:20:43.291 2023] [686484] RPL: /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/galera/src/replicator_smm.cpp:proces
s_trx():1404: Ignorng trx(487693) due to SST failure
[Thu Sep 21 14:20:43.291 2023] [686484] RPL: /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/galera/src/replicator_smm.cpp:proces
s_trx():1404: Ignorng trx(487694) due to SST failure
[Thu Sep 21 14:20:43.291 2023] [686484] RPL: /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/galera/src/replicator_smm.cpp:proces
s_trx():1404: Ignorng trx(487695) due to SST failure
[Thu Sep 21 14:20:43.291 2023] [686484] RPL: /__w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/galera/src/gcs_action_source.cpp:dis
patch():142: Received SELF-LEAVE. Closing connection.
[Thu Sep 21 14:20:43.291 2023] [686484] RPL: new cluster membership: -1(0), global seqno: 0, status non-primary, gap 0
[Thu Sep 21 14:20:43.291 2023] [686484] RPL:
[Thu Sep 21 14:20:43.291 2023] [686484] RPL: /_w/manticoresearch/manticoresearch/build/galera-build/galera_populate-prefix/src/galera_populate/galera/src/replicator_smm.cpp:async
recv():461: Slave thread exit. Return code: 6
[Thu Sep 21 14:20:43.291 2023] [686484] RPL: receiver prodmain01 done, code 6, error in client connection, must abort
[Thu Sep 21 14:20:43.291 2023] [685913] FATAL: ‘prodmain01’ cluster after join error: ‘10.30.0.168:9312’ error when sending data: Broken pipe;‘10.30.0.168:9312’ error when sending
data: Broken pipe;‘10.30.0.168:9312’ error when sending data: Broken pipe, nodes ‘10.30.0.166:9322,10.30.0.167:9320’
[Thu Sep 21 14:20:43.291 2023] [686484] DEBUG: Detached::RemoveThread called for 686484
[Thu Sep 21 14:20:43.291 2023] [685913] RPL: deleting cluster prodmain01
[Thu Sep 21 14:20:43.291 2023] [686484] DEBUG: Terminated thread 686484, ‘prodmain01_repl_0’
скопирует 80-200мб и здыхает
gcache 4096M