K8s multi-master: worker не добавляется в кластер

Поднял кластер мультимастер в k8s

    worker:
      replicaCount: 3
    ...
      config:
        path: /mnt/manticore.conf
        content: |
          searchd
          {
            listen = /var/run/mysqld/mysqld.sock:mysql41
            listen = 9306:mysql41
            listen = 9308:http
            listen = 9301:mysql_vip
            listen = $hostname:9312
            listen = $hostname:9315-9415:replication
            node_address = $hostname
            binlog_path = /var/lib/manticore
            query_log = /dev/stdout
            query_log_format = sphinxql
            pid_file = /var/run/manticore/searchd.pid
            data_dir = /var/lib/manticore
            shutdown_timeout = 25s
            auto_optimize = 0
    
            max_packet_size = 32M
            net_workers = 4
    
            seamless_rotate = 1
            unlink_old = 1
            watchdog = 1
    
            max_filter_values = 10000
            persistent_connections_limit = 256
          }

Все хорошо работало: пока на worker-0 и worker-2 кончилось место. Место добавил, worker-0 поднялся и присоеденился к кластеру: а вот worker-2 не хочет ни в какую (убивал pvc чтобы запустить worker-2 пустым - ошибки те же)

    Manticore 6.3.8 d17bd2b6b@24112202 (columnar 2.3.0 88a01c3@24052206) (secondary 2.3.0 88a01c3@24052206) (knn 2.3.0 88a01c3@24052206)
    Mount success
    2025-03-14 10:44:36,098 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
    2025-03-14 10:44:36,101 INFO RPC interface 'supervisor' initialized
    2025-03-14 10:44:36,101 CRIT Server 'unix_http_server' running without any HTTP authentication checking
    2025-03-14 10:44:36,102 INFO supervisord started with pid 43
    2025-03-14 10:44:37,104 INFO spawned: 'quorum_recover' with pid 44
    2025-03-14 10:44:37,106 INFO spawned: 'searchd_replica' with pid 45
    [2025-03-14T10:44:37.135800+00:00] Logs.INFO: Replication mode: multi-master [] []
    [2025-03-14T10:44:37.162518+00:00] Logs.INFO: Pods count 2 [] []
    [2025-03-14T10:44:37.162551+00:00] Logs.INFO: Empty conf with more than one node in cluster [] []
    2025-03-14 10:44:37,278 INFO spawned: 'searchd' with pid 50
    [Fri Mar 14 10:44:37.290 2025] [50] using config file '/etc/manticoresearch/manticore.conf' (866 chars)...
    starting daemon version '6.3.8 d17bd2b6b@24112202 (columnar 2.3.0 88a01c3@24052206) (secondary 2.3.0 88a01c3@24052206) (knn 2.3.0 88a01c3@24052206)' ...
    listening on UNIX socket /var/run/mysqld/mysqld.sock
    listening on all interfaces for mysql, port=9306
    listening on all interfaces for sphinx and http(s), port=9308
    listening on all interfaces for VIP mysql, port=9301
    listening on 10.2.249.19:9312 for sphinx and http(s)
    prereading 0 tables
    preread 0 tables in 0.000 sec
    accepting connections
    [BUDDY] started v2.3.12 '/usr/share/manticore/modules/manticore-buddy/bin/manticore-buddy --listen=http://0.0.0.0:9308 --bind=127.0.0.1 --threads=2 --skip=manticoresoftware/buddy-plugin-sharding --skip=manticoresoftware/buddy-plugin-queue' at http://127.0.0.1:46437
    [BUDDY] Loaded plugins:
    [BUDDY] core: empty-string, backup, emulate-elastic, create, insert, alias, select, show, cli-table, plugin, test, alter-distributed-table, alter-rename-table, modify-table, knn, replace
    [BUDDY] local:
    [BUDDY] extra:
    2025-03-14 10:44:38,355 INFO success: searchd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
    2025-03-14 10:44:38,355 INFO success: quorum_recover entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
    2025-03-14 10:44:38,356 INFO success: searchd_replica entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
    [2025-03-14T10:44:38.373358+00:00] Logs.INFO: Wait until manticoresearch-worker-2 came alive [] []
    [2025-03-14T10:45:39.134675+00:00] Logs.INFO: Wait for NS... [] []
    [2025-03-14T10:45:40.158843+00:00] Logs.INFO: Wait until join host come available ["manticoresearch-worker-1.manticoresearch-worker-replication-svc",9306] []
    [2025-03-14T10:45:40.161707+00:00] Logs.INFO: Check is cluster exist at manticoresearch-worker-1.manticoresearch-worker-replication-svc [] []
    [2025-03-14T10:45:40.163045+00:00] Logs.INFO: Join to manticoresearch-worker-1.manticoresearch-worker-replication-svc [] []
    WARNING: No persistent state found. Bootstraping with default state
    WARNING: (78aea525, 'tcp://0.0.0.0:9315') address 'tcp://10.2.249.19:9315' points to own listening address, blacklisting
    FATAL: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
    at /__w/manticoresearch/manticoresearch/build/galera-build/_deps/galera_populate-src/gcomm/src/pc.cpp:connect():159
    FATAL: /__w/manticoresearch/manticoresearch/build/galera-build/_deps/galera_populate-src/gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
    FATAL: /__w/manticoresearch/manticoresearch/build/galera-build/_deps/galera_populate-src/gcs/src/gcs.cpp:gcs_open():1514: Failed to open channel 'usmall_cluster' at 'gcomm://manticoresearch-worker-1.manticoresearch-worker-replication-svc.manticoresearch.svc.cluster.local:9315,manticoresearch-worker-2.manticoresearch-worker-replication-svc.manticoresearch.svc.cluster.local:9315': -110 (Connection timed out)
    FATAL: gcs connect failed: Connection timed out
    /* Fri Mar 14 10:46:10.704 2025 conn 4 (127.0.0.1:50914) */ JOIN CLUSTER usmall_cluster at 'manticoresearch-worker-1.manticoresearch-worker-replication-svc:9312'312'9306 # error=replication connection failed: 7 'error in node state, must reinit'
    [2025-03-14T10:46:10.704503+00:00] Logs.ERROR: Exception until query processing. Query: JOIN CLUSTER usmall_cluster at 'manticoresearch-worker-1.manticoresearch-worker-replication-svc:9312' . Error: mysqli_sql_exception: replication connection failed: 7 'error in node state, must reinit' in /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreMysqliFetcher.php:33 Stack trace: #0 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreMysqliFetcher.php(33): mysqli->query() #1 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(193): Core\Manticore\ManticoreMysqliFetcher->query() #2 /etc/manticoresearch/replica.php(219): Core\Manticore\ManticoreConnector->joinCluster() #3 {main} [] []
    [2025-03-14T10:46:10.704503+00:00] Logs.ERROR: Exception until query processing. Query: JOIN CLUSTER usmall_cluster at 'manticoresearch-worker-1.manticoresearch-worker-replication-svc:9312' . Error: mysqli_sql_exception: replication connection failed: 7 'error in node state, must reinit' in /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreMysqliFetcher.php:33 Stack trace: #0 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreMysqliFetcher.php(33): mysqli->query() #1 /etc/manticoresearch/vendor/manticoresoftware/manticoresearch-auto-replication/src/Manticore/ManticoreConnector.php(193): Core\Manticore\ManticoreMysqliFetcher->query() #2 /etc/manticoresearch/replica.php(219): Core\Manticore\ManticoreConnector->joinCluster() #3 {main} [] []
    [2025-03-14T10:46:10.704694+00:00] Logs.ERROR: Error until query processing. Query: JOIN CLUSTER usmall_cluster at 'manticoresearch-worker-1.manticoresearch-worker-replication-svc:9312' . Error: replication connection failed: 7 'error in node state, must reinit' [] []
    [2025-03-14T10:46:10.704694+00:00] Logs.ERROR: Error until query processing. Query: JOIN CLUSTER usmall_cluster at 'manticoresearch-worker-1.manticoresearch-worker-replication-svc:9312' . Error: replication connection failed: 7 'error in node state, must reinit' [] []
    WARNING: No persistent state found. Bootstraping with default state
    WARNING: (8b7ba2c9, 'tcp://0.0.0.0:9315') address 'tcp://10.2.249.19:9315' points to own listening address, blacklisting
    FATAL: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
    at /__w/manticoresearch/manticoresearch/build/galera-build/_deps/galera_populate-src/gcomm/src/pc.cpp:connect():159
    FATAL: /__w/manticoresearch/manticoresearch/build/galera-build/_deps/galera_populate-src/gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
    FATAL: /__w/manticoresearch/manticoresearch/build/galera-build/_deps/galera_populate-src/gcs/src/gcs.cpp:gcs_open():1514: Failed to open channel 'usmall_cluster' at 'gcomm://manticoresearch-worker-1.manticoresearch-worker-replication-svc.manticoresearch.svc.cluster.local:9315,manticoresearch-worker-2.manticoresearch-worker-replication-svc.manticoresearch.svc.cluster.local:9315': -110 (Connection timed out)
    FATAL: gcs connect failed: Connection timed out

Накрутил стартап пробу и воркер со временем заехал в кластер

1 Like