Server A created a posts cluster, but Server B failed to join the replication cluster

Server A created a posts cluster, but Server B failed to join the replication cluster. The failure log is as follows:
/* Wed May 29 13:47:05.391 2024 conn 5 */ JOIN CLUSTER posts AT ‘8.138.88.168:9312’ # error=cluster ‘posts’, no nodes available(8.138.88.168:9312), error

manticore.conf:
searchd {
listen = 9312
listen = 9306:mysql
listen = 9308:http
listen = 0.0.0.0:9360-9369:replication
log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
pid_file = /run/manticore/searchd.pid
data_dir = /var/lib/manticore
}

are your nodes in the same network or in different data centers or behind the NAT?

Could you enable replication verbosity logs at all nodes via SphinxQL statement SET GLOBAL log_level = replication then provide daemon logs from all nodes?

Nodes are on different Alibaba Cloud servers.

node A (8.138.88.168)
tcp 0 0 0.0.0.0:9360 0.0.0.0:* LISTEN 18668/searchd
tcp 0 0 0.0.0.0:9306 0.0.0.0:* LISTEN 18668/searchd
tcp 0 0 0.0.0.0:9308 0.0.0.0:* LISTEN 18668/searchd
tcp 0 0 0.0.0.0:9312 0.0.0.0:* LISTEN 18668/searchd

[Wed May 29 16:29:51.747 2024] [11014] DEBUG: P01: syntax error, unexpected identifier near ‘JOIN CLUSTER posts ‘8.138.88.168:9312’ as nodes’
[Wed May 29 16:29:51.763 2024] [11012] RPL: cluster ‘posts’ wait to finish
[Wed May 29 16:29:51.763 2024] [11012] RPL: cluster ‘posts’ finished, cluster deleted, lib (nil) unloaded

Try node_address - Manticore Search Manual: Server settings > Searchd

searchd {
listen = 9312
listen = 9306:mysql
listen = 9308:http
listen = 0.0.0.0:9360-9369:replication
log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
pid_file = /run/manticore/searchd.pid
data_dir = /var/lib/manticore
node_address = 8.138.88.168
}
I have set node.address, but when server B joins the 8.138.88.168 replication cluster, an error is reported:
[Fri May 31 09:20:32.600 2024] [11014] DEBUG: P01: syntax error, unexpected identifier near ‘JOIN CLUSTER posts AT ‘8.138.88.168:9312’’
[Fri May 31 09:20:32.612 2024] [11012] RPL: cluster ‘posts’ wait to finish
[Fri May 31 09:20:32.612 2024] [11012] RPL: cluster ‘posts’ finished, cluster deleted, lib (nil) unloaded

If you mean “DEBUG: P01: syntax error” - this is just a debug message meaning one of the parsers couldn’t parse the command, you can skip it or disable debug/replication logging (off by default).

±-----------------------------------------±------------------------------------------------+
| Counter | Value |
±-----------------------------------------±------------------------------------------------+
| command_cluster | 5 |
| cluster_name | posts |
| cluster_posts_state_uuid | 51b15a99-1d7e-11ef-afeb-c6ffb78f4622 |
| cluster_posts_conf_id | 1 |
| cluster_posts_status | primary |
| cluster_posts_size | 1 |
| cluster_posts_local_index | 0 |
| cluster_posts_node_state | synced |
| cluster_posts_nodes_set | |
| cluster_posts_nodes_view | 8.138.88.168:9312,8.138.88.168:9360:replication |
| cluster_posts_indexes_count | 0 |
| cluster_posts_indexes | |
| cluster_posts_local_state_uuid | 51b15a99-1d7e-11ef-afeb-c6ffb78f4622 |
| cluster_posts_protocol_version | 9 |
| cluster_posts_last_applied | 0 |
| cluster_posts_last_committed | 0 |
| cluster_posts_replicated | 0 |
| cluster_posts_replicated_bytes | 0 |
| cluster_posts_repl_keys | 0 |
| cluster_posts_repl_keys_bytes | 0 |
| cluster_posts_repl_data_bytes | 0 |
| cluster_posts_repl_other_bytes | 0 |
| cluster_posts_received | 2 |
| cluster_posts_received_bytes | 195 |
| cluster_posts_local_commits | 0 |
| cluster_posts_local_cert_failures | 0 |
| cluster_posts_local_replays | 0 |
| cluster_posts_local_send_queue | 0 |
| cluster_posts_local_send_queue_max | 1 |
| cluster_posts_local_send_queue_min | 0 |
| cluster_posts_local_send_queue_avg | 0.000000 |
| cluster_posts_local_recv_queue | 0 |
| cluster_posts_local_recv_queue_max | 2 |
| cluster_posts_local_recv_queue_min | 0 |
| cluster_posts_local_recv_queue_avg | 0.500000 |
| cluster_posts_local_cached_downto | 0 |
| cluster_posts_flow_control_paused_ns | 0 |
| cluster_posts_flow_control_paused | 0.000000 |
| cluster_posts_flow_control_sent | 0 |
| cluster_posts_flow_control_recv | 0 |
| cluster_posts_flow_control_interval | [ 100, 100 ] |
| cluster_posts_flow_control_interval_low | 100 |
| cluster_posts_flow_control_interval_high | 100 |
| cluster_posts_flow_control_status | OFF |
| cluster_posts_cert_deps_distance | 0.000000 |
| cluster_posts_apply_oooe | 0.000000 |
| cluster_posts_apply_oool | 0.000000 |
| cluster_posts_apply_window | 0.000000 |
| cluster_posts_commit_oooe | 0.000000 |
| cluster_posts_commit_oool | 0.000000 |
| cluster_posts_commit_window | 0.000000 |
| cluster_posts_local_state | 4 |
| cluster_posts_local_state_comment | Synced |
| cluster_posts_cert_index_size | 0 |
| cluster_posts_cert_bucket_count | 2 |
| cluster_posts_gcache_pool_size | 1320 |
| cluster_posts_causal_reads | 0 |
| cluster_posts_cert_interval | 0.000000 |
| cluster_posts_open_transactions | 0 |
| cluster_posts_open_connections | 0 |
| cluster_posts_ist_receive_status | |
| cluster_posts_ist_receive_seqno_start | 0 |
| cluster_posts_ist_receive_seqno_current | 0 |
| cluster_posts_ist_receive_seqno_end | 0 |
| cluster_posts_incoming_addresses | 8.138.88.168:9312,8.138.88.168:9360:replication |
| cluster_posts_cluster_weight | 1 |
| cluster_posts_desync_count | 0 |
| cluster_posts_evs_delayed | |
| cluster_posts_evs_evict_list | |
| cluster_posts_evs_repl_latency | 0/0/0/0/0 |
| cluster_posts_evs_state | OPERATIONAL |
| cluster_posts_gcomm_uuid | bb21e757-1eeb-11ef-9761-128243f6c959 |
±-----------------------------------------±---------------------------------------------
I have set node.address, but server B still cannot join the replication cluster on 8.138.88.168, and an error is reported:
SQLSTATE[42000]: Syntax error or access violation: 1064 cluster ‘posts’, no nodes available(8.138.88.168:9312), error:

as I said you need enable replication verbosity logs at all nodes via SphinxQL statement SET GLOBAL log_level = replication then provide full daemon logs from all nodes to investigate the issue further

I get this when I just can’t connect to the donor:

mysql> join cluster clustername at '127.0.0.1:10201';
ERROR 1064 (42000): cluster 'clustername', no nodes available(127.0.0.1:10201), error: '127.0.0.1:10201': retries limit exceeded

so make sure it’s not a connectivity issue. E.g. do this:

telnet 8.138.88.168 9312

[root@iZ94laeyoplZ bin]# telnet 8.138.88.168 9312
Trying 8.138.88.168…
Connected to 8.138.88.168.
Escape character is ‘^]’.
Connection closed by foreign host.

8.138.88.168:
tcp 0 0 0.0.0.0:9360 0.0.0.0:* LISTEN 20850/searchd
tcp 0 0 0.0.0.0:9306 0.0.0.0:* LISTEN 20850/searchd
tcp 0 0 0.0.0.0:9308 0.0.0.0:* LISTEN 20850/searchd
tcp 0 0 0.0.0.0:9312 0.0.0.0:* LISTEN 20850/searchd
Port 9312 is open

Is this a full error? Nothing after error: ?

yes

If you can reproduce it, please create an issue on GitHub. We’d definitely want to fix it since an empty error is no good.

The problem has been resolved, it is caused by inconsistency between two versions. Thank you