fcw
May 29, 2024, 6:03am
1
Server A created a posts cluster, but Server B failed to join the replication cluster. The failure log is as follows:
/* Wed May 29 13:47:05.391 2024 conn 5 */ JOIN CLUSTER posts AT ‘8.138.88.168:9312’ # error=cluster ‘posts’, no nodes available(8.138.88.168:9312), error
manticore.conf:
searchd {
listen = 9312
listen = 9306:mysql
listen = 9308:http
listen = 0.0.0.0:9360-9369:replication
log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
pid_file = /run/manticore/searchd.pid
data_dir = /var/lib/manticore
}
tomat
May 29, 2024, 7:15am
2
are your nodes in the same network or in different data centers or behind the NAT?
Could you enable replication verbosity logs at all nodes via SphinxQL statement SET GLOBAL log_level = replication
then provide daemon logs from all nodes?
fcw
May 29, 2024, 8:31am
3
Nodes are on different Alibaba Cloud servers.
node A (8.138.88.168)
tcp 0 0 0.0.0.0:9360 0.0.0.0:* LISTEN 18668/searchd
tcp 0 0 0.0.0.0:9306 0.0.0.0:* LISTEN 18668/searchd
tcp 0 0 0.0.0.0:9308 0.0.0.0:* LISTEN 18668/searchd
tcp 0 0 0.0.0.0:9312 0.0.0.0:* LISTEN 18668/searchd
[Wed May 29 16:29:51.747 2024] [11014] DEBUG: P01: syntax error, unexpected identifier near ‘JOIN CLUSTER posts ‘8.138.88.168:9312’ as nodes’
[Wed May 29 16:29:51.763 2024] [11012] RPL: cluster ‘posts’ wait to finish
[Wed May 29 16:29:51.763 2024] [11012] RPL: cluster ‘posts’ finished, cluster deleted, lib (nil) unloaded
Sergey
May 29, 2024, 11:41am
4
fcw
May 31, 2024, 1:23am
5
searchd {
listen = 9312
listen = 9306:mysql
listen = 9308:http
listen = 0.0.0.0:9360-9369:replication
log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
pid_file = /run/manticore/searchd.pid
data_dir = /var/lib/manticore
node_address = 8.138.88.168
}
I have set node.address, but when server B joins the 8.138.88.168 replication cluster, an error is reported:
[Fri May 31 09:20:32.600 2024] [11014] DEBUG: P01: syntax error, unexpected identifier near ‘JOIN CLUSTER posts AT ‘8.138.88.168:9312’’
[Fri May 31 09:20:32.612 2024] [11012] RPL: cluster ‘posts’ wait to finish
[Fri May 31 09:20:32.612 2024] [11012] RPL: cluster ‘posts’ finished, cluster deleted, lib (nil) unloaded
If you mean “DEBUG: P01: syntax error” - this is just a debug message meaning one of the parsers couldn’t parse the command, you can skip it or disable debug/replication logging (off by default).
fcw
May 31, 2024, 5:42am
7
±-----------------------------------------±------------------------------------------------+
| Counter | Value |
±-----------------------------------------±------------------------------------------------+
| command_cluster | 5 |
| cluster_name | posts |
| cluster_posts_state_uuid | 51b15a99-1d7e-11ef-afeb-c6ffb78f4622 |
| cluster_posts_conf_id | 1 |
| cluster_posts_status | primary |
| cluster_posts_size | 1 |
| cluster_posts_local_index | 0 |
| cluster_posts_node_state | synced |
| cluster_posts_nodes_set | |
| cluster_posts_nodes_view | 8.138.88.168:9312,8.138.88.168:9360:replication |
| cluster_posts_indexes_count | 0 |
| cluster_posts_indexes | |
| cluster_posts_local_state_uuid | 51b15a99-1d7e-11ef-afeb-c6ffb78f4622 |
| cluster_posts_protocol_version | 9 |
| cluster_posts_last_applied | 0 |
| cluster_posts_last_committed | 0 |
| cluster_posts_replicated | 0 |
| cluster_posts_replicated_bytes | 0 |
| cluster_posts_repl_keys | 0 |
| cluster_posts_repl_keys_bytes | 0 |
| cluster_posts_repl_data_bytes | 0 |
| cluster_posts_repl_other_bytes | 0 |
| cluster_posts_received | 2 |
| cluster_posts_received_bytes | 195 |
| cluster_posts_local_commits | 0 |
| cluster_posts_local_cert_failures | 0 |
| cluster_posts_local_replays | 0 |
| cluster_posts_local_send_queue | 0 |
| cluster_posts_local_send_queue_max | 1 |
| cluster_posts_local_send_queue_min | 0 |
| cluster_posts_local_send_queue_avg | 0.000000 |
| cluster_posts_local_recv_queue | 0 |
| cluster_posts_local_recv_queue_max | 2 |
| cluster_posts_local_recv_queue_min | 0 |
| cluster_posts_local_recv_queue_avg | 0.500000 |
| cluster_posts_local_cached_downto | 0 |
| cluster_posts_flow_control_paused_ns | 0 |
| cluster_posts_flow_control_paused | 0.000000 |
| cluster_posts_flow_control_sent | 0 |
| cluster_posts_flow_control_recv | 0 |
| cluster_posts_flow_control_interval | [ 100, 100 ] |
| cluster_posts_flow_control_interval_low | 100 |
| cluster_posts_flow_control_interval_high | 100 |
| cluster_posts_flow_control_status | OFF |
| cluster_posts_cert_deps_distance | 0.000000 |
| cluster_posts_apply_oooe | 0.000000 |
| cluster_posts_apply_oool | 0.000000 |
| cluster_posts_apply_window | 0.000000 |
| cluster_posts_commit_oooe | 0.000000 |
| cluster_posts_commit_oool | 0.000000 |
| cluster_posts_commit_window | 0.000000 |
| cluster_posts_local_state | 4 |
| cluster_posts_local_state_comment | Synced |
| cluster_posts_cert_index_size | 0 |
| cluster_posts_cert_bucket_count | 2 |
| cluster_posts_gcache_pool_size | 1320 |
| cluster_posts_causal_reads | 0 |
| cluster_posts_cert_interval | 0.000000 |
| cluster_posts_open_transactions | 0 |
| cluster_posts_open_connections | 0 |
| cluster_posts_ist_receive_status | |
| cluster_posts_ist_receive_seqno_start | 0 |
| cluster_posts_ist_receive_seqno_current | 0 |
| cluster_posts_ist_receive_seqno_end | 0 |
| cluster_posts_incoming_addresses | 8.138.88.168:9312,8.138.88.168:9360:replication |
| cluster_posts_cluster_weight | 1 |
| cluster_posts_desync_count | 0 |
| cluster_posts_evs_delayed | |
| cluster_posts_evs_evict_list | |
| cluster_posts_evs_repl_latency | 0/0/0/0/0 |
| cluster_posts_evs_state | OPERATIONAL |
| cluster_posts_gcomm_uuid | bb21e757-1eeb-11ef-9761-128243f6c959 |
±-----------------------------------------±---------------------------------------------
I have set node.address, but server B still cannot join the replication cluster on 8.138.88.168, and an error is reported:
SQLSTATE[42000]: Syntax error or access violation: 1064 cluster ‘posts’, no nodes available(8.138.88.168:9312), error:
tomat
May 31, 2024, 6:47am
8
as I said you need enable replication verbosity logs at all nodes via SphinxQL statement SET GLOBAL log_level = replication
then provide full daemon logs from all nodes to investigate the issue further
I get this when I just can’t connect to the donor:
mysql> join cluster clustername at '127.0.0.1:10201';
ERROR 1064 (42000): cluster 'clustername', no nodes available(127.0.0.1:10201), error: '127.0.0.1:10201': retries limit exceeded
so make sure it’s not a connectivity issue. E.g. do this:
telnet 8.138.88.168 9312
fcw
June 1, 2024, 6:42am
10
[root@iZ94laeyoplZ bin]# telnet 8.138.88.168 9312
Trying 8.138.88.168…
Connected to 8.138.88.168.
Escape character is ‘^]’.
Connection closed by foreign host.
8.138.88.168:
tcp 0 0 0.0.0.0:9360 0.0.0.0:* LISTEN 20850/searchd
tcp 0 0 0.0.0.0:9306 0.0.0.0:* LISTEN 20850/searchd
tcp 0 0 0.0.0.0:9308 0.0.0.0:* LISTEN 20850/searchd
tcp 0 0 0.0.0.0:9312 0.0.0.0:* LISTEN 20850/searchd
Port 9312 is open
Sergey
June 1, 2024, 3:04pm
11
Is this a full error? Nothing after error:
?
Sergey
June 17, 2024, 3:08pm
13
If you can reproduce it, please create an issue on GitHub. We’d definitely want to fix it since an empty error is no good.
fcw
June 18, 2024, 3:48am
14
The problem has been resolved, it is caused by inconsistency between two versions. Thank you