Improving search query times

Hello. I’m searching for ways to improve the performance of our search queries. We have a distributed table that’s composed of 15 shards. We currently have 1.845.926.047 indexed rows.

Here is an execution example of a query that we currently use. Leaving aside the query itself (I know it could be faster by removing wild cards, or by using a single field), can you guys point me to some settings or reading that I could do to improve this? Thank you!

mysql> SET profiling=1; SELECT id FROM contacts_dist WHERE accountid = 13759 AND MATCH(‘@(emailaddress,company,title) “*john *”’) ORDER BY WEIGHT() DESC LIMIT 0, 100 OPTION field_weights=(emailaddress=3, lastname=2, firstname=2); SHOW PROFILE; SHOW PLAN;
Query OK, 0 rows affected (0.37 sec)

±--------------------+
| id |
±--------------------+
| 6686435559093475499 |

| 6530604436991394630 |
±--------------------+
100 rows in set (2 min 35.25 sec)
— 100 out of 1184640 results in 2m 34.9s —

±--------------±------------±---------±--------+
| Status | Duration | Switches | Percent |
±--------------±------------±---------±--------+
| unknown | 0.002203 | 3017 | 0.00 |
| local_df | 7.092945 | 16 | 0.61 |
| local_search | 3.139677 | 3 | 0.27 |
| sql_parse | 0.000029 | 1 | 0.00 |
| setup_iter | 0.575284 | 6022 | 0.05 |
| dict_setup | 0.004128 | 3026 | 0.00 |
| parse | 0.030140 | 3026 | 0.00 |
| transforms | 350.196731 | 6037 | 30.16 |
| init | 6.985186 | 16195642 | 0.60 |
| init_segment | 0.000231 | 384 | 0.00 |
| read_docs | 299.727531 | 16290635 | 25.81 |
| read_hits | 3.678937 | 7684 | 0.32 |
| get_docs | 186.394712 | 670791 | 16.05 |
| filter | 301.281637 | 556412 | 25.95 |
| rank | 0.041826 | 641735 | 0.00 |
| qcache_update | 0.071719 | 38154 | 0.01 |
| qcache_final | 0.007216 | 3011 | 0.00 |
| sort | 0.546130 | 38154 | 0.05 |
| finalize | 0.004991 | 6037 | 0.00 |
| clone_attrs | 1.376992 | 15 | 0.12 |
| aggregate | 0.004881 | 3 | 0.00 |
| net_write | 0.000049 | 2 | 0.00 |
| eval_post | 0.000000 | 1 | 0.00 |
| eval_getfield | 0.000556 | 1 | 0.00 |
| total | 1161.163731 | 34469809 | 0 |
±--------------±------------±---------±--------+
25 rows in set (0.30 sec)

±-----------------±----------------------------------------------------------------------------------+
| Variable | Value |
±-----------------±----------------------------------------------------------------------------------+
| transformed_tree | AND(fields=(emailaddress, company, title), KEYWORD(john, querypos=1, expanded)) |
±-----------------±----------------------------------------------------------------------------------+
1 row in set (0.27 sec)

Shard Status

mysql> SHOW TABLE contact_shard_0 status;
±------------------------------±-----------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
±------------------------------±-----------------------------------------------------------------------------------------------------------------------------+
| table_type | rt |
| indexed_documents | 123047676 |
| indexed_bytes | 4961325070 |
| ram_bytes | 502970923 |
| disk_bytes | 45601807731 |
| disk_mapped | 27694107808 |
| disk_mapped_cached | 500981760 |
| disk_mapped_doclists | 0 |
| disk_mapped_cached_doclists | 0 |
| disk_mapped_hitlists | 0 |
| disk_mapped_cached_hitlists | 0 |
| killed_documents | 2377870 |
| killed_rate | 1.89% |
| ram_chunk | 1118651 |
| ram_chunk_segments_count | 24 |
| disk_chunks | 202 |
| mem_limit | 134217728 |
| mem_limit_rate | 95.00% |
| ram_bytes_retired | 0 |
| optimizing | 0 |
| locked | 0 |
| tid | 2869729 |
| tid_saved | 2869592 |
| query_time_1min | {“queries”:1, “avg_sec”:221.212, “min_sec”:221.212, “max_sec”:221.212, “pct95_sec”:221.212, “pct99_sec”:221.212} |
| query_time_5min | {“queries”:2, “avg_sec”:5110.701, “min_sec”:221.212, “max_sec”:10000.191, “pct95_sec”:10000.191, “pct99_sec”:10000.191} |
| query_time_15min | {“queries”:6, “avg_sec”:3411.444, “min_sec”:66.038, “max_sec”:10000.191, “pct95_sec”:10000.191, “pct99_sec”:10000.191} |
| query_time_total | {“queries”:1502, “avg_sec”:49574.156, “min_sec”:0.012, “max_sec”:234291.988, “pct95_sec”:225535.444, “pct99_sec”:229289.734} |
| found_rows_1min | {“queries”:1, “avg”:0, “min”:0, “max”:0, “pct95”:0, “pct99”:0} |
| found_rows_5min | {“queries”:2, “avg”:39482, “min”:0, “max”:78964, “pct95”:78964, “pct99”:78964} |
| found_rows_15min | {“queries”:6, “avg”:26321, “min”:0, “max”:78964, “pct95”:78964, “pct99”:78964} |
| found_rows_total | {“queries”:1502, “avg”:11117959, “min”:0, “max”:122993398, “pct95”:80895295, “pct99”:106781177} |
| command_search | 1502 |
| command_excerpt | 0 |
| command_update | 0 |
| command_keywords | 0 |
| command_status | 20744 |
| command_delete | 0 |
| command_insert | 0 |
| command_replace | 2064995 |
| command_commit | 0 |
| command_suggest | 0 |
| command_callpq | 0 |
| command_getfield | 0 |
| insert_replace_stats_ms_avg | 10656.680 20444.792 8152.992 |
| insert_replace_stats_ms_min | 5167.749 5167.749 7.525 |
| insert_replace_stats_ms_max | 16363.917 60022.853 60022.853 |
| insert_replace_stats_ms_pct95 | 16363.917 59917.463 31683.253 |
| insert_replace_stats_ms_pct99 | 16363.917 60022.853 59917.463 |
| search_stats_ms_avg | 221.212 5110.701 3411.444 |
| search_stats_ms_min | 221.212 221.212 66.038 |
| search_stats_ms_max | 221.212 10000.191 10000.191 |
| search_stats_ms_pct95 | 221.212 10000.191 10000.191 |
| search_stats_ms_pct99 | 221.212 10000.191 10000.191 |
| update_stats_ms_avg | N/A N/A N/A |
| update_stats_ms_min | N/A N/A N/A |
| update_stats_ms_max | N/A N/A N/A |
| update_stats_ms_pct95 | N/A N/A N/A |
| update_stats_ms_pct99 | N/A N/A N/A |
±------------------------------±-----------------------------------------------------------------------------------------------------------------------------+

Shard Settings

mysql> SHOW TABLE contact_shard_0 settings;
min_infix_len = 3
index_exact_words = 1
charset_table = 0..9, A..Z->a..z, a..z, -, ., @, _
min_word_len = 3
morphology = lemmatize_en
morphology_skip_fields = accountrecid,recordid,accountid,listid,lastmodified,emailaddress,firstname,lastname,company,businessphone,cellphone,homephone

Show Status

mysql> SHOW STATUS;

| uptime | 676701
| connections | 561289
| maxed_out | 0
| version | 13.11.1 134b36912@25091317 (columnar 8.0.3 6d8ff91@25091005) (secondary 8.0.3 6d8ff91@25091005) (knn 8.0.3 6d8ff91@25091005) (embeddings 1.0.1) (buddy v3.35.1+25090418-41d9811f)
| mysql_version | 13.11.1 134b36912@25091317 (columnar 8.0.3 6d8ff91@25091005) (secondary 8.0.3 6d8ff91@25091005) (knn 8.0.3 6d8ff91@25091005) (embeddings 1.0.1) |
| command_search | 1090
| command_excerpt | 0
| command_update | 0
| command_keywords | 0
| command_persist | 9398
| command_status | 23240
| command_flushattrs | 0
| command_sphinxql | 0
| command_ping | 0
| command_delete | 0
| command_set | 351170
| command_insert | 0
| command_replace | 10327155
| command_commit | 0
| command_suggest | 0
| command_json | 0
| command_callpq | 0
| command_cluster | 4731
| command_getfield | 0
| insert_replace_stats_ms_avg | 784.710 392.959 2401.466
| insert_replace_stats_ms_min | 189.079 1.823 1.823
| insert_replace_stats_ms_max | 1374.426 4618.730 65205.706
| insert_replace_stats_ms_pct95 | 1190.013 1236.506 8607.541
| insert_replace_stats_ms_pct99 | 1270.091 3249.568 48651.031
| search_stats_ms_avg | N/A 56.815 27738.501
| search_stats_ms_min | N/A 56.815 56.815
| search_stats_ms_max | N/A 56.815 154876.084
| search_stats_ms_pct95 | N/A 56.815 154876.084
| search_stats_ms_pct99 | N/A 56.815 154876.084
| update_stats_ms_avg | N/A N/A N/A
| update_stats_ms_min | N/A N/A N/A
| update_stats_ms_max | N/A N/A N/A
| update_stats_ms_pct95 | N/A N/A N/A
| update_stats_ms_pct99 | N/A N/A N/A
| agent_connect | 187265
| agent_tfo | 0
| agent_retry | 35065
| queries | 1509
| dist_queries | 0
| workers_total | 8
| workers_active | 129
| workers_clients | 109
| workers_clients_vip | 0
| workers_clients_buddy | 19
| work_queue_length | 149
| load | 1.58 2.07 2.88
| load_primary | 0.00 0.01 0.03
| load_secondary | 0.00 0.15 0.35
| query_wall | 1139745.791
| query_cpu | OFF
| dist_wall | 0.000
| dist_local | 0.000
| dist_wait | 0.000
| query_reads | OFF
| query_readkb | OFF
| query_readtime | OFF
| avg_query_wall | 755.298
| avg_query_cpu | OFF
| avg_dist_wall | 0.000
| avg_dist_local | 0.000
| avg_dist_wait | 0.000
| avg_query_reads | OFF
| avg_query_readkb | OFF
| avg_query_readtime | OFF
| qcache_max_bytes | 16777216
| qcache_thresh_msec | 3000
| qcache_ttl_sec | 60
| qcache_cached_queries | 0
| qcache_used_bytes | 0
| qcache_hits | 5000
| cluster_name | contact_cluster

If you run the same query twice, will the second run be faster?

Hello Sergey. Just tried that, no, in fact the 2nd execution took a little longer.

Okay, then it’s probably not an I/O issue. Can you reproduce the long query when you run it against one of the shards?

Hm, something changed now, yeah. When using a shard the first execution took 14.5 sec, and the second just 2.8 sec. I see that the transforms and filter steps were almost instantaneous in the second run.

mysql> SET profiling=1; SELECT id FROM contact_shard_0 WHERE accountid = 13759 AND MATCH(‘@(emailaddress,company,title) “john”’) ORDER BY WEIGHT() DESC LIMIT 0, 100 OPTION field_weights=(emailaddr
Query OK, 0 rows affected (0.38 sec)

±--------------------+
| id |
±--------------------+
| 5671795337616243840 |

| 1430793042498492225 |
±--------------------+
100 rows in set (14.50 sec)
— 100 out of 79223 results in 14s 43ms —

±--------------±----------±---------±--------+
| Status | Duration | Switches | Percent |
±--------------±----------±---------±--------+
| unknown | 0.000106 | 106 | 0.00 |
| local_df | 0.358360 | 1 | 0.41 |
| local_search | 0.000067 | 2 | 0.00 |
| sql_parse | 0.000045 | 1 | 0.00 |
| setup_iter | 0.018701 | 200 | 0.02 |
| dict_setup | 0.000146 | 101 | 0.00 |
| parse | 0.001000 | 101 | 0.00 |
| transforms | 27.941260 | 194 | 31.64 |
| init | 0.307574 | 1311105 | 0.35 |
| init_segment | 0.000005 | 24 | 0.00 |
| read_docs | 24.257004 | 1319923 | 27.46 |
| read_hits | 0.170283 | 391 | 0.19 |
| get_docs | 6.011437 | 65619 | 6.81 |
| filter | 29.233756 | 56064 | 33.10 |
| rank | 0.003687 | 61304 | 0.00 |
| qcache_update | 0.005013 | 2487 | 0.01 |
| qcache_final | 0.000226 | 100 | 0.00 |
| sort | 0.014051 | 2487 | 0.02 |
| finalize | 0.000175 | 201 | 0.00 |
| clone_attrs | 0.000211 | 1 | 0.00 |
| aggregate | 0.000097 | 3 | 0.00 |
| net_write | 0.000032 | 2 | 0.00 |
| eval_post | 0.000102 | 1 | 0.00 |
| eval_getfield | 0.000000 | 1 | 0.00 |
| total | 88.323338 | 2820419 | 0 |
±--------------±----------±---------±--------+
25 rows in set (0.23 sec)

±-----------------±----------------------------------------------------------------------------------+
| Variable | Value |
±-----------------±----------------------------------------------------------------------------------+
| transformed_tree | AND(fields=(emailaddress, company, title), KEYWORD(john, querypos=1, expanded)) |
±-----------------±----------------------------------------------------------------------------------+
1 row in set (0.23 sec)

mysql> SET profiling=1; SELECT id FROM contact_shard_0 WHERE accountid = 13759 AND MATCH(‘@(emailaddress,company,title) “john”’) ORDER BY WEIGHT() DESC LIMIT 0, 100 OPTION field_weights=(emailaddress=3, lastname=2, firstname=2); SHOW PROFILE; SHOW PLAN;
Query OK, 0 rows affected (0.23 sec)

±--------------------+
| id |
±--------------------+
| 5671795337616243840 |

| 1430793042498492225 |
±--------------------+
100 rows in set (2.79 sec)
— 100 out of 79223 results in 2s 561ms —

±--------------±----------±---------±--------+
| Status | Duration | Switches | Percent |
±--------------±----------±---------±--------+
| unknown | 0.000063 | 106 | 0.00 |
| local_df | 0.001934 | 1 | 0.01 |
| local_search | 0.000058 | 2 | 0.00 |
| sql_parse | 0.000041 | 1 | 0.00 |
| setup_iter | 0.006587 | 200 | 0.04 |
| dict_setup | 0.000087 | 101 | 0.00 |
| parse | 0.000691 | 101 | 0.00 |
| transforms | 1.117006 | 194 | 6.34 |
| init | 0.231819 | 1311105 | 1.32 |
| init_segment | 0.000006 | 23 | 0.00 |
| read_docs | 15.181067 | 1319923 | 86.18 |
| read_hits | 0.078905 | 391 | 0.45 |
| get_docs | 0.865563 | 65619 | 4.91 |
| filter | 0.122343 | 56065 | 0.69 |
| rank | 0.002870 | 61303 | 0.02 |
| qcache_update | 0.002908 | 2487 | 0.02 |
| qcache_final | 0.000098 | 100 | 0.00 |
| sort | 0.004117 | 2487 | 0.02 |
| finalize | 0.000086 | 201 | 0.00 |
| clone_attrs | 0.000049 | 1 | 0.00 |
| aggregate | 0.000101 | 3 | 0.00 |
| net_write | 0.000033 | 2 | 0.00 |
| eval_post | 0.000086 | 1 | 0.00 |
| eval_getfield | 0.000000 | 1 | 0.00 |
| total | 17.616518 | 2820418 | 0 |
±--------------±----------±---------±--------+
25 rows in set (0.24 sec)

±-----------------±----------------------------------------------------------------------------------+
| Variable | Value |
±-----------------±----------------------------------------------------------------------------------+
| transformed_tree | AND(fields=(emailaddress, company, title), KEYWORD(john, querypos=1, expanded)) |
±-----------------±----------------------------------------------------------------------------------+
1 row in set (0.23 sec)

I also noticed this. The dist table seems to be using only local shards (I have a 3 host setup, where each shard is primary in one host and replica in the other 2), that could affect performance right? I guess we didn’t create the table correctly.

mysql> SHOW CREATE TABLE contacts_dist;
±--------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
±--------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| contacts_dist | CREATE TABLE contacts_dist type=‘distributed’ local=‘contact_shard_0’ local=‘contact_shard_1’ local=‘contact_shard_10’ local=‘contact_shard_11’ local=‘contact_shard_12’ local=‘contact_shard_13’ local=‘contact_shard_14’ local=‘contact_shard_2’ local=‘contact_shard_3’ local=‘contact_shard_4’ local=‘contact_shard_5’ local=‘contact_shard_6’ local=‘contact_shard_7’ local=‘contact_shard_8’ local=‘contact_shard_9’ |
±--------------±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)

Okay, so I altered the dist table to use the remote shards, and performance has improved a lot. From 2’30” to 30” :slight_smile:. The execution time is still 30” on the second run tho, not sure if that is indicating something.