Profiling "init" status details

Hi there.

I’m in the middle of migrating my project from sphinxsearch to manticore and have noticed slower query execution for fast queries (about x3 slower).

I dug into it a bit and found out that init state in profiling table has much higher duration in manticore 0.006 vs 0.0002 and switches count is also higher 260 vs 2. I can assume that switches count correlates with table chunks count.

It’s worth mentioning that there are different index types in manticore (rt) and sphinx (plain) and manticore index files take twice more disk space then sphinx’s because of extra store column and different ngram_chars and charset_table settings.

So the question is what happens in this init stage and whether I can reduce this stage duration or not?

Here is sample query:

SELECT *, WEIGHT() as w FROM post
WHERE MATCH('(@(tags,text,user_name) "sallyfox"/1|"sally fox"/2)') AND created>1705915641
GROUP BY user_id
ORDER BY likes DESC
LIMIT 3;

Manticore profiling:

+--------------+----------+----------+---------+
| Status       | Duration | Switches | Percent |
+--------------+----------+----------+---------+
| unknown      | 0.000218 | 6        | 2.28    |
| local_search | 0.000407 | 2        | 4.26    |
| sql_parse    | 0.000064 | 1        | 0.67    |
| dict_setup   | 0.000062 | 129      | 0.65    |
| parse        | 0.001591 | 129      | 16.65   |
| transforms   | 0.000269 | 129      | 2.82    |
| init         | 0.006451 | 260      | 67.51   |
| init_segment | 0.000123 | 23       | 1.29    |
| get_docs     | 0.000007 | 26       | 0.07    |
| filter       | 0.000002 | 3        | 0.02    |
| rank         | 0.000010 | 49       | 0.10    |
| finalize     | 0.000006 | 1        | 0.06    |
| clone_attrs  | 0.000054 | 1        | 0.57    |
| aggregate    | 0.000253 | 1        | 2.65    |
| net_write    | 0.000038 | 2        | 0.40    |
| total        | 0.009555 | 762      | 0       |
+--------------+----------+----------+---------+

Sphinx profiling:

+--------------+----------+----------+---------+
| Status       | Duration | Switches | Percent |
+--------------+----------+----------+---------+
| unknown      | 0.000068 | 4        | 11.09   |
| local_search | 0.000269 | 1        | 43.88   |
| sql_parse    | 0.000044 | 1        | 7.18    |
| dict_setup   | 0.000000 | 1        | 0.00    |
| parse        | 0.000045 | 1        | 7.34    |
| transforms   | 0.000001 | 1        | 0.16    |
| init         | 0.000184 | 2        | 30.02   |
| aggregate    | 0.000002 | 1        | 0.33    |
| total        | 0.000613 | 12       | 0       |
+--------------+----------+----------+---------+

Manticore index:

CREATE TABLE IF NOT EXISTS post (
    id bigint,
    host_id multi,
    post_id int,
    user_id int,
    likes int,
    type int,
    created timestamp,
    user_name text,
    user_slug text,
    `text` text,
    tags text,
    tags_es text,
    data string stored
)
index_exact_words='1'
charset_table='non_cjk'
min_word_len='2'
morphology='stem_enru'
ngram_len='1'
ngram_chars='cjk';

Manticore variables:

+------------------------------+-----------+
| Variable_name                | Value     |
+------------------------------+-----------+
| autocommit                   | 1         |
| auto_optimize                | 1         |
| optimize_cutoff              | 128       |
| collation_connection         | libc_ci   |
| query_log_format             | sphinxql  |
| session_read_only            | 0         |
| log_level                    | info      |
| max_allowed_packet           | 134217728 |
| character_set_client         | utf8      |
| character_set_connection     | utf8      |
| grouping_in_utc              | 0         |
| last_insert_id               |           |
| pseudo_sharding              | 1         |
| secondary_indexes            | 1         |
| accurate_aggregation         | 0         |
| distinct_precision_threshold | 3500      |
| threads_ex_effective         |           |
| thread_stack                 | 1048576   |
| threads_ex                   |           |
+------------------------------+-----------+

could you also provide the output of the show index post status ?

here you are

+-----------------------------+------------------------------------------------------------------------------------------------------------+
| Variable_name               | Value                                                                                                      |
+-----------------------------+------------------------------------------------------------------------------------------------------------+
| index_type                  | rt                                                                                                         |
| indexed_documents           | 20749612                                                                                                   |
| indexed_bytes               | 12166818323                                                                                                |
| ram_bytes                   | 6450830993                                                                                                 |
| disk_bytes                  | 11023768529                                                                                                |
| disk_mapped                 | 6375059922                                                                                                 |
| disk_mapped_cached          | 6376910848                                                                                                 |
| disk_mapped_doclists        | 3426916207                                                                                                 |
| disk_mapped_cached_doclists | 3427168256                                                                                                 |
| disk_mapped_hitlists        | 1010447616                                                                                                 |
| disk_mapped_cached_hitlists | 1010728960                                                                                                 |
| killed_documents            | 0                                                                                                          |
| killed_rate                 | 0.00%                                                                                                      |
| ram_chunk                   | 73378057                                                                                                   |
| ram_chunk_segments_count    | 23                                                                                                         |
| disk_chunks                 | 128                                                                                                        |
| mem_limit                   | 134217728                                                                                                  |
| mem_limit_rate              | 95.00%                                                                                                     |
| ram_bytes_retired           | 0                                                                                                          |
| tid                         | 20757                                                                                                      |
| tid_saved                   | 20757                                                                                                      |
| query_time_1min             | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"}                                   |
| query_time_5min             | {"queries":6, "avg_sec":0.007, "min_sec":0.001, "max_sec":0.030, "pct95_sec":0.007, "pct99_sec":0.007}     |
| query_time_15min            | {"queries":108, "avg_sec":0.008, "min_sec":0.001, "max_sec":0.072, "pct95_sec":0.014, "pct99_sec":0.037}   |
| query_time_total            | {"queries":85123, "avg_sec":0.004, "min_sec":0.000, "max_sec":0.645, "pct95_sec":0.010, "pct99_sec":0.070} |
| found_rows_1min             | {"queries":0, "avg":"-", "min":"-", "max":"-", "pct95":"-", "pct99":"-"}                                   |
| found_rows_5min             | {"queries":6, "avg":243, "min":0, "max":829, "pct95":243, "pct99":243}                                     |
| found_rows_15min            | {"queries":108, "avg":2285, "min":0, "max":17338, "pct95":17338, "pct99":17338}                            |
| found_rows_total            | {"queries":85123, "avg":11216, "min":0, "max":3366335, "pct95":12503, "pct99":381172}                      |
+-----------------------------+------------------------------------------------------------------------------------------------------------+

you need to issue optimize statement as query single plain index is faster than 122 disk chunks