Медленные запросы при поиске в большом индексе

У нас sql-запросы к индексу работают по пол-секунды и больше. Индекс около 35 млн строк (~20GB)
Примеры из query-лог:

2024-02-27 21:27:55	
/* Tue Feb 27 18:27:55.689 2024 conn 1160354 real 0.538 wall 0.539 found 1 */ SELECT COUNT(*) FROM retail_products WHERE product_id=2937526 AND city_id=213;
2024-02-27 21:27:39	
/* Tue Feb 27 18:27:39.072 2024 conn 1158218 real 0.508 wall 0.509 found 1 */ SELECT COUNT(*) FROM retail_products WHERE product_id=2941661 AND city_id=213;
2024-02-27 21:27:39	
/* Tue Feb 27 18:27:39.036 2024 conn 1158240 real 0.512 wall 0.513 found 1 */ SELECT COUNT(*) FROM retail_products WHERE product_id=2922830 AND city_id=213;
2024-02-27 21:27:37	
/* Tue Feb 27 18:27:37.459 2024 conn 1158060 real 0.539 wall 0.540 found 1 */ SELECT COUNT(*) FROM retail_products WHERE product_id=2854640 AND city_id=213;
2024-02-27 21:27:22	
/* Tue Feb 27 18:27:22.096 2024 conn 1156064 real 0.501 wall 0.502 found 1 */ SELECT COUNT(*) FROM retail_products WHERE product_id=2824657 AND city_id=213;

2024-02-27 22:44:17	
/* Tue Feb 27 19:44:16.870 2024 conn 1850127 real 0.520 wall 0.521 found 1 */ SELECT id,

    product_id,
    shop_id,
    city_id,
    shop_brand_id,

    price_retail,
    price_discounted_for_guest,
    price_discount_percent_for_guest,
    price_discounted_for_authorized,
    price_discount_percent_for_authorized,
    price_discounted_for_subscriber,
    price_discount_percent_for_subscriber,
    price_discounted_with_loyalty_card,
    price_discount_percent_with_loyalty_card,

    availability_quantity,

    hall,
    rack,
    shelf,

    1000000000000 * literature_work_publishing_year + released_at as newness FROM retail_products WHERE product_id=2846808 AND city_id=213 LIMIT 1 OPTION max_matches=1;

2024-02-27 22:44:14	
/* Tue Feb 27 19:44:14.628 2024 conn 1849705 real 0.552 wall 0.552 found 1 */ SELECT id,

    product_id,
    shop_id,
    city_id,
    shop_brand_id,

    price_retail,
    price_discounted_for_guest,
    price_discount_percent_for_guest,
    price_discounted_for_authorized,
    price_discount_percent_for_authorized,
    price_discounted_for_subscriber,
    price_discount_percent_for_subscriber,
    price_discounted_with_loyalty_card,
    price_discount_percent_with_loyalty_card,

    availability_quantity,

    hall,
    rack,
    shelf,

    1000000000000 * literature_work_publishing_year + released_at as newness FROM retail_products WHERE product_id=2661897 AND city_id=213 LIMIT 1 OPTION max_matches=1;

Результат count-запроса как правило от 0 до 80.
DDL по индексу:

CREATE TABLE retail_products ( id bigint, literature_work_publishing_year integer, rating_star integer, rating_weight integer, review_count integer, purchase_stats_day_avg_count integer, school_education_system_id integer, school_umk_id integer, school_material_type_id integer, school_exam_year_id integer, publisher_series_id integer, publisher_id integer, publisher_brand_id integer, manufacturer_id integer, manufacturer_brand_id integer, literature_work_cycle_id integer, literature_work_cycle_volume_id integer, binding_id integer, product_type_id integer, stationery_format_id integer, stationery_line_type_id integer, stationery_ink_color_id integer, stationery_ink_base_id integer, stationery_lead_diameter_id integer, stationery_case_shape_id integer, stationery_mechanism_type_id integer, stationery_diameter_id integer, stationery_feature_id integer, stationery_colors_quantity_id integer, stationery_gender_id integer, stationery_length_id integer, stationery_staple_number_id integer, stationery_stapler_number_id integer, stationery_material_id integer, stationery_punched_sheets_quantity_id integer, stationery_pen_thickness_id integer, stationery_pen_tip_shape_id integer, stationery_mounting_type_id integer, stationery_calculator_capacity_id integer, stationery_calendar_year_id integer, stationery_calendar_type_id integer, stationery_calendar_subject_id integer, stationery_clasp_type_id integer, stationery_compartments_quantity_id integer, stationery_cover_binding_id integer, stationery_cover_surface_id integer, stationery_package_type_id integer, stationery_universal_id integer, stationery_sheets_quantity_id integer, stationery_color_id integer, stationery_volume_id integer, game_audience_id integer, game_child_age_id integer, game_series_id integer, game_duration_id integer, constructor_nation_id integer, constructor_equipment_type_id integer, souvenir_format_id integer, souvenir_set_quantity_id integer, toy_type_id integer, toy_height_id integer, product_id integer, city_id integer, shop_brand_id integer, availability_status integer, availability_quantity integer, age_restriction bigint, shop_id bigint, preorder_available_at timestamp, released_at timestamp, updated_at timestamp, is_school_prepare bool, is_out_of_class_reading bool, rating_average float, price_retail float, price_discounted_for_guest float, price_discount_percent_for_guest float, price_discounted_for_authorized float, price_discount_percent_for_authorized float, price_discounted_for_subscriber float, price_discount_percent_for_subscriber float, price_discounted_with_loyalty_card float, price_discount_percent_with_loyalty_card float, image string attribute, hall string attribute, rack string attribute, shelf string attribute, marketing_status_list multi, school_grade_id_list multi, school_subject_id_list multi, school_exam_id_list multi, category_id_list multi, tbk_id_list multi, ekn_id_list multi, author_id_list multi, translator_id_list multi, illustrator_id_list multi, literature_work_cycle_id_list multi, tag_id_list multi, product_set_id_list multi, stationery_lead_hardness_id_list multi, stationery_brush_shape_id_list multi, stationery_brush_material_id_list multi, stationery_brush_number_id_list multi, stationery_painting_technique_id_list multi, comic_universal_id_list multi, comic_character_id_list multi, comic_genre_id_list multi, comic_series_id_list multi, comic_type_id_list multi, comic_line_id_list multi, comic_section_id_list multi, comic_subject_id_list multi, game_player_quantity_id_list multi, game_use_case_id_list multi, game_skill_id_list multi, constructor_detail_quantity_id_list multi, souvenir_reason_id_list multi, gift_hobby_id_list multi, gift_for_children_id_list multi, gift_new_year_id_list multi, gift_section_id_list multi, school_purpose_id integer, school_material_type_id_list multi, gift_books_on_interest_id_list multi )

show index retail_products status:

index_type  rt
indexed_documents  29851212
indexed_bytes  0
ram_bytes  19540946310
disk_bytes  25348391685
disk_mapped  19433808429
disk_mapped_cached  19435192320
disk_mapped_doclists  0
disk_mapped_cached_doclists  0
disk_mapped_hitlists  0
disk_mapped_cached_hitlists  0
killed_documents  177441
killed_rate  0.59%
ram_chunk  105338382
ram_chunk_segments_count  23
disk_chunks  98
mem_limit  134217728
mem_limit_rate  95.00%
ram_bytes_retired  0
tid  176653
tid_saved  176653
query_time_1min  {"queries":2928, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.002, "pct95_sec":0.001, "pct99_sec":0.001}
query_time_5min  {"queries":14502, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.002, "pct95_sec":0.001, "pct99_sec":0.001}
query_time_15min  {"queries":35656, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.142, "pct95_sec":0.001, "pct99_sec":0.002}
query_time_total  {"queries":447750, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.776, "pct95_sec":0.002, "pct99_sec":0.002}
found_rows_1min  {"queries":2928, "avg":6, "min":0, "max":331, "pct95":24, "pct99":39}
found_rows_5min  {"queries":14502, "avg":6, "min":0, "max":699, "pct95":22, "pct99":45}
found_rows_15min  {"queries":35656, "avg":5, "min":0, "max":705, "pct95":22, "pct99":62}
found_rows_total  {"queries":447750, "avg":4, "min":0, "max":706, "pct95":12, "pct99":81}

При этом, заметили в логах searchd записи optimized progressive chunk

2024-02-27 21:20:56	
[Tue Feb 27 18:20:56.093 2024] [1929] rt: table retail_products: optimized progressive chunk(s) 109 ( left 98 ) in 20m 29.6s
2024-02-27 19:47:13	
[Tue Feb 27 16:12:12.535 2024] [996889] rt: table retail_products: optimized progressive chunk(s) 136 ( left 99 ) in 35m 3.8s
2024-02-27 19:47:13	
[Tue Feb 27 15:35:04.623 2024] [996902] rt: table retail_products: optimized progressive chunk(s) 165 ( left 114 ) in 32m 34.1s
2024-02-27 19:47:13	
[Tue Feb 27 15:02:27.444 2024] [996870] rt: table retail_products: optimized progressive chunk(s) 325 ( left 128 ) in 1h 4.8m
2024-02-27 19:47:13	
[Tue Feb 27 13:57:32.195 2024] [996878] rt: table retail_products: optimized progressive chunk(s) 1974 ( left 178 ) in 7h 46.6m
  1. С чем могут быть связано долгое выполнение запросов?
  2. Какие есть приёмы/подходы, которые позволят быстро искать в manticore по 30 млн. записей с нашими условиями product_id=2922830 AND city_id=213?
  3. Что означают записи «optimized progressive chunk» в логах? они могут негативно влиять на производительность запросов?
  4. Является ли допустимым использование нами manticore в подобных сценариях и какой тип хранения лучше выбрать? (при 30 млн. записей и отсутствием match)

вам нужно использовать Secondary Indexes - поставить библиотеку MCL и переиндексировать данные чтобы создать SI

в это случае full-scan заменится на выборку нужного документа и не будет такой долгой работы запроса

Добавили библиотеку MCL, включили SI. Сделали реиндекс.
Сделали замеры, с SI=1 результаты медленней чем с SI=0. Почему такое может происходить?
manticore/write-only/issue-chg-20240229 - сюда добавили наш индекс, на котором проводили замеры.

Для тестов использовали mysqlslap
Замеры с включенным SI:

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.068 seconds
        Minimum number of seconds to run all queries: 0.056 seconds
        Maximum number of seconds to run all queries: 1.109 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra2.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.060 seconds
        Minimum number of seconds to run all queries: 0.043 seconds
        Maximum number of seconds to run all queries: 6.240 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra3.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.066 seconds
        Minimum number of seconds to run all queries: 0.053 seconds
        Maximum number of seconds to run all queries: 3.360 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra4.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.064 seconds
        Minimum number of seconds to run all queries: 0.050 seconds
        Maximum number of seconds to run all queries: 3.406 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra5.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.064 seconds
        Minimum number of seconds to run all queries: 0.052 seconds
        Maximum number of seconds to run all queries: 2.750 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

Замеры с SI=0:

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.078 seconds
        Minimum number of seconds to run all queries: 0.064 seconds
        Maximum number of seconds to run all queries: 1.071 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra2.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.043 seconds
        Minimum number of seconds to run all queries: 0.036 seconds
        Maximum number of seconds to run all queries: 0.090 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra3.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.079 seconds
        Minimum number of seconds to run all queries: 0.065 seconds
        Maximum number of seconds to run all queries: 0.152 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra4.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.063 seconds
        Minimum number of seconds to run all queries: 0.054 seconds
        Maximum number of seconds to run all queries: 0.161 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

[serhio@zeon libra]$ mysqlslap --concurrency=5 --iterations=1000 --query=./libra5.sql --delimiter=";" --host=10.9.2.132 -P 9306
Benchmark
        Average number of seconds to run all queries: 0.076 seconds
        Minimum number of seconds to run all queries: 0.062 seconds
        Maximum number of seconds to run all queries: 0.150 seconds
        Number of clients running queries: 5
        Average number of queries per client: 1

примеры запросов:

/*libra.sql*/
SELECT id,
    product_id,
    shop_id,
    city_id,
    shop_brand_id,
    price_retail,
    price_discounted_for_guest,
    price_discount_percent_for_guest,
    price_discounted_for_authorized,
    price_discount_percent_for_authorized,
    price_discounted_for_subscriber,
    price_discount_percent_for_subscriber,
    price_discounted_with_loyalty_card,
    price_discount_percent_with_loyalty_card,
    availability_quantity,
    hall,
    rack,
    shelf,
    1000000000000 * literature_work_publishing_year + released_at as newness,
    id IN (2771904,2836205,2847786,2882276,2725930,2817104,2562833,2415394,2951931,2429492,2955104,3001056,2967330,2544501,2955103,2031012,2731949,2544502,2544665,2837826,2373075,2993020,2764390,2784356,2317468,3000360,2979982,2843014,2610892,2777502,2538093,2603892,2816605,2368717,2951932,2816603,2757521,2732145,2901600,2506013) as is_first_product
FROM retail_products
WHERE product_id IN (2031012,2317468,2368717,2373075,2415394,2429492,2506013,2538093,2544501,2544502,2544665,2562833,2603892,2610892,2725930,2731949,2732145,2757521,2764390,2771904,2777502,2784356,2816603,2816605,2817104,2836205,2837826,2843014,2847786,2882276,2901600,2951931,2951932,2955103,2955104,2967330,2979982,2993020,3000360,3001056) AND shop_id=805
ORDER BY is_first_product DESC
LIMIT 40
OPTION max_matches=40;

/*libra2.sql*/
SELECT id,
    product_id,
    shop_id,
    city_id,
    shop_brand_id,
    price_retail,
    price_discounted_for_guest,
    price_discount_percent_for_guest,
    price_discounted_for_authorized,
    price_discount_percent_for_authorized,
    price_discounted_for_subscriber,
    price_discount_percent_for_subscriber,
    price_discounted_with_loyalty_card,
    price_discount_percent_with_loyalty_card,
    availability_quantity,
    hall,
    rack,
    shelf,
    1000000000000 * literature_work_publishing_year + released_at as newness,
    id IN (3004098,3003069,2925489,2963777) as is_first_product
FROM retail_products
WHERE product_id IN (2925489,2963777,3003069,3004098) AND shop_id=299
ORDER BY is_first_product DESC
LIMIT 40
OPTION max_matches=40;

/*libra3.sql*/
SELECT id,
    product_id,
    shop_id,
    city_id,
    shop_brand_id,
    price_retail,
    price_discounted_for_guest,
    price_discount_percent_for_guest,
    price_discounted_for_authorized,
    price_discount_percent_for_authorized,
    price_discounted_for_subscriber,
    price_discount_percent_for_subscriber,
    price_discounted_with_loyalty_card,
    price_discount_percent_with_loyalty_card,
    availability_quantity,
    hall,
    rack,
    shelf,
    1000000000000 * literature_work_publishing_year + released_at as newness,
    id IN (2982762,2866210,2982763,2985354,2906532,2809515,2535336,3003210,2969902,2977099,2981123,2809152,2921824,2989303,2411504,2921387,2920458,2981125,2965264,2981335,2945261,2603048,2879706,2879707,2945259,2996951,2732115,2920955,2939708,2982764,2985340,2985341,3022400,2926906,2732091,2801723,2831976,2925697,2757635,2925696) as is_first_product
FROM retail_products
WHERE product_id IN (2411504,2535336,2603048,2732091,2732115,2757635,2801723,2809152,2809515,2831976,2866210,2879706,2879707,2906532,2920458,2920955,2921387,2921824,2925696,2925697,2926906,2939708,2945259,2945261,2965264,2969902,2977099,2981123,2981125,2981335,2982762,2982763,2982764,2985340,2985341,2985354,2989303,2996951,3003210,3022400) AND shop_id=336
ORDER BY is_first_product DESC
LIMIT 40
OPTION max_matches=40;

/*libra4.sql*/
SELECT id,
    product_id,
    shop_id,
    city_id,
    shop_brand_id,
    price_retail,
    price_discounted_for_guest,
    price_discount_percent_for_guest,
    price_discounted_for_authorized,
    price_discount_percent_for_authorized,
    price_discounted_for_subscriber,
    price_discount_percent_for_subscriber,
    price_discounted_with_loyalty_card,
    price_discount_percent_with_loyalty_card,
    availability_quantity,
    hall,
    rack,
    shelf,
    1000000000000 * literature_work_publishing_year + released_at as newness,
    id IN (2735904,2701331,2702081,2900630,2735931,2719254,2735936,2749268,2876874,2702456,2701459,2824365,2814254,2735905,2884720,2765749,2827331,2834528,2735901,2735900,2837630,2719256,2726447,2682415,2766074,2717977,2797022,2749288,2818815,2718280,2828852,2717924,2897206,2898727,2735902,2735930,2891754,2718269,2808842,2666088) as is_first_product
FROM retail_products
WHERE product_id IN (2666088,2682415,2701331,2701459,2702081,2702456,2717924,2717977,2718269,2718280,2719254,2719256,2726447,2735900,2735901,2735902,2735904,2735905,2735930,2735931,2735936,2749268,2749288,2765749,2766074,2797022,2808842,2814254,2818815,2824365,2827331,2828852,2834528,2837630,2876874,2884720,2891754,2897206,2898727,2900630) AND shop_id=652
ORDER BY is_first_product DESC
LIMIT 40
OPTION max_matches=40;

/*libra5.sql*/
SELECT id,
    product_id,
    shop_id,
    city_id,
    shop_brand_id,
    price_retail,
    price_discounted_for_guest,
    price_discount_percent_for_guest,
    price_discounted_for_authorized,
    price_discount_percent_for_authorized,
    price_discounted_for_subscriber,
    price_discount_percent_for_subscriber,
    price_discounted_with_loyalty_card,
    price_discount_percent_with_loyalty_card,
    availability_quantity,
    hall,
    rack,
    shelf,
    1000000000000 * literature_work_publishing_year + released_at as newness,
    id IN (3025211,2879010,2765935,3004171,2760259,2720539,2774408,2930232,2930401,2976817,2834802,2756511,2903704,2326504,2855365,2858163,2891807,2899049,2931377,2954337,2954871,2920365,2919416,2785906,2208950,2803479,2839480,2800842,2882503) as is_first_product
FROM retail_products
WHERE product_id IN (2208950,2326504,2720539,2756511,2760259,2765935,2774408,2785906,2800842,2803479,2834802,2839480,2855365,2858163,2879010,2882503,2891807,2899049,2903704,2919416,2920365,2930232,2930401,2931377,2954337,2954871,2976817,3004171,3025211) AND shop_id=233
ORDER BY is_first_product DESC
LIMIT 40
OPTION max_matches=40;

надо взять один запрос который не ускорился \ замедлился и прогнать его как

set profiling=1;
query;
show profile;

и выложить счетчики от запроса без SI и запроса с SI

и так же посмотреть вывод meta чтобы убедится какие index использовались при выполнении запроса

query;
show meta;

“Maximum number of seconds to run all queries” не медленнее. Попробуйте увеличить iterations. И если есть возможность, посмотрите на 95/99 персентили. Это обычно более объективная метрика, чем min/avg/max.

поставили в проде SI=1, пока наблюдаем,
Из примечательного, изменился характер метрики CPU system time. Это ожидаемо?

при использовании SI происходит больше чтений из файла, при скане больше чтения памяти - разный патерн выполнения запроса, разный тип нагрузки на ноду