Инфра
Есть Manticore версии:
Server version: 7.0.0 92c650401@25013002 (columnar 4.0.0 5aa8e43@25012409) (secondary 4.0.0 5aa8e43@25012409) (knn 4.0.0 5aa8e43@25012409) git branch manticore-7.0.0...origin/manticore-7.0.0
Сервера:
indexer - для индексации и доставки индексов на поисковые сервера
proxy - Manticore с настроенными agents distributed индексов
search nodes - поисковые сервера с plain индексами
PostgreSQL - источник данных для индексов
Конфигурация индексов
Есть индекс products с конфигурацией main + delta и chunks 16, т.е. 32 индекса: 16 main и 16 delta. Каждая delta под свой main.
Конфигурация индексов:
Summary
source products_main_0
{
type = pgsql
sql_host = ***
sql_user = ***
sql_pass = ***
sql_db = ***
sql_port = ***
sql_query_pre = \
INSERT INTO manticore_indexer (index_name, indexing_started_at) \
VALUES ('products_main_0', NOW()) \
ON CONFLICT (index_name) DO UPDATE SET indexing_started_at = EXCLUDED.indexing_started_at
sql_query = \
SELECT id, name, seo_slug as slug, main_image as image, price_retail, price_retail_min, marketing_statuses as marketing_status_list, \
availability_status, availability_quantity, vendor_id, vat as nds, description, selling_text, \
reasons_to_buy as reasons_to_buy_list, literature_work_publishing_year, EXTRACT(EPOCH FROM preorder_available_at) AS preorder_available_at, EXTRACT(EPOCH FROM released_at) AS released_at, \
EXTRACT(EPOCH FROM created_at) AS created_at, EXTRACT(EPOCH FROM updated_at) AS updated_at, isbns, printing_page_count, printing_page_format, printing_copy_count, weight, height, width, length, \
excerpts, additional_images, rating_average, rating_weight, rating_star, rating_count, review_count, \
sales_stat_half_year_day_avg_quantity as purchase_stats_day_avg_count, \
sales_stat_year_turnover_amount_without_vat, sales_stat_year_turnover_quantity, sales_stat_year_markup, \
sales_stat_half_year_turnover_amount_without_vat, sales_stat_half_year_turnover_quantity, sales_stat_half_year_markup, \
sales_stat_quarter_turnover_amount_without_vat, sales_stat_quarter_turnover_quantity, sales_stat_quarter_markup, \
sales_stat_month_turnover_amount_without_vat, sales_stat_month_turnover_quantity, sales_stat_month_markup, \
school_grade_id_list, school_subject_id_list, school_material_type_id_list, school_exam_id_list, \
school_education_system_id_list, school_umk_id_list, school_umk_title_list as school_umk_title, school_exam_year_id_list, \
school_purpose_id_list, seo_title, seo_description, main_category_id_list, category_id_list, tbk_id_list, ekn_id_list, \
author_id_list, translator_id_list, illustrator_id_list, publisher_series_id, publisher_id, publisher_brand_id, \
manufacturer_brand_id, literature_work_cycle_id, literature_work_cycle_volume_id, age_restriction, \
printing_binding_id as binding_id, tag_id_list, product_set_id_list, author_full_name_list, publisher_series_name, \
publisher_name, publisher_brand_name, manufacturer_brand_name, product_type_id, article_number_id, \
stationery_brush_shape_id_list, stationery_brush_material_id_list, stationery_brush_number_id_list, \
stationery_painting_technique_id_list, stationery_lead_hardness_id_list, stationery_format_id_list, \
stationery_line_type_id_list, stationery_ink_color_id_list, stationery_lead_diameter_id_list, \
stationery_case_shape_id_list, stationery_mechanism_type_id_list, stationery_diameter_id_list, \
stationery_feature_id_list, stationery_colors_quantity_id_list, stationery_gender_id_list, stationery_length_id_list, \
stationery_staple_number_id_list, stationery_stapler_number_id_list, stationery_material_id_list, \
stationery_punched_sheets_quantity_id_list, stationery_pen_thickness_id_list, stationery_mounting_type_id_list, \
stationery_pen_tip_shape_id_list, stationery_ink_base_id_list, \
stationery_calculator_capacity_id_list, stationery_calendar_year_id_list, stationery_calendar_type_id_list, \
stationery_calendar_subject_id_list, stationery_clasp_type_id_list, stationery_compartments_quantity_id_list, \
stationery_cover_binding_id_list, stationery_package_type_id_list, stationery_cover_surface_id_list, \
stationery_universal_id_list, stationery_sheets_quantity_id_list, stationery_color_id_list, stationery_volume_id_list, \
comic_universal_id_list, comic_character_id_list, comic_genre_id_list, comic_series_id_list, comic_type_id_list, \
comic_line_id_list, comic_section_id_list, comic_subject_id_list, game_player_quantity_id_list, \
game_use_case_id_list, game_skill_id_list, game_audience_id_list, game_child_age_id_list, game_series_id_list, \
game_duration_id_list, constructor_detail_quantity_id_list, constructor_nation_id_list, \
constructor_equipment_type_id_list, souvenir_reason_id_list, souvenir_format_id_list, souvenir_set_quantity_id_list, \
toy_type_id_list, toy_height_id_list, gift_hobby_id_list, gift_for_children_idea_id_list as gift_for_children_id_list, \
gift_for_new_year_idea_id_list as gift_new_year_id_list, gift_section_id_list, gift_books_on_interest_id_list, \
product_collection_id_list, active_shops_id_list, active_cities_id_list, active_shop_brands_id_list, all_shops_id_list, \
all_cities_id_list, all_shop_brands_id_list, videos, attended_foreign_agents, \
EXTRACT(EPOCH FROM synchronized_at) AS synchronized_at \
FROM product \
WHERE is_removed = 'f' AND id >= $start AND id <= $end AND (id % 16) = 0
sql_query_range = \
SELECT MIN(id), MAX(id) FROM product
sql_range_step = 50000
sql_query_post_index = \
INSERT INTO manticore_indexer (index_name, indexing_ended_at) \
VALUES ('products_main_0', (SELECT indexing_started_at FROM manticore_indexer WHERE index_name='products_main_0')) \
ON CONFLICT (index_name) DO UPDATE SET indexing_ended_at = EXCLUDED.indexing_ended_at
# attributes
sql_attr_string = name
sql_attr_string = slug
sql_attr_string = description
...
sql_attr_multi = uint all_shop_brands_id_list from field
# attributes end
}
source products_delta_0: products_main_0
{
sql_query_pre = \
INSERT INTO manticore_indexer (index_name, indexing_started_at) \
VALUES ('products_delta_0', NOW()) \
ON CONFLICT (index_name) DO UPDATE SET indexing_started_at = EXCLUDED.indexing_started_at
sql_query = \
SELECT id, name, seo_slug as slug, main_image as image, price_retail, price_retail_min, marketing_statuses as marketing_status_list, \
availability_status, availability_quantity, vendor_id, vat as nds, description, selling_text, \
reasons_to_buy as reasons_to_buy_list, literature_work_publishing_year, EXTRACT(EPOCH FROM preorder_available_at) AS preorder_available_at, EXTRACT(EPOCH FROM released_at) AS released_at, \
EXTRACT(EPOCH FROM created_at) AS created_at, EXTRACT(EPOCH FROM updated_at) AS updated_at, isbns, printing_page_count, printing_page_format, printing_copy_count, weight, height, width, length, \
excerpts, additional_images, rating_average, rating_weight, rating_star, rating_count, review_count, \
sales_stat_half_year_day_avg_quantity as purchase_stats_day_avg_count, \
sales_stat_year_turnover_amount_without_vat, sales_stat_year_turnover_quantity, sales_stat_year_markup, \
sales_stat_half_year_turnover_amount_without_vat, sales_stat_half_year_turnover_quantity, sales_stat_half_year_markup, \
sales_stat_quarter_turnover_amount_without_vat, sales_stat_quarter_turnover_quantity, sales_stat_quarter_markup, \
sales_stat_month_turnover_amount_without_vat, sales_stat_month_turnover_quantity, sales_stat_month_markup, \
school_grade_id_list, school_subject_id_list, school_material_type_id_list, school_exam_id_list, \
school_education_system_id_list, school_umk_id_list, school_umk_title_list as school_umk_title, school_exam_year_id_list, \
school_purpose_id_list, seo_title, seo_description, main_category_id_list, category_id_list, tbk_id_list, ekn_id_list, \
author_id_list, translator_id_list, illustrator_id_list, publisher_series_id, publisher_id, publisher_brand_id, \
manufacturer_brand_id, literature_work_cycle_id, literature_work_cycle_volume_id, age_restriction, \
printing_binding_id as binding_id, tag_id_list, product_set_id_list, author_full_name_list, publisher_series_name, \
publisher_name, publisher_brand_name, manufacturer_brand_name, product_type_id, article_number_id, \
stationery_brush_shape_id_list, stationery_brush_material_id_list, stationery_brush_number_id_list, \
stationery_painting_technique_id_list, stationery_lead_hardness_id_list, stationery_format_id_list, \
stationery_line_type_id_list, stationery_ink_color_id_list, stationery_lead_diameter_id_list, \
stationery_case_shape_id_list, stationery_mechanism_type_id_list, stationery_diameter_id_list, \
stationery_feature_id_list, stationery_colors_quantity_id_list, stationery_gender_id_list, stationery_length_id_list, \
stationery_staple_number_id_list, stationery_stapler_number_id_list, stationery_material_id_list, \
stationery_punched_sheets_quantity_id_list, stationery_pen_thickness_id_list, stationery_mounting_type_id_list, \
stationery_pen_tip_shape_id_list, stationery_ink_base_id_list, \
stationery_calculator_capacity_id_list, stationery_calendar_year_id_list, stationery_calendar_type_id_list, \
stationery_calendar_subject_id_list, stationery_clasp_type_id_list, stationery_compartments_quantity_id_list, \
stationery_cover_binding_id_list, stationery_package_type_id_list, stationery_cover_surface_id_list, \
stationery_universal_id_list, stationery_sheets_quantity_id_list, stationery_color_id_list, stationery_volume_id_list, \
comic_universal_id_list, comic_character_id_list, comic_genre_id_list, comic_series_id_list, comic_type_id_list, \
comic_line_id_list, comic_section_id_list, comic_subject_id_list, game_player_quantity_id_list, \
game_use_case_id_list, game_skill_id_list, game_audience_id_list, game_child_age_id_list, game_series_id_list, \
game_duration_id_list, constructor_detail_quantity_id_list, constructor_nation_id_list, \
constructor_equipment_type_id_list, souvenir_reason_id_list, souvenir_format_id_list, souvenir_set_quantity_id_list, \
toy_type_id_list, toy_height_id_list, gift_hobby_id_list, gift_for_children_idea_id_list as gift_for_children_id_list, \
gift_for_new_year_idea_id_list as gift_new_year_id_list, gift_section_id_list, gift_books_on_interest_id_list, \
product_collection_id_list, active_shops_id_list, active_cities_id_list, active_shop_brands_id_list, all_shops_id_list, \
all_cities_id_list, all_shop_brands_id_list, videos, attended_foreign_agents, \
EXTRACT(EPOCH FROM synchronized_at) AS synchronized_at \
FROM product \
WHERE is_removed = 'f' AND changed_at_timestamp >= $start AND changed_at_timestamp <= $end AND (id % 16) = 0
sql_query_range = \
SELECT (SELECT EXTRACT(EPOCH FROM data_relevant_to AT TIME ZONE 'Europe/Moscow') FROM manticore_indexer WHERE index_name='products_main_0') min, \
(SELECT EXTRACT(EPOCH FROM indexing_started_at AT TIME ZONE 'Europe/Moscow') FROM manticore_indexer WHERE index_name='products_delta_0') max
sql_range_step = 1000
sql_query_post_index = \
INSERT INTO manticore_indexer (index_name, data_relevant_to) \
VALUES ('products_delta_0', (SELECT indexing_started_at FROM manticore_indexer WHERE index_name='products_delta_0')) \
ON CONFLICT (index_name) DO UPDATE SET data_relevant_to = EXCLUDED.data_relevant_to
sql_query_killlist = \
SELECT id FROM product \
WHERE changed_at_timestamp >= (SELECT EXTRACT(EPOCH FROM data_relevant_to AT TIME ZONE 'Europe/Moscow') FROM manticore_indexer WHERE index_name='products_main_0')::int \
AND changed_at_timestamp <= (SELECT EXTRACT(EPOCH FROM indexing_started_at AT TIME ZONE 'Europe/Moscow') FROM manticore_indexer WHERE index_name='products_delta_0')::int AND (id % 16) = 0
}
Сам индекс собирается по частям с chunks и содержит свой диапазон с документами.
Индексация запускается по cron:
#Ansible: index-products
0 */1 * * * …
#Ansible: index-products-delta
*/1 * * * * …
main индексируется каждый час, а delta каждую минуту.
В источнике PostgreSQL в таблице product используются поля:
changed_at_timestamp
- с timestamp измененной записи
is_removed
- флаг отметки удаления документа, где t - удаленный, f - не удаленный.
Процесс индексации и доставки
Условно весь процесс состоит из этапов:
- запуск cron под индексы
- индексация
- синхронизация на поисковые node
- подготовка и запуск ротации
Индексация идет параллельно с ограничением количества chunks.
Синхронизация идет параллельно на все поисковые node.
Ротация запускается параллельно на все поисковые node.
Если запущена delta, то на этапе подготовки к ротации идет проверка на уже ротиванный main и delta пропускается, если main уже был ротирован в этот интервал времени (минута).
Проблема
На скриншоте представлен график по метрике количество документов в индексе: main + delta с одной из поисковых node (результат идентичен на всех).
Хронология событий:
1, 2, 3 delta отрабатывают на 0, 1 и 2 минуте часа и все хорошо.
Индексируются они с диапазоном от прошлой полной индексации main и применяются к еще текущему индексу main.
Затем на почти конце 2 минуты приходит ротация нового main и количество документов верное, совпадает с источником.
До полной индексации main количество документов не верное в индексе. Это проблема.
Далее на 3 минуте отрабатывает уже новая delta (с новым диапазоном на начала часа и до старта новой delta) и применяется к уже новому main.
И на этом этапе происходит странное - в индексе main подавляется большое количество документов, что быть не должно!
Это стабильно всегда повторяется в одинаковой последовательности, первая delta после полной индексации main убивает большое количество документов, оно разное.
delta при этом индексируется корректно, количество документов и ее kill-list возрастает от последней полной индексации до начала новой delta.
Ниже представлены метрики по количеству документов и количеству документов в kill-list в delta.
Последняя delta примененная до индекса main:
products_delta_0 start_time: 1751540401 end_time: 1751544121 count: 584 kill_list_count: 2035
products_delta_1 start_time: 1751540401 end_time: 1751544121 count: 614 kill_list_count: 2052
products_delta_2 start_time: 1751540401 end_time: 1751544121 count: 608 kill_list_count: 2006
products_delta_3 start_time: 1751540401 end_time: 1751544121 count: 608 kill_list_count: 2078
products_delta_4 start_time: 1751540440 end_time: 1751544123 count: 606 kill_list_count: 1887
products_delta_5 start_time: 1751540440 end_time: 1751544123 count: 583 kill_list_count: 1888
products_delta_6 start_time: 1751540440 end_time: 1751544123 count: 584 kill_list_count: 1954
products_delta_7 start_time: 1751540440 end_time: 1751544123 count: 607 kill_list_count: 1896
products_delta_8 start_time: 1751540464 end_time: 1751544124 count: 610 kill_list_count: 1776
products_delta_9 start_time: 1751540465 end_time: 1751544124 count: 602 kill_list_count: 1799
products_delta_10 start_time: 1751540465 end_time: 1751544124 count: 616 kill_list_count: 1759
products_delta_11 start_time: 1751540465 end_time: 1751544124 count: 598 kill_list_count: 1733
products_delta_12 start_time: 1751540486 end_time: 1751544126 count: 560 kill_list_count: 1655
products_delta_13 start_time: 1751540486 end_time: 1751544126 count: 550 kill_list_count: 1669
products_delta_14 start_time: 1751540487 end_time: 1751544126 count: 578 kill_list_count: 1649
products_delta_15 start_time: 1751540487 end_time: 1751544126 count: 628 kill_list_count: 1611
Итого: 9536 - количество документов в индексе delta, 29447 - количество документов в kill-list. Как правило количество совпадет.
И фактическое количество документов под каждый индекс delta:
collected 608
collected 608
collected 614
collected 584
collected 606
collected 585
collected 583
collected 608
collected 602
collected 610
collected 616
collected 598
collected 550
collected 560
collected 578
collected 628
Итого: 9538:
Первая delta примененная после индекса main:
products_delta_0 start_time: 1751544001 end_time: 1751544182 count: 30 kill_list_count: 30
products_delta_1 start_time: 1751544001 end_time: 1751544182 count: 35 kill_list_count: 35
products_delta_2 start_time: 1751544001 end_time: 1751544182 count: 36 kill_list_count: 36
products_delta_3 start_time: 1751544001 end_time: 1751544182 count: 33 kill_list_count: 33
products_delta_4 start_time: 1751544046 end_time: 1751544183 count: 25 kill_list_count: 25
products_delta_5 start_time: 1751544046 end_time: 1751544183 count: 30 kill_list_count: 30
products_delta_6 start_time: 1751544046 end_time: 1751544183 count: 24 kill_list_count: 24
products_delta_7 start_time: 1751544046 end_time: 1751544183 count: 33 kill_list_count: 33
products_delta_8 start_time: 1751544071 end_time: 1751544184 count: 20 kill_list_count: 20
products_delta_9 start_time: 1751544071 end_time: 1751544184 count: 23 kill_list_count: 23
products_delta_10 start_time: 1751544071 end_time: 1751544184 count: 27 kill_list_count: 27
products_delta_11 start_time: 1751544071 end_time: 1751544184 count: 23 kill_list_count: 23
products_delta_12 start_time: 1751544091 end_time: 1751544185 count: 29 kill_list_count: 29
products_delta_13 start_time: 1751544091 end_time: 1751544185 count: 29 kill_list_count: 29
products_delta_14 start_time: 1751544091 end_time: 1751544185 count: 26 kill_list_count: 26
products_delta_15 start_time: 1751544091 end_time: 1751544185 count: 19 kill_list_count: 19
Итого: 442 - количество документов в индексе delta, 442 - количество документов в kill-list.
И фактическое количество документов под каждый индекс delta:
collected 36
collected 33
collected 30
collected 35
collected 24
collected 25
collected 33
collected 30
collected 20
collected 27
collected 23
collected 23
collected 29
collected 19
collected 29
collected 26
Итого: 442
Видно, что количество документов в индексе и количество документов в kill-list корректное.
Но почему подавляется 25К документов с новой delta в уже новом индексе main непонятно.
Причем в самом процессе нет какого-либо наложения, гонки или чего-либо еще.
Ротация происходит выборочно только под требуемые индексы.
Уже всю голову сломали, но не ясно в чем проблема и почему Manticore так делает, что применяется не верный kill-list, который ведет к подавлению большого количества документов?
searchd.log во вложении за этот интервал.
searchd.log (96.4 KB)