Manticore 6.3.0 FATAL: CRASH

Доброго времени суток!

Версия:

manticore-backup/jammy,now 1.3.8-24052208-57fc406 all [установлен, автоматически]
manticore-buddy/jammy,now 2.3.10-24052208-7612a4f all [установлен, автоматически]
manticore-columnar-lib/jammy,now 2.3.0-24052206-88a01c3 amd64 [установлен]
manticore-common/jammy,now 6.3.0-24052209-1811a9efb all [установлен, автоматически]
manticore-dev/jammy,now 6.3.0-24052209-1811a9efb all [установлен, автоматически]
manticore-executor/jammy,now 1.1.6-24052206-c55bc2b amd64 [установлен]
manticore-galera/jammy,now 3.37 amd64 [установлен]
manticore-icudata-65l/jammy,now 5.0.3-221123-d2d9e5e56 all [установлен, автоматически]
manticore-repo/now 0.0.4 all [установлен, локальный]
manticore-server-core/jammy,now 6.3.0-24052209-1811a9efb amd64 [установлен, автоматически]
manticore-server/jammy,now 6.3.0-24052209-1811a9efb amd64 [установлен, автоматически]
manticore-tools/jammy,now 6.3.0-24052209-1811a9efb amd64 [установлен, автоматически]
manticore-tzdata/jammy,now 1.0.0-240522-a8aa66e all [установлен, автоматически]
manticore/jammy,now 6.3.0-24052209-1811a9efb amd64 [установлен]

1 node, не кластер.

Стабильно стал падать, по логу уже почти 1300 FATAL: CRASH.

Лог searchd.log:

[Tue Jun 25 09:46:45.056 2024] [2543224] accepting connections
[Tue Jun 25 09:46:45.113 2024] [2543228] [BUDDY] started v2.3.10 '/usr/share/manticore/modules/manticore-buddy/bin/manticore-buddy --listen=http://10.9.2.99:9312 --bind=127.0.0.1 --disable-telemetry --threads=5 --skip=manticoresoftware/buddy-plugin-sharding --skip=manticoresoftware/buddy-plugin-queue' at http://127.0.0.1:44099
[Tue Jun 25 09:46:45.113 2024] [2543228] [BUDDY] Loaded plugins:
[Tue Jun 25 09:46:45.113 2024] [2543228] [BUDDY]   core: empty-string, backup, emulate-elastic, create, insert, alias, select, show, cli-table, plugin, test, alter-distributed-table, alter-rename-table, modify-table, knn, replace
[Tue Jun 25 09:46:45.113 2024] [2543228] [BUDDY]   local: 
[Tue Jun 25 09:46:45.113 2024] [2543228] [BUDDY]   extra: 
------- FATAL: CRASH DUMP -------
[Tue Jun 25 09:47:44.622 2024] [2543224]

--- crashed SphinxQL request dump ---
SELECT
    id,
    name,
    slug,
    image,

    price_retail,
    price_discounted_for_guest,

    price_discount_percent_for_guest,
    price_discounted_for_authorized,
    price_discount_percent_for_authorized,

    price_discounted_for_subscriber,
    price_discount_percent_for_subscriber,


    marketing_status_list,

    IF (
        availability_status != 40 AND availability_status 
!= 50 AND GREATEST(shop_id_list) > 0,
        20,
        availability_status
    
) as availability_status,
    availability_status as availability_status_online,
 
   availability_quantity,
    GREATEST(shop_id_list) > 0 as is_available_offline,

    IN(city_id_list, -1) as is_available_in_customer_city,
    IN(shop_id_list, -1) 
as is_available_in_customer_shop,
    IF (
        availability_status IN (40, 50),

        100,
        IF (
            IN(shop_id_list, -1),
            35,
     
       IF (
                IN(city_id_list, -1),
                30,
            
    IF (
                    GREATEST(shop_id_list) > 0,
                    20,
 
                   availability_status
                )
            )
        )
 
   ) as availability,  /* статус для сортировки */

    vendor_id,

    nds,

    main_category_id_list,
    author_id_list,

    product_type_id,
  
  article_number_id,
    product_collection_id_list,

    1000000000000 * literature_work_publishing_year 
+ released_at as newness

    , publisher_id, publisher_series_id, tbk_id_list, weight,
 height, width, length, tbk_id_list
    
FROM products
WHERE ALL(tbk_id_list) NOT 
IN (32, 30) AND ANY(shop_id_list) IN (0) AND availability_status = 50
ORDER BY availability 
DESC, purchase_stats_day_avg_count DESC, newness DESC, price_retail ASC
LIMIT 40
OFFSET 
0
OPTION
    max_matches = 40
--- request dump end ---
--- local index:products
Manticore 6.3.0 1811a9efb@24052209 (columnar 2.3.0 88a01c3@24052206) (secondary 2.3.0 88a01c3@24052206) (knn 2.3.0 88a01c3@24052206)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 16.0.6
Configured with flags: Configured with these definitions: -DDISTR_BUILD=jammy -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.21 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (jammy) (cross-compiled)
Stack bottom = 0x7f7c440c3ff0, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x20000)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x20000, stack=0x7f7c440c0000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x227)[0x558bb4ac9057]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x364)[0x558bb493f224]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f82c27f8520]
/usr/bin/searchd(_ZNK14Expr_MVAAggr_cIjE7IntEvalERK9CSphMatch+0x91)[0x558bb4b60671]
/usr/bin/searchd(_ZNK12Expr_GtInt_c7IntEvalERK9CSphMatch+0x14)[0x558bb4b4b544]
/usr/bin/searchd(_ZNK13Expr_AndInt_c7IntEvalERK9CSphMatch+0x25)[0x558bb4b4cdb5]
/usr/bin/searchd(_ZNK9Expr_If_c7IntEvalERK9CSphMatch+0x14)[0x558bb4b46634]
/usr/bin/searchd(_ZNK16CSphQueryContext10CalcFilterER9CSphMatch+0xa0)[0x558bb4a81a20]
/usr/bin/searchd(+0xf6bc1c)[0x558bb4a17c1c]
/usr/bin/searchd(_ZNK13CSphIndex_VLN18RunFullscanOnAttrsERK17RowIdBoundaries_tRK16CSphQueryContextR19CSphQueryResultMetaRK11VecTraits_TIP15ISphMatchSorterER9CSphMatchibil+0x1846)[0x558bb49e2476]
/usr/bin/searchd(_ZNK13CSphIndex_VLN12ScanByBlocksILb0EEEbRK16CSphQueryContextR19CSphQueryResultMetaRK11VecTraits_TIP15ISphMatchSorterER9CSphMatchibilPK17RowIdBoundaries_t+0xc7)[0x558bb4a81fb7]
/usr/bin/searchd(_ZNK13CSphIndex_VLN9MultiScanER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgsl+0xb02)[0x558bb49e9fc2]
/usr/bin/searchd(_ZNK13CSphIndex_VLN10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs+0x8d7)[0x558bb49f3f07]
/usr/bin/searchd(+0x1121409)[0x558bb4bcd409]
/usr/bin/searchd(+0x124216d)[0x558bb4cee16d]
/usr/bin/searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE+0x24f)[0x558bb5ae0fdf]
/usr/bin/searchd(_ZNK9RtIndex_c10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs+0xedf)[0x558bb4bbc53f]
/usr/bin/searchd(_ZNK13CSphIndexStub12MultiQueryExEiPK9CSphQueryP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x70)[0x558bb4a98910]
/usr/bin/searchd(+0xef8ea6)[0x558bb49a4ea6]
/usr/bin/searchd(+0x124216d)[0x558bb4cee16d]
/usr/bin/searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE+0x7a)[0x558bb5ae0e0a]
/usr/bin/searchd(_ZN15SearchHandler_c16RunLocalSearchesEv+0x741)[0x558bb4954111]
/usr/bin/searchd(_ZN15SearchHandler_c9RunSubsetEii+0x522)[0x558bb4955642]
/usr/bin/searchd(_ZN15SearchHandler_c10RunQueriesEv+0xa8)[0x558bb49525f8]
/usr/bin/searchd(_Z17HandleMysqlSelectR11RowBuffer_iR15SearchHandler_c+0x151)[0x558bb4978da1]
/usr/bin/searchd(_ZN15ClientSession_c7ExecuteESt4pairIPKciER11RowBuffer_i+0x17dc)[0x558bb4989adc]
/usr/bin/searchd(_Z8SqlServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x115c)[0x558bb48a749c]
/usr/bin/searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x43)[0x558bb48a3033]
/usr/bin/searchd(+0xdf7bbf)[0x558bb48a3bbf]
/usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB_+0x1c)[0x558bb5ae3c1c]
/usr/bin/searchd(make_fcontext+0x37)[0x558bb5b25b17]
Trying boost backtrace:
 0# sphBacktrace(int, bool) in /usr/bin/searchd
 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd
 2# 0x00007F82C27F8520 in /lib/x86_64-linux-gnu/libc.so.6
 3# Expr_MVAAggr_c<unsigned int>::IntEval(CSphMatch const&) const in /usr/bin/searchd
 4# Expr_GtInt_c::IntEval(CSphMatch const&) const in /usr/bin/searchd
 5# Expr_AndInt_c::IntEval(CSphMatch const&) const in /usr/bin/searchd
 6# Expr_If_c::IntEval(CSphMatch const&) const in /usr/bin/searchd
 7# CSphQueryContext::CalcFilter(CSphMatch&) const in /usr/bin/searchd
 8# 0x0000558BB4A17C1C in /usr/bin/searchd
 9# CSphIndex_VLN::RunFullscanOnAttrs(RowIdBoundaries_t const&, CSphQueryContext const&, CSphQueryResultMeta&, VecTraits_T<ISphMatchSorter*> const&, CSphMatch&, int, bool, int, long) const in /usr/bin/searchd
10# bool CSphIndex_VLN::ScanByBlocks<false>(CSphQueryContext const&, CSphQueryResultMeta&, VecTraits_T<ISphMatchSorter*> const&, CSphMatch&, int, bool, int, long, RowIdBoundaries_t const*) const in /usr/bin/searchd
11# CSphIndex_VLN::MultiScan(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&, long) const in /usr/bin/searchd
12# CSphIndex_VLN::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in /usr/bin/searchd
13# 0x0000558BB4BCD409 in /usr/bin/searchd
14# 0x0000558BB4CEE16D in /usr/bin/searchd
15# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in /usr/bin/searchd
16# RtIndex_c::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in /usr/bin/searchd
17# CSphIndexStub::MultiQueryEx(int, CSphQuery const*, CSphQueryResult*, ISphMatchSorter**, CSphMultiQueryArgs const&) const in /usr/bin/searchd
18# 0x0000558BB49A4EA6 in /usr/bin/searchd
19# 0x0000558BB4CEE16D in /usr/bin/searchd
20# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in /usr/bin/searchd
21# SearchHandler_c::RunLocalSearches() in /usr/bin/searchd
22# SearchHandler_c::RunSubset(int, int) in /usr/bin/searchd
23# SearchHandler_c::RunQueries() in /usr/bin/searchd
24# HandleMysqlSelect(RowBuffer_i&, SearchHandler_c&) in /usr/bin/searchd
25# ClientSession_c::Execute(std::pair<char const*, int>, RowBuffer_i&) in /usr/bin/searchd
26# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in /usr/bin/searchd
27# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in /usr/bin/searchd
28# 0x0000558BB48A3BBF in /usr/bin/searchd
29# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
30# make_fcontext in /usr/bin/searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
--- active threads ---
thd 0 (work_4), proto mysql, state query, command select
--- Totally 3 threads, and 1 client-working threads ---
------- CRASH DUMP END -------
[Tue Jun 25 09:47:48.549 2024] [2510366] watchdog: main process 2543224 crashed via CRASH_EXIT (exit code 2), will be restarted
[Tue Jun 25 09:47:48.550 2024] [2510366] watchdog: main process 2543377 forked ok
[Tue Jun 25 09:47:48.551 2024] [2543377] Using local time zone '/etc/localtime'
[Tue Jun 25 09:47:48.552 2024] [2543377] starting daemon version '6.3.0 1811a9efb@24052209 (columnar 2.3.0 88a01c3@24052206) (secondary 2.3.0 88a01c3@24052206) (knn 2.3.0 88a01c3@24052206)' ...
[Tue Jun 25 09:47:48.552 2024] [2543377] listening on 10.9.2.99:9312 for sphinx and http(s)
[Tue Jun 25 09:47:48.552 2024] [2543377] listening on 10.9.2.99:9306 for mysql
[Tue Jun 25 09:47:48.552 2024] [2543377] listening on 10.9.2.99:9308 for sphinx and http(s)
[Tue Jun 25 09:47:48.689 2024] [2543378] WARNING: table 'publisher_brands': table 'publisher_brands': morphology option changed from config has no effect, ignoring
[Tue Jun 25 09:47:48.699 2024] [2543382] WARNING: table 'publishers': table 'publishers': morphology option changed from config has no effect, ignoring
[Tue Jun 25 09:47:48.757 2024] [2543379] WARNING: table 'publisher_series': table 'publisher_series': morphology option changed from config has no effect, ignoring
[Tue Jun 25 09:47:48.761 2024] [2543378] WARNING: table 'school_properties_values': table 'school_properties_values': morphology option changed from config has no effect, ignoring
[Tue Jun 25 09:47:48.942 2024] [2543380] WARNING: table 'persons': table 'persons': morphology option changed from config has no effect, ignoring
[Tue Jun 25 09:47:48.960 2024] [2543382] WARNING: table 'product_properties_values': table 'product_properties_values': morphology option changed from config has no effect, ignoring
[Tue Jun 25 09:47:49.119 2024] [2543381] WARNING: table 'products': table 'products': morphology option changed from config has no effect, ignoring
[Tue Jun 25 09:47:49.124 2024] [2543381] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.1032.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Tue Jun 25 09:47:49.127 2024] [2543381] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.1082.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Tue Jun 25 09:47:49.131 2024] [2543381] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.904.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Tue Jun 25 09:47:49.135 2024] [2543381] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.822.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Tue Jun 25 09:47:49.141 2024] [2543381] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.932.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Tue Jun 25 09:47:49.146 2024] [2543381] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.928.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index
[Tue Jun 25 09:47:49.150 2024] [2543381] WARNING: missing /var/lib/manticore/products_for_autocorrect/products_for_autocorrect.1021.spidx; secondary index(es) disabled, consider using ALTER REBUILD SECONDARY to recover the secondary index

Проверка по indextool по всем индексам прошла.

В чем проблема и как ее устранить?

На S3 загрузил логи: issue-240625

если indextool -c your.conf --check products завершается успешно, то нужно загрузить индекс products как описано в manual Uploading-your-data для воспроизведения креша локально.

У нас на production версия 6.2.12, такой проблемы нет.
На stage с версией 6.3.0 стабильно crash.
Вчера для проверки работы переустановил на версию 6.2.12. В ночь снова был такой же crash.
По логам еще смущает сообщение по сигналу:

--- local index:products
Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Handling signal 11
-------------- backtrace begins here ---------------

Загружу индекс products чуть позднее.

судя по стеку креш происходит при вычисление дерева выражений в ноде GREATEST(mva_attr) в наших тестах никаких крешей в этой ноде нет, мне очевидно что креш зависит от данных - поэтому без данных \ воспрозведения расследовать или починить креш невозможно.

Если вы не предоставите индекс \ воспроизводимый кейс, то вы можете убрать эти выражения из запроса и креш должен будет пропасть.

Попробую вытащить индекс этот.
Кстати, в prod такой же crash случился и тот же запрос на products.

Загрузил в рамках issue: FATAL CRASH при работе · Issue #2332 · manticoresoftware/manticoresearch · GitHub

По fix: FATAL CRASH при работе · Issue #2332 · manticoresoftware/manticoresearch · GitHub в релиз войдет или только в dev будет?

фикс будет в дев ветке и вместе со всеми остальными комитами войдёт в след релиз, но пока не понятно когда, тк недавно был основной релиз и на днях был maintance release тоже

Было бы неплохо в релизе увидеть fix.
Ну ладно. Пока будет на контроле, чтобы такое не повторилось в запросах.

если вам это критично - вы можете воспользоватся сервисом Manticore team's services