Падает мантикор

Периодически падает searchd (несколько раз в день). Каких то зависимостей не было обнаружено. Падение было на предпоследней и на последней версии. Самый свежий лог в момент краша прикладываю:

------- FATAL: CRASH DUMP -------
[Tue Aug 15 04:01:46.207 2023] [290790]

--- crashed SphinxQL request dump ---
SELECT id
FROM products
WHERE
    MATCH('@(title,author_full_name_list,translator_full_name_list,
category_tree_list,brand,item_title_synonyms,umk_title,school_subject_title_list,material_type_title)
(избушка|на|костях)')


LIMIT 0
FACET product_type ORDER BY
COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET available ORDER BY COUNT(*) DESC, FACET()
DESC LIMIT 10000 FACET category_tree_id_list ORDER BY COUNT(*) DESC, FACET() DESC
LIMIT 10000 FACET marketing_status_list ORDER BY COUNT(*) DESC, FACET() DESC LIMIT
10000 FACET marketing_status_list ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 10000
FACET available ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET marketing_status_list
ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET author_id_list ORDER BY COUNT(*)
DESC, FACET() DESC LIMIT 11 FACET series_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT
11 FACET publishing_house_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 11 FACET manufacturer_id
ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 11 FACET school_class_id_list ORDER BY
COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET school_subject_id_list ORDER BY COUNT(*)
DESC, FACET() DESC LIMIT 10000 FACET material_type_id ORDER BY COUNT(*) DESC, FACET()
DESC LIMIT 10000 FACET education_system_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT
10000 FACET umk_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 11 FACET cover_id ORDER
BY COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET is_school_prepare ORDER BY COUNT(*)
DESC, FACET() DESC LIMIT 10000 FACET is_out_of_class_reading ORDER BY COUNT(*) DESC,
 FACET() DESC LIMIT 10000 FACET school_exam_id_list ORDER BY COUNT(*) DESC, FACET()
DESC LIMIT 10000 FACET school_exam_year_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT
10000
--- request dump end ---
--- local index:products
Manticore 6.2.0 45680f95d@230804 (columnar 2.2.0 dc33868@230804) (secondary 2.2.0 dc33868@230804)
Handling signal 6
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.7
Configured with flags: Configured with these definitions: -DDISTR_BUILD=jammy -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 ->Built on Linux x86_64 (jammy) (cross-compiled)
Stack bottom = 0x7efb600f7220, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7efb600f0000, stacksize=0x20000)

Доп.инфа:
Средняя нагрузка 100 qps
Несколько индексов: для товаров, подсказок, справочников.
Около 650 тысяч записей,
17 полнотекстовых полей
30 атрибутов
Размеры индексов не большие, самый тяжелый 600 мб

Железо:
Процессор 2 × Intel Silver 4214R (12x2.4 ГГц HT)
Память 64 ГБ — 8 × 8 ГБ DDR4 ECC Reg
Диск 1000 ГБ SSD NVMe M.2; 2 × 240 ГБ SSD SATA Enterprise
Сетевые карты
2 × 10 GE + port to Private network 10 Гбит/s
Материнская плата
X11DPi-NT

проверьте индекс products с помощью indextool

можете выложить лог демона где были бы видны стеки демона при креше, а не только запросы ?

в момент креша ничего интересного:

[Tue Aug 15 06:49:58.201 2023] [294382] WARNING: table '/var/lib/manticore/products/products.10309'
: wordforms file '/var/lib/manticore/products/wordforms.txt' is shared with table 'authors', but to
kenizer settings are different
[Tue Aug 15 06:49:58.208 2023] [294382] rt: table products: diskchunk 10309(79), segments 26  saved
 in 2.201474 (2.208324) sec, RAM saved/new 127965634/0 ratio 0.950000 (soft limit 127506841, conf l
imit 134217728)
[Tue Aug 15 07:00:37.536 2023] [294403] WARNING: table '/var/lib/manticore/products/products.10310'
: wordforms file '/var/lib/manticore/products/wordforms.txt' is shared with table 'authors', but to
kenizer settings are different
[Tue Aug 15 07:00:37.538 2023] [294403] rt: table products: diskchunk 10310(80), segments 27  saved
 in 2.501360 (2.502271) sec, RAM saved/new 127535101/0 ratio 0.950000 (soft limit 127506841, conf l
imit 134217728)
[Tue Aug 15 07:13:28.884 2023] [294424] WARNING: table '/var/lib/manticore/products/products.10311'
: wordforms file '/var/lib/manticore/products/wordforms.txt' is shared with table 'authors', but to
kenizer settings are different
[Tue Aug 15 07:13:28.890 2023] [294424] rt: table products: diskchunk 10311(81), segments 21  saved
 in 2.322513 (2.328248) sec, RAM saved/new 134358226/0 ratio 0.950000 (soft limit 127506841, conf l
imit 134217728)

Запускали на горячую:

WARNING: table '/var/lib/manticore/products/products.10302': wordforms file 'wordforms.txt' is shared with table 'products', but tokenizer settings are different
checking schema...
checking dictionary...
checking data...
checking rows...
checking attribute blocks index...
checking kill-list...
checking docstore...
checking dead row map...
checking doc-id lookup...
check FAILED, 1 failures reported, 157.9 sec elapsed
checking disk chunk, extension 10303, 72(94)...
thread_ts=1692050389.113099&cid=C04ETHSCH7G)

check FAILED,

Чуть дополню логи в момент падения:

------- FATAL: CRASH DUMP -------
[Tue Aug 15 04:01:46.207 2023] [290790]

--- crashed SphinxQL request dump ---
SELECT id
FROM products
WHERE
    MATCH('@(title,author_full_name_list,translator_full_name_list,
category_tree_list,brand,item_title_synonyms,umk_title,school_subject_title_list,material_type_title) 
(избушка|на|костях)')
    
    
LIMIT 0
FACET product_type ORDER BY 
COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET available ORDER BY COUNT(*) DESC, FACET() 
DESC LIMIT 10000 FACET category_tree_id_list ORDER BY COUNT(*) DESC, FACET() DESC 
LIMIT 10000 FACET marketing_status_list ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 
10000 FACET marketing_status_list ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 10000 
FACET available ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET marketing_status_list 
ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET author_id_list ORDER BY COUNT(*) 
DESC, FACET() DESC LIMIT 11 FACET series_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 
11 FACET publishing_house_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 11 FACET manufacturer_id 
ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 11 FACET school_class_id_list ORDER BY 
COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET school_subject_id_list ORDER BY COUNT(*) 
DESC, FACET() DESC LIMIT 10000 FACET material_type_id ORDER BY COUNT(*) DESC, FACET() 
DESC LIMIT 10000 FACET education_system_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 
10000 FACET umk_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 11 FACET cover_id ORDER 
BY COUNT(*) DESC, FACET() DESC LIMIT 10000 FACET is_school_prepare ORDER BY COUNT(*) 
DESC, FACET() DESC LIMIT 10000 FACET is_out_of_class_reading ORDER BY COUNT(*) DESC,
 FACET() DESC LIMIT 10000 FACET school_exam_id_list ORDER BY COUNT(*) DESC, FACET() 
DESC LIMIT 10000 FACET school_exam_year_id ORDER BY COUNT(*) DESC, FACET() DESC LIMIT 
10000
--- request dump end ---
--- local index:products
Manticore 6.2.0 45680f95d@230804 (columnar 2.2.0 dc33868@230804) (secondary 2.2.0 dc33868@230804)
Handling signal 6
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.7
Configured with flags: Configured with these definitions: -DDISTR_BUILD=jammy -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.21 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (jammy) (cross-compiled)
Stack bottom = 0x7efb600f7220, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7efb600f0000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x22a)[0x5618fdf1a56a]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x5618fdd99065]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7efcc1a4f520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7efcc1aa3a7c]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7efcc1a4f476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7efcc1a357f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x896f6)[0x7efcc1a966f6]
/lib/x86_64-linux-gnu/libc.so.6(+0xa0d7c)[0x7efcc1aadd7c]
/lib/x86_64-linux-gnu/libc.so.6(+0xa184c)[0x7efcc1aae84c]
/lib/x86_64-linux-gnu/libc.so.6(+0xa46ab)[0x7efcc1ab16ab]
/lib/x86_64-linux-gnu/libc.so.6(malloc+0x99)[0x7efcc1ab21b9]
/usr/bin/searchd(_Znwm+0x9)[0x5618fde1c769]
/usr/bin/searchd(_ZN20MatchesToNewSchema_c12ProcessMatchEP9CSphMatch+0x51)[0x5618feab5e41]
/usr/bin/searchd(_ZN22CSphKBufferGroupSorterI16MatchGeneric2_fn9UniqHLL_cLi0ELb0ELb0EE8FinalizeER16MatchProcessor_ibb+0x67)[0x5618fe23c607]
/usr/bin/searchd(_ZN13MatchSorter_c30TransformPooled2StandalonePtrsESt8functionIFPKhPK9CSphMatchEES0_IFPN8columnar10Columnar_iES5_EEb+0x478)[0x5618fdf248c8]
/usr/bin/searchd(_ZN23SorterSchemaTransform_c9TransformEP15ISphMatchSorterRK9RtGuard_t+0xef)[0x5618feb4874f]
/usr/bin/searchd(_ZNK9RtIndex_c10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs+0x211e)[0x5618feb4b37e]
/usr/bin/searchd(+0xebed46)[0x5618fddfdd46]
/usr/bin/searchd(+0x1d220bd)[0x5618fec610bd]
/usr/bin/searchd(_ZN7Threads4Coro8ExecuteNEiOSt8functionIFvvEE+0x78)[0x5618ff066ac8]
/usr/bin/searchd(_ZN15SearchHandler_c16RunLocalSearchesEv+0xb39)[0x5618fddaf439]
/usr/bin/searchd(_ZN15SearchHandler_c9RunSubsetEii+0x51a)[0x5618fddb079a]
/usr/bin/searchd(_ZN15SearchHandler_c10RunQueriesEv+0xd4)[0x5618fddad494]
/usr/bin/searchd(_Z17HandleMysqlSelectR11RowBuffer_iR15SearchHandler_c+0x14f)[0x5618fddd31bf]
/usr/bin/searchd(_Z20HandleMysqlMultiStmtRKN3sph8Vector_TI9SqlStmt_tNS_13DefaultCopy_TIS1_EENS_14DefaultRelimitENS_16DefaultStorage_TIS1_EEEER19CSphQueryResultMetaR11RowBuffer_iRK10CSphString+0x3cb)[0x5618fddd653b]
/usr/bin/searchd(_ZN15ClientSession_c7ExecuteESt4pairIPKciER11RowBuffer_i+0x742)[0x5618fdde2692]
/usr/bin/searchd(_Z20ProcessSqlQueryBuddySt4pairIPKciERhR21GenericOutputBuffer_c+0x52)[0x5618fdd43b02]
/usr/bin/searchd(_Z8SqlServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x105d)[0x5618fdd28f4d]
/usr/bin/searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x43)[0x5618fdd24cc3]
/usr/bin/searchd(+0xde68a2)[0x5618fdd258a2]
/usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB_+0x1c)[0x5618ff06987c]
/usr/bin/searchd(make_fcontext+0x37)[0x5618ff089b57]
Trying boost backtrace:
[Tue Aug 15 05:10:01.740 2023] [290790] caught SIGTERM, shutting down

могли бы вы создать тикет на Github - где выложить воспроизводимый пример?
конфиг, запросы из креш логов или searchd.log и индекс products

Да, соберу все и сделаю тикет
Также в логах периодически проскакивает
[Tue Aug 15 14:25:35.671 2023] [317766] WARNING: [BUDDY] [991843] P10: syntax error, unexpected TOK_IDENT, expecting $end near 'Error 500: Internal Server Error': Error 500: Internal Server Error

И еще заметил, что каждый раз перед крешем за несколько секунд до в логах searchd примерно такое:
rt: table products: diskchunk 10351(67), segments 10 saved in 2.666780 (2.669575) sec, RAM saved/new 130871429/0 ratio 0.950000 (soft limit 127506841, conf limit 134217728)

Это с чем связано и может быть причиной падения searchd?

мне кажется сообщения от бади не связано с крешем
но вы можете задать в конфиге в секции searchd - запуск бади в отладочном режиме \ с ключем --debug

buddy_path = /usr/share/manticore/modules/manticore-buddy/bin/manticore-buddy --debug

тогда в логе демона будет выводится все команды которые посылает бади в демон и будет понятно на какую команду демон так ругается

Хорошо, попробуем. Это никак не скажется на производительности? В продакшене можно поставить дебаг режим?

И еще вопрос, незадолго перед крешами всегда есть сброс на диск
rt: table products: diskchunk 10351(67), segments 10 saved in 2.666780 (2.669575) sec, RAM saved/new 130871429/0 ratio 0.950000 (soft limit 127506841, conf limit 134217728)

Это не может влиять на креши?