Index keeps Crashing at Random Times

  • A large index ~500MB keeps crashing randomly. Once a day usually
  • Using the indextool shows ‘rowitems count mismatch’ with a bunch of unexpected block indexes
  • The index is called listings and we have a listings_delta that gets merged into the main index daily
  • It seems that the crash occurs after the merging of the indexes but it is not immediate
  • We reindex the delta anytime a change is made, this can happen many times in a short period of time
  • We also reindex all deltas once every 30 minutes

I’m at a loss at how to further debug this. Upgraded to the latest version, running on Debian 11.



[Mon Sep 25 05:12:20.335 2023] [339947] rotating table 'heavy_invoice_records': success
[Mon Sep 25 05:12:20.335 2023] [339947] rotating table: all tables done
[Mon Sep 25 05:12:20.398 2023] [339944] caught SIGHUP (seamless=1, in_rotate=0, need_rotate=0)
[Mon Sep 25 05:12:20.425 2023] [339947] rotating table 'heavy_invoice_records': started
[Mon Sep 25 05:12:20.426 2023] [339947] RW-idx for rename to .old, acquiring...
[Mon Sep 25 05:12:20.426 2023] [339947] RW-idx for rename to .old, acquired...
[Mon Sep 25 05:12:20.427 2023] [339947] rotating table 'heavy_invoice_records': success
[Mon Sep 25 05:12:20.427 2023] [339947] rotating table: all tables done
------- FATAL: CRASH DUMP -------
[Mon Sep 25 05:12:28.435 2023] [339944]

--- crashed SphinxAPI request dump ---
AAABGQAAAeYAAAAAAAAAAQAAAAAAD0I/AAAABgAAAAgAAABEc3VtKChtb2RpZmllZF90cy84NjQwMC0xOTYyNS4y
MTY5OTA3NDEpKyh1c2VyX3dlaWdodCooMStleGFjdF9oaXQpKSkAAAAEAAAAJ2ZpbHRlcl9saXN0aW5n
X2lkIERFU0MsIEByZWxldmFuY2UgREVTQwAAAAsqcGV0ZXJiaWx0KgAAAAAAAAAXbGlzdGluZ3MgbGlz
dGluZ3NfZGVsdGEAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAWZmlsdGVyX2xpc3RpbmdfdHlwZV9p
ZAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGGZpbHRlcl9saXN0aW5nX3N0YXR1c19pZAAAAAAAAAAB
AAAAAAAAAAMAAAAAAAAAAAAAAAAAD0I/AAAAC0Bncm91cCBkZXNjAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAYAAAAKbGlzdGluZ19pZAAAAAoAAAAQcmVsYXRlZF9saXN0aW5ncwAAAAoAAAAFdGl0
bGUAAAAFAAAAA3ZpbgAAAAMAAAAMaW5zdXJlcl9pbmZvAAAABQAAAA1hZGp1c3Rlcl9pbmZvAAAABAAA
AAAAAAAAAAAAASo=
--- request dump end ---
--- local index:
Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.7
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bullseye -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (bullseye) (cross-compiled)
Stack bottom = 0x7fb0b80292f0, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7fb0b8030000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x22a)[0x559faadfdcfa]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x559faac7cce5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7fb0d6451140]
/usr/bin/searchd(_ZNK10Docstore_c6GetDocEjPK11VecTraits_TIiElb+0x97)[0x559fabadb127]
/usr/bin/searchd(_ZNK13CSphIndex_VLN6GetDocER13DocstoreDoc_tlPK11VecTraits_TIiElb+0x73)[0x559faad31853]
/usr/bin/searchd(_ZNK16Expr_GetStored_TILb1EE13GetBlobPackedERK9CSphMatch+0x54)[0x559fabae3654]
/usr/bin/searchd(+0xeb914c)[0x559faacdc14c]
/usr/bin/searchd(_Z18MinimizeAggrResultR12AggrResult_tRK9CSphQuerybRKN3sph9StringSetEP14QueryProfile_cPK18CSphFilterSettingsbb+0x650b)[0x559faac9027b]
/usr/bin/searchd(_ZN15SearchHandler_c9RunSubsetEii+0x1324)[0x559faac95164]
/usr/bin/searchd(_ZN15SearchHandler_c10RunQueriesEv+0xd4)[0x559faac91064]
/usr/bin/searchd(_Z19HandleCommandSearchR16ISphOutputBuffertR13InputBuffer_c+0x333)[0x559faac9bd63]
/usr/bin/searchd(_Z8ApiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x7b1)[0x559faac0a5c1]
/usr/bin/searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x12e)[0x559faac085ee]
/usr/bin/searchd(+0xde60b2)[0x559faac090b2]
/usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB_+0x1c)[0x559fabf4b62c]
/usr/bin/searchd(make_fcontext+0x2f)[0x559fabf6b9cf]
Trying boost backtrace:
 0# sphBacktrace(int, bool) in /usr/bin/searchd
 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd
 2# 0x00007FB0D6451140 in /lib/x86_64-linux-gnu/libpthread.so.0
 3# Docstore_c::GetDoc(unsigned int, VecTraits_T<int> const*, long, bool) const in /usr/bin/searchd
 4# CSphIndex_VLN::GetDoc(DocstoreDoc_t&, long, VecTraits_T<int> const*, long, bool) const in /usr/bin/searchd
 5# Expr_GetStored_T<true>::GetBlobPacked(CSphMatch const&) const in /usr/bin/searchd
 6# 0x0000559FAACDC14C in /usr/bin/searchd
 7# MinimizeAggrResult(AggrResult_t&, CSphQuery const&, bool, sph::StringSet const&, QueryProfile_c*, CSphFilterSettings const*, bool, bool) in /usr/bin/searchd
 8# SearchHandler_c::RunSubset(int, int) in /usr/bin/searchd
 9# SearchHandler_c::RunQueries() in /usr/bin/searchd
10# HandleCommandSearch(ISphOutputBuffer&, unsigned short, InputBuffer_c&) in /usr/bin/searchd
11# ApiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in /usr/bin/searchd
12# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in /usr/bin/searchd
13# 0x0000559FAAC090B2 in /usr/bin/searchd
14# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
15# make_fcontext in /usr/bin/searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
--- active threads ---
thd 0 (work_0), proto sphinx, state query, command search
--- Totally 2 threads, and 1 client-working threads ---
------- CRASH DUMP END -------
[Mon Sep 25 05:12:31.497 2023] [3329436] watchdog: main process 339944 crashed via CRASH_EXIT (exit code 2), will be restarted
[Mon Sep 25 05:12:31.497 2023] [3329436] watchdog: main process 359552 forked ok
[Mon Sep 25 05:12:31.523 2023] [359552] Reloading the config (38828 chars)
[Mon Sep 25 05:12:31.524 2023] [359552] Reconfigure the daemon
[Mon Sep 25 05:12:31.525 2023] [359552] starting daemon version '6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)' ...
[Mon Sep 25 05:12:31.525 2023] [359552] listening on all interfaces for sphinx and http(s), port=9312
[Mon Sep 25 05:12:31.525 2023] [359552] listening on all interfaces for mysql, port=9306
[Mon Sep 25 05:12:31.604 2023] [359557] prereading 13 tables
[Mon Sep 25 05:12:31.608 2023] [359557] preread 13 tables in 0.005 sec
[Mon Sep 25 05:12:31.610 2023] [359552] accepting connections
[Mon Sep 25 05:12:31.653 2023] [359556] [BUDDY] started v1.0.18 '/usr/share/manticore/modules/manticore-buddy/bin/manticore-buddy --listen=http://0.0.0.0:9312  --threads=4' at http://127.0.0.1:34867
[Mon Sep 25 05:12:31.681 2023] [359557] [BUDDY] Loaded plugins:
[Mon Sep 25 05:12:31.681 2023] [359557] [BUDDY]   core: empty-string, backup, emulate-elastic, insert, select, show, cli-table, plugin, test, insert-mva
[Mon Sep 25 05:12:31.681 2023] [359557] [BUDDY]   local: 
[Mon Sep 25 05:12:31.681 2023] [359557] [BUDDY]   extra: 
------- FATAL: CRASH DUMP -------
[Mon Sep 25 05:12:34.614 2023] [359552]

--- crashed SphinxAPI request dump ---
AAABGQAAAeYAAAAAAAAAAQAAAAAAD0I/AAAABgAAAAgAAABEc3VtKChtb2RpZmllZF90cy84NjQwMC0xOTYyNS4y
MTcwNjAxODUpKyh1c2VyX3dlaWdodCooMStleGFjdF9oaXQpKSkAAAAEAAAAJ2ZpbHRlcl9saXN0aW5n
X2lkIERFU0MsIEByZWxldmFuY2UgREVTQwAAAAsqcGV0ZXJiaWx0KgAAAAAAAAAXbGlzdGluZ3MgbGlz
dGluZ3NfZGVsdGEAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAWZmlsdGVyX2xpc3RpbmdfdHlwZV9p
ZAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGGZpbHRlcl9saXN0aW5nX3N0YXR1c19pZAAAAAAAAAAB
AAAAAAAAAAMAAAAAAAAAAAAAAAAAD0I/AAAAC0Bncm91cCBkZXNjAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAYAAAAKbGlzdGluZ19pZAAAAAoAAAAQcmVsYXRlZF9saXN0aW5ncwAAAAoAAAAFdGl0
bGUAAAAFAAAAA3ZpbgAAAAMAAAAMaW5zdXJlcl9pbmZvAAAABQAAAA1hZGp1c3Rlcl9pbmZvAAAABAAA
AAAAAAAAAAAAASo=
--- request dump end ---
--- local index:listings
Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)
Handling signal 11
-------------- backtrace begins here ---------------
Program compiled with Clang 15.0.7
Configured with flags: Configured with these definitions: -DDISTR_BUILD=bullseye -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore
Built on Linux x86_64 (bullseye) (cross-compiled)
Stack bottom = 0x7fb0bc043f40, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7fb0bc040000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x22a)[0x559faadfdcfa]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x559faac7cce5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7fb0d6451140]
/usr/bin/searchd(_ZNK10Docstore_c6GetDocEjPK11VecTraits_TIiElb+0x97)[0x559fabadb127]
/usr/bin/searchd(_ZNK13CSphIndex_VLN6GetDocER13DocstoreDoc_tlPK11VecTraits_TIiElb+0x73)[0x559faad31853]
/usr/bin/searchd(_ZNK16Expr_GetStored_TILb1EE13GetBlobPackedERK9CSphMatch+0x54)[0x559fabae3654]
/usr/bin/searchd(+0xeb914c)[0x559faacdc14c]
/usr/bin/searchd(_Z18MinimizeAggrResultR12AggrResult_tRK9CSphQuerybRKN3sph9StringSetEP14QueryProfile_cPK18CSphFilterSettingsbb+0x650b)[0x559faac9027b]
/usr/bin/searchd(_ZN15SearchHandler_c9RunSubsetEii+0x1324)[0x559faac95164]
/usr/bin/searchd(_ZN15SearchHandler_c10RunQueriesEv+0xd4)[0x559faac91064]
/usr/bin/searchd(_Z19HandleCommandSearchR16ISphOutputBuffertR13InputBuffer_c+0x333)[0x559faac9bd63]
/usr/bin/searchd(_Z8ApiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x7b1)[0x559faac0a5c1]
/usr/bin/searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x12e)[0x559faac085ee]
/usr/bin/searchd(+0xde60b2)[0x559faac090b2]
/usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8__invokeESB_+0x1c)[0x559fabf4b62c]
/usr/bin/searchd(make_fcontext+0x2f)[0x559fabf6b9cf]
Trying boost backtrace:
 0# sphBacktrace(int, bool) in /usr/bin/searchd
 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd
 2# 0x00007FB0D6451140 in /lib/x86_64-linux-gnu/libpthread.so.0
 3# Docstore_c::GetDoc(unsigned int, VecTraits_T<int> const*, long, bool) const in /usr/bin/searchd
 4# CSphIndex_VLN::GetDoc(DocstoreDoc_t&, long, VecTraits_T<int> const*, long, bool) const in /usr/bin/searchd
 5# Expr_GetStored_T<true>::GetBlobPacked(CSphMatch const&) const in /usr/bin/searchd
 6# 0x0000559FAACDC14C in /usr/bin/searchd
 7# MinimizeAggrResult(AggrResult_t&, CSphQuery const&, bool, sph::StringSet const&, QueryProfile_c*, CSphFilterSettings const*, bool, bool) in /usr/bin/searchd
 8# SearchHandler_c::RunSubset(int, int) in /usr/bin/searchd
 9# SearchHandler_c::RunQueries() in /usr/bin/searchd
10# HandleCommandSearch(ISphOutputBuffer&, unsigned short, InputBuffer_c&) in /usr/bin/searchd
11# ApiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in /usr/bin/searchd
12# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in /usr/bin/searchd
13# 0x0000559FAAC090B2 in /usr/bin/searchd
14# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
15# make_fcontext in /usr/bin/searchd

-------------- backtrace ends here ---------------
Please, create a bug report in our bug tracker (https://github.com/manticoresoftware/manticore/issues)
and attach there:
a) searchd log, b) searchd binary, c) searchd symbols.
Look into the chapter 'Reporting bugs' in the manual
(https://manual.manticoresearch.com/Reporting_bugs)
Dump with GDB via watchdog
--- active threads ---
thd 0 (work_3), proto sphinx, state query, command search
--- Totally 2 threads, and 1 client-working threads ---
------- CRASH DUMP END -------
[Mon Sep 25 05:12:37.832 2023] [3329436] watchdog: main process 359552 crashed via CRASH_EXIT (exit code 2), will be restarted
[Mon Sep 25 05:12:37.832 2023] [3329436] watchdog: main process 359581 forked ok
[Mon Sep 25 05:12:37.861 2023] [359581] Reloading the config (38828 chars)
[Mon Sep 25 05:12:37.862 2023] [359581] Reconfigure the daemon
[Mon Sep 25 05:12:37.862 2023] [359581] starting daemon version '6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)' ...
[Mon Sep 25 05:12:37.862 2023] [359581] listening on all interfaces for sphinx and http(s), port=9312
[Mon Sep 25 05:12:37.862 2023] [359581] listening on all interfaces for mysql, port=9306
[Mon Sep 25 05:12:37.927 2023] [359586] prereading 13 tables
[Mon Sep 25 05:12:37.931 2023] [359586] preread 13 tables in 0.004 sec
[Mon Sep 25 05:12:37.935 2023] [359581] accepting connections
[Mon Sep 25 05:12:37.963 2023] [359583] [BUDDY] started v1.0.18 '/usr/share/manticore/modules/manticore-buddy/bin/manticore-buddy --listen=http://0.0.0.0:9312  --threads=4' at http://127.0.0.1:46561
[Mon Sep 25 05:12:38.006 2023] [359584] [BUDDY] Loaded plugins:
[Mon Sep 25 05:12:38.006 2023] [359584] [BUDDY]   core: empty-string, backup, emulate-elastic, insert, select, show, cli-table, plugin, test, insert-mva
[Mon Sep 25 05:12:38.006 2023] [359584] [BUDDY]   local: 
[Mon Sep 25 05:12:38.006 2023] [359584] [BUDDY]   extra: 
------- FATAL: CRASH DUMP -------

Since the table is corrupted, I would check if you can reproduce the indextool failure after building/merging the tables from scratch and if you can - let us know how to do it by: