RTINDEX - GROUP N BY fatal crash

Ladislav_Kafka · November 2, 2020, 1:28pm

Hello,
I have come to this issue with 3.5.2. Manticore crashes when query is used with
“GROUP N BY”. Crash is occruing only when N > 1.

So with " GROUP 1 BY" its OK.
So with " GROUP 5 BY" its CRASH.

Example query follow bellow.

SELECT
    id
FROM test_index 
WHERE
    MATCH('@t s389962')
GROUP 10 BY pt
WITHIN GROUP ORDER BY id DESC
ORDER BY id DESC;

backtrace from log
-------------- backtrace begins here ---------------
Program compiled with 4.8.5
Configured with flags: Configured by CMake with these definitions: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DDISTR_BUILD=rhel7 -DUSE_SSL=ON -DDL_UNIXODBC=1 -DUNIXODBC_LIB=libodbc.so.2 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DUSE_LIBICONV=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmysqlclient.so.18 -DDL_PGSQL=1 -DPGSQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/data -DFULL_SHARE_DIR=/usr/share/manticore -DUSE_RE2=1 -DUSE_ICU=1 -DUSE_BISON=ON -DUSE_FLEX=ON -DUSE_SYSLOG=1 -DWITH_EXPAT=1 -DWITH_ICONV=ON -DWITH_MYSQL=1 -DWITH_ODBC=ON -DWITH_PGSQL=1 -DWITH_RE2=1 -DWITH_STEMMER=1 -DWITH_ZLIB=ON -DGALERA_SONAME=libgalera_manticore.so.31 -DSYSCONFDIR=/etc/manticoresearch
Host OS is Linux x86_64
Stack bottom = 0x7f41b4023c60, thread stack size = 0x20000
Trying manual backtrace:
Something wrong with thread stack, manual backtrace may be incorrect (fp=0x5956f0)
Wrong stack limit or frame pointer, manual backtrace failed (fp=0x5956f0, stack=0x7f41b4020000, stacksize=0x20000)
Trying system backtrace:
begin of system symbols:
/usr/bin/searchd(_Z12sphBacktraceib+0x90)[0x72daf0]
/usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x1ba)[0x595d0a]
/lib64/libpthread.so.0(+0xf630)[0x7f435f1c0630]
/lib64/libc.so.6(+0x156a66)[0x7f435e0fca66]
/usr/bin/searchd(_ZNK16CSphSchemaHelper17CloneMatchSpecialER9CSphMatchRKS0_RK11VecTraits_TIiE+0x66)[0x6846f6]
/usr/bin/searchd(_ZN23CSphKBufferNGroupSorterI16MatchGeneric2_fnLb0ELb0EE6PushExERK9CSphMatchlbbb+0x392)[0x804752]
/usr/bin/searchd[0x88943b]
/usr/bin/searchd(_ZNK9RtIndex_c10MultiQueryER15CSphQueryResultRK9CSphQueryRK11VecTraits_TIP15ISphMatchSorterERK18CSphMultiQueryArgs+0x22db)[0x8a72fb]
/usr/bin/searchd(_ZNK9RtIndex_c12MultiQueryExEiPK9CSphQueryP15CSphQueryResultPP15ISphMatchSorterRK18CSphMultiQueryArgs+0x73)[0x8a7733]
/usr/bin/searchd[0x5d43ff]
/usr/bin/searchd[0x978cd7]
/usr/bin/searchd(_ZN7Threads10CoExecuteNEiOSt8functionIFvvEE+0x1a5)[0x9748b5]
/usr/bin/searchd(_ZN15SearchHandler_c16RunLocalSearchesEv+0x4fb)[0x5c831b]
/usr/bin/searchd(_ZN15SearchHandler_c9RunSubsetEii+0xf15)[0x5e4b95]
/usr/bin/searchd(_ZN15SearchHandler_c10RunQueriesEv+0xbb)[0x5e59ab]
/usr/bin/searchd(_Z17HandleMysqlSelectR11RowBuffer_iR15SearchHandler_c+0x1a8)[0x5e5fe8]
/usr/bin/searchd(_ZN16CSphinxqlSession7ExecuteESt4pairIPKciER11RowBuffer_i+0x15f1)[0x603461]
/usr/bin/searchd[0x672c0c]
/usr/bin/searchd(_Z8SqlServe11SharedPtr_TIP16AsyncNetBuffer_c9Deleter_TIS1_L5ETYPE0EE16ISphRefcountedMTE+0x8b9)[0x673989]
/usr/bin/searchd[0x66edfa]
/usr/bin/searchd(ZZN7Threads11CoRoutine_cC1ESt8functionIFvvEEmENUlN5boost7context6detail10transfer_tEE_4_FUNES7+0x17)[0x974ee7]
/usr/bin/searchd(make_fcontext+0x2f)[0x97a67f]
-------------- backtrace ends here ---------------

tomat · November 2, 2020, 1:30pm

could you check your index with indextool then in case test passed well create ticket at Github there upload your index data that recreates this crash locally?

Ladislav_Kafka · November 2, 2020, 1:36pm

Hello,
It’s a RT index… so indextool will not help me much, right?
I did complete index recreate before testing, its about 450MB index size.

I hope the stack trace will be helpful, I will try to prepare recreate case though.
Few notices:

Manticore 3.1. is working OK with these queries
The rt index in crash case does not fit to RAM chunk, its split over over couple 200MB .*spX

Sergey · November 3, 2020, 7:43am

It’s a RT index… so indextool will not help me much, right?

Indextool can check RT indexes too.

I did complete index recreate before testing, its about 450MB index size.

That’s great, tit should be easy to fix it then if you share the index. You can upload it to our write-only FTP Manticore Search Manual: Reporting bugs

Ladislav_Kafka · November 3, 2020, 10:33am

I perfomed indextool check, got this message:
WARNING: failed to load RAM chunks, checking only 1 disk chunks
could that be related?

I also noticed, what causes the crash:

this query is OK:

SELECT
    id, pt
FROM crash_index
WHERE MATCH('@t s9413')
GROUP 2 BY pt;

this query is CRASH:

SELECT
    id,
    pt,
    (1+2) AS test_expr
FROM crash_index
WHERE MATCH('@t s9413')
GROUP 2 BY pt;

To me, it’s obvious, that adding a expresion to select query causes the crash.

tomat · November 3, 2020, 10:55am

We have GROUP N BY tests and I added case that you described there

sphinxql-1> select id, (1+2) AS test_expr from idx_factors_rev where match ('this') group 2 by gid;
	id	test_expr
	1	3
	2	3
	5	3

and see no crashes - all works well.

That is why I need reproducible example or data there crash happens to investigate this issue further.

No that is not related to crash. That means that indextool sees daemon running and checks only disk backed chunks of RT index but not whole RT index.

Ladislav_Kafka · November 3, 2020, 11:29am

You are correct,
I also noticed it does not happen on small rt index, so case test passes OK.
Must be somehow related to chunked larger index or to this specific data structure.

I created RTINDEX - GROUP N BY expression select = fatal crash · Issue #435 · manticoresoftware/manticoresearch · GitHub
Data will be uploaded within 10minutes, 599MB. Full RT index.

tomat · November 17, 2020, 4:37pm

I’ve just fixed this crash at master branch e6424a13. You could use binaries from dev repo after CI will finish to get issue fixed.

Ladislav_Kafka · January 18, 2021, 9:16am

I wanted to thank you. After this bugfix I was able to upgrade to version 3.5.4 (from 3.2) and after 10 days in live production everything seems perfectly stable for us. We use only RT indexes.

memory leaks gone during “flush rtindex”
index size is smaller and queries are faster

I am looking forward to next great updates!

Only “strange” thing for me is this issue “‘data_dir’ cannot be mixed with index declarations”. Since we use script generated indexes and move eveything to “live” index declarations seems somehow unreal at this moment, se we can’t use new RT features yet.