Searchd forks on crash

Hello!

I am using manticore 3.2.2 (it’s pretty old, but i am planning to update), and i face the following situation:

  1. Starting a process of searchd (searchd1), it’s working fine
  2. It gets some bad request and a crash happens
  3. searchd1 continue working, but it starts a new searchd (searchd2) instance
  4. searchd2 is not working, waste a lot of resources and has only one thread. It freezes for a long time, until a main processed is finished. It’s typical bt:
Thread 1 (Thread 0x75b6bc775700 (LWP 4071)):
#0  0x000075b91e5853c3 in select () at ../sysdeps/unix/syscall-template.S:84
#1  0x000057dbe57bbd04 in sphSleepMsec (iMsec=<optimized out>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinx.cpp:3033
#2  0x000057dbe57c3185 in sphSleepMsec (iMsec=iMsec@entry=50) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinx.cpp:3022
#3  0x000075b91f979bf2 in LazyJobs_c::FinishAllWorkers (this=0x75b91fba39a0 <LazyTasker()::dEvents>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchdtask.cpp:671
#4  LazyJobs_c::Shutdown (this=0x75b91fba39a0 <LazyTasker()::dEvents>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchdtask.cpp:774
#5  LazyJobs_c::~LazyJobs_c (this=0x75b91fba39a0 <LazyTasker()::dEvents>, __in_chrg=<optimized out>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchdtask.cpp:766
#6  0x000075b91e4d9940 in __run_exit_handlers (status=status@entry=1, listp=0x75b91e83d5d8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
    at exit.c:83
#7  0x000075b91e4d999a in __GI_exit (status=status@entry=1) at exit.c:105
#8  0x000057dbe5875808 in sphBacktrace (iFD=5, bSafe=<optimized out>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinxutils.cpp:3396
#9  0x000057dbe5733543 in SphCrashLogger_c::HandleCrash (sig=6) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:1101
#10 <signal handler called>
....

I have done some research and found fork() function executed with some logic in sphBacktrace() (in 3.2.2 and in newest version 6.3.6 i see this code either).

What is purpose of it? How can i get rid of this instances?

according to the stack you provided these are exit handlers these got called on regular shutdown and should be not called on crash - seems like a bug. As after sphBacktrace finished daemon should exit without any handlers called

Here is the full bt of current searchd2 instance:

Thread 1 (Thread 0x7bdcc9f6d700 (LWP 2176)):
#0  0x00007bdf2f6013c3 in select () at ../sysdeps/unix/syscall-template.S:84
#1  0x000067c50cb64d04 in sphSleepMsec (iMsec=<optimized out>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinx.cpp:3033
#2  0x000067c50cb6c185 in sphSleepMsec (iMsec=iMsec@entry=50) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinx.cpp:3022
#3  0x00007bdf309f5bf2 in LazyJobs_c::FinishAllWorkers (this=0x7bdf30c1f9a0 <LazyTasker()::dEvents>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchdtask.cpp:671
#4  LazyJobs_c::Shutdown (this=0x7bdf30c1f9a0 <LazyTasker()::dEvents>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchdtask.cpp:774
#5  LazyJobs_c::~LazyJobs_c (this=0x7bdf30c1f9a0 <LazyTasker()::dEvents>, __in_chrg=<optimized out>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchdtask.cpp:766
#6  0x00007bdf2f555940 in __run_exit_handlers (status=status@entry=1, listp=0x7bdf2f8b95d8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
    at exit.c:83
#7  0x00007bdf2f55599a in __GI_exit (status=status@entry=1) at exit.c:105
#8  0x000067c50cc1e808 in sphBacktrace (iFD=5, bSafe=<optimized out>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinxutils.cpp:3396
#9  0x000067c50cadc543 in SphCrashLogger_c::HandleCrash (sig=6) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:1101
#10 <signal handler called>
#11 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#12 0x00007bdf2f55442a in __GI_abort () at abort.c:89
#13 0x00007bdf2f590c00 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7bdf2f685d98 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#14 0x00007bdf2f596fc6 in malloc_printerr (action=3, str=0x7bdf2f685ea8 "free(): invalid next size (fast)", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5049
#15 0x00007bdf2f59780e in _int_free (av=0x7bdc5c000020, p=0x7bdc5c98b320, have_lock=0) at malloc.c:3905
#16 0x000067c50cb711aa in sphDeallocateSmall (pBlob=<optimized out>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinxstd.h:5205
#17 sphDeallocatePacked (pBlob=<optimized out>) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinx.cpp:7143
#18 CSphSchemaHelper::FreeDataSpecial (pMatch=0x7bdadda4bf78, dSpecials=...) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinx.cpp:7155
#19 0x000067c50cb38689 in KillAllDupes (tRes=..., pSorter=0x7bdc5c82c020) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:3886
#20 MergeAllMatches (pProfiler=0x0, pAggrFilter=<optimized out>, bMaster=true, bAllEqual=<optimized out>, bHaveLocals=true, tQuery=..., tRes=..., this=<optimized out>)
    at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:4608
#21 MinimizeAggrResult (tRes=..., tQuery=..., bHaveLocals=<optimized out>, hExtraColumns=..., pProfiler=0x0, pAggrFilter=<optimized out>, bForceRefItems=false, bMaster=true, dRemotes=...)
    at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:4722
#22 0x000067c50cb3abe9 in SearchHandler_c::RunSubset (this=<optimized out>, iStart=<optimized out>, iEnd=<optimized out>)
    at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:6619
#23 0x000067c50cb3c25d in SearchHandler_c::RunQueries (this=0x7bdcc9f6b2f0) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:5076
#24 0x000067c50cb3c928 in HandleMysqlSelect (dRows=..., tHandler=...) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:13890
#25 0x000067c50cb6057f in CSphinxqlSession::Execute (this=<optimized out>, sQuery=..., tOutput=..., uPacketID=<optimized out>, tThd=...)
    at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:16373
#26 0x000067c50cb3e541 in LoopClientMySQL (uPacketID=@0x7bdcbc853219: 1 '\001', tSession=..., sQuery=..., iPacketLen=<optimized out>, bProfile=<optimized out>, tThd=..., tIn=..., 
    tOut=...) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:17062
#27 0x000067c50cb3ef44 in ThdJobQL_t::Call (this=0x7bdcbc8511e0) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/searchd.cpp:21800
#28 0x000067c50cc2d7a2 in CSphThdPool::TickImpl (this=0x67c50e1d9980) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinxstd.cpp:2055
#29 CSphThdPool::Tick (pArg=0x67c50e1d9980) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinxstd.cpp:2021
#30 0x000067c50cc292c5 in sphThreadProcWrapper (pArg=0x67c50e1ddc20) at /root/rpmbuild/BUILD/sphinx-3.2.2/manticore-3.2.2/src/sphinxstd.cpp:884
#31 0x00007bdf30785494 in start_thread (arg=0x7bdcc9f6d700) at pthread_create.c:333
#32 0x00007bdf2f608acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

If i set iChild from int iChild = fork(); to -1 can it be temporary solution to avoid this situation or it can be done without code changes (cmake vars or something else)?

not sure about the 3.x-release code but run daemon WITHOUT cli --coredump and without any searchd logs should not call that code