How to kill a search thread?

Damir_Tresnjo · December 1, 2020, 3:22pm

Is there a way to kill a specific search thread? Similar how you can do it in mysql. Occasionally, we get these run away threads that run for 5-10 minutes. Normally, they run under a second but every once in a while they get stuck. It would be useful to be able to kill them and free up a search thread. At that point, a client has given up anyway, so no point in waiting to finish the thread.

1099 | work_0 | http,ssl | query | 127.0.0.1:59778 | 301 | 631.109693 | 2m | 0us | 0.00% | 50164588 | 11m | No (working) | 1 ch 2: api-search query=“901*” comment="" index=“myindex”

tomat · December 1, 2020, 3:31pm

currently there is no way to control execution or terminate threads at daemon

Damir_Tresnjo · December 1, 2020, 4:45pm

Thank you for the quick response.

barryhunter · December 2, 2020, 2:50pm

There is max_query_time
https://manual.manticoresearch.com/Searching/Options#max_query_time

You could perhaps add this to all queries, even if set to a high value (like 30 seconds) which should help to curtail long running queries. I’ve started adding that to nearly all queries, and these ‘zoombie’ threads, have all but disappeared.

Damir_Tresnjo · December 2, 2020, 5:16pm

Awesome tip. We will do that. Thank you!

Damir_Tresnjo · December 21, 2020, 10:35pm

The max_query_time does not seem to be working for us. I set it to 12s (12000) and after 6 minutes, it is still running

15275 | work_1 | mysql | query | 10.20.1.4:49642 | 80612 | 333.643670 | 38m | 0us | 0.00% | 320728 | 6m | No (working) | 1 ch 3: api-search query=“420230920*” comment="" index=“dx1parts” SELECT * from my_distributed_index ORDER BY isredpart ASC,altvendorpartcode ASC LIMIT 0,50 OPTION max_query_time=12000 |
| 15276 | work_2 | | - | | -1 | 433300.887719 | 8m | 0us | 0.00% | 322925 | 40us | 486ms ago |

barryhunter · December 22, 2020, 12:43pm

Looking at that query it seems to be a ‘full table scan’. ie there is no MATCH() clause.

While manticore is pretty good at them, because attributes are usually held entirely in memory, they are not very efficient. Particular as its a Distributed index.
… so the long running can be in an uninterupteable section. Its likely in the sort phase (the only way to implement aht query is to load all records, and sort them all (on each remote index). Before returning results to master and sorting again. Those sorts phases could be a lot of work, and wont be interupted in the middle. I htink max_query_time is only checked at intermediate points between query stages.

Frankly such a query might be better implemented in database, ie the database can optimze that query using an index scan.

Damir_Tresnjo · December 22, 2020, 4:17pm

Thanks Barry. My apologize, I shortened that query. Normally, this query runs in about 0.1s but every once in a while it gets stuck. Here is the complete query:

SELECT altvendorpartcode,upceancode,partdescription,partscatalogcode,brandname,partcategorycode,currentprice,binlocation,qtyonhand,qtyavailable
,organizationid,dealershipid,dealerinventorypartid,isdiscontinued,partid,partscatalogid,partisprivate,isactive,vendorpartcode,vendorcode
,IF(dealerinventorypartid=’’,1,0) AS isredpart FROM dx1partsalldist WHERE MATCH(‘420230920*’) AND (dealershipid IS NULL OR (dealershipid=‘0c610336-1c49-4887-b1f6-85c5084679ed’ AND organizationid=‘c4b82774-caab-4856-a073-101f39332fdd’)) AND isactive=1 GROUP BY partid WITHIN GROUP ORDER BY isredpart ASC ORDER BY isredpart ASC,altvendorpartcode ASC LIMIT 0,50 OPTION max_query_time=12000;

tomat · December 22, 2020, 6:01pm

According to index name dx1partsalldist it is a distributed index.
max_query_time is local index option, ie for distributed index every index will get option value and breaks search if time limit reached. However distributed index will process all indexes sequential or parallel depends on distributed index type.

ie for worst case of sequential processing of 10 local index in distributed and max_query_time=12000 you still get reply after 120 sec.

For distributed index with remote agent you could per distributed index use config option agent_query_timeout

You could also use similar option for query agent_query_timeout

tomat · December 22, 2020, 6:07pm

That is why there is no simple way of just kill query - as query might be a different kind:

search
insert \ replace
update
index management
daemon state
replication

and works with:

local index
group of local indexes
distributed index with only locals
distributed index with remote agents
and mix of them

This work is quite complex and need full specification first. I also saw not a lot of feature requests for that.

Damir_Tresnjo · December 22, 2020, 6:23pm

Thanks Stan. I appreciate the explanation. This distributed index has 3 local RT index and the query was still running after 10 minutes. I had to restart manticore to get rid of it. Normally, that query runs in about 0.1s but occasionally it gets stuck and ties up a search thread. You get a few of those stuck queries and your search is effectively non-responsive.

tomat · December 22, 2020, 8:00pm

each RT index consists multiple disk chunks they are just plain indexes where RT performs search too and that option got applied to every each of them

You could check amount of disk chunks for these RT indexes with ‘show index status’ statement and could reduce disk chunks count with ‘optimize’ statement

Damir_Tresnjo · December 22, 2020, 10:21pm

Thank you for that clarification. Makes sense. We do try and run optimize index at least once a day. That distributed index has 3x4chunks so 12 chunks in total, it should not have run over 10 minutes. Thank you for the explanations. I learned a lot.