Is there a way to kill a specific search thread? Similar how you can do it in mysql. Occasionally, we get these run away threads that run for 5-10 minutes. Normally, they run under a second but every once in a while they get stuck. It would be useful to be able to kill them and free up a search thread. At that point, a client has given up anyway, so no point in waiting to finish the thread.
You could perhaps add this to all queries, even if set to a high value (like 30 seconds) which should help to curtail long running queries. I’ve started adding that to nearly all queries, and these ‘zoombie’ threads, have all but disappeared.
Looking at that query it seems to be a ‘full table scan’. ie there is no MATCH() clause.
While manticore is pretty good at them, because attributes are usually held entirely in memory, they are not very efficient. Particular as its a Distributed index.
… so the long running can be in an uninterupteable section. Its likely in the sort phase (the only way to implement aht query is to load all records, and sort them all (on each remote index). Before returning results to master and sorting again. Those sorts phases could be a lot of work, and wont be interupted in the middle. I htink max_query_time is only checked at intermediate points between query stages.
Frankly such a query might be better implemented in database, ie the database can optimze that query using an index scan.
Thanks Barry. My apologize, I shortened that query. Normally, this query runs in about 0.1s but every once in a while it gets stuck. Here is the complete query:
SELECT altvendorpartcode,upceancode,partdescription,partscatalogcode,brandname,partcategorycode,currentprice,binlocation,qtyonhand,qtyavailable
,organizationid,dealershipid,dealerinventorypartid,isdiscontinued,partid,partscatalogid,partisprivate,isactive,vendorpartcode,vendorcode
,IF(dealerinventorypartid=’’,1,0) AS isredpart FROM dx1partsalldist WHERE MATCH(‘420230920*’) AND (dealershipid IS NULL OR (dealershipid=‘0c610336-1c49-4887-b1f6-85c5084679ed’ AND organizationid=‘c4b82774-caab-4856-a073-101f39332fdd’)) AND isactive=1 GROUP BY partid WITHIN GROUP ORDER BY isredpart ASC ORDER BY isredpart ASC,altvendorpartcode ASC LIMIT 0,50 OPTION max_query_time=12000;
According to index name dx1partsalldist it is a distributed index. max_query_time is local index option, ie for distributed index every index will get option value and breaks search if time limit reached. However distributed index will process all indexes sequential or parallel depends on distributed index type.
ie for worst case of sequential processing of 10 local index in distributed and max_query_time=12000 you still get reply after 120 sec.
For distributed index with remote agent you could per distributed index use config option agent_query_timeout
Thanks Stan. I appreciate the explanation. This distributed index has 3 local RT index and the query was still running after 10 minutes. I had to restart manticore to get rid of it. Normally, that query runs in about 0.1s but occasionally it gets stuck and ties up a search thread. You get a few of those stuck queries and your search is effectively non-responsive.
each RT index consists multiple disk chunks they are just plain indexes where RT performs search too and that option got applied to every each of them
You could check amount of disk chunks for these RT indexes with ‘show index status’ statement and could reduce disk chunks count with ‘optimize’ statement
Thank you for that clarification. Makes sense. We do try and run optimize index at least once a day. That distributed index has 3x4chunks so 12 chunks in total, it should not have run over 10 minutes. Thank you for the explanations. I learned a lot.