Query timeout for local distributed indexes

Vishnu_shettigar · January 14, 2020, 12:09pm

As per the documentation I understood that agent_query_timeout value works for distributed index having remote agents, but does agent_query_timeout work for local distributed indexes as well ?

I have few hundreds of millions of rows , the queries I fire are taking too long time to return result,
Is there a way to have query_timeout for local distributed indexes so that i wouldn’t have to wait for all threads to complete the query and gather result

Sergey · January 14, 2020, 2:08pm

Hi

agent_query_timeout works only for agents. For locals (actually it will work for remote queries too) use per query OPTION max_query_time instead (Manticore Search Manual: Searching > Full text matching > Basic usage)

Vishnu_shettigar · January 15, 2020, 6:03am

Thanks @Sergey, After reading the documentation I found that using max_query_time may result in unstable non repeatable results, instead I could use max_predicted_time which gives repeatable results,

The documentation doesn’t properly mention how predicted_time_costs value can be configured

In the example section

predicted_time_costs = doc=128, hit=96, skip=4096, match=128

like doc=128 , now what does that mean.
How do I ensure that those values will give best performance for my index

What do each of the value present here mean ?
How does this affect performance?

Sergey · January 16, 2020, 6:08am

doc/hit/skip/match means how expensive (in nanoseconds) each of the factors is (fetching a doc, fetching a hit, making a skip and processing a match), but:

it depends on the hardware and it’s complicated to find the values close to true. Some statistics techniques (e.g. linear regression) may be used to find the optimal values if you have big enough training data set (query log), but our old experiments show that the quality is still not sufficient and the predicted time is often very different from the real time.
there are other contributing factors not included in the formula, that’s probably why the regression didn’t work out

So I would say that unless it’s really important for you and you are ready for doing some maths just use the defaults and check manually if you are satisfied by the result.

Vishnu_shettigar · January 20, 2020, 5:18am

Thank you @Sergey. Sounds like it’s complicated, I need to see what I can do.

dennis · January 24, 2020, 3:29pm

@Sergey I’ve been toying around with max_predicted_time the last few days too and, alas, in its current form it is basically unusable. I’ve had queries that took ~5 seconds wall time with a predicted time of 18 ms according to SHOW META with the default values for predicted_time_costs. If you have one index it might be feasible to tune the values, but with a couple of hundred it’s not.

For me there are two use cases that max_predicted_time would be perfect for:

Doing auto complete of very short queries with wild cards or expand_keywords e.g. “a*” in large indexes. I know, the results are pretty much useless, but customers like it, so…
As a safety net for other queries that take longer than a certain threshold.

In both cases it is important to get stable results, so max_query_time can not be used. Do you have any plans to get his feature into a usable state, possibly with some sort of auto tuning of the cost variables?

Sergey · January 24, 2020, 3:32pm

Hello @dennis

We don’t have plans to improve this in the nearest future.