Hi,

Does “option max_marches” needed for count(*) or sum(…) for a large number of rows?

Thank you

Hi,

Does “option max_marches” needed for count(*) or sum(…) for a large number of rows?

Thank you

max_matches, is the equivalent to the *rows* in the resultset. ie the total_found. Not the number of rows within each group.

In effect it can only keep track of ‘max_matches’ number of counts/sums etc. it that many rows.

So if the search result has lens than max_matches rows(ie groups) then the sum/count will be accurate

**If the number of rows (ie ‘total_found’) is above max_matches, then the counts/sums are NOT guaranteed to be accurate** (they may be, but possibly not!)

So a query

`SELECT category,COUNT(*) as count,SUM(points) as points FROM index GROUP BY category`

should be accurate if the **number of categories** is below max_matches (default 1000). *Regardless* of the number of documents **in** each category.

BUt if there are more than 1000 categories, then the numbers **might** be under-counted.

Thank you for you explanation, but what about with a simple query like this:

`SELECT COUNT(*) as count FROM index`

or `SELECT SUM(points) as count FROM index`

Well as there would only be 1 row in the **result**, and 1 is (much) less then 1000, it will be entirely accurate.

total_found will always be 1 on that type of query. Regardless of the number documents.

Thank you for the explanation.

Hello everyone. Looks clear for me in case of simple aggregation or with grouping. But I’m in two minds when it comes to query with filtering. Does it work in the same manner? e.g.

Is it aimed at keeping all rows in memory just before filtering OR in resulting set ( it doesn’t matter how many rows before filtering and grouping+aggregation)?

Assuming we have next data

day1 app1 some_revenue

day1 app2 some_revenue

…

dayN appM some_revenue

I have gotten inaccurate results doing some aggregations and limiting them.

For instance

select sum(revenue) from index where timestamp between start and end group by app_id order by sum(revenue) desc limit 25;

Resulting set count is 25 due to limit , but should I cover all possible distinct app ids in MAX_MATCHES number?

The unique number of apps could be too large, what advice will you give me?