max_matches + count

Hi,

Does “option max_marches” needed for count(*) or sum(…) for a large number of rows?

Thank you

max_matches, is the equivalent to the rows in the resultset. ie the total_found. Not the number of rows within each group.

In effect it can only keep track of ‘max_matches’ number of counts/sums etc. it that many rows.

So if the search result has lens than max_matches rows(ie groups) then the sum/count will be accurate

If the number of rows (ie ‘total_found’) is above max_matches, then the counts/sums are NOT guaranteed to be accurate (they may be, but possibly not!)


So a query
SELECT category,COUNT(*) as count,SUM(points) as points FROM index GROUP BY category

should be accurate if the number of categories is below max_matches (default 1000). Regardless of the number of documents in each category.

BUt if there are more than 1000 categories, then the numbers might be under-counted.

Thank you for you explanation, but what about with a simple query like this:
SELECT COUNT(*) as count FROM index or SELECT SUM(points) as count FROM index

Well as there would only be 1 row in the result, and 1 is (much) less then 1000, it will be entirely accurate.

total_found will always be 1 on that type of query. Regardless of the number documents.

Thank you for the explanation.

Hello everyone. Looks clear for me in case of simple aggregation or with grouping. But I’m in two minds when it comes to query with filtering. Does it work in the same manner? e.g.
Is it aimed at keeping all rows in memory just before filtering OR in resulting set ( it doesn’t matter how many rows before filtering and grouping+aggregation)?

Assuming we have next data
day1 app1 some_revenue
day1 app2 some_revenue

dayN appM some_revenue
I have gotten inaccurate results doing some aggregations and limiting them.
For instance
select sum(revenue) from index where timestamp between start and end group by app_id order by sum(revenue) desc limit 25;
Resulting set count is 25 due to limit , but should I cover all possible distinct app ids in MAX_MATCHES number?
The unique number of apps could be too large, what advice will you give me?