COUNT(*) and duplicates over distributed


#1

I have a main and a delta index and I put them into a distributed index.
If I do SELECT COUNT(*) FROM all WHERE category=5; I get count(*) as 8, but it should return me 7 as one record is present in both main and delta. Why is not showing me 7 ?


#2

Hi,
Are those local indexes? Do you have a killist set for the delta?
Doing COUNT(*) just send back to the main result only the counts, so it doesn’t have the doc ids and doesn’t know to filter the dups. If your indexes are local and you use kill-lists, this is not a problem as the kill-list will suppress the dups in the main index.
If you don’t have kill-lists , you can overcome this by doing a normal select ( SELECT * ) and use the total_found from SHOW META (in contrast to normal DBs, the count of found records doesn’t need a separate query, as this information is computed anyway). What to keep in mind in this case is that each index sends to main result a number of max_matches doc ids, which default is 1000 (can be changed via OPTION clause), so you might need to increase it. But for main+delta local schema, better use kill-lists.