I have index with bilions of rows. In every row there is two very low-selectivity fields (integers) - country and city. Each country and city may be present in 20-30% of rows.
I know two approaches for storing data.
First - create interger fields and search by them as by attributes.
Second - create full-text field (geo) and encode this values there, like ‘co12 ci234’ and search using MATCH.
So question is: what approach is better for performance for selects like this:
select id, attr1, attr2 from index1 where country = 12 group by attr1 order by attr2 asc limit 100;
select id, attr1, attr2 from index1 where match(’@geo co12’) group by attr1 order by attr2 asc limit 100;
I’m not sure what is better - full index scan or full-text match 30% rows and then search them in index.
Having keys as indexed text is in general much faster (even by 2 orders of magnitude) than attribute filters, even for low selectivity. If you have in the MATCH only key filterings, add OPTION ranker=none as there’s no point for the query to calculate ranking,
Maybe i understand something wrong, but look at my results:
mysql> select count(*) from chara_addr_v2;
+----------+
| count(*) |
+----------+
| 6899519 |
+----------+
mysql> select count(*) from chara_addr_v2 where match('co2');
+----------+
| count(*) |
+----------+
| 5747887 |
+----------+
mysql> select count(*) from chara_addr_v2 where country = 2;
+----------+
| count(*) |
+----------+
| 5747887 |
+----------+
And here is items from query log
/* Wed Aug 19 20:08:17.369 2020 conn 9 real 0.322 wall 0.322 found 5747887 */ SELECT id FROM chara_addr_v2 WHERE country=2;
/* Wed Aug 19 20:08:27.791 2020 conn 9 real 1.575 wall 1.575 found 5747887 */ SELECT id FROM chara_addr_v2 WHERE MATCH('co2');
Conditions match('co2') and country = 2 matches the same rows