Facet by JSON column values

Hi, I have a table with the following structure

[
            'id'                => ['type' => 'bigint'],
            'article_id'        => ['type' => 'bigint'],
            'journal_id'        => ['type' => 'bigint'],

            ...

            'subjects'      => ['type' => 'json'],
            'countries'     => ['type' => 'json'],
]

countries column contains JSONs consisting of countries ids (country_id >= 1 and <= 255), like this

[1,3,234]
[2,3]
[2,3,4,67,55,136]
[4]
...

When I use facet on countries field I get the result looking like the following

[
            0 => [
                        'key' => [1,3,234],
                        'doc_count' => 1
            ],
            1 => [
                        'key' => [2,3],
                        'doc_count' => 345
            ],
            2 => [
                        'key' => [4],
                        'doc_count' => 45
            ],
            ...
]

Is it possible within Manticore to group result by countries JSON values not JSON strings? So I want each key to be equal to country_id and doc_count to be equal to the number of documents where countries JSON contains country_id, like this

[
            0 => [
                        'key' => 1, // country_id
                        'doc_count' => 15
            ],
            1 => [
                        'key' => 3, // country_id
                        'doc_count' => 33
            ],
            2 => [
                        'key' => 4, // country_id
                        'doc_count' => 445
            ],
            ...
]

I think I’ve found the solution

Noticed the example with categories in docs here

then found multi data type

so I think I need to change countries type from json to multi.

1 Like

Hi. Grouping by array elements of a json attribute via FACET works ok for me:

mysql> drop table if exists t; create table t(j json); insert into t values(1,'{"a": [1,2,3]}'),(2,'{"a": [2,3,4]}'); select * from t facet j.a;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table t(j json)
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
insert into t values(1,'{"a": [1,2,3]}'),(2,'{"a": [2,3,4]}')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select * from t facet j.a
--------------

+------+---------------+
| id   | j             |
+------+---------------+
|    1 | {"a":[1,2,3]} |
|    2 | {"a":[2,3,4]} |
+------+---------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

+------+----------+
| j.a  | count(*) |
+------+----------+
| 3    |        2 |
| 2    |        2 |
| 1    |        1 |
| 4    |        1 |
+------+----------+
4 rows in set (0.00 sec)
--- 4 out of 4 results in 0ms ---

mysql> SHOW VERSION;

+-----------+----------------------------------+
| Component | Version                          |
+-----------+----------------------------------+
| Daemon    | 13.6.7 1ab9ccc95@25080704        |
| Columnar  | columnar 4.1.1 25f4706@25022806  |
| Secondary | secondary 4.1.1 25f4706@25022806 |
| Buddy     | buddy v3.34.2+25080617-ff540bfc  |
+-----------+----------------------------------+
4 rows in set (0.02 sec)

I create table with PHP client for Manticore Search v 4.1.

then running this query where subjects is json field

select * from `articles_test` WHERE MATCH('@name ion') FACET subjects LIMIT 50;

the result

+------------------------------+----------+
| subjects                     | count(*) |
+------------------------------+----------+
| NULL                         |     2236 |
| [50]                         |       48 |
| [33,38]                      |       28 |
| [33,38]                      |       58 |
| [3,6,16,64]                  |       19 |
| [3,6,16,64]                  |        8 |
| [3,6,16,64]                  |       70 |
| [3,6,16,64]                  |       21 |
| [4]                          |      148 |
| [1,8,10,15,16]               |       41 |
| [1,8,10,15,16]               |       26 |
| [1,8,10,15,16]               |       24 |
| [1,8,10,15,16]               |       64 |
| [4,50,107]                   |       25 |
| [16,53,83,107]               |       25 |
| [16,53,83,107]               |       19 |
| [3,4,43,49,63]               |        9 |
| [3,4,43,49,63]               |       43 |
| [3,4,43,49,63]               |       14 |
| [4,7,10,45]                  |       10 |
| [4,7,10,45]                  |       14 |
| [4,13,43]                    |        5 |
| [83,100]                     |        3 |
| [59,94,95]                   |        9 |
| [59,94,95]                   |        4 |
| [59,94,95]                   |        2 |
| [18,38,49,50,63]             |       13 |
| [1,2,60,72]                  |        5 |
| [1,2,60,72]                  |        6 |
| [1,2,60,72]                  |       13 |
| [49,51,56,225]               |       14 |
| [49,51,56,225]               |        2 |
| [49,51,56,225]               |        1 |
| [37]                         |       35 |
| [1,5,6,15,16,49]             |        5 |
| [7,8,15,16,17,91]            |        1 |
| [7,8,15,16,17,91]            |        3 |
| [111,212]                    |        2 |
| [111,212]                    |        2 |
| [8,9,10]                     |        5 |
| [4,7,17,55]                  |        1 |
| [18,37,96,97,98,99,100]      |        2 |
| [18,37,96,97,98,99,100]      |        3 |
| [18,37,96,97,98,99,100]      |        1 |
| [18,37,96,97,98,99,100]      |        3 |
| [10,18,37,38,80]             |       10 |
| [8,10,20,37]                 |        8 |
| [20,27,37,50]                |        3 |
| [37,80,96,97,98,119,141,169] |        2 |
| [37,80,96,97,98,119,141,169] |        1 |
+------------------------------+----------+
50 rows in set (0.01 sec)
--- 50 out of 90 results in 1ms ---

and this result if subjects field type is changed to multi

+----------+----------+
| subjects | count(*) |
+----------+----------+
|       50 |       48 |
|       38 |       28 |
|       33 |       58 |
|        6 |       19 |
|       64 |        8 |
|        3 |       21 |
|       16 |       70 |
|        4 |      148 |
|       10 |       26 |
|       15 |       41 |
|        8 |       24 |
|        1 |       64 |
|      107 |       25 |
|       83 |       19 |
|       53 |       25 |
|       63 |        9 |
|       49 |       43 |
|       43 |       14 |
|       45 |       10 |
|        7 |       14 |
|       13 |        5 |
|      100 |        3 |
|       95 |        2 |
|       94 |        4 |
|       59 |        9 |
|       18 |       13 |
|       72 |        5 |
|       60 |        6 |
|        2 |       13 |
|       51 |       14 |
|       56 |        2 |
|      225 |        1 |
|       37 |       35 |
|        5 |        5 |
|       91 |        1 |
|       17 |        3 |
|      212 |        2 |
|      111 |        2 |
|        9 |        5 |
|       55 |        1 |
|       96 |        2 |
|       98 |        3 |
|       97 |        3 |
|       99 |        1 |
|       80 |       10 |
|       20 |        8 |
|       27 |        3 |
|      119 |        2 |
|      141 |        1 |
|      169 |        1 |
+----------+----------+
50 rows in set (0.01 sec)
--- 50 out of 89 results in 1ms ---

If you have an array right in the root of your json attributes, I’m afraid you have to use the groupby() function which is not available after FACET, but you can do group by:

mysql> drop table if exists t; create table t(j json); insert into t values(1,'[1,2,3]'),(2,'[2,3,4]'); select groupby(), count(*) from t group by j;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t(j json)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t values(1,'[1,2,3]'),(2,'[2,3,4]')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select groupby(), count(*) from t group by j
--------------

+-----------+----------+
| groupby() | count(*) |
+-----------+----------+
| 3         |        2 |
| 2         |        2 |
| 1         |        1 |
| 4         |        1 |
+-----------+----------+
4 rows in set (0.00 sec)
--- 4 out of 4 results in 0ms ---

As I understand multi type is better than json type for arrays of ids. Also I need multiple facets in the query so grouping by just a single column is not enough and I see that gouping by multiple columns is not possible for both types.

mysql> select groupBy(), count(*) from `articles_test` group by subjects, countries limit 100;
ERROR 1064 (42000): table articles_test: MVA values can't be used in multiple group-by
mysql> select groupBy(), count(*) from `articles_test` group by subjects, countries limit 100;
ERROR 1064 (42000): table articles_test: JSON blob can't be used in multiple group-by

And one more question

I have a query like this
mysql> select * from articles_test WHERE MATCH('@name ion') AND any(subjects) IN (2,6,5) FACET subjects ORDER BY FACET() ASC LIMIT 10;
which returns facet results for subjects

+----------+----------+
| subjects | count(*) |
+----------+----------+
|        1 |       19 |
|        2 |       13 |
|        3 |       15 |
|        4 |        2 |
|        5 |        5 |
+----------+----------+
5 rows in set (0.01 sec)
--- 5 out of 25 results in 1ms ---

subject with id=6 is not included in the result. Is it possible to order facet results the way that filtered ids go first and the rest go after them? Basically I need to limit facet results and see 2,6,5 subjects in the result as well.

Like this

+----------+----------+
| subjects | count(*) |
+----------+----------+
|        1 |       19 |
|        2 |       13 |
|        3 |       15 |
|        6 |       19 |
|        5 |        5 |
+----------+----------+
5 rows in set (0.01 sec)
--- 5 out of 25 results in 1ms ---