I could reproduce the issue. It has smth to do with the # of disk chunks. The simplified case is:
mysql> create table t(f text);
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t values(0,'abc');
Query OK, 1 row affected (0.00 sec)
mysql> flush ramchunk t;
Query OK, 0 rows affected (0.01 sec)
mysql> call keywords('abc', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | abc | abc | 1 | 1 |
+------+-----------+------------+------+------+
1 row in set (0.00 sec)
mysql> call keywords('abc abc', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 2 | abc | abc | 2 | 2 |
+------+-----------+------------+------+------+
1 row in set (0.00 sec)
mysql> insert into t values(0,'abc');
Query OK, 1 row affected (0.00 sec)
mysql> flush ramchunk t;
Query OK, 0 rows affected (0.01 sec)
mysql> call keywords('abc', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | abc | abc | 2 | 2 |
+------+-----------+------------+------+------+
1 row in set (0.00 sec)
mysql> call keywords('abc abc', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 2 | abc | abc | 4 | 4 |
+------+-----------+------------+------+------+
1 row in set (0.00 sec)
i.e. the stats depend on the number of times you specify a keyword and the number of disk chunks as well.
And the effect remains even after merging to a single chunk:
mysql> select * from t.status;
+------+----------+--------------------------------+-------------------+---------------+-----------+------------+-------------+--------------------+----------------------+-----------------------------+----------------------+-----------------------------+------------------+
| id | chunk_id | base_name | indexed_documents | indexed_bytes | ram_bytes | disk_bytes | disk_mapped | disk_mapped_cached | disk_mapped_doclists | disk_mapped_cached_doclists | disk_mapped_hitlists | disk_mapped_cached_hitlists | killed_documents |
+------+----------+--------------------------------+-------------------+---------------+-----------+------------+-------------+--------------------+----------------------+-----------------------------+----------------------+-----------------------------+------------------+
| 2 | 1 | /usr/local/var/manticore/t/t.1 | 1 | 6 | 8296 | 541 | 85 | 4096 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | /usr/local/var/manticore/t/t.0 | 1 | 3 | 8296 | 541 | 85 | 4096 | 0 | 0 | 0 | 0 | 0 |
+------+----------+--------------------------------+-------------------+---------------+-----------+------------+-------------+--------------------+----------------------+-----------------------------+----------------------+-----------------------------+------------------+
2 rows in set (0.00 sec)
mysql> optimize index t option cutoff=1, sync=1;
Query OK, 0 rows affected (0.01 sec)
mysql> select * from t.status;
+------+----------+--------------------------------+-------------------+---------------+-----------+------------+-------------+--------------------+----------------------+-----------------------------+----------------------+-----------------------------+------------------+
| id | chunk_id | base_name | indexed_documents | indexed_bytes | ram_bytes | disk_bytes | disk_mapped | disk_mapped_cached | disk_mapped_doclists | disk_mapped_cached_doclists | disk_mapped_hitlists | disk_mapped_cached_hitlists | killed_documents |
+------+----------+--------------------------------+-------------------+---------------+-----------+------------+-------------+--------------------+----------------------+-----------------------------+----------------------+-----------------------------+------------------+
| 1 | 2 | /usr/local/var/manticore/t/t.2 | 2 | 9 | 20584 | 579 | 93 | 16384 | 0 | 0 | 0 | 0 | 0 |
+------+----------+--------------------------------+-------------------+---------------+-----------+------------+-------------+--------------------+----------------------+-----------------------------+----------------------+-----------------------------+------------------+
1 row in set (0.00 sec)
mysql> call keywords('abc', 't', 1 as stats); call keywords('abc abc', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | abc | abc | 2 | 2 |
+------+-----------+------------+------+------+
1 row in set (0.00 sec)
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 2 | abc | abc | 4 | 4 |
+------+-----------+------------+------+------+
1 row in set (0.00 sec)
Stats in show meta
are correct before and after the OPTIMIZE
:
mysql> select * from t where match('abc'); show meta;
+---------------------+------+
| id | f |
+---------------------+------+
| 1514356453580734545 | abc |
| 1514356453580734546 | abc |
+---------------------+------+
2 rows in set (0.00 sec)
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 2 |
| total_found | 2 |
| time | 0.000 |
| keyword[0] | abc |
| docs[0] | 2 |
| hits[0] | 2 |
+---------------+-------+
6 rows in set (0.00 sec)
Created an issue about it - CALL KEYWORDS vs RT index gives wrong stats · Issue #593 · manticoresoftware/manticoresearch · GitHub
Thanks for pointing this out, @bileslaw !