Tool/API for Manticore for analyzing sample documents/queries and how they are indexed

Is there a tool/API for Manticore similar to ElasticSearch’s analyzer (see here). I am looking for something that helps me, given a sample document or query, see how it was tokenized, stemmed, etc. This will help me do a comparative study between the Manticore and ElasticSearch for some differences in retrieval benchmarks observed.

What I’ve tried: I saw the indextool utility that has the option of --dumpdict INDEXNAME. But I ran into an error and the documentation does not mention exactly what is returned by the command.

Yes, there’s CALL KEYWORDS. You can make an empty index with your tokenization settings and then use CALL KEYWORDS to see how it would tokenize your text:

mysql> drop table if exists t; create table t(f text); call keywords('running slowly', 't');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t(f text)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
call keywords('running slowly', 't')
--------------

+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | running   | running    |
| 2    | slowly    | slowly     |
+------+-----------+------------+
2 rows in set (0.00 sec)
mysql> drop table if exists t; create table t(f text) morphology='lemmatize_en_all'; call keywords('running slowly', 't');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.03 sec)

--------------
create table t(f text) morphology='lemmatize_en_all'
--------------

Query OK, 0 rows affected (0.03 sec)

--------------
call keywords('running slowly', 't')
--------------

+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | running   | run        |
| 1    | running   | running    |
| 2    | slowly    | slowly     |
+------+-----------+------------+
3 rows in set (0.00 sec)

More info here Manticore Search Manual: Searching > Autocomplete

1 Like