Which are the best practices in Manticore for optimizing search queries?

kieranas · September 17, 2024, 11:22am

Hi everyone,

I am working on a project that involves handling large datasets & i have started using Manticore Search to manage the search functionalities.its been great but as my dataset grows I am noticing some performance slowdowns with more complex queries.

I want to know what the best practices are for optimizing search queries in Manticore, especially when working with millions of records. There are any indexing techniques or query structures that you have found to significantly improve performance? how does Manticore handle complex filtering conditions like multiple WHERE clauses or nested searches? Should I be using full text search features more, or is there another approach I am missing?

As well, I found these resources when doing research on this; https://forum.manticoresearch.com/t/manticore-client-manticoresearch-java servicenow-tutorial & if anyone have any resources, tutorials or personal experiences please share with me, It would be greatly appreciated!!

Thank you…

barryhunter · September 17, 2024, 1:25pm

Lots of different things that can be done, depending on your data and/or queries.

A big once is typically use the ‘full text query’ as much as possible for filtering (ie inside MATCH())
… as the full text query uses real ‘inverted index’.
… by default normal ‘attribute’ filter queries are unindexed (analogous to ‘full table scan’)

So a query like

SELECT * FROM table WHERE MATCH(‘keyword’) AND category = ‘one’

First findes all documents, matching keyword (reasonably efficiently) but then has to check every document to see to if the attribute matches.
In theory a query like

SELECT * FROM table WHERE MATCH(‘keyword @cateory one’)
is better.

But there are edge cases that doesn’t work. (if the category is big, such that many matches, it will have to look though the biggest list)

… this can be especially good for highly selective queries though. Also includes geo queries (ie if using GEODIST function), there are often ways can be optimized using the full text query using a 2D tile structure.

Another idea is sharding. So that can run queries on just part of the index, rather than the whole lot.
… for example, if regularly run queries that only look at data from the last week. Can have a index JUST of documents from last week, and then run a queries directly to that ‘shard’, rather than the entire index.

There is also alternate storage for attributes
https://manual.manticoresearch.com/Creating_a_table/Data_types#Columnar-attribute-properties
which can provide indexing for attributes, which can accelerate some queries. Honestly dont have much experience, with that, but something to also consider.