rt_mem_limit and search speed

Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822)

Hi, Im trying to speed up some search queries. Now some of them take 25 seconds on a real-time index with 117 million documents.

I did read this Manticore Search Manual: Creating a table > Local tables > Plain and real-time table settings

Im thinking to increase rm_mem_limit to 24GB, so the entire data can fit into RAM. 3 questions about that:

  1. Will that help with the full-text query speed? The linked doc says “The RAM chunk may be slightly slower than a disk chunk”
  2. This real time index gets many INSERT, UPDATE, and DELETE. If i raise rm_mem_limit, will it cause slowness when manticore is writing disk chunks?
  3. Do i change rm_mem_limit value in manticore.conf or by running ALTER TABLE?

Thank you

Hi, Im trying to speed up some search queries

Which one? Please show the query, the output and SHOW META after it. What else is important is the number of CPU cores on the server.

so the entire data can fit into RAM

You can just mlock everything instead - Manticore Search Manual: Creating a table > Local tables > Plain and real-time table settings

Will that help with the full-text query speed? The linked doc says “The RAM chunk may be slightly slower than a disk chunk”

It’s unlikely it will help. We recently got this issue which may be related to yours - Slow FT search in a plain table / single-chunk RT with high-frequency words with or w/o PS · Issue #1767 · manticoresoftware/manticoresearch · GitHub

This real time index gets many INSERT, UPDATE, and DELETE. If i raise rm_mem_limit, will it cause slowness when manticore is writing disk chunks?

It may cause slowness before flushing the ram chunk to disk since the larger the RAM chunk is the longer it takes to add a new document to it. I would say 24GB is too large RAM chunk. Flushing to disk will also become slower of course.

Do i change rm_mem_limit value in manticore.conf or by running ALTER TABLE?

Are you running Manticore in RT mode or Plain mode? Manticore Search Manual: Read this first

rt mode: alter table
plain mode: change in the config and then execute alter table reconfigure

Thank you.
It’s an RT index, Manticore is in RT mode (indexes created with CREATE TABLE).
The server has 4 CPU cores and 22mb RAM (both can be increased if needed).


This is fast

MySQL [(none)]> SELECT id FROM la8_items_all WHERE MATCH('dog') ORDER BY weight() desc LIMIT 78;

78 rows in set (0.021 sec)

MySQL [(none)]> show meta;
+----------------+--------+
| Variable_name  | Value  |
+----------------+--------+
| total          | 78     |
| total_found    | 230310 |
| total_relation | eq     |
| time           | 0.033  |
| keyword[0]     | dog    |
| docs[0]        | 230534 |
| hits[0]        | 356275 |
+----------------+--------+

This is slow

MySQL [(none)]> SELECT id FROM la8_items_all WHERE MATCH('dog') AND (section != 3) AND (pushout_time < 1706261914) AND (sold_hammer_usd > 0) ORDER BY weight() desc LIMIT 78;

78 rows in set (9.769 sec)

MySQL [(none)]> show meta;
+----------------+--------+
| Variable_name  | Value  |
+----------------+--------+
| total          | 78     |
| total_found    | 165333 |
| total_relation | eq     |
| time           | 9.768  |
| keyword[0]     | dog    |
| docs[0]        | 230520 |
| hits[0]        | 356256 |
+----------------+--------+

So it seems like filtering the attributes makes it slow. Any advice for that?

We must filter on the attributes for most queries. Perhaps the columnar engine can help here?

When you repeat the same query, is it still slow?

Same query second time is fast.

MySQL [(none)]> SELECT id FROM la8_items_all WHERE MATCH('dog') AND (section != 3) AND (pushout_time < 1706261914) AND (sold_hammer_usd > 0) ORDER BY weight() desc LIMIT 78;
78 rows in set (19.519 sec)

MySQL [(none)]> show meta;
+----------------+--------+
| Variable_name  | Value  |
+----------------+--------+
| total          | 78     |
| total_found    | 165333 |
| total_relation | eq     |
| time           | 19.483 |
| keyword[0]     | dog    |
| docs[0]        | 230534 |
| hits[0]        | 356275 |
+----------------+--------+

MySQL [(none)]> SELECT id FROM la8_items_all WHERE MATCH('dog') AND (section != 3) AND (pushout_time < 1706261914) AND (sold_hammer_usd > 0) ORDER BY weight() desc LIMIT 78;
78 rows in set (0.055 sec)

MySQL [(none)]> show meta;
+----------------+--------+
| Variable_name  | Value  |
+----------------+--------+
| total          | 78     |
| total_found    | 165333 |
| total_relation | eq     |
| time           | 0.052  |
| keyword[0]     | dog    |
| docs[0]        | 230534 |
| hits[0]        | 356275 |
+----------------+--------+


cache is off

MySQL [(none)]> SHOW STATUS LIKE '%cache%';
+-----------------------+-------+
| Counter               | Value |
+-----------------------+-------+
| qcache_max_bytes      | 0     |
| qcache_thresh_msec    | 3000  |
| qcache_ttl_sec        | 1     |
| qcache_cached_queries | 0     |
| qcache_used_bytes     | 0     |
| qcache_hits           | 0     |
+-----------------------+-------+

Then it may have be caused by slow I/O. I would experiment with access_plain_attrs=mlock , access_blob_attrs=mlock (Manticore Search Manual: Creating a table > Local tables > Plain and real-time table settings)

Thank you. I did read that and checked disk I/O, it’s kind of high. I will play with those settings and post the experiments here. Could you tell me if my first idea makes sense:

I raise the RAM high to make sure everything fits. Then…

preopen=1 # (i only have 2 tables)
access_plain_attrs=mlock
access_blob_attrss=mlock
access_doclistss=mlock
access_hitlistss=mlock

# (i keep row-wise storage)

All these go into the .conf file and needs a searchd restart to apply. Correct?

All these go into the .conf file and needs a searchd restart to apply. Correct?

Yes. Or per-table like create table ... access_plain_attrs='mlock'

experiment with access_plain_attrs=mlock , access_blob_attrs=mlock

Yepp, nice. Much faster since 1 day already! Thank you!

Since we are getting somewhere, do you have any tips for this real life query (RT index, 66million small docs):

SELECT id FROM `la8_items_all` 
WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) 
ORDER BY weight() desc 
LIMIT 78 
OPTION boolean_simplify=1, max_matches=4500, ranker=sph04, cutoff=300000 
FACET country_code limit 250 
FACET house_match_str limit 4500 
FACET struct_category limit 100;

78 rows in set (2.752 sec)

MySQL [(none)]> show meta;
+----------------+---------+
| Variable_name  | Value   |
+----------------+---------+
| total          | 78      |
| total_found    | 300000  |
| total_relation | eq      |
| time           | 3.435   |
| multiplier     | 4       |
| keyword[0]     | black   |
| docs[0]        | 2670074 |
| hits[0]        | 4189288 |
+----------------+---------+

Please show output of:

show table la8_items_all status;
show variables like '%se%';
MySQL [(none)]> show table la8_items_all status;
+-----------------------------+------------------------------------------------------------------------------------------------------------+
| Variable_name               | Value                                                                                                      |
+-----------------------------+------------------------------------------------------------------------------------------------------------+
| index_type                  | rt                                                                                                         |
| indexed_documents           | 66963996                                                                                                   |
| indexed_bytes               | 23632137926                                                                                                |
| ram_bytes                   | 25035098931                                                                                                |
| disk_bytes                  | 31416308927                                                                                                |
| disk_mapped                 | 25007919833                                                                                                |
| disk_mapped_cached          | 25008082944                                                                                                |
| disk_mapped_doclists        | 10470030848                                                                                                |
| disk_mapped_cached_doclists | 10470051840                                                                                                |
| disk_mapped_hitlists        | 2183885527                                                                                                 |
| disk_mapped_cached_hitlists | 2183909376                                                                                                 |
| killed_documents            | 804461                                                                                                     |
| killed_rate                 | 1.18%                                                                                                      |
| ram_chunk                   | 26962955                                                                                                   |
| ram_chunk_segments_count    | 26                                                                                                         |
| disk_chunks                 | 12                                                                                                         |
| mem_limit                   | 134217728                                                                                                  |
| mem_limit_rate              | 95.00%                                                                                                     |
| ram_bytes_retired           | 0                                                                                                          |
| tid                         | 8631951                                                                                                    |
| tid_saved                   | 8569289                                                                                                    |
| query_time_1min             | {"queries":5, "avg_sec":0.001, "min_sec":0.001, "max_sec":0.001, "pct95_sec":0.001, "pct99_sec":0.001}     |
| query_time_5min             | {"queries":40, "avg_sec":0.005, "min_sec":0.000, "max_sec":0.026, "pct95_sec":0.009, "pct99_sec":0.026}    |
| query_time_15min            | {"queries":195, "avg_sec":0.004, "min_sec":0.000, "max_sec":0.090, "pct95_sec":0.003, "pct99_sec":0.026}   |
| query_time_total            | {"queries":33425, "avg_sec":0.011, "min_sec":0.000, "max_sec":5.631, "pct95_sec":0.008, "pct99_sec":0.137} |
| found_rows_1min             | {"queries":5, "avg":26682, "min":576, "max":105938, "pct95":105938, "pct99":105938}                        |
| found_rows_5min             | {"queries":40, "avg":7164, "min":2, "max":105938, "pct95":72160, "pct99":105938}                           |
| found_rows_15min            | {"queries":195, "avg":3157, "min":0, "max":105938, "pct95":12489, "pct99":76792}                           |
| found_rows_total            | {"queries":33425, "avg":52807, "min":0, "max":65860799, "pct95":8587, "pct99":59007}                       |
+-----------------------------+------------------------------------------------------------------------------------------------------------+
MySQL [(none)]> show variables like '%se%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| session_read_only        | 0     |
| character_set_client     | utf8  |
| character_set_connection | utf8  |
| last_insert_id           |       |
| pseudo_sharding          | 1     |
| secondary_indexes        | 1     |
+--------------------------+-------+

Try simplifying the query bit by bit, e.g.:

SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc LIMIT 78 OPTION boolean_simplify=1, max_matches=4500, ranker=sph04, cutoff=300000 FACET country_code limit 250 FACET house_match_str limit 4500 FACET struct_category limit 100;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc LIMIT 78 OPTION boolean_simplify=1, max_matches=4500, ranker=sph04, cutoff=300000 FACET country_code limit 250 FACET house_match_str limit 4500;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc LIMIT 78 OPTION boolean_simplify=1, max_matches=4500, ranker=sph04, cutoff=300000 FACET country_code limit 250;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc LIMIT 78 OPTION boolean_simplify=1, max_matches=4500, ranker=sph04, cutoff=300000;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc LIMIT 78 OPTION boolean_simplify=1, max_matches=4500, ranker=sph04;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc LIMIT 78 OPTION boolean_simplify=1, max_matches=4500;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc LIMIT 78 OPTION boolean_simplify=1;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc LIMIT 78;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812) ORDER BY weight() desc;
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3) AND (pushout_time < 1706686812);
SELECT id FROM `la8_items_all` WHERE MATCH('black') AND (section != 3);
SELECT id FROM `la8_items_all` WHERE MATCH('black');

and running each query a few times to get a stable average result. This way you can find out which part contributes into the high response time most.