Improving Aggregation Performance in Manticore Search

Damir_Turcinovic · January 9, 2025, 8:52pm

Hi everyone,

I’m working with Manticore Search, using a table with approximately 8,100,000 rows. The table was dynamically created using the following SQL statement:

CREATE TABLE IF NOT EXISTS de_de_products (
		catalog_id multi64,
		product_ids multi64,
		category_ids multi64 ENGINE='columnar',
	    category_path_as_string STRING ENGINE='columnar',
	    category_names JSON ENGINE='columnar',
		utin STRING,
		utin_index TEXT,
		long_description TEXT,
		gtin TEXT,
		manufacturer_id INT,
		manufacturer JSON ENGINE='columnar',
		manufacturer_name JSON,
		manufacturer_name_index TEXT ENGINE='columnar',
		suppliers JSON ENGINE='columnar',
		suppliers_name JSON,
		suppliers_names_index TEXT ENGINE='columnar',
		prices JSON,
		price FLOAT,
		features JSON ENGINE='columnar',
		skus JSON,
		skus_index TEXT,
		mpns JSON,
		mpns_index TEXT,
		keywords TEXT,
		visibility JSON
	) min_infix_len = '3' ENGINE='columnar';

I am running the following aggregation query:

SELECT count(*) as count, suppliers, suppliers_name  
FROM de_de_products 
GROUP BY suppliers 
LIMIT 50 
OPTION max_matches=50;

This query takes about 3.47 seconds to complete.

4 of these similar queries are executed in succession

Server Details:

Hardware:
- 32 GB RAM
- 8 vCPUs
Configuration (searchd):

searchd {
    listen = 0.0.0.0:9312
    listen = 0.0.0.0:9306:mysql
    listen = 0.0.0.0:9308:http
    log = /var/log/manticore/searchd.log
    query_log = /var/log/manticore/query.log
    pid_file = /var/run/manticore/searchd.pid
    data_dir = /var/lib/manticore

    qcache_max_bytes = 2G
    qcache_ttl_sec = 300
    qcache_thresh_msec = 50

    pseudo_sharding = 1
    docstore_cache_size = 12G
    secondary_indexes = 1
    max_open_files = 10000
    rt_merge_iops = 100
    access_plain_attrs = mmap_preread
    access_blob_attrs = mmap_preread
    access_doclists = mmap_preread
    access_hitlists = mmap_preread
    preopen_tables = 1
    seamless_rotate = 1
    read_buffer_docs = 4M
    read_buffer_hits = 4M
    binlog_flush = 2
    reset_network_timeout_on_packet = 1
}

Insert Example:

Here is an example of a row in the table:

INSERT INTO de_de_products (
    id, utin_index, long_description, gtin, manufacturer_name_index,
    suppliers_names_index, skus_index, mpns_index, keywords, catalog_id, 
    product_ids, category_ids, category_path_as_string, category_names, 
    utin, manufacturer_id, manufacturer, manufacturer_name, suppliers, 
    suppliers_name, prices, price, features, skus, mpns, visibility
) VALUES (
    1363764563357676418, '1732444278129573260', 'k1151.2637120, passscheibe din988...', 
    '', 'Kipp', 'Lentz', '1562497§2QS', 
    '221217', '', '189', '861118772617197436', 
    '2164405201189976107,4076521623242442016,5764495528351540639,9157085240420398765', 
    '9157085240420398765,5764495528351540639,2164405201189976107,4076521623242442016', 
    '["Interimsklasse (nicht spezifiziert)",...]', '1732444278129573260', 
    4130911417, '["6517244243062863033"]', '["Kipp"]', 
    '["5396176442324678359"]', '["Lentz GmbH"]', 
    '[{"price":0.380000,...}]', 0.000000, '[]', '["1562497§2QS"]', 
    '["221217"]', NULL
);

Question:

The aggregation query takes about 3.47 seconds to execute. Are there any ways to optimize the performance of this query? Are there specific configuration changes, schema modifications, or hardware recommendations that could improve the execution time?

Sergey · January 14, 2025, 11:40am

Hello. I can’t reproduce the issue using the manticore-load tool:

Here I load 8.1 docs:

➜  manticore-load git:(main) ./manticore-load --drop --init="CREATE TABLE de_de_products ( catalog_id multi64, product_ids multi64, category_ids multi64 ENGINE='columnar', category_path_as_string STRING ENGINE='columnar', category_names JSON ENGINE='columnar', utin STRING, utin_index TEXT, long_description TEXT, gtin TEXT, manufacturer_id INT, manufacturer JSON ENGINE='columnar', manufacturer_name JSON, manufacturer_name_index TEXT ENGINE='columnar', suppliers JSON ENGINE='columnar', suppliers_name JSON, suppliers_names_index TEXT ENGINE='columnar', prices JSON, price FLOAT, features JSON ENGINE='columnar', skus JSON, skus_index TEXT, mpns JSON, mpns_index TEXT, keywords TEXT, visibility JSON ) min_infix_len = '3' ENGINE='columnar'" --load="INSERT INTO de_de_products ( id, utin_index, long_description, gtin, manufacturer_name_index, suppliers_names_index, skus_index, mpns_index, keywords, catalog_id, product_ids, category_ids, category_path_as_string, category_names, utin, manufacturer_id, manufacturer, manufacturer_name, suppliers, suppliers_name, prices, price, features, skus, mpns, visibility ) VALUES ( <increment>, '<int/10000/5396176442324678359>', '<text/5/20>', '', '<text/1/1>', '<text/1/1>', '<string/3/5>', '<int/1/1000000>', '', (<array/1/10/10000/5396176442324678359>), (<array/1/10/10000/5396176442324678359>), (<array/2/10/100/1732444278129573260>), '<array/2/10/100/1732444278129573260>', '[\"<string/1/3>\",\"<string/1/3>\",\"<string/1/3>\"]', '<int/1/1732444278129573260>', <int/100000/4130911417>, '[\"<int/10000/5396176442324678359>\"]', '[\"<text/1/1>\"]', '[\"<int/10000/5396176442324678359>\"]', '[\"<text/1/5>\"]', '[{\"price\":<float/0/10>,\"smth\": <int/1/100>}]', <float/0/1>, '[]', '[\"<string/3/5>\"]', '[\"<int/1/100000>\"]', 'NULL' )" --total=8100000 --batch-size=10000

Total time:       20:36
Total queries:    810
Threads:          1
Batch size:       10000
Total docs:       8100000
Docs per sec avg: 6551
QPS avg:          1
QPS 1p:           0
QPS 5p:           0
QPS 95p:          1
QPS 99p:          1
Latency avg:      1526.6 ms
Latency 50p:      1250.0 ms
Latency 95p:      3350.0 ms
Latency 99p:      6750.0 ms

We have 8.1M docs:

mysql> select count(*) from de_de_products;
+----------+
| count(*) |
+----------+
|  8100000 |
+----------+
1 row in set (0.00 sec)
--- 1 out of >=1 results in 1ms ---

Your select query takes 0.13 sec on my 2021 Mac M1:

mysql> SELECT count(*) as count, suppliers, suppliers_name   FROM de_de_products  GROUP BY suppliers  LIMIT 50  OPTION max_matches=50; show meta;
+-------+-------------------------+------------------------------------------------------+
| count | suppliers               | suppliers_name                                       |
+-------+-------------------------+------------------------------------------------------+
|     1 | ["2428905477448567614"] | ["Then on."]                                         |
|     1 | ["971020496481922728"]  | ["Lead."]                                            |
|     1 | ["4622839562198879783"] | ["Happy legally dead unfortunately wild."]           |
|     1 | ["3644504723885142687"] | ["Left rich."]                                       |
|     1 | ["3723953048948366249"] | ["Wise! above would! ask? wrong."]                   |
|     1 | ["5175383418649123643"] | ["Nightly dead fortunately at true."]                |
|     1 | ["1572185128640905277"] | ["Privately; quietly beautiful luckily."]            |
|     1 | ["5120473566729286789"] | ["Then made tomorrow."]                              |
|     1 | ["2230739602295289079"] | ["Illegally."]                                       |
|     1 | ["394194246739087335"]  | ["Call send daily. Wet rough."]                      |
|     1 | ["405366180189362037"]  | ["5 incorrectly."]                                   |
|     1 | ["4180156883340385428"] | ["Negatively."]                                      |
|     1 | ["3137821731108961528"] | ["Feel."]                                            |
|     1 | ["1289743706831663277"] | ["Spend. Grew drove. Spiritually."]                  |
|     1 | ["1777353990801323287"] | ["Short rarely."]                                    |
|     1 | ["4103994460892812681"] | ["Constantly."]                                      |
|     1 | ["4617502210890529977"] | ["Hard constantly nowhere emotionally possibly."]    |
|     1 | ["409910344023574924"]  | ["Fell; evening."]                                   |
|     1 | ["3209983956429334609"] | ["Sent fresh."]                                      |
|     1 | ["4608034587747630189"] | ["Directly."]                                        |
|     1 | ["3766322725282614922"] | ["Educationally."]                                   |
|     1 | ["1963446004699138045"] | ["Emotionally build I."]                             |
|     1 | ["5333635331076660756"] | ["Angry is alive."]                                  |
|     1 | ["834609691434315019"]  | ["But call! fake."]                                  |
|     1 | ["5066544297780907100"] | ["Universally he by artistically."]                  |
|     1 | ["4773319360642298684"] | ["Have, surely fall."]                               |
|     1 | ["3381808637046510930"] | ["This."]                                            |
|     1 | ["5122048419433393287"] | ["Good rich."]                                       |
|     1 | ["3840294575206091401"] | ["Could I met physically."]                          |
|     1 | ["4310916571834305863"] | ["Complexly stand left! thin heavy."]                |
|     1 | ["415667635745301498"]  | ["Heavy."]                                           |
|     1 | ["906344884850113509"]  | ["Shallow."]                                         |
|     1 | ["1487008207870449887"] | ["Today full buy brave."]                            |
|     1 | ["2982327987446078206"] | ["Technically."]                                     |
|     1 | ["2023878162418152214"] | ["Negatively physically, take gave call."]           |
|     1 | ["4997963855180149794"] | ["Began surprisingly 20."]                           |
|     1 | ["1994660488467787522"] | ["Short? it indirectly. Ran."]                       |
|     1 | ["953221909772035939"]  | ["Economically perhaps calm."]                       |
|     1 | ["2741357837206877881"] | ["Locally soft."]                                    |
|     1 | ["4360332700701170269"] | ["50? sell short gradually."]                        |
|     1 | ["3666221307694791100"] | ["Go."]                                              |
|     1 | ["2599343603411391109"] | ["Individually tomorrow put north; today."]          |
|     1 | ["3286226312399426043"] | ["Kind."]                                            |
|     1 | ["5214975759906074802"] | ["Surprisingly, technically yesterday left softly."] |
|     1 | ["3280686534521398283"] | ["Happy hardly."]                                    |
|     1 | ["819369454301822842"]  | ["Dull 20 brought cold to."]                         |
|     1 | ["3767196749106140913"] | ["Buy perhaps shallow cruel; suddenly."]             |
|     1 | ["2620120707903854819"] | ["Proud low hard."]                                  |
|     1 | ["2262850687493580244"] | ["Drove wrote. Need right."]                         |
|     1 | ["949012978573414343"]  | ["Called run."]                                      |
+-------+-------------------------+------------------------------------------------------+
50 rows in set (0.13 sec)
--- 50 out of 890450 results in 137ms ---

+----------------+--------+
| Variable_name  | Value  |
+----------------+--------+
| total          | 50     |
| total_found    | 890450 |
| total_relation | eq     |
| time           | 0.137  |
+----------------+--------+
4 rows in set (0.01 sec)

Perhaps my dataset doesn’t reflect yours (e.g. perhaps there should be more elements in the array). Can you improve this examples, so it starts reproducing the issue?