Hi everyone,
I’m working with Manticore Search, using a table with approximately 8,100,000 rows. The table was dynamically created using the following SQL statement:
CREATE TABLE IF NOT EXISTS de_de_products (
catalog_id multi64,
product_ids multi64,
category_ids multi64 ENGINE='columnar',
category_path_as_string STRING ENGINE='columnar',
category_names JSON ENGINE='columnar',
utin STRING,
utin_index TEXT,
long_description TEXT,
gtin TEXT,
manufacturer_id INT,
manufacturer JSON ENGINE='columnar',
manufacturer_name JSON,
manufacturer_name_index TEXT ENGINE='columnar',
suppliers JSON ENGINE='columnar',
suppliers_name JSON,
suppliers_names_index TEXT ENGINE='columnar',
prices JSON,
price FLOAT,
features JSON ENGINE='columnar',
skus JSON,
skus_index TEXT,
mpns JSON,
mpns_index TEXT,
keywords TEXT,
visibility JSON
) min_infix_len = '3' ENGINE='columnar';
I am running the following aggregation query:
SELECT count(*) as count, suppliers, suppliers_name
FROM de_de_products
GROUP BY suppliers
LIMIT 50
OPTION max_matches=50;
This query takes about 3.47 seconds to complete.
- 4 of these similar queries are executed in succession
Server Details:
-
Hardware:
- 32 GB RAM
- 8 vCPUs
-
Configuration (
searchd
):
searchd {
listen = 0.0.0.0:9312
listen = 0.0.0.0:9306:mysql
listen = 0.0.0.0:9308:http
log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
pid_file = /var/run/manticore/searchd.pid
data_dir = /var/lib/manticore
qcache_max_bytes = 2G
qcache_ttl_sec = 300
qcache_thresh_msec = 50
pseudo_sharding = 1
docstore_cache_size = 12G
secondary_indexes = 1
max_open_files = 10000
rt_merge_iops = 100
access_plain_attrs = mmap_preread
access_blob_attrs = mmap_preread
access_doclists = mmap_preread
access_hitlists = mmap_preread
preopen_tables = 1
seamless_rotate = 1
read_buffer_docs = 4M
read_buffer_hits = 4M
binlog_flush = 2
reset_network_timeout_on_packet = 1
}
Insert Example:
Here is an example of a row in the table:
INSERT INTO de_de_products (
id, utin_index, long_description, gtin, manufacturer_name_index,
suppliers_names_index, skus_index, mpns_index, keywords, catalog_id,
product_ids, category_ids, category_path_as_string, category_names,
utin, manufacturer_id, manufacturer, manufacturer_name, suppliers,
suppliers_name, prices, price, features, skus, mpns, visibility
) VALUES (
1363764563357676418, '1732444278129573260', 'k1151.2637120, passscheibe din988...',
'', 'Kipp', 'Lentz', '1562497§2QS',
'221217', '', '189', '861118772617197436',
'2164405201189976107,4076521623242442016,5764495528351540639,9157085240420398765',
'9157085240420398765,5764495528351540639,2164405201189976107,4076521623242442016',
'["Interimsklasse (nicht spezifiziert)",...]', '1732444278129573260',
4130911417, '["6517244243062863033"]', '["Kipp"]',
'["5396176442324678359"]', '["Lentz GmbH"]',
'[{"price":0.380000,...}]', 0.000000, '[]', '["1562497§2QS"]',
'["221217"]', NULL
);
Question:
The aggregation query takes about 3.47 seconds to execute. Are there any ways to optimize the performance of this query? Are there specific configuration changes, schema modifications, or hardware recommendations that could improve the execution time?