建议manticoresearch表可以有分区

xyls82 · October 16, 2023, 3:31am

这样当表数据很大时，如果是分区表，查询表可以先根据分区字段过滤底层文件，再查询，这样对有些场景很实用

Sergey · October 17, 2023, 2:42am

In this way, when the table data is large, if it is a partitioned table, the query table can first filter the underlying files according to the partition field, and then query. This is very practical for some scenarios.

Manticore provides the functionality of secondary indexes which should be somewhat similar in terms of performance, but easier to use. If you have a specific case when the performance is poor and you believe partitioning would help, please provide more details.

xyls82 · October 18, 2023, 3:09am

manticore 配置

[root@12 ~]# cat /etc/manticoresearch/manticore.conf 
common {
    plugin_dir = /var/lib/manticore
}
indexer
{
  mem_limit = 2048M
  lemmatizer_cache = 4096M
}
searchd {
    listen = 0.0.0.0:9312
    listen = 0.0.0.0:9306:mysql
    listen = 0.0.0.0:9308:http
    listen = 10.2.1.2:9360-9370:replication
    log = /var/log/manticore/searchd.log
    query_log = /var/log/manticore/query.log
    pid_file = /run/manticore/searchd.pid
    data_dir = /data01/manticore/indexdata
    binlog_path = /data01/manticore/binlogdata
    agent_query_timeout = 200000
    mysql_version_string = 5.0.37
}

服务器配置

 free -g
              total        used        free      shared  buff/cache   available
Mem:            251          17           0           0         232         231
Swap:             0           0           0

CPU(64核)
 [root@i205z22 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64

表结构

| xx_1_202309 | CREATE TABLE_xx_1_202309 (
id bigint,
a_f0 text,
a_f1 text,
a_if2 text,
a_f3text,
a_f4 text,
a_f5 text,
a_f6 text,
a_f7 text,
a_f8 text,
a_f9 text,
a_f10 text,
a_f11 text,
a_f12text,
a_f13text,
a_14 text,
a_f15 text,
a_f16 text,
a_f17 text,
a_f18 text,
a_f19 text,
a_f20 text,
a_f21 text,
a_f22 text,
a_f23 text,
a_f24 text,
a_f25 text,
a_f26 text,
a_f27 text,
a_f28 text,
a_f29 text,
a_f30 text,
a_f31 text,
a_f32 text,
a_f33 text,
b1 bigint,
b2 bigint,
b3 bigint,
b4 bigint,
b5 timestamp,
b6 timestamp,
b7 serv_lat float,
b8 float,
b9 float,
c1 string attribute,
c2 string attribute,
c3 string attribute,
c4  string attribute,
c5  string attribute,
c6 string attribute,
c7 string attribute,
c8 string attribute,
c9 string attribute
) ngram_len='1' ngram_chars='cjk' |
+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)

表数据量

MySQL [(none)]> select count(*) from xx_202309;
+------------+
| count(*)   |
+------------+
| 3134061780 |
+------------+
1 row in set (0.02 sec)

使用时间字段统计一段时间范围数据量时间太长

MySQL [(none)]> 
MySQL [(none)]> select count(*) from xx_202309 where b5 BETWEEN 1693670400 AND 1694966400;
+------------+
| count(*)   |
+------------+
| 2015541463 |
+------------+
1 row in set (10 min 13.40 sec)

MySQL [(none)]> show meta;
+----------------+----------------------------------+
| Variable_name  | Value                            |
+----------------+----------------------------------+
| total          | 1                                |
| total_found    | 1                                |
| total_relation | eq                               |
| time           | 613.388                          |
| index          | b5:SecondaryIndex (100%) |
+----------------+----------------------------------+
5 rows in set (0.00 sec)

二级索引速度不行啊

Sergey · October 18, 2023, 11:53am

looks worrisome. Is this easily reproducible?

xyls82 · October 19, 2023, 12:37am

很容易重现，执行这种【select count(*) from xx_202309 where b5 BETWEEN A AND B; 】语句花费时间一直数分钟以上

xyls82 · October 19, 2023, 12:39am

在表数据量30多亿时，执行【select count(*) from xx_202309 where b5 BETWEEN A AND B; 】这花费时间都是数分钟以上

Sergey · October 19, 2023, 3:34pm

Can you reproduce it on a synthetic case by creating a table like create table t(b5 timestamp); adding data into it and running the select which takes a few seconds/minutes? Can you provide the instruction then (a script which does the job)?

xyls82 · October 20, 2023, 12:54am

表建仅一列吗，不符合使用场景啊，使用一般表都40~60列，或者更多列啊

xyls82 · October 20, 2023, 3:36am

还有manticore是先match检索表所有文件再根据时间过滤，实际应用场景先根据时间分区过滤，再match检索，我认为这样表数据量特别大时速度比前期manticore这是方式要快很多

xyls82 · October 20, 2023, 7:48am

还有对于很大数据量，manticore哪里有分表相关说明啊

Sergey · October 20, 2023, 7:52am

表建仅一列吗，不符合使用场景啊，使用一般表都40~60列，或者更多列啊

In your case it doesn’t, but if you reproduce the issue with just a single column, it will be easier for us to understand why it’s slow.