建议manticoresearch表可以有分区

这样当表数据很大时,如果是分区表,查询表可以先根据分区字段过滤底层文件,再查询,这样对有些场景很实用

In this way, when the table data is large, if it is a partitioned table, the query table can first filter the underlying files according to the partition field, and then query. This is very practical for some scenarios.

Manticore provides the functionality of secondary indexes which should be somewhat similar in terms of performance, but easier to use. If you have a specific case when the performance is poor and you believe partitioning would help, please provide more details.

manticore 配置

[root@12 ~]# cat /etc/manticoresearch/manticore.conf 
common {
    plugin_dir = /var/lib/manticore
}
indexer
{
  mem_limit = 2048M
  lemmatizer_cache = 4096M
}
searchd {
    listen = 0.0.0.0:9312
    listen = 0.0.0.0:9306:mysql
    listen = 0.0.0.0:9308:http
    listen = 10.2.1.2:9360-9370:replication
    log = /var/log/manticore/searchd.log
    query_log = /var/log/manticore/query.log
    pid_file = /run/manticore/searchd.pid
    data_dir = /data01/manticore/indexdata
    binlog_path = /data01/manticore/binlogdata
    agent_query_timeout = 200000
    mysql_version_string = 5.0.37
}

服务器配置

 free -g
              total        used        free      shared  buff/cache   available
Mem:            251          17           0           0         232         231
Swap:             0           0           0

CPU(64核)
 [root@i205z22 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64



表结构

| xx_1_202309 | CREATE TABLE_xx_1_202309 (
id bigint,
a_f0 text,
a_f1 text,
a_if2 text,
a_f3text,
a_f4 text,
a_f5 text,
a_f6 text,
a_f7 text,
a_f8 text,
a_f9 text,
a_f10 text,
a_f11 text,
a_f12text,
a_f13text,
a_14 text,
a_f15 text,
a_f16 text,
a_f17 text,
a_f18 text,
a_f19 text,
a_f20 text,
a_f21 text,
a_f22 text,
a_f23 text,
a_f24 text,
a_f25 text,
a_f26 text,
a_f27 text,
a_f28 text,
a_f29 text,
a_f30 text,
a_f31 text,
a_f32 text,
a_f33 text,
b1 bigint,
b2 bigint,
b3 bigint,
b4 bigint,
b5 timestamp,
b6 timestamp,
b7 serv_lat float,
b8 float,
b9 float,
c1 string attribute,
c2 string attribute,
c3 string attribute,
c4  string attribute,
c5  string attribute,
c6 string attribute,
c7 string attribute,
c8 string attribute,
c9 string attribute
) ngram_len='1' ngram_chars='cjk' |
+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)

表数据量

MySQL [(none)]> select count(*) from xx_202309;
+------------+
| count(*)   |
+------------+
| 3134061780 |
+------------+
1 row in set (0.02 sec)

使用时间字段统计一段时间范围数据量时间太长

MySQL [(none)]> 
MySQL [(none)]> select count(*) from xx_202309 where b5 BETWEEN 1693670400 AND 1694966400;
+------------+
| count(*)   |
+------------+
| 2015541463 |
+------------+
1 row in set (10 min 13.40 sec)

MySQL [(none)]> show meta;
+----------------+----------------------------------+
| Variable_name  | Value                            |
+----------------+----------------------------------+
| total          | 1                                |
| total_found    | 1                                |
| total_relation | eq                               |
| time           | 613.388                          |
| index          | b5:SecondaryIndex (100%) |
+----------------+----------------------------------+
5 rows in set (0.00 sec)

二级索引速度不行啊

looks worrisome. Is this easily reproducible?

很容易重现,执行这种【select count(*) from xx_202309 where b5 BETWEEN A AND B; 】 语句花费时间一直数分钟以上

在表数据量30多亿时,执行【select count(*) from xx_202309 where b5 BETWEEN A AND B; 】这花费时间都是数分钟以上

Can you reproduce it on a synthetic case by creating a table like create table t(b5 timestamp); adding data into it and running the select which takes a few seconds/minutes? Can you provide the instruction then (a script which does the job)?

表建仅一列吗,不符合使用场景啊,使用一般表都40~60列,或者更多列啊

还有manticore是 先match检索表所有文件再根据时间过滤,实际应用场景先根据时间分区过滤,再match检索,我认为这样表数据量特别大时速度比前期manticore这是方式要快很多

还有对于很大数据量,manticore哪里有分表相关说明啊

表建仅一列吗,不符合使用场景啊,使用一般表都40~60列,或者更多列啊

In your case it doesn’t, but if you reproduce the issue with just a single column, it will be easier for us to understand why it’s slow.