performance down when increasing cpu cores and memory

rev:the latest 6.3.9 285d98390@24112610
mode: realtime
engine: columnar
pseudo_sharding = 1

for testing ,I inserted about 450k rows of text into a table on a 4 cores 4gb vps, then I did a search test for its full text search performance, finally I get a decent result about 1700 queries/ second,

however, when I ported exactly the same data to a 8 cores 16gb vps, the final search performance went down to 1200 queries / second

I also tested a vector search performance for these 450k data, a smaller vps with 4cores 4gb runs faster than the 8 cores 16gb one.

so what might be the problem ,was there anything I did wrong?

when I ported exactly the same data

Did you copy the files physically or reinserted the data in the other environment?

backup and restore data

using manticore-backup or mysqldump?

manticore-backup

When you run your load tests are the CPUs fully loaded?

cpu has no other process running, it is a dedicated manticore server,all resources is reserved for manticore only。

I was running the query test using 8 clients ,each one has a persistent sql protocol link to the server,keep quering。 and manticore-server cpu was loaded at 30% more or less

your CPU is only 30% utilized, does it make sense to measure throughput? Typically when you have a dedicated server for Manticore, throughput is measured when the CPU is fully utilized (100%), unless it is limited by I/O, which doesn’t seem to be the case here. Or perhaps the bottleneck is in your script which produces the load and for some reason it can’t send more queries per second.

If the script is ok, to fully load your CPU, try increasing the value of searchd.threads in your configuration. Also, ensure that you aren’t setting threads=N in your queries.

of course , I payed attention to this cpu utilization thing,there was no IO cost when I check iostats and cpustats,just cpu time is more when in more-cpu-cores situation。

after a few tests, the performance of column engine really makes me feel not that understandable, there are a few testing result which is quite strange to me:

1、for a table,for example,500k records,in 8 disk chunks,simply increasing cpu cores or ram without increaing disk chunks can not improve query throughput,or even get a worse performance if you has more cpu cores for previous 8 disk chunks, only after I alter disk chunk count to 16,then I get a slight improvement

2、there is a trick or maybe bug,that I found can greatly improve query performance:
for a four core cpu vps:defaut disk chunk count is 8,run ALTER TABLE xxx optimize_cutoff=‘16’,waiting optimization finished,then run ALTER TABLE xxx optimize_cutoff=‘8’, to change it back,now I get query performance boost as twice or more,but server cpu loading percentage is even lower。
before I played this trick, when runing test, cpu peak 24-30%
after I played this trick,when runing test, cpu peak 14-22% , but throughput doubled

3、I got better performance in virtual machine guest(pve debian11)。rather than directly install manticore server in non-virtual debian 11 in the same hardware which physically has more cpu cores and ram

I’d like to reproduce your issues locally. Can you share your backup and your load script? Here’s how you can do it Manticore Search Manual: Reporting bugs

yes, I’d like to, however the script require some additional work, and special text embedding model, maybe it is not that strait forward to get it set up.

1、data source:

2、fast embed:

3、hugging face embedding model:

4、php extension to import big json data:

and I think the easiest way to reproduce it is by sharing my screen to you?maybe a network chat? I am in China, and cann’t use discord, how about using microsoft teams to help showing this issue?

It will be easier if you give me 2 things:

  • your backup
  • your query log

I can then just restore from your backup and replay your query log with various concurrencies.

fine, how could I send you these files?
backup file compressed , 4GB , a little big for sharing, let me find a place to upload, then I will send you pm.

however, when I looked into query logs, those logs strip some key info of the original query log, my query contain 768 dimensions vector points for vector search, but in log file, I can only find sth like:

/* Sat Nov 30 13:02:10.750 2024 conn 1016 (192.168.1.24:55666) real 0.004 wall 0.005 found 11032 */ SELECT id, intro, knn_dist() FROM users LIMIT 10;

so, how to save a complete knn query log?

fine, how could I send you these files?

https://manual.manticoresearch.com/Reporting_bugs#Uploading-your-data

those logs strip some key info of the original query log

This is a bug. I’ve created this issue - KNN query is not logged properly · Issue #2804 · manticoresoftware/manticoresearch · GitHub
While we are fixing it, can you try reproducing your issue on a single query and then share it?

ok, let me figure it out,

knn log bug, I suggest using a config parameter to disable log knn float points, coz, there are always high dimensional vectors which is quite big, which adds obvious overhead

adds obvious overhead

Good point. We’ll discuss it, but with thousands of queries per second, there are also other types of queries that could result in writing tens of megabytes per second to the query log.

Regarding the original issue, please share your backup and a sample query. I’ll test it on a 4-core and an 8-core VPS to try and reproduce the problem.

yes, I am working on that, my query sql contains 3000 different news titles embed using fast emded , in 3000 different queries. if I just provide you one single query sql to do load test, maybe the final result won’t be true for real situation.

I wish I can give you queries reflecting my actual condition, hope I can get it done soon

1 Like

table backup file and query log being upload to
manticore/write-only/issue-cpu-load-bug
according to uploading manual,

but I can not see the uploaded file, access denied.

but I can not see the uploaded file, access denied.

That’s fine, it’s a write-only S3 storage. I can see this:

root@dev2 /mnt/s3.manticoresearch.com/issue-cpu-load-bug # ls -la
total 3971925
drwxr-x--- 1 root root          0 Dec 31  1969 .
drwxrwxrwx 1 root root          0 Jan  1  1970 ..
-rw-r----- 1 root root 4066902144 Dec  2 16:16 backup.zip
-rw-r----- 1 root root       1157 Dec  2 15:40 readme.txt
-rw-r----- 1 root root     345096 Dec  2 15:40 test_query.log

Thanks. We’ll look into it.