Nothing is being stored in RAM - CPU usage way UP - Queries slow

Zentiram · May 24, 2023, 2:38pm

We are testing a migration from Sphinx to Manticore for a mid sized production application that stores about 20GB worth of documents in RAM (on Sphinx) using a MySQL source. The migration to Manticore was pretty seamless however there is a considerable performance gap between Sphinx and Manticore and our applications search functionality is not useful with Manticore. I was reviewing the test systems we are running on and noticed the memory profiles are vastly different. While Sphinx is firing up and using 20GB of memory, Manticore is using less than 2GB. This has led me down quite a rabbit hole trying to better understand why Sphinx is loading its indices into RAM and Manticore is not.

It feels like I am missing something fundamental here and after three days of scouring the docs and reading these forums I have decided to post. Ill provide a sample setting below that is generated and works in both sphinx and manticore:

source event_core_0
{
  type = mysql
  sql_host = <>
  sql_user = <>
  sql_pass = <>
  sql_db = <>
  sql_query_pre = SET TIME_ZONE = '+0:00'
  sql_query_pre = UPDATE `events` SET `delta` = 0 WHERE `delta` = 1
  sql_query_pre = SET NAMES utf8
  sql_query = SELECT SQL_NO_CACHE `events`.`id` * 33 + 4 AS `id`, `events`.`name` AS `name`, GROUP_CONCAT(DISTINCT CONCAT_WS(' ', `events`.`name`, `events`.`post_as`, `events`.`id`, `events`.`imported_id`, `contacts`.`first_name`, `contacts`.`last_name`, `email_addresses`.`address`, `accounts`.`name`, `documents`.`parsed_id`, `documents`.`custom_title`, `documents`.`sequential_id_with_prefix`) SEPARATOR ' ') AS `extended_name`, `events`.`id` AS `sphinx_internal_id`, 'Event' AS `sphinx_internal_class`, 0 AS `sphinx_deleted`, UNIX_TIMESTAMP(`events`.`created_at`) AS `created_at`, UNIX_TIMESTAMP(`events`.`event_start`) AS `event_start`, UNIX_TIMESTAMP(`events`.`deleted_at`) AS `deleted_at`, `events`.`location_id` AS `location_id` FROM `events` LEFT OUTER JOIN `contacts` ON `contacts`.`id` = `events`.`contact_id` LEFT OUTER JOIN `email_addresses` ON `email_addresses`.`email_addressable_id` = `contacts`.`id` AND `email_addresses`.`email_addressable_type` = 'Contact' LEFT OUTER JOIN `accounts` ON `accounts`.`id` = `events`.`account_id` LEFT OUTER JOIN `documents` ON `documents`.`documentable_id` = `events`.`id` AND `documents`.`documentable_type` = 'Event' WHERE (`events`.`id` BETWEEN $start AND $end) GROUP BY `events`.`id`, `events`.`name`, `events`.`id`, `events`.`created_at`, `events`.`event_start`, `events`.`deleted_at`, `events`.`location_id` ORDER BY NULL
  sql_query_range = SELECT IFNULL(MIN(`events`.`id`), 1), IFNULL(MAX(`events`.`id`), 1) FROM `events`
  sql_attr_uint = sphinx_internal_id
  sql_attr_uint = sphinx_deleted
  sql_attr_uint = created_at
  sql_attr_uint = event_start
  sql_attr_uint = deleted_at
  sql_attr_uint = location_id
  sql_attr_string = sphinx_internal_class
  sql_field_string = name
  sql_field_string = extended_name
}

index event_core
{
  type = plain
  path = /path/to/event_core
  morphology = stem_en
  dict = keywords
  min_prefix_len = 3
  expand_keywords = 1
  index_exact_words = 1
  source = event_core_0
}

EDIT: Nearly forgot to ask the question. Can I expect Manticore to load indices into RAM? Currently live searches are taking orders of magnitude more time than against the Sphinx box and they are bogging down the CPUs. Is there a configuration change that can be made to use the systems memory?

Thank you

tomat · May 24, 2023, 3:11pm

you could set access_XXX per index option or searchd.access_XXX global options to mlock as described Performance related section

searchd
{
	access_plain_attrs = mlock;
	access_blob_attrs = mlock;
	access_doclists = mlock;
	access_hitlists = mlock;

to force loading and locking data into memory

Could you also provide ls -lh of your index ?

Zentiram · May 24, 2023, 9:18pm

Thank you for the reply.

There have been a couple of hang ups using mlock. Firstly, I needed to change the user under which Manticore was stared as it needed access to root. After this was discovered, I decided to re-index. This is the index for the example above:

-rw-------  1 user       user         12 May 24 15:14 event_core.ram
-rw-r--r--  1 root       root 1263405120 May 24 20:33 event_core.spa
-rw-r--r--  1 root       root 4428171916 May 24 20:33 event_core.spb
-rw-r--r--  1 user       user 3724188308 May 24 20:39 event_core.spd
-rw-r--r--  1 user       user 1934781898 May 24 20:33 event_core.spds
-rw-r--r--  1 user       user   83218756 May 24 20:39 event_core.spe
-rw-r--r--  1 user       user       1343 May 24 20:39 event_core.sph
-rw-r--r--  1 user       user      90646 May 24 20:33 event_core.sphi
-rw-r--r--  1 user       user  940611463 May 24 20:39 event_core.spi
-rw-r--r--  1 user       user    3887400 May 24 20:33 event_core.spm
-rw-r--r--  1 user       user 2167373858 May 24 20:39 event_core.spp
-rw-r--r--  1 user       user  162786575 May 24 20:33 event_core.spt

I first started the service as the root user with the settings you suggested and it fired up, but the memory profile did not change. I reindexed as the root user, started it again, and the memory profile remained the same.

Sergey · May 25, 2023, 6:43am

Please share the searchd log.

Zentiram · May 25, 2023, 2:37pm

[Thu May 25 01:39:26.761 2023] [25513] watchdog: main process 25515 forked ok
[Thu May 25 01:39:26.762 2023] [25515] starting daemon version '6.0.4 1a3a4ea82@230314' ...
[Thu May 25 01:39:26.762 2023] [25515] listening on 10.8.1.228:9312 for mysql
[Thu May 25 01:39:26.896 2023] [25517] WARNING: secondary library not loaded; secondary index(es) disabled
[Thu May 25 01:39:29.351 2023] [25519] last message repeated 23 times
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.001; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.002
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.002; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.002
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.002; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.003
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.003; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.004
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.004; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.005
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.005; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.005
[Thu May 25 01:39:29.351 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.351 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.005; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.351 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.006
[Thu May 25 01:39:29.352 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.352 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.006; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.352 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.007
[Thu May 25 01:39:29.352 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.352 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.007; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.352 2023] [25519] binlog: replaying log /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.008
[Thu May 25 01:39:29.352 2023] [25519] binlog: replay stats: 0 commits; 0 updates, 0 reconfigure; 0 pq-add; 0 pq-delete; 0 pq-add-delete, 0 tables
[Thu May 25 01:39:29.352 2023] [25519] binlog: finished replaying /var/www/user/releases/20230522151115/tmp/binlog/staging/binlog.008; 0.0 MB in 0.000 sec
[Thu May 25 01:39:29.352 2023] [25519] binlog: finished replaying total 14 in 0.000 sec
[Thu May 25 01:39:29.352 2023] [25520] prereading 24 tables
[Thu May 25 01:39:29.352 2023] [25515] WARNING: [BUDDY] no SPHINX or HTTP listeners found, disabled
[Thu May 25 01:39:29.352 2023] [25515] accepting connections
[Thu May 25 01:39:32.435 2023] [25520] preread 24 tables in 3.083 sec

This is running Sidekiq to support a Ruby on Rails application.

Sergey · May 25, 2023, 3:08pm

Nothing suspicious in the log. I would:

look at the searchd’s output too to make sure there’s no warning that it can’t mlock
check with with pmap or in /proc/pid/smaps whether the files are actually mlocked or not

If you have access_plain_attrs = mlock and there’s no warning on searchd start anywhere and it’s NOT locked according to /proc/, then feel free to create an issue on github about it with an instruction how to reproduce it.

E.g. in /proc/<searchd pid from pid_file>/smaps you can see smth like:

7f680ce28000-7f680ffdf000 rw-s 00000000 09:03 220546779                  /home/snikolaev/gl3495/blogs_week_6.spa
Size:              50908 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:               50908 kB
Pss:               50908 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:     50908 kB
Private_Dirty:         0 kB
Referenced:        50908 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:            50908 kB
THPeligible:    0

Locked: 50908 kB means it was mlocked.