Manticore memory capping on Kubernetes

Dear all,

I am trying to get my Manticore Cluster work properly on Kubernetes but I am facing memory usage issues which lead to the pod’s systematic eviction / crashloop backoff.

I have two nodes, with 4 CPUs and 16 Gi RAM each, running one manticore container each. I have set up Requests and Limits for each of them at 80% full potential, but my pods keep being OOMkilled by Kubernetes system (Out Of Memory).

Containers:
  manticore:
    Container ID:   docker://2c71c25298154b09ecb00****6c58d1103096d0fc42732aa316516ec82a9
    Image:          *****/manticore:7e740fe43b50
    Image ID:       docker-pullable://*****/manticore@sha256:c38b116***38cf383fa583c8a76705c9ac2dd417851649df608134ee2cf8a2
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 30 Dec 2020 13:06:58 +0100
      Finished:     Wed, 30 Dec 2020 13:11:18 +0100
    Ready:          False
    Restart Count:  8
    Limits:
      cpu:     3700m
      memory:  6000Mi
    Requests:
      cpu:        50m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /var/lib/manticore from manticore-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-csf8n (ro)

I am using RealTime index and each time I do some “REPLACE INTO” queries, the memory keeps on increasing. I have managed to monitor this with this command :

kubectl top pod manticore-6446***6c-6bcd8

NAME                        CPU(cores)   MEMORY(bytes)
manticore-64466f86c-6bcd8   1m           3429Mi

CPU is keeping with low values but memory can’t stop increasing and when it reaches my pod’s limit, it gets destroyed, again and again. Is there a way to “cap” memory usage or to flush automatically to disk before the pod gets destroyed?

What’s your rt_mem_limit? https://mnt.cr/rt_mem_limit

It is not set, so it should default to 256MB.

Here is the result of a “kubectl top” command after a few minutes (around 30 minutes) of quite intensive “REPLACE INTOs”. I’ve upgraded the servers to 32 GB to see if it helps.

kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   1m           276Mi
kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   21m          7925Mi
kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   18m          8677Mi
kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   31m          11463Mi
kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   27m          13238Mi
kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   1m           16226Mi

After 30 minutes, memory usage has hit a maximum of around 16Gi and began to lower.

kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   2m           9892Mi
kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   5m           8683Mi

But it went up again and it finally crashed …

kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   77m          11814Mi
kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   1m           20172Mi
kubectl top pod manticore-5467cb5c74-4j7nz
NAME                         CPU(cores)   MEMORY(bytes)
manticore-5467cb5c74-4j7nz   110m         26104Mi

By chance I’ve attached a PersistentVolumeClaim, so it restarts from where it was, but it creates a downtime during which it can’t be updated.

So, it would probably be good to be able to cap global memory usage to avoid these kind of issues.

In case it helps : my index files on disk are only 243 MB… and manticoresearch is the only container running.

Here is my config file :

#!/bin/sh
ip=`hostname -i`
cat << EOF

searchd
{
        listen                  = 9306:mysql41
        listen                  = $ip:9312
        listen                  = $ip:9315-9325:replication
        log                     = /dev/stdout
        query_log               = /var/log/manticore/searchd.log
        pid_file                = /var/run/manticore/searchd.pid
        preopen_indexes         = 0
        binlog_path             = /var/lib/manticore
        data_dir                = /var/lib/manticore
        collation_server        = utf8_general_ci
        max_packet_size = 128M
        max_open_files = max
        rt_mem_limit = 8192M

}
EOF

I think I found the issue :

preopen_indexes was in fact set to 1 because it did not sync my last changes.

Setting preopen_indexes to 0 seems to stop the server from always increasing used memory!