Docker shotdown with Exited (137) high RAM consumption with RT

Hi everyone,

I’m running Manticore in a Docker and when I load more than 700k records the Docker shuts down (Exited (137)) because it consumes 8 Gbs of RAM that I provided… These records are news so the “Cuerpo” field contains texts of 20k characters and the 2 jsons fields contain an array of objects with no more than 3 or 4 properties that store integer values.

My goal is to load 100M records and keep adding day by day, I understand that for this I should use more RAM, but how much should I estimate? Could something in the creation of my table be improved?

The version I’m using I got it from: docker pull manticoresearch/manticore

I’m using RT and I created it in the following way with a C# application:
create table noticias
(
NoticiaId bigint,
Titulo text engine=‘columnar’,
Cuerpo text indexed engine=‘columnar’,
FechaAlta timestamp,
FechaPublicacion timestamp,
Audiencia int,
TierId int,
TipoMedioId int,
Ave float,
SoporteId int,
DivisionId int,
PaisId int,
Empresas json,
Temas json,
embed_vector float_vector knn_type=‘hnsw’ knn_dims=‘4096’ hnsw_similarity=‘cosine’ engine=‘columnar’
)
morphology = ‘libstemmer_es’
stopwords = ‘es’
min_word_len = ‘4’
html_strip = ‘1’

doesn’t make sense. A full-text (text) field can’t be columnar.

Empresas json,
Temas json,

may be what consumes most RAM and it can’t be stored in the columnar storage. Try storing the data in scalar attributes or, if it’s not possible, give it more RAM.

Hi again,

I made the following changes and there were great improvements, now I can save up to 2 million records:

  • Text now is engine rowwise
  • The rest of the fields are columnar
  • Change the jsons to an array with their ids
create table noticias
(
    NoticiaId bigint,
    Titulo text engine='rowwise', 
    Cuerpo text indexed engine='rowwise', 
    FechaAlta timestamp,
    FechaPublicacion timestamp,
    Audiencia int,
    TierId int,
    TipoMedioId int,
    Ave float,
    SoporteId int,
    DivisionId int,
    PaisId int,
    Empresas multi,
    Temas multi,
    embed_vector float_vector knn_type='hnsw' knn_dims='4096' hnsw_similarity='cosine'
) 
morphology = 'libstemmer_es'
stopwords = 'es'
min_word_len = '4'
html_strip = '1'
engine='columnar'

Any other suggestions so I can improve performance of the table?

To run docker use the following command

docker run \
    -d \
    --name manticore \
    -v /opt/manticoredb/data:/var/lib/manticore \
    -p 9306:9306 \
    -p 9308:9308 \
    -p 9312:9312    \
    --ulimit nofile=65536:65536 \
    --cap-add=IPC_LOCK \
    --ulimit memlock=-1:-1 \
    -e EXTRA=1 \
    manticoresearch/manticore

Is this enough or should I add some extra configuration?

Thank you for your time and help

For what query and what’s the current performance?

I mean, better performance so that Docker doesn’t crash when I try to add more records.

Now I can add up to 2,100,000 records to the table and after that Docker shuts down.

The query reaction times are good, between 150 ms and 300 ms to resolve, for example, this query:

{
    "index": "noticias",
    "query": {
        "match": {
            "cuerpo": "exportaciones de argentina al exterior"
        }
    },
    "_source": {
        "includes": [
            "*"
        ],
        "excludes": [
            "embed_vector",
            "audiencia",
            "tierid",
            "fechaalta"
        ]
    },
    "limit": 20
}

My only problem is that Docker shuts down after reaching a certain number of records

Pls provide a sample of a document. I’ll try to reproduce locally.

I’m adding the link to download a json with the example because it exceeds the number of characters I can write in the comment

I’ve reproduced RAM consumption reaching 16GB:

root@61bf8e296e2e:/var/lib/manticore# mysql -P9306 -h0 -e "select count(*) from noticias"
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

root@61bf8e296e2e:/var/lib/manticore# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mantico+       1 14.3 12.5 24258544 16532316 ?   Ssl  Oct02 195:19 searchd -c /etc/manticoresearch/manticore.conf.sh --nodetach

Most of the RAM is used by the .spknn files:

root@61bf8e296e2e:/var/lib/manticore# ls -la /var/lib/manticore/noticias/*.spknn|awk '{sum+=$5;} END{print sum/1024/1024/1024;}'
15.3971

These files contain an HNSW index for the float_vector field. Unfortunately, a limitation of the HNSW index is that it requires to be in RAM for good performance, especially when the dimensions is as high as 4096.

so that Docker doesn’t crash when I try to add more records

So the answer is:

  • either increase RAM available for the container
  • or drop the float_vector column
  • or lower the dimensions by using another model

Ok, I understand

Can you help me estimate how much RAM I need for 500,000,000 documents?

I estimate that in a year I will have 1,000,000,000 documents, I also need to estimate how much RAM I will need at that time.

Doing some quick math I get the result that I need 2TB and 4TB respectively, am I correct?

Based on what you previously said:

700k records the Docker shuts down (Exited (137)) because it consumes 8 Gbs of RAM that I provided

For 100k records, you need about 1170 MB of RAM. So, for 1 billion records, you’d need approximately 11.16 TB.

In my test, I inserted a single sample document 1 million times, which used 16,532,316 KB of RAM. Based on that, storing 1 billion records would require roughly 15.4 TB.