I’m running Manticore in a Docker and when I load more than 700k records the Docker shuts down (Exited (137)) because it consumes 8 Gbs of RAM that I provided… These records are news so the “Cuerpo” field contains texts of 20k characters and the 2 jsons fields contain an array of objects with no more than 3 or 4 properties that store integer values.
My goal is to load 100M records and keep adding day by day, I understand that for this I should use more RAM, but how much should I estimate? Could something in the creation of my table be improved?
The version I’m using I got it from: docker pull manticoresearch/manticore
I’m using RT and I created it in the following way with a C# application:
create table noticias
(
NoticiaId bigint,
Titulo text engine=‘columnar’,
Cuerpo text indexed engine=‘columnar’,
FechaAlta timestamp,
FechaPublicacion timestamp,
Audiencia int,
TierId int,
TipoMedioId int,
Ave float,
SoporteId int,
DivisionId int,
PaisId int,
Empresas json,
Temas json,
embed_vector float_vector knn_type=‘hnsw’ knn_dims=‘4096’ hnsw_similarity=‘cosine’ engine=‘columnar’
)
morphology = ‘libstemmer_es’
stopwords = ‘es’
min_word_len = ‘4’
html_strip = ‘1’
doesn’t make sense. A full-text (text) field can’t be columnar.
Empresas json,
Temas json,
may be what consumes most RAM and it can’t be stored in the columnar storage. Try storing the data in scalar attributes or, if it’s not possible, give it more RAM.
root@61bf8e296e2e:/var/lib/manticore# mysql -P9306 -h0 -e "select count(*) from noticias"
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
root@61bf8e296e2e:/var/lib/manticore# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
mantico+ 1 14.3 12.5 24258544 16532316 ? Ssl Oct02 195:19 searchd -c /etc/manticoresearch/manticore.conf.sh --nodetach
Most of the RAM is used by the .spknn files:
root@61bf8e296e2e:/var/lib/manticore# ls -la /var/lib/manticore/noticias/*.spknn|awk '{sum+=$5;} END{print sum/1024/1024/1024;}'
15.3971
These files contain an HNSW index for the float_vector field. Unfortunately, a limitation of the HNSW index is that it requires to be in RAM for good performance, especially when the dimensions is as high as 4096.
so that Docker doesn’t crash when I try to add more records
700k records the Docker shuts down (Exited (137)) because it consumes 8 Gbs of RAM that I provided
For 100k records, you need about 1170 MB of RAM. So, for 1 billion records, you’d need approximately 11.16 TB.
In my test, I inserted a single sample document 1 million times, which used 16,532,316 KB of RAM. Based on that, storing 1 billion records would require roughly 15.4 TB.