Minimum RAM for 150k records with float_vector HNSW index?

Fernando_Figaroli · June 17, 2025, 1:10pm

Hi everyone,

I need your help to figure out the optimal resource configuration for my Manticore cluster on Kubernetes.

My Setup:

I’ve set up a 3-node cluster using the Helm chart.
I have created a table with the following schema: SQLCREATE TABLE accounting( _id string, sectionId string, um string, label text stored, vector float_vector knn_type='hnsw' knn_dims='512' hnsw_similarity='COSINE' );
I have indexed approximately 150,000 records.

The Problem: I’ve allocated 2GB of RAM to each pod, but they are running into Out of Memory (OOM) errors.

My Question: What is the recommended minimum RAM configuration to handle this volume of data, especially considering the HNSW index on 512-dimension vectors? Are there any guidelines or formulas to calculate the required RAM based on the number of records and vector dimensionality?

Thanks in advance for any suggestions!

Sergey · June 18, 2025, 3:35am

Hi @Fernando_Figaroli

512-dimensional float vectors = 512 × 4 bytes = 2 KB per vector
For 150,000 vectors: 150,000 × 2 KB = ~300 MB (raw vector data only)
HNSW adds memory overhead for graph structures.
- Conservative estimate: 8–10× the vector data size in RAM depending on M, ef_construction, and search parameters.
- So, 300 MB × 10 = ~3 GB for the index alone (can vary of course)

We’re currently wrapping up work on:

vector quantization
rescoring

These updates should significantly reduce RAM usage while keeping the same level of precision. Would you be interested in beta testing it?

Fernando_Figaroli · June 18, 2025, 5:08am

Thanks, that sounds great.

Yes, we’d like to join the beta. We’re testing a few solutions right now, so this would be very helpful.

Let me know what you need from our side.

Thanks.

Sergey · June 18, 2025, 8:27am

Conservative estimate: 8–10× the vector data size in RAM depending on M, ef_construction, and search parameters.

However, after running more tests, I found that RAM usage also depends on other factors. For example, with 4 dimensions, the ratio was around 7.3x, but with 512 dimensions, it dropped to just 1.07x:

mysql> select * from test.@files
...
|   11 | /var/lib/manticore/test/test.12.spknn | /var/lib/manticore/test/test.12.spknn | 329474292 |
...
|   15 | /var/lib/manticore/test/test.12.spb   | /var/lib/manticore/test/test.12.spb   | 307781080 |

Anyway, using binary quantization should reduce HNSW RAM usage by up to 32x. Also, if you store the raw vectors in columnar storage, that can help lower memory usage even further. I’ll ping you here when we have a docker image with the quantization and rescoring functionality ready. Hopefully this week.

Sergey · June 23, 2025, 3:38pm

@Fernando_Figaroli

The features are now ready for beta testing in the development packages.

Here’s how you can install them - Manticore Search Manual

Here’s an sample of the syntax:

MySQL [(none)]> drop table if exists test; create table test ( title text, image_vector float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' quantization='1bit'); insert into test values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406) ); select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926), { ef=2000, oversampling=5.0, rescore=1 } );
--------------
drop table if exists test
--------------

Query OK, 0 rows affected (0.016 sec)

--------------
create table test ( title text, image_vector float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' quantization='1bit')
--------------

Query OK, 0 rows affected (0.002 sec)

--------------
insert into test values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406) )
--------------

Query OK, 2 rows affected (0.002 sec)

--------------
select id, knn_dist() from test where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926), { ef=2000, oversampling=5.0, rescore=1 } )
--------------

+------+------------+
| id   | knn_dist() |
+------+------------+
|    1 | 0.28146550 |
|    2 | 0.81527930 |
+------+------------+
2 rows in set (0.000 sec)

The documentation is available here: Searching > KNN | Manticore Search Manual
Note: the part about quantization isn’t ready yet in the docs, but you can see an example above.

Please give it a try and let me know if it helps.

benwills · June 23, 2025, 5:09pm

I’ve considering implementiung the upcoming 1-bit quantization feature for vector search, and have a couple questions.

My use case is text similarity search. I run a multi-tenant setup with a database per user, where each user can have hundreds of thousands of documents, so memory usage is a hard constraint. I do use NVMe storage, so storing indexes on disk is fine (I am okay with the query time cost of this), but RAM usage needs to be minimized.

My questions:

When using 1-bit quantization:
- Are both the 1-bit and original 32-bit vectors stored? You mention storing the vectors in columnar storage, how is that configured when using a CREATE TABLE statement?
- Is the 1-bit representation used as a coarse filter with reranking based on the full vectors, or does it fully replace the 32-bit vectors during similarity search?
If I later want to implement 1-bit quantization in production, can I easily modify existing tables or indexes to support it, or would it require reindexing/rebuilding from scratch?

ilya · June 24, 2025, 7:35am

Yes

You can do it either way. By default there’s no rescoring/oversampling, but you can specify them in knn search options: Searching > KNN | Manticore Search Manual

There’s currently no direct way to do it, but we can implement it later. Make a feature request on github if you need it.