Best Practices for Multitenancy, Multiple Vectors, and Columnar Storage

Hi,

I’m currently designing the architecture for a new project using Manticore Search and I would love to get your advice on a few key points to ensure I’m following the best practices.

Here are my questions:

1. Multi-tenancy Strategy I need to implement a multi-tenant system. I’m considering two main approaches:

  • Approach A: Use a single, large table with a tenant_id attribute. All queries would then need a WHERE tenant_id = ... filter.
  • Approach B: Create a separate table for each tenant (e.g., docs_tenant_1, docs_tenant_2, etc.).

My question is: what is the generally recommended approach for performance, scalability, and ease of management? If I choose Approach B, is it possible and efficient to perform a single query that searches across multiple (or all) tenant tables at once?

2. Multiple Vector Columns in a Single Table My documents have distinct semantic features that I would like to capture in separate vector fields. Is it possible for a single table to have more than one vector column?

If this schema is valid, can a single KNN query perform a search against both vector columns simultaneously? For example, finding a document that is a close match for a query vector on vector_1 and/or another query vector on vector_2 within the same search operation, or would this require two separate searches?

3. Columnar Storage for Vectors at Scale I am expecting to have around 15 million documents in the main table. Given this scale, would you recommend defining the vector fields with engine='columnar' to optimize for RAM usage? What are the performance trade-offs (if any) for KNN search speed when using columnar storage for vectors compared to the default row-wise storage?

thanks.

  1. If you mostly need to search within a single tenant (which is usually the case in multi-tenant setups), I’d go with separate tables. You can still create a distributed table that combines all tenants if you need to search across them. Just make sure that all IDs are unique in that case.
  2. Yes. Here’s an example:
mysql> drop table if exists test; create table test ( title text, v1 float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' quantization='1bit', v2 float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' ); insert into test values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821), (-0.148894,0.748278,0.091892,-0.095406) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406), (0.653448                                                                                                                                                                                                       nn_
--------------
drop table if exists test
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table test ( title text, v1 float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' quantization='1bit', v2 float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' )
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
insert into test values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821), (-0.148894,0.748278,0.091892,-0.095406) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406), (0.653448,0.192478,0.017971,0.339821) )
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select * from test
--------------

+------+------------+---------------------------------------+---------------------------------------+
| id   | title      | v1                                    | v2                                    |
+------+------------+---------------------------------------+---------------------------------------+
|    1 | yellow bag | 0.653448,0.192478,0.017971,0.339821   | -0.148894,0.748278,0.091892,-0.095406 |
|    2 | white bag  | -0.148894,0.748278,0.091892,-0.095406 | 0.653448,0.192478,0.017971,0.339821   |
+------+------------+---------------------------------------+---------------------------------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

--------------
select id, knn_dist() from test where knn ( v1, 5, (0.286569,-0.031816,0.066684,0.032926), { ef=2000, oversampling=3.0, rescore=1 } )
--------------

+------+------------+
| id   | knn_dist() |
+------+------------+
|    1 | 0.28146550 |
|    2 | 0.81527930 |
+------+------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---

--------------
select id, knn_dist() from test where knn ( v2, 5, (0.286569,-0.031816,0.066684,0.032926), { ef=2000, oversampling=3.0, rescore=1 } )
--------------

+------+------------+
| id   | knn_dist() |
+------+------------+
|    2 | 0.28146550 |
|    1 | 0.81527930 |
+------+------------+
2 rows in set (0.00 sec)
--- 2 out of 2 results in 0ms ---
  1. Definitely use quantization for RAM optimization (released in version 13.2.3 just a few days ago), along with oversampling and rescoring (to keep the accuracy). Columnar storage can also help improve performance.
1 Like

Thank you for your advice, I’m still running some tests and so far with excellent results.

I read in the issues that embedding has been implemented in the latest version, but I can’t find any documentation on how to use it. Is it possible to have some examples?

thanks.

It’s not fully set up yet: it can automatically create embeddings for documents, but not for queries yet. We’re still working on that part.

understand, thank you. For now, it would be useful for me to know how automatic embedding works.

I would need to know if I can configure any model / endpoint or only the local open-source ones?

You can check out this PR with tests covering the new functionality (WIP) - Tests: auto-embeddings in columnar by donhardman · Pull Request #3472 · manticoresoftware/manticoresearch · GitHub

You should be able to use any embedding model from Hugging Face which is compatible with GitHub - huggingface/candle: Minimalist ML framework for Rust which Manticore uses to convert texts to embeddings. There’s also an integration with OpenAI API.

1 Like