Vector indexing questions

Hi,

I have 2 questions related to the vector indexes

  1. From time to time without a pattern I could reproduce I get the “KNN index not loaded” error. The way I can reproduce it is when on an existing table with a vector embedding field I drop the field and recreate it with some different properties (ex QUANTIZATION=‘8bit’). It works fine after “ALTER TABLE … REBUILD KNN“ and also if I recreate the field with same properties. Is this a normal behaviour? My current vector field definition is: FLOAT_VECTOR KNN_TYPE=‘hnsw’ KNN_DIMS=‘1536’ HNSW_SIMILARITY=‘COSINE’ QUANTIZATION=‘8bit’ (I am generating embeddings and saving them)
  2. What is the default quantization for auto embeddings and vector fields created with MODEL_NAME=”openai/…”?
  1. So does it happen only after you drop a field and before ALTER … REBUILD KNN or not?
  2. There’s no quantization by default
  1. Apparently it happens after I drop the vector field and recreate it with different configuration, but this is not happening all the time. After issuing ALTER … REBUILD KNN the matter is solved

Does this look like your case?

 ~  mysql -P9306 -h0 -v -e "drop table if exists t; create table t ( title text, vec float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' ); insert into t values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406) ); select id, knn_dist() from t where knn ( vec, 5, (0.286569,-0.031816,0.066684,0.032926)); alter table t drop column vec; flush ramchunk t; alter table t add column vec float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2'; select id, knn_dist() from t where knn ( vec, 5, (0.286569,-0.031816,0.066684,0.032926));"
mysql -P9306 -h0 -v -e "alter table t rebuild knn"
mysql -P9306 -h0 -v -e "select id, knn_dist() from t where knn ( vec, 5, (0.286569,-0.031816,0.066684,0.032926));"
--------------
drop table if exists t
--------------

--------------
create table t ( title text, vec float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' )
--------------

--------------
insert into t values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406) )
--------------

--------------
select id, knn_dist() from t where knn ( vec, 5, (0.286569,-0.031816,0.066684,0.032926))
--------------

+------+------------+
| id   | knn_dist() |
+------+------------+
|    1 | 0.28146550 |
|    2 | 0.81527930 |
+------+------------+
--------------
alter table t drop column vec
--------------

--------------
flush ramchunk t
--------------

--------------
alter table t add column vec float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2'
--------------

--------------
select id, knn_dist() from t where knn ( vec, 5, (0.286569,-0.031816,0.066684,0.032926))
--------------

ERROR 1064 (42000) at line 1: table t: KNN index not loaded
--------------
alter table t rebuild knn
--------------

--------------
select id, knn_dist() from t where knn ( vec, 5, (0.286569,-0.031816,0.066684,0.032926))
--------------

+------+------------+
| id   | knn_dist() |
+------+------------+
|    1 | 0.08866492 |
|    2 | 0.08866492 |
+------+------------+

This is exactly the situation, however sometimes it seems the error is triggered even with the same index configuration.

Right now I catch the exception and issue a rebuild but if this is the expected behaviour it would be good to have a function to check if the index is loaded.

Do you mean even without “ALTER TABLE”?

I meant without changing the vector field type, drop field and recreate it later with same type and sometimes (rare ocassions) the error appears (I could not find a clear pattern)

Thanks. I’ll talk with the team so we can make it clearer or improve the behaviour if it’s a bug.

Thanks a lot!

The related issue is Re-adding an embedding column using ALTER should add an HNSW index too · Issue #3861 · manticoresoftware/manticoresearch · GitHub