How apply 'ALTER' command to distributed table?

qiyuqiyu · August 8, 2024, 9:26am

In our business, we need alter table column frequently, but we also need distributed to storage our data about ten billion level.Currently we use elastic search, we can use ‘put mapping’ command to alter field.I wanna know whether manticore search realize similar functions?

qiyuqiyu · August 8, 2024, 9:31am

In elastic search, we can use command like this “curl --location --request GET ‘http://localhost:8080/item_comment_*/_mappings/_doc’
–header ‘Content-Type: application/json’
–data ‘{
“properties”: {
“attr_is_highlight_comment”: {
“type”: “boolean”
}
}
}’”.
Then “attr_is_highlight_comment” filed will be added into all indexes and can be searched, filtered and aggerated. Can manticore search do this?

Sergey · August 8, 2024, 10:02am

Manticore doesn’t have this functionality yet.

qiyuqiyu · August 8, 2024, 10:17am

That’s means, if I want to use ‘ALTER’ command, I have to give up distributed table and storage all data into one table?

Sergey · August 8, 2024, 10:18am

No, it means you have to alter each table the distributed table includes separately.

qiyuqiyu · August 8, 2024, 10:21am

That’s good. If I alter one table, its replica will also be altered automatically right?

Sergey · August 8, 2024, 10:42am

No. You have to detach a table from a cluster, alter it and then add it back. Otherwise you’ll get this error:

mysql> alter table t add column a int;
ERROR 1064 (42000): table t: is part of cluster c, ALTER is not supported for tables in cluster

qiyuqiyu · August 8, 2024, 10:49am

How can I keep the cluster is still working for read and write data when I detach one table from a cluster?

barryhunter · August 8, 2024, 11:23am

In general you can’t do direct ‘online’ schema updates. Would have to take teh index offline to perform such updates.

But you might consider a JSON attribute.
https://manual.manticoresearch.com/Data_creation_and_modification/Adding_data_from_external_storages/Fetching_from_databases/Processing_fetched_data#sql_attr_json

You can add arbitrary ‘sub’ attributes to each document, without having to alter the overall table.

You setup the single JSON attribute on the table, and then the properties inside the attribute are schemaless.

Subject to some limitations can still filter or sort by properties inside a JSON attribute. ( I think grouping too, not tested)

Sergey · August 9, 2024, 3:12am

The table will be still readable. Writing should be anyway suspended whlie the table is being altered. Since you have to write to cluster_name:table_name when the table is in a cluster, you won’t be able to do it once it’s detached as a protection from unexpected writes.

qiyuqiyu · August 9, 2024, 9:18am

Does that mean when one table is detached from cluster, read request will be assigned to read its replica?And I need to alter all tables and their replica tables, right?

tomat · August 9, 2024, 2:17pm

then you detach the table from the cluster - you can not write into that table and replicate writes across the whole cluster.

You could read or search that table. You could write the table locally at the node. Then you attach the table again into cluster the cluster uses SST to replicate the table content across the nodes. That is why after ALTER cluster ADD table all nodes will see the same state of the added table.

However you can not add or drop attributes for table that is part of the cluster as that operation is not cluster wide and works only for local tables. That operation is also lock local table for write that is why it is not possible to write data into table and add or drop attributes at the same time.

If you want to add attribute you should

drop table from the cluster (table remains local at all nodes and can be used as local table at any node)
issue alter table t add column a int; to change local table at one of the node
add local table into the cluster (cluster replicate the new table files across the nodes and if the different nodes have different table content it will be replaced by the content of the table from the node there table got added into cluster)

qiyuqiyu · August 12, 2024, 2:37am

So just to confirm, whether I only need to alter one table, and the whole cluster(including its replica and all other shards) will synchronize the attributes I changed?

tomat · August 12, 2024, 6:22am

yes the cluster will see the changes after it finished adding table into cluster back. However it might take time if the table has huge size.

qiyuqiyu · August 16, 2024, 10:02am

Will you support this feature in the future that alter table attributes in distributed table in the future like elasticsearch “curl --location --request GET ‘http://localhost:8080/item_comment_*/_mappings/_doc’
–header ‘Content-Type: application/json’
–data ‘{
“properties”: {
“attr_is_highlight_comment”: {
“type”: “boolean”
}
}
}’”?