OPTIMIZING index without excluding from a cluster

glukkkk · July 12, 2021, 10:24am

Any chance to see this feature in the future? We want to re-sync our indexes daily, but without optimizing it drains our disk space significantly. And we cannot exclude indexes from the cluster on our production environment because of downtime.

barryhunter · July 12, 2021, 10:39am

Are you meaning calling ‘OPTIMIZE INDEX’?

That shouldnt need a ‘remove from cluster’, the optimization happens in a background thread, with no downtime. (sometimes there is short pause, when the final switch happens, but its just a momemnty ‘slow down’ - queries dont fail.

tomat · July 12, 2021, 10:42am

no you can not issue OPTIMIZE for index in cluster.

As optimize performs at each node it could produce different disk chunks during the work and could also keep different amount of disk chunks depends on core count at every node.

That could lead to SST issues

joiner started to fetch index from one donor with one disk chunks set via SST, stopped
joiner reconnects but cluster selects another donor with another set of disk chunks and SST starts to transfer whole disk chunks set again from a new donor node

tomat · July 12, 2021, 10:48am

for now recommendation is to remove index from cluster and issue OPTIMIZE at one node

During optimize

all other nodes have local index available and could handle read queries but the index at these nodes has an old snapshot of data
node there optimize happens could handle write and read queries, someone could route all write queries into this node

after optimize finished user could add index back into cluster and SST will transfer actual index to all nodes in cluster

tomat · July 12, 2021, 10:51am

For now we are implementing auto-optimize functionality there different optimize strategies could be added. That allow user does not issue optimize by hand.

However that will be not related to cluster and cluster indexes still need to be optimized by hand.

But after refactor of optimize code for auto-optimize we will plan cluster wide optimize too.

barryhunter · July 12, 2021, 11:33am

Ah ok, didnt think of ‘replication clusters’, that might well be different. Don’t use them really, so not experienced with them.

Sorry for confusion.