Manticore 3, index merging and new kill-list strategy

BrokenEye · November 26, 2019, 12:43pm

Hello.

I’m planning to migrate from Sphinx 2.11 to Manticore 3. Due to the change in kill-list concept, I’m trying to develop new scheme for live index update. RT indices have not been considered yet and I want to avoid raising the issue in this discussion in order to keep focus. Currently I stick to the main + delta scheme and index merging. I want to tell about my vision of this scheme in the hope of getting opinions and, possibly, adjusting it.

First, I will mention facts that was verified experimentally and are relevant to this case. However, maybe I’m missing something:

The documentation about index merging says:
“Note, however, that the “old” keywords will not be automatically removed in such cases. For example, if there’s a keyword “old” associated with document 123 in DSTINDEX, and a keyword “new” associated with it in SRCINDEX, document 123 will be found by both keywords after the merge”.
However I’m not able to reproduce this behavior. After the merge I can only find a document by new words if the keywords associated with the document have changed. Am I doing something wrong?
SRCINDEX kill-list is applied to DSTINDEX during index merge. This can be used to exclude deleted documents instead of using the --merge-dst-range option. But I’m not sure if there is a difference between these approaches in terms of final result. Do we get identical indices in both cases?

As for the index update scheme, we want to achieve the following:

Use delta index to update data and merge it to main index
Exclude deleted documents using kill-list
Be able to reindex delta without applying kill-list. We want to apply it only at the merge stage

To achieve the described goals I came up with the option of using two configuration files that differ only in the composition of the indices:

A configuration file that includes only the main index. This file is used to run searchd and allows to avoid delta index locking
A configuration file that includes main and delta indices. This file is used to run indexer. Since delta index is not locked by searchd, we can update it without rotation (--rotate option) and to apply kill-list only at the merge stage. We are doing merge with rotation, so searchd could pick-up the changed main index

I suppose this scheme is a bit tricky, but if we do not consider RT indexes, at the moment I do not see any other way to implement live index update. What do you think about this approach?

Thanks in advance.

tomat · November 26, 2019, 12:51pm

with your schema you do not see the data at delta indexes until it got merged into main.

I also do not get why you try to avoid locking of indexes this process is transparent to user and should not interrupt searches.

I also suggest to read previous discussion on same topic here https://forum2.manticoresearch.com/t/new-killlist-strategy-in-manticore-3/338

BrokenEye · November 26, 2019, 7:07pm

Thanks for the link. Yes, I read this discussion before I wrote this post. It’s more about using RT indexes. But for now I want to explore the possibility of staying with plain indexes.

I’m starting to think I’m not getting the idea of index merging right. I thought we had a main index that was kept up to date by periodically updating the data in it. The data is updated by reindexing the delta index and merging it with the main index. Only the main index is used for the search. I mean, why search by delta index, if the data from it already in the main index after the merge?

Ok, assuming we are searching at both indexes, then we get the following scenario:

we updating delta index. We have to use --rotate option because searchd locks the index
using --rotate means that kill-list will be applied and we will suppress documents in the main index. Suppose that the kill-list contains the IDs of all documents included in the delta index. Everything is fine at this stage.
further, according to the plan, we merge the delta and main index. From now on, when searching for both indexes, we will get duplicates of documents: we have updated documents in the delta index and the same documents merged with the main index.

Am I right?

It’s not about transparency, its about scheme I came up to:

we are searching only by main index. We are updating it by periodically merging it with the delta index
therefore, we do not want to suppress documents via kill-list when reindexing the delta index otherwise they will be lost before we perform index merge. So, we need to be able to reindex the delta index without the --rotate option. So the delta index should not be locked by searchd
we are applying kill-list in the process of index merging. From now on main index contains updated data.

Still hope to understand where I made a mistake in my reasoning

BrokenEye · November 28, 2019, 2:38pm

I discovered one remarkable thing. When you rotate the main index or merge the delta and main index, documents with the same IDs as those in the main index are removed from the delta index. That changes everything!

However this happens under certain conditions:

killlist_target in delta index should be in id or both (id and kl) mode. It makes sense.
sql_query_killlist must be present and contain at least one ID (not necessarily an existing one). This part is a little confusing.

@tomat , please comment on this

Deleting documents from the delta index when rotating the main index or merging with it is a planned behavior, right? Can I rely on that? I just haven’t seen any mention of it anywhere. And where does this dependence on sql_query_killlist come from?

tomat · November 28, 2019, 7:04pm

killlist_target descritption here
https://manticoresearch.gitlab.io/dev/conf_options_reference/index_configuration_options.html?highlight=killlist_target#killlist-target

said about similar behavior as you posted

Here is also topic on kill-lists https://manticoresearch.gitlab.io/dev/getting-started/migrate_from_manticore2.html#kill-list at our documentation