I’m planning to migrate from Sphinx 2.11 to Manticore 3. Due to the change in kill-list concept, I’m trying to develop new scheme for live index update. RT indices have not been considered yet and I want to avoid raising the issue in this discussion in order to keep focus. Currently I stick to the
delta scheme and index merging. I want to tell about my vision of this scheme in the hope of getting opinions and, possibly, adjusting it.
First, I will mention facts that was verified experimentally and are relevant to this case. However, maybe I’m missing something:
- The documentation about index merging says:
“Note, however, that the “old” keywords will not be automatically removed in such cases. For example, if there’s a keyword “old” associated with document 123 in DSTINDEX, and a keyword “new” associated with it in SRCINDEX, document 123 will be found by both keywords after the merge”.
However I’m not able to reproduce this behavior. After the merge I can only find a document by new words if the keywords associated with the document have changed. Am I doing something wrong?
- SRCINDEX kill-list is applied to DSTINDEX during index merge. This can be used to exclude deleted documents instead of using the
--merge-dst-range option. But I’m not sure if there is a difference between these approaches in terms of final result. Do we get identical indices in both cases?
As for the index update scheme, we want to achieve the following:
delta index to update data and merge it to
- Exclude deleted documents using kill-list
- Be able to reindex
delta without applying kill-list. We want to apply it only at the merge stage
To achieve the described goals I came up with the option of using two configuration files that differ only in the composition of the indices:
- A configuration file that includes only the
main index. This file is used to run searchd and allows to avoid
delta index locking
- A configuration file that includes
delta indices. This file is used to run indexer. Since
delta index is not locked by searchd, we can update it without rotation (
--rotate option) and to apply kill-list only at the merge stage. We are doing merge with rotation, so searchd could pick-up the changed
I suppose this scheme is a bit tricky, but if we do not consider RT indexes, at the moment I do not see any other way to implement live index update. What do you think about this approach?
Thanks in advance.
with your schema you do not see the data at delta indexes until it got merged into main.
I also do not get why you try to avoid locking of indexes this process is transparent to user and should not interrupt searches.
I also suggest to read previous discussion on same topic here New killlist strategy in Manticore 3
Thanks for the link. Yes, I read this discussion before I wrote this post. It’s more about using RT indexes. But for now I want to explore the possibility of staying with plain indexes.
I’m starting to think I’m not getting the idea of index merging right. I thought we had a
main index that was kept up to date by periodically updating the data in it. The data is updated by reindexing the
delta index and merging it with the
main index. Only the
main index is used for the search. I mean, why search by
delta index, if the data from it already in the
main index after the merge?
Ok, assuming we are searching at both indexes, then we get the following scenario:
- we updating
delta index. We have to use
--rotate option because searchd locks the index
--rotate means that kill-list will be applied and we will suppress documents in the
main index. Suppose that the kill-list contains the IDs of all documents included in the
delta index. Everything is fine at this stage.
- further, according to the plan, we merge the
main index. From now on, when searching for both indexes, we will get duplicates of documents: we have updated documents in the
delta index and the same documents merged with the
Am I right?
It’s not about transparency, its about scheme I came up to:
- we are searching only by
main index. We are updating it by periodically merging it with the
- therefore, we do not want to suppress documents via kill-list when reindexing the
delta index otherwise they will be lost before we perform index merge. So, we need to be able to reindex the
delta index without the
--rotate option. So the
delta index should not be locked by searchd
- we are applying kill-list in the process of index merging. From now on
main index contains updated data.
Still hope to understand where I made a mistake in my reasoning
I discovered one remarkable thing. When you rotate the
main index or merge the
main index, documents with the same IDs as those in the
main index are removed from the
delta index. That changes everything!
However this happens under certain conditions:
delta index should be in
kl) mode. It makes sense.
sql_query_killlist must be present and contain at least one ID (not necessarily an existing one). This part is a little confusing.
@tomat , please comment on this
Deleting documents from the
delta index when rotating the
main index or merging with it is a planned behavior, right? Can I rely on that? I just haven’t seen any mention of it anywhere. And where does this dependence on
sql_query_killlist come from?