how to merge all fields except for one?

dariusarius · August 15, 2024, 5:17pm

Hi,

I have a permanent plain index that has a sql_file_field that holds very large PDF and Word documents. The index also has a bunch of other fields. I’d like to incrementally update the permanent plain index without re-indexing the sql_file_field, for performance reasons. The PDF and Word contents never changes, but the other fields in the index do. So far I have:

indexer --config E:\Manticore\ETC\manticore.conf --merge permanent_index incremental_index --rotate

The problem is the above command overwrites the sql_file_field with the value in incremental_index; I want to update all the fields, but keep the pre-existing value of sql_file_field in the permanent index.

Any help and guidance would be greatly appreciated.

barryhunter · August 16, 2024, 2:29pm

I dont think this is going to be easy as such. Fields are decomposed into and Inverted index. Its one big index rather than per field.

Overall it may be worth considering a RT index. So can do ‘partial replaces’
https://manual.manticoresearch.com/Data_creation_and_modification/Updating_documents/REPLACE_vs_UPDATE#UPDATE-vs-partial-REPLACE

But fields need to be ‘stored’ for it work. The whole document is deleted and reinserted (using the stored data to reinsert the fields not updating!)

… so ultimately it not going to be any more ‘efficient’ that just reindexing the entire PDF documents like in merge.
The partial REPLACE would just allow to do it ‘piecemeal’ one document at a time, rather than as a big merge operation.

tomat · August 16, 2024, 2:35pm

you could use stored_only_fields for big documents for only retrieve its content back to client. And also use attributes for all data that changes then use UPDATE statement to update attributes values

dariusarius · August 16, 2024, 3:31pm

i might be worried too much about performance; I indexed an entire Staging database of documents in only 20 minutes. So, I think I should be okay with the way I have it. I noticed the newest version of Manticore beta version supports relations between indexes, so maybe in the future I can migrate to that and have a searchable index for just the PDF contents, and the main index for all the other fields.

dariusarius · August 16, 2024, 3:32pm

probably would not work for me since I need the document content to be searchable, but I think I should be okay. Thanks for the help.