Create indexes from CSVs in parallel

I want to index many millions of documents from MongoDB. My plan so far:

  1. export from MongoDB to hundreds of CSV files
  2. in parallel for each CSV file, run indexer with csvpipe source to create plain index
  3. Somehow attach all plain indexes to RT index

Any advice how to do 3 without modifying global config multiple times? Or any other way to easily merge hundreds of plain indexes created by indexers?

  1. I would not export to CSV file, but would use a mongodb csv exporter script right as a csvpipe_command in your config, just to avoid unnecessary write/read, but it should would with files as well.
  2. For attaching a plain index to a RT index there’s ATTACH INDEX - https://mnt.cr/attaching

Not sure why you want to modify your config multiple times. Just prepare your config once specifying there all your plain and RT indexes and then attach the plain ones to the RT indexes in a loop.

Also since you are going to have RT indexes in the end I assume you are going to write directly to them along with mongo (or instead of mongo). If it’s not the case (and you want to keep syncing from mongo to manticore) then just plain indexes may be ok and main+delta schema may be more beneficial. We have an interactive course about it https://play.manticoresearch.com/maindelta/