Tips and discussion: using /dev/shm ramdisk for delta indexes

From what I know and searched using ramdisk for Sphinx/Manticore search won’t improve performance much because linux is efficient in caching disk and also SSDs are quite fast now.

However, when creating a large delta for ease of use, we noticed a 1 minute delta of 700 MBytes each minute would amount to nearly 1TB of writes per day.

So we decided to created a directory in /dev/shm and use it for only delta indexes while keeping main index on SSD.
We have no problems using this method and currently can recommend this method to decrease write level load on SSDs.

Hope this helps others.
If you have an opinion of this, feel free to give.

Thanks for sharing your experience. Have you tried RT indexes instead of the main+delta schema? Have you tried main+delta1+…+deltaN schema? I think in your case it may be beneficial, e.g.:

  • rebuild last day delta once per hour
  • rebuild last 12 hours delta once each 30 minutes
  • rebuild last 4 hours delta once each 15 minutes
  • rebuild last hour delta each minute

or something like that, the number of deltas and the periods can be calculated based on the number of incoming documents and what indexation latency you can afford.

Yeah, we considered multiple deltas until we noticed making the whole delta takes about 30 secs for about 500~700 MB, so we can do it in every 1 minute all the time, while keeping the delta accurate.
If it takes longer we were already considering splitting deltas.
Since we are not using RT indexes yet, the above approach seems simple and keeps all the delta data 100% accurate.

We are considering RT indexes in the future but have not gotten around to testing.
We just recently optimized our old indexes.

We have no experience in RT indexes.
By just changing our config to support RT indexes, do you think we could use RT indexes as a drop in replacement(while slowly modifying app to support RT index changes)?

I think so as an RT index is actually just a wrapper for a set of plain indexes which automates maintenance of index partitions (main, delta etc. can be considered partitions too), but works the same way in terms of all data structures and algorithms.

I’ve also experiemented with putting indexes on Ramdisk (partly for same reason, writes on SSD degrade its lifespan)

It worked fine. Particully for indexes that will be regually rebuilt, (ie rebuilt on boot anyway)

Its not terribly efficent memory wise, as manticore lods much of the index into memory anyway, and as far as could see the ramdisk was cached in the OS level ‘page cache’ anyway, perhaps even by manticore mmapping too.
… so the index is multiple times in memory.

… in general using a RT index (even just for the delta, with a RAM chunk big enough that entirely in memory) is more efficent, but can be more work to setup if ‘used’ to workin with indexer.

… In theory can pipe indexer --dump-rows (or perhaps --print-rt) to searchd to (re)build a delta index (into a RAM chunk!) - but never tried in practice.
Would give you the ‘small’ index held entirely in RAM, once. And it can be rebuilt easily (although would need to contrive the ‘atomic’ switch, traditionally get with ‘–rotate’.