How can we export all documents and data from an rt index in plain mode?

regstuff · August 22, 2022, 1:50pm

I’m looking for a way to dump all the data in my index into a csv. Can anyone point me in the right direction, preferably using the python API client.

Thanks

barryhunter · August 24, 2022, 9:02am

You can export extra fields that have been ‘stored’. Either in the docstore, or string attributes.
All attributes are exportable.

fields that havent been explicitly stored can’t be exported. (during indexing the text was ‘tokenized’ and the individual words inserted into an inverted index. The raw text is discarded)

I made a PHP script to export index as a SQL script file

It can also create a .tsv file for of the data. (so could just pipe the SQL data to /dev/null if dont want it!)

It does try to recreate the schema for creating the index in RT-mode. That isn’t well tested.

A very similar node.js script

Although in concept the script is pretty simply. Just run a SELECT * FROM index and write the contents into file. That could be ceated quite easily in python.
The only complication, is dont usually do a really massive select with millions of rows, - its explicitly limited by max_matches, but even raising that is not recomemdned. Instead extract in lots of small chunks.

regstuff · August 24, 2022, 2:03pm

Thanks. Very useful