I’m looking for a way to dump all the data in my index into a csv. Can anyone point me in the right direction, preferably using the python API client.
Thanks
I’m looking for a way to dump all the data in my index into a csv. Can anyone point me in the right direction, preferably using the python API client.
Thanks
You can export extra field
s that have been ‘stored’. Either in the docstore
, or string attribute
s.
All attribute
s are exportable.
field
s that havent been explicitly stored can’t be exported. (during indexing the text was ‘tokenized’ and the individual words inserted into an inverted index. The raw text is discarded)
I made a PHP script to export index as a SQL script file
It can also create a .tsv file for of the data. (so could just pipe the SQL data to /dev/null if dont want it!)
It does try to recreate the schema for creating the index in RT-mode. That isn’t well tested.
A very similar node.js script
Although in concept the script is pretty simply. Just run a SELECT * FROM index
and write the contents into file. That could be ceated quite easily in python.
The only complication, is dont usually do a really massive select with millions of rows, - its explicitly limited by max_matches
, but even raising that is not recomemdned. Instead extract in lots of small chunks.
Thanks. Very useful