Clickhouse integration

Moving our corpus to a ClickHouse datastore (currently MySQL database).

What are the options WRT FT corpus/indexing? Is there a Manticore - ClickHouse connector? Any plan?


Hello @gauvins

Our plan is to make Manticore not less performant than Clickhouse for analytical queries. Then provided Manticore can do full-text fine it should be sufficient in most cases when you need advanced full-text + basic analytics to use just Manticore than Clickhouse + something else.

Our columnar storage is a key thing for the above.

Our tests show Manticore is not bad:

We are soon releasing implicit secondary indexes, then the WHERE queries will be even faster.

Data collection: 1.7B documents from public NYC taxi and for-hire vehicle (Uber, Lyft, etc.) trip data

Disclaimer: Clickhouse and Manticore Search are used here with default settings, just like if a newcomer would use them. Our belief is that any database default settings should give sufficient performance without the need to be an expert to make queries times faster.

So if your case is a need in simple analytics + full-text search Manticore should be good to go. If your case is advanced analytics (with JOINs, complex analytical queries etc) + simple full-text search perhaps just filtering by string Clickhouse may be a compromise. In the above test it’s used internally instead of MATCH for the 2nd query like this:

SELECT * FROM taxi where match(dropoff_ntaname, '(?i)\\WHarlem\\WEast\\W') or match(pickup_ntaname, '(?i)\\WHarlem\\WEast\\W') LIMIT 20
1 Like