Indexer rotate best strategy

Hello, I have a PostgreSQL database with 18 million records. These records have a field called country_iso_code that specifies the country to which the record belongs.

I am trying to use Manticore Search as a search engine, aiming to use the main-delta schema.

Although all the data is in a single table in PostgreSQL, my intention is to separate them into indices by countries, such as US_MAIN, US_DELTA as local indices to be exposed through a distributed index.

table US_MAIN
{
    type    = plain
    source  = US_MAIN
    path    = /var/lib/manticore/data/US_MAIN
}

table US_DELTA
{
    type    = plain
    source  = US_DELTA
    path    = /var/lib/manticore/data/US_DELTA
    killlist_target = US_MAIN:kl
}

table US {
{
    type    = distributed
    local   = US_MAIN, US_DELTA
}

source US_MAIN: {

   sql_query_pre = INSERT INTO index_deltabreaker (country_iso, instance, index, created_at) \
                        VALUES ('US', '1', 'main', NOW()) \
                        ON CONFLICT (country_iso, instance, index)  \
                        DO UPDATE SET created_at = EXCLUDED.created_at;

   sql_query_range = SELECT EXTRACT(EPOCH FROM (SELECT min_created_at FROM index_metadata WHERE country_iso = 'US')) AS min, EXTRACT(EPOCH FROM (SELECT created_at - INTERVAL '1 second' FROM index_deltabreaker WHERE index='main' AND country_iso = 'US' AND instance = '1')) AS max;

    sql_ranged_throttle = 200

    sql_range_step = 86400

    sql_query = SELECT ID, name, tags,  created_at,  updated_at FROM locations  WHERE country_iso = 'US' AND updated_at >= TO_TIMESTAMP($start) AND updated_at <= TO_TIMESTAMP($end) AND NOT is_deleted AND type_id > 0

    sql_query_post_index = INSERT INTO deltabreaker (country_iso, index, created_at) \
                            VALUES ('US', 'delta', (SELECT created_at FROM deltabreaker WHERE index = 'main' AND country_iso = 'US' )) \
                            ON CONFLICT (country_iso, instance, index) \
                            DO UPDATE SET created_at = EXCLUDED.created_at;
}

source US_DELTA:US_MAIN
{
    sql_query_pre =

    sql_query_range = SELECT (SELECT EXTRACT(EPOCH FROM created_at) FROM deltabreaker WHERE index='delta' AND country_iso = 'US') AS min, EXTRACT(EPOCH FROM NOW()) AS max

    sql_query_killlist = SELECT id FROM locations WHERE country_iso = 'US' AND is_deleted

}

Does anyone have any idea how to improve performance when creating the main index for the first time? It’s taking too long… I’m not sure if the sql_range_step parameter is configured correctly.

The fields I’m using for filters are indexed.

Thanks a lot to all of you,
Juan

Hi.

I would try running the sql_query_range and sql_query queries manually to ensure postgres executes them fast.

Thanks!

Another idea is to add --print-queries to the indexer command, so you can ‘watch’ the SQL queries as they being made, and see if any particular query is particully slow.

There is no real ‘right’ answer for sql_range_step - it varies a lot between databases. Sometimes lots of ‘small’ queries is quicker, sometimes fewer larger ones. Usually it’s some ‘sweatspot’ in the middle. Experiment and benchmark.

But yes, defintly contcentrate on the main sql_query - that is the one that gets run lots of times, make sure it can use database indexes.