sql_attr_multi problem

Trying to index catalog I get this error:

indexing table 'catalog'...
ERROR: table 'catalog': (null).

The problem are the sql_attr_multi lines. If I remove them from all three sources, it indexes fine. It also works, leaving it in if I only use one of three sources in the index block. But once two or three sources are used, it fails. Here is the config file. I should also say that this worked in Sphinx 3.0.1. In case it matters, I am using MariaDB 11.8. I have spent a whole day trying different things. Nothing works. I hope somebody can help me.

searchd {
    listen = 127.0.0.1:9312
    listen = 127.0.0.1:9306:mysql
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    query_log = /var/log/manticore/query.log
    pid_file = /run/manticore/searchd.pid
    #data_dir = /var/lib/manticore/data
}

indexer {
    mem_limit = 2047M
}

source base {
	type			= mysql

	sql_host		= 127.0.0.1
	sql_user		= lybrary_db
	sql_pass		= xxx
	sql_db			= db_lybrary

	
	sql_query_pre		= SET group_concat_max_len = 200000
	sql_query_pre		= SET CHARACTER_SET_RESULTS=utf8mb4
	sql_query_pre		= SET NAMES utf8mb4
}

source catalog: base {
	sql_query		= \
		SELECT id*3, price, price_old, tax_class, date_added, status, image, title, description, description_short, \
		star_rating_html, publisher, isbn, author_id, author_name, multi_author_ids, arg_count, category_id, reviewtext \
		FROM sphinx_search_catalog where status!='0'

	sql_attr_float		= price
	sql_attr_float		= price_old
	sql_attr_uint		= tax_class
	sql_attr_uint		= date_added
	sql_attr_uint		= status
	sql_attr_string		= image
	sql_attr_string		= description_short
	sql_attr_string		= star_rating_html
	sql_attr_uint		= author_id
	sql_attr_string		= multi_author_ids
	sql_attr_uint		= arg_count
	sql_attr_uint		= category_id
	
	sql_attr_multi		= uint cats from query; SELECT products_id*3 as id, categories_id FROM products_to_categories
		
	sql_field_string	= description
	sql_field_string	= title
	sql_field_string	= author_name
	sql_field_string	= isbn
	sql_field_string	= reviewtext
}

source author: base {	
	sql_query		= \
		SELECT manufacturers_id*3+1, 0 as price, 0 as price_old, 0 as tax_class, 0 as date_added, 0 as status, manufacturers_image as image,\
		'' as title, manufacturers_bio as description, '' as description_short, '' as star_rating_html, '' as publisher,\
		'' as isbn, manufacturers_id as author_id, manufacturers_name as author_name, '' as multi_author_ids, 0 as arg_count, 0 as category_id, '' as reviewtext \
		FROM manufacturers where multi_author_ids = '' and active_products > 0
	
	sql_attr_float		= price
	sql_attr_float		= price_old
	sql_attr_uint		= tax_class
	sql_attr_uint		= date_added
	sql_attr_uint		= status
	sql_attr_string		= image
	sql_attr_string		= description_short
	sql_attr_string		= star_rating_html
	sql_attr_uint		= author_id
	sql_attr_string		= multi_author_ids
	sql_attr_uint		= arg_count
	sql_attr_uint		= category_id
	
	#sql_attr_multi		= uint cats from query; SELECT 0, 0
	sql_attr_multi = uint cats from query; SELECT manufacturers_id*3+1 as id, 0 FROM manufacturers WHERE 1=0
	
	sql_field_string	= description
	sql_field_string	= title
	sql_field_string	= author_name
	sql_field_string	= isbn
	sql_field_string	= reviewtext
}

source topic: base {
	sql_query		= \
		SELECT cd.categories_id*3+2, 0 as price, 0 as price_old, 0 as tax_class, 0 as date_added, 0 as status, '' as image, cd.categories_name as title,\
		cd.categories_description as description, '' as description_short, '' as star_rating_html, '' as publisher, '' as isbn, 0 as author_id, \
		'' as author_name, cd.categories_path as multi_author_ids, 0 as arg_count, cd.categories_id as category_id, '' as reviewtext \
		FROM categories_description cd left join categories c on (cd.categories_id=c.categories_id) where c.product_count > 0
	
	sql_attr_float		= price
	sql_attr_float		= price_old
	sql_attr_uint		= tax_class
	sql_attr_uint		= date_added
	sql_attr_uint		= status
	sql_attr_string		= image
	sql_attr_string		= description_short
	sql_attr_string		= star_rating_html
	sql_attr_uint		= author_id
	sql_attr_string		= multi_author_ids
	sql_attr_uint		= arg_count
	sql_attr_uint		= category_id
	
	sql_attr_multi		= uint cats from query; SELECT categories_id*3+2 as id, categories_id FROM categories WHERE product_count > 0
	
	sql_field_string	= description
	sql_field_string	= title
	sql_field_string	= author_name
	sql_field_string	= isbn
	sql_field_string	= reviewtext
}

source mkb {
	type			= mysql

	sql_host		= 127.0.0.1
	sql_user		= lybrary_db
	sql_pass		= xxx
	sql_db			= db_digishelf
	
	sql_query_pre		= SET CHARACTER_SET_RESULTS=utf8mb4
	sql_query_pre		= SET NAMES utf8mb4

	sql_query		= \
		SELECT id, text, products_id, page, publication_year \
		FROM sphinx_search_mkb
	
	sql_field_string	= text
	sql_attr_uint		= products_id
	sql_attr_uint		= page
	sql_attr_uint		= publication_year
}

index catalog {
    source = catalog
    source = author
    source = topic
    path = /var/lib/manticore/catalog
    min_infix_len = 3
    html_strip = 1
    index_sp = 1
    expand_keywords = 1
	
    ignore_chars = U+00AD #soft hyphen to visually break up long words but keep them together as one for search
    wordforms = /var/lib/manticore/wordforms.txt
    exceptions = /var/lib/manticore/exceptions.txt
    blend_chars = -,! #U+21 is exclamation mark; U+3F is question mark
    charset_table = non_cjk
}

index mkb : catalog {
    source = mkb
    path = /var/lib/manticore/mkb
    html_strip = 0
    min_infix_len = 3 #2 does not work properly for exact phrase searches
}

Could you please share your DB dump or (preferably) a minimal reproducible example, so it’s easier to reproduce and debug the issue?

I am sorry, I can’t share the database dump. Creating a smaller version may be possible, but this looks very much like a bug to me because the queries are individually working, only when combining sources they fail.

I have now changed the config file to use

sql_attr_multi = uint cats from field

and changed the sql_query such that with LEFT JOIN, GROUP BY, and GROUP_CONCAT a new field cats is created for ‘catalog’. The other two do not need GROUP_CONCAT since there is either just one category or none. With the “from field” version indexing works. The SQL queries now are:

For ‘catalog’

sql_query               = \
                SELECT id*3, price, price_old, tax_class, date_added, status, image, title, description, description_short, \
                star_rating_html, publisher, isbn, author_id, author_name, multi_author_ids, arg_count, category_id, reviewtext, GROUP_CONCAT(ptc.categories_id) as cats \
                FROM sphinx_search_catalog LEFT JOIN products_to_categories ptc ON ptc.products_id = id where status!='0' GROUP BY id

For ‘author’

sql_query               = \
                SELECT manufacturers_id*3+1, 0 as price, 0 as price_old, 0 as tax_class, 0 as date_added, 0 as status, manufacturers_image as image,\
                '' as title, manufacturers_bio as description, '' as description_short, '' as star_rating_html, '' as publisher,\
                '' as isbn, manufacturers_id as author_id, manufacturers_name as author_name, '' as multi_author_ids, 0 as arg_count, 0 as category_id, '' as reviewtext, 0 as cats \
                FROM manufacturers where multi_author_ids = '' and active_products > 0

And for ‘topic’:

sql_query               = \
                SELECT cd.categories_id*3+2, 0 as price, 0 as price_old, 0 as tax_class, 0 as date_added, 0 as status, '' as image, cd.categories_name as title,\
                cd.categories_description as description, '' as description_short, '' as star_rating_html, '' as publisher, '' as isbn, 0 as author_id, \
                '' as author_name, cd.categories_path as multi_author_ids, 0 as arg_count, cd.categories_id as category_id, '' as reviewtext, cd.categories_id as cats \
                FROM categories_description cd left join categories c on (cd.categories_id=c.categories_id) where c.product_count > 0

Related PR fix: update source schema before query MVA collection by sanikolaev · Pull Request #4600 · manticoresoftware/manticoresearch · GitHub

@lybrary, the issue is fixed in the PR. While it’s being reviewed, you can check if the indexer in the Docker image ghcr.io/manticoresoftware/manticoresearch:test-kit-fix_sql-attr-multi-3141 solves your issue.

@Sergey, thank you! I will be able to test it in a few days. A general question. Is there an advantage to using either field or query for a multiple attribute? Perhaps there are instances where one or the other can’t be done, but if both are possible, is there an advantage to doing one or the other?