"Personalized" facets - intersecting matches with lists of document IDs

csy · April 27, 2018, 6:57pm

Short version: when doing a faceted search, I would also like to facet by lists of document IDs to get a count of how many of the specified documents are among the matches

I’ve been making use of FACET to build facets for a user facing search and it works great.

I have many “personalized” facets that are based on user-specific data that is not present in the search index - things like “liked”, “purchased”, “recently viewed”, etc etc.

Here are a few ways I have experimented with getting counts for these facets. Is there something that would work better?

Use a single search and add facets clause that look like: FACET IN(id, list-of-document-ids) AS liked This was relatively slow.
Do two searches: the main search and then one extra search that also filters by the union of all of the document IDs that appear and a FACET IN(id…) clause for each personalized facet. I don’t know much about Manticore internals, but using document ID as a filter speeds things up a lot.
For every personalized facet, fire off a new search that is a copy of the first with a list of document IDs as an additional filter and max_matches set to 1 , so that I get a count of the documents that match the search.
Similar to above, but use a multi query batch. This didn’t work in previous versions of Sphinx because adding a different filter (a list of document IDs) would break the multi query batch but I think it might be fine in Manticore?

adrian · April 27, 2018, 7:49pm

Use a single search and add facets clause that look like: FACET IN(id, list-of-document-ids) AS liked This was relatively slow.

How many document ids you had in a list? Was it slow because of this facet? Do the optimization is on ( check ‘multiplier’ in SHOW META output) ?

csy · April 27, 2018, 8:45pm

100-1000 document IDs. I do see the multiplier.

I’m testing with the mysql client trying to get some more specifics but it looks like the time in SHOW META is not correct when facets are involved? Is that right?

Looking at it more, I think that it’s full scans without MATCH() that are throwing me off. Those will (as expected) be faster if I write my search as a filter on document_id instead of a facet. So in most cases, I should be using one query with FACET IN(id, list-of-document-ids)

i guess I don’t have a question at the moment