I am trying to figure out the best way to use Manticore for my needs. I have product data that is relatively static and singular to each product, such as descriptions, brands, etc. Then I have attribute and other data that can have 0-50 values per product. When searching, I want to be looking at all this data. On top of this, my products are assigned to warehouses, and I need to be able to filter by warehouse as well as text search some data within the warehouse record (warehouse specific product data). What is the best approach for this situation without losing all the performance of Manticore by having to run multiple queries and step through and query each one? It seems like aggregating the data is the answer, but then I multiply my data by a factor of several thousand, ending up with upwards of 90 iterations of the same data, which is sometimes thousands of characters per line already.
Here is an example:
I have a product record like “wx-202”. This has various descriptive data that I have compiled from different sources. This product is set up in 50 different warehouses, and each warehouse potentially has a different vendor for this product, a different vendor part number, and other whse specific data like prices and bin locations. This product also has 35 attributes (attribute name, attribute value, attribute unit of measure). I also have 7 bullet points for this product, and a table with 4 levels of categorization for this product. I have a user from a specific warehouse or group of warehouses and they should only see products from their warehouse(s). They may also choose to filter by category, brand, or also by any particular attribute (volume, size, color, etc.).
Should I aggregate all this data into one index that has a key of product+whse (since that is the only unique identifier for any aggregated product), or should I create separate indexes for each part and relate them to each other in some manner? I imagine this is a fairly common use case for a search engine, but I can’t seem to find any answer that is reliable and can be done without either heavy processing of multiple queries or massive indexes filled with duplicated data.
Any advice here is appreciated. Thanks