I’m working on a php-powered site which uses plain indexes for a product catalog. They want to migrate the site to realtime indexes to avoid having to constantly reindex all the time to keep the catalog up-to-date. Part of migrating the site to realtime indexes involves figuring out why the catalog queries are returning incomplete data. The old plain indexes are now hit via a MySQLi connection and it works fine. However, when I have the catalog use realtime data, things get pretty weird when using match queries.
I’m using match queries for 2 operations:
find departments based on their tree structure path so we can include child departments,
and 2. word search. Duh. lol
Example match query: SELECT department_path, COUNT(DISTINCT id) count FROM rt_catalog WHERE MATCH('(@department_path *Path3_*)') GROUP BY department LIMIT 50
Example data for pathes:
Path1_
Path2_
Path3_
Path3_4_
Path3_5_
Path6_
Example results from plain index:
Path3_
Path3_4_
Path3_5_
Example results from realtime index:
Path3_
As you can see, the child departments are missing when querying for partial matches.
The word search is also affected by this issue, though I am debugging the issue using the @department_path operator.
Both indexes are set up using the same wordforms file and same conf file settings. Has anyone else had this issue before? Can you share a known issue or fix please? I’m currently on Manticore v5 so a little behind on updates but not extremely… and I doubt this issue is caused by that. I’m not using the PHP API because its incompatible with their server. I rather keep everything in the same SQL syntax style, anyway, because that’s way easier to maintain. One thing I noticed that’s missing when I hit Manticore with SQL queries is facet results, though. If I use the facet queries in CLI, they work, so I tried match queries in cli as well… but I found these queries have the same results both in the CLI and on the php script.
SHOW META data shows the expected results of the issue I described. The docs and hits are much less on realtime index. There is one odd behavior difference I can’t make since of:
Realtime index shows keyword[0] of “path3_*”
Plain index shows keyword[0] of “path3_”
Notice the extra asterisk at the beginning on plain indexes. That’s suspicious.
I ran “call keywords” on each index for the given “*Path3_*” example and it is vastly different on plain indexes. The realtime index shows only the single “path3_” token, meaning only exact matches are found. Plain indexes show a bunch of partial matches tokenized as “*path3_*”. Is there anything that could prevent tokenization on realtime indexes?