Rather than using indexer --print-rt, I tried to write a PHP script that would populate the initial RT index with data from a MySQL database.
I presume the way to do this is to use a mysqli_query() statement to fetch rows from the database, then use a while loop to addDocument the fetched MySQL database rows one by one, like this:
Unfortunately, the script does not complete and my web server error log shows the following error:
PHP Fatal error: Uncaught Manticoresearch\\Exceptions\\ResponseException:
"unsupported value type" in /home/user/public_html/vendor/manticoresoftware/manticoresearch-php/src/Manticoresearch/Transport/Http.php:127\nStack trace:\n#0
/home/user/public_html/vendor/manticoresoftware/manticoresearch-php/src/Manticoresearch/Client.php(357):
Manticoresearch\\Transport\\Http->execute()\n#1
/home/user/public_html/vendor/manticoresoftware/manticoresearch-php/src/Manticoresearch/Client.php(189):
Manticoresearch\\Client->request()\n#2
/home/user/public_html/vendor/manticoresoftware/manticoresearch-php/src/Manticoresearch/Index.php(87):
Manticoresearch\\Client->insert()\n#3
/home/user/public_html/populatemanticorertindex.php(92):
Manticoresearch\\Index->addDocument()\n#4 {main}\n
thrown in /home/user/public_html/vendor/manticoresoftware/manticoresearch-php/src/Manticoresearch/Transport/Http.php on line 127
where line 92 of populatemanticorertindex.php is the first line of the array assignment inside the addDocument() call.
I can’t tell exactly what causes the script to crash. I’m not sure how to determine which iteration of the while loop (i.e., which database table row) is the problematic one. What code should I be putting in as my error handling to help me debug?
Also if my script is not the right way to go about initially populating a RT index, please let me know the better way to do it.
It turned out that the first row that tried to put a NULL value into a field defined (using create()) as ['type' => 'text'] or ['type' => 'int'] would crash the whole thing.
I debugged it by putting a test conditional with a ++ incremented counter in the while loop that stops the loop when the counter goes above a chosen value. I thus implemented my own manual binary search to home in on the last working counter value, and printed out (using PHP’s var_export()) the next (i.e., the first non-working) $current_row. I could see that one of the fields in the first non-working $current_row was NULL.
So doing something like this seems to work better:
(presuming that field2 is ['type' => 'text'] and field3 is ['type' => 'int']).
I’m not sure why this breaks in the Manticore PHP client but indexing the same data doesn’t break the command line indexer. It seems to me that the Manticore PHP client should be able to handle these situations smoothly. That is, the Manticore PHP client should be able to permit for NULL database values without causing a fuss.
It may also be worth mentioning that I had to override the default maximum memory and time limit values in my script:
One issue here is that the ID that is stored in the RT table isn’t the value that I put in. It’s a 19-digit number instead. For example, if $id is 3391, it ends up being stored as 8793066877211036762. How can I make sure that I can later recover the same ID that I try to put into the table?
EDIT: Nevermind. For some reason, casting the ID to an int before putting it in the table did the trick:
What if instead of declaring the $id in PHP you retrieve it from a MySQL database? Even if it is an unsigned integer in MySQL, will it be retrieved as text instead of an integer, and could that be part of the problem?
Curiously, I noticed this discussion in which smazur reported having to cast the $id.
Sorry, I can’t explain it. My script looks like yours, except I connect to a MySQL database with mysqli, set the charset as 'utf8', query the database to snag my data (including the unsigned integer ID number), go through the snagged data line by line using a mysqli_fetch_assoc($result), in a while loop, and try to put the data and the ID into the RT index using addDocument.
Possibly this was a bug that was fixed in a newer version of the PHP client (given that it’s in the changelog of 3.0.0)?