Running 'DESCRIBE' against a remote distributed index.

Assuming a Distributed index, that only contains agent/ ‘remote’ indexes, is there a trick to get the list of fields/attributes? (like the ‘Balancer’ in the Helm chart setup!)

A normal DESCRIBE index only gives you a list of agents. If its a distributed index with ‘Local’ indexes, can of course just run DESCRIBE against one of the local indexes.

… but thats more tricky with ‘remote’ agents. Technically I guess you could parse out the address of the remote agent(s) and then reconnect directly to one. But that seems messy.
(conceptually the IP address the ‘balancer’ uses to connect ot agents might not even be routable from clients)

I guess this is probably a feature request for something like DESCRIBE index REAL. which will always return the fields/attributes of the index, even for distributed indexes. Ie distributed indexes get the definion from one of the agents - even if that is remote. Or I suppose it could ‘reuse’ the PQ syntax to get the underlying table desc pq table

This looks like a good task for Buddy - GitHub - manticoresoftware/manticoresearch-buddy: Manticore Buddy is a Manticore Search's sidecar which helps it with various tasks

And since you know PHP you can even try to implement it yourself and make a PR.

However, it requires more thinking. For example, what should be in the final description if the tables have different schemas? Should it fail then, or return multiple result sets or try to combine different schemas in a single result set or show a random one with a warning or without that?

Good point of Buddy. Alas I’ve tried to understand the code, and it’s beyond me. Dont really ‘get’ OOP code.

And yes, was just assuming the index schemas would be the same on all agents (certainly the assumption if using ‘mirror’ indexes ) - but it could be different. Honestly dont know what would happen with SELECT * FROM dist for example if the attributes differ.

Personally would say it ok, just to return a single schema at random, but ultimately suppose should be a way to specify which agent want. Something like ‘DESCRIBE REMOTE dist1 remote_1’ the last being the shard reference seen in DESCRIBE dist1

… would also have to figure out which agent(s) are currently ‘alive’, I think can be figured out from ‘show agent status’, and implement ‘failover’ - whereas searchd should have most of that information + logic already.

But suppose once implemented, the ‘proxy to agents’ logic, could be used to make BUDDY provide the missing proxy functionality eg in QSUGGEST.

Good point of Buddy. Alas I’ve tried to understand the code, and it’s beyond me. Dont really ‘get’ OOP code.

Do you think it would help if we made all the functionalities available in Buddy plugins that you could use as an example?

Personally would say it ok, just to return a single schema at random

This shouldn’t be a big deal to implement:

--------------
desc d
--------------

+-------+-------+
| Agent | Type  |
+-------+-------+
| t     | local |
| t1    | local |
+-------+-------+
2 rows in set (0.00 sec)

--------------
desc d table
--------------

+-------+--------------+----------------+-------+
| Field | Type         | Properties     | Agent |
+-------+--------------+----------------+-------+
| id    | bigint       |                | all   |
| f     | text         | indexed stored | all   |
| f2    | text         | indexed stored | t1    |
| a     | uint         |                | all   |
| c     | uint         |                | t1    |
| b     | uint, float  |                | all   |
+-------+--------------+----------------+-------+
6 rows in set (0.00 sec)

Re buddy, part the problem is I dont really know what I dont know. Which is kinda the same problem the developers wont know what isnt clear to others either.

I found a couple of examples, like

and can mostly follow Request.php. Executor, is a bit more mysterious. I can see it runs a ‘SHOW TABLES’, BUt it just seems to return that directly. Surely ‘Show Full Tables’ does more than that? get getting extra details from ‘show $index status’

ShowQueries is a perhaps a better example. Although still confusing. Seems to get JSON which it has to decode, but then returns an ‘array’ stuct.

Tried to follow what HTTPClient does. But not totally clear. I guess we to create a responce in same format as HTTP. Will need to figure out the ‘columns’ array. eg what can use in ‘type’ (string, long etc)

… I suppose what would help the most, is ‘documentation’ for the few built in commands. At the moment, I am trying to deduce what they ‘do’ from the code. But as dont understand the code, missing detail. If knew exactly what the functions where doing, eg the exact responce that ‘Show Queries’ returns, then I could compare find the code that implements each part of the process.

Task, also seems an important class, but dont know its interface.

Finally, I can’t figure how would ‘include’ the newly defined class. So the new command can be ‘found’. There is the reference in extractCommandFromRequest, but doesnt actully load any code. Looks like part of that might adding to

Would expect

to find something - ie to load all the Executor.php files.

Thank you for the feedback. I’ll discuss it with the project lead and will try to do something to make it easier to get started with Buddy.

Hi there.

Most of your points are clear and reasonable. We are still working on improving things and making it simple to contribute. We much appreciate your feedback because it will help us to move forward to better implementation that will be so easy to follow to resist to build :slight_smile:

There is a helper script to create a new command to implement, but it’s in another branch that is still WIP and not merged yet, but it’s safe to use it already right now: manticoresearch-buddy/create-command at feature/mysqldump · manticoresoftware/manticoresearch-buddy · GitHub

Just download it and put it in bin/create-command and run chmod +x bin/create-command; usage is straightforward: bin/create-command CreateDatabase, for example. It will create all boilerplate for new command execution that you need to implement and put some comments in the code. It’s much easier to start and understand things.

Also, I recommend checking this section that explains what flow of creating a new command manually – GitHub - manticoresoftware/manticoresearch-buddy: Manticore Buddy is a Manticore Search's sidecar which helps it with various tasks

Task is a particular class that helps to organize things and run the “execution” of the command in parallel in threaded runtime. That’s why there is Request to parse and prepare data for the execution and Executor that creates Task with logic to get Closure and run it in parallel later.

Executor should return a structure that should follow the standard format that you can explore as an example here – manticoresearch-buddy/Executor.php at main · manticoresoftware/manticoresearch-buddy · GitHub

For sure, it’s a matter of refactoring and making better interfaces for it to make it easy to return data.

We still do not have internal methods documentation but try to follow my steps, and if you have something that you did not get or any questions, feel free to ask. While we are working on docs and updating to pluggable design, hopefully making things much more straightforward, I’m here to help.

Thanks for pointing out the weak points again; that will help us make contributing easier and understand what we’ve missed adding. But still early, and still working on it :slight_smile: