I have have some basic UDFs that keep track of results to help create some various shuffling etc.
Someone has just reported the longer work with pseudo_sharding (enabled by default in recent versions). Manticore crashes.
The plugins work by setting up a std::map in ‘udf_init’ and deallocating it in ‘udf_deinit’.
Very little detail on how pseudo_sharding works, but assuming it doing some sort of multi-threading.
Is there any special considerations UDFs need to take in light of new pseudo_sharding ? Could it be that _init and _deinit are not getting called exactly once per query?
The actual update the code taht runs, does a single ++var so in theory shouldbe atomic?
Is there any special considerations UDFs need to take in light of new pseudo_sharding ? Could it be that _init and _deinit are not getting called exactly once per query?
The actual update the code taht runs, does a single ++var so in theory shouldbe atomic?
I’ve asked the author of the pseudo sharding to comment on this.
No, you don’t need to change your code to work with pseudo_sharding. _init and _deinit are still called once per query, but multiple queries are run instead of just one. So each query has its unique func_data.
Ok thanks. That breaks the logical function of the UDFs, the point of the UDF is to ‘count’ results in the query, so intended to only have one per query.
… but that would just miscount. Doesn’t explain why it crashing thou.
Seems we will need to get a minimal reproducible example, to debug this.
Actually you can think of pseudo_sharding as running a query against an rt index that has several identical disk chunks. They work basically the same. When disk chunks are queried (in an rt index), each thread creates a clone of all expressions (including UDFs).
However, after having a look at the code, it seems that UDFs are a special case. When UDF expressions are cloned, __init is not called (and __deinit is called for each cloned copy).
As __init is not called for each thread, no unique func_data is provided for each thread, so the udf function can really be called from several threads.