Geodist Against Document JSON Lat Lon Pairs

Richard_Drage · January 28, 2022, 7:28am

Hi,

I’m trying to select documents with multiple lat/lon pairs using geodist but whilst the documentation mentions: If the document has multiple locations, they can be embedded in a JSON object as float pairs, it doesn’t specify how this should be done.

I’ve tried both lat / lon pairs in a single json attribute [[52.673, -1.18767], [52.7208, -1.36021]

SELECT *, GEODIST(50.407505, -5.264365, latlon_json, {in=degrees, out=miles}) AS distance FROM notices ORDER BY DISTANCE ASC

and lat and lon in seperate attributes [50.407505, 50.1682] [-5.264365, -5.10308]

SELECT *, GEODIST(50.407505, -5.264365, lat_json, lon_json, {in=degrees, out=miles}) AS distance FROM notices ORDER BY DISTANCE ASC

but distance is alway returned as 3485.184326

Any assistance greatly appreciated.

Sergey · January 29, 2022, 4:21am

If the document has multiple locations, they can be embedded in a JSON object as float pairs, it doesn’t specify how this should be done.

Good question. I’ll check with the team on this next Monday.

Sergey · January 29, 2022, 4:33am

But I’m afraid what is meant is that if a document includes multiple locations it may be convenient to store them in a json attribute, but it doesn’t change the way you can access the coordinates, i.e. it doesn’t change the behaviour of GEODIST(), nor it does any other magic. E.g. you can just do:

mysql> drop table if exists t; create table t(f text, j json); insert into t(j) values('{"coord1":[54.833941, 83.096050],"coord2":[54.785440, 83.196717]}'); select *, geodist(54.808514, 83.134377,j.coord1[0],j.coord1[1],{in=degrees, out=km}) dist from t; select *, geodist(54.808514, 83.134377,j.coord2[0],j.coord2[1],{in=degrees, out=km}) dist from t;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t(f text, j json)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t(j) values('{"coord1":[54.833941, 83.096050],"coord2":[54.785440, 83.196717]}')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select *, geodist(54.808514, 83.134377,j.coord1[0],j.coord1[1],{in=degrees, out=km}) dist from t
--------------

+---------------------+-----------------------------------------------------------------+------+------------+
| id                  | j                                                               | f    | dist       |
+---------------------+-----------------------------------------------------------------+------+------------+
| 1514665016379310097 | {"coord1":[54.833941,83.096050],"coord2":[54.785440,83.196717]} |      | 3.75286126 |
+---------------------+-----------------------------------------------------------------+------+------------+
1 row in set (0.01 sec)

--------------
select *, geodist(54.808514, 83.134377,j.coord2[0],j.coord2[1],{in=degrees, out=km}) dist from t
--------------

+---------------------+-----------------------------------------------------------------+------+------------+
| id                  | j                                                               | f    | dist       |
+---------------------+-----------------------------------------------------------------+------+------------+
| 1514665016379310097 | {"coord1":[54.833941,83.096050],"coord2":[54.785440,83.196717]} |      | 4.76190233 |
+---------------------+-----------------------------------------------------------------+------+------------+
1 row in set (0.00 sec)

Richard_Drage · January 29, 2022, 7:47am

Hi Sergey,

Perfect, huge thanks for such a quick response, I have around 5 million documents with several geo coordinates each and search results are returned in around a second which is very impressive

Sergey · January 29, 2022, 9:11am

search results are returned in around a second

Try searchd.pseudo_sharding=1, chances are it can be even faster.

Richard_Drage · January 29, 2022, 5:55pm

Down to just over half a second which is incredible thanks

Richard_Drage · January 30, 2022, 11:48am

Hi Sergey,

Just wondered if you could think of a better way to select the minimum distance in the following query?
MySql has a least function that takes multiple values but Manticore’s least funtion requires a json array and using multiple mins doubles the query time.

SELECT *,
GEODIST(53.14208, -2.48365, latlon_json.1[0], latlon_json.1[1], {in=degrees, out=miles}) AS d1,
GEODIST(53.14208, -2.48365, latlon_json.2[0], latlon_json.2[1], {in=degrees, out=miles}) AS d2,
GEODIST(53.14208, -2.48365, latlon_json.3[0], latlon_json.3[1], {in=degrees, out=miles}) AS d3,
GEODIST(53.14208, -2.48365, latlon_json.4[0], latlon_json.4[1], {in=degrees, out=miles}) AS d4,
MIN(d1, d2) as d_r2,
MIN(d_r2, d3) as d_r3,
MIN(d_r3, d4) as d
FROM notices WHERE d<20

Many thanks again

Richard

Sergey · January 31, 2022, 9:11am

If the document has multiple locations, they can be embedded in a JSON object as float pairs

I’ve discussed this with the team. It means you can use just a single json attribute instead N of float attributes.

MIN/LEAST don’t support more than 2 arguments, but you can do MIN(MIN(... etc.
It shouldn’t make the query faster, but on the other hand this:

MIN(d1, d2) as d_r2,
MIN(d_r2, d3) as d_r3,
MIN(d_r3, d4) as d

shouldn’t have made it slower. At least not 2 times slower compared to ordering by just single MIN() result. Please provide full query/result examples with SHOW META after the queries.

Richard_Drage · January 31, 2022, 11:55am

Hi Sergey,

Re: min, you example if much neater but as you say, doesn’t change the query speed

Query 1 (with min)

SELECT id, firstname, surname, notices_latlon_json, published_online
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.1[0], notices_latlon_json.1[1], {in=degrees, out=miles}) AS d1
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.2[0], notices_latlon_json.2[1], {in=degrees, out=miles}) AS d2
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.3[0], notices_latlon_json.3[1], {in=degrees, out=miles}) AS d3
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.4[0], notices_latlon_json.4[1], {in=degrees, out=miles}) AS d4
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.5[0], notices_latlon_json.5[1], {in=degrees, out=miles}) AS d5
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.6[0], notices_latlon_json.6[1], {in=degrees, out=miles}) AS d6
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.7[0], notices_latlon_json.7[1], {in=degrees, out=miles}) AS d7
, MIN(MIN(MIN(MIN(MIN(MIN(d1, d2), d3), d4), d5), d6), d7) as d
FROM noticesdev
WHERE published_online < 1643673600 AND notices_latlon_json!=‘{}’ AND d<=20
ORDER BY d ASC, published_online_bigint DESC
LIMIT 0, 500 OPTION max_matches=500;

SHOW META;

total	500
total_found	391500
time	1.125

Query 2 (without min and using first GEODIST to produce d)

SELECT id, firstname, surname, notices_latlon_json, published_online
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.1[0], notices_latlon_json.1[1], {in=degrees, out=miles}) AS d
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.2[0], notices_latlon_json.2[1], {in=degrees, out=miles}) AS d2
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.3[0], notices_latlon_json.3[1], {in=degrees, out=miles}) AS d3
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.4[0], notices_latlon_json.4[1], {in=degrees, out=miles}) AS d4
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.5[0], notices_latlon_json.5[1], {in=degrees, out=miles}) AS d5
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.6[0], notices_latlon_json.6[1], {in=degrees, out=miles}) AS d6
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.7[0], notices_latlon_json.7[1], {in=degrees, out=miles}) AS d7
FROM noticesdev
WHERE published_online < 1643673600 AND notices_latlon_json!=‘{}’ AND d<=20
ORDER BY d ASC, published_online_bigint DESC
LIMIT 0, 500 OPTION max_matches=500;

SHOW META;

total	500
total_found	390782
time	0.307

I wrote a simple UDF to return the min of passed args but there seems to be an issue assigning the result to a variable so I raised a bug report

github.com/manticoresoftware/manticoresearch

UDF crash when returning value as variable 4.2.0

opened 10:35AM - 31 Jan 22 UTC

closed 12:41AM - 30 Apr 22 UTC

prowebuk

bug waiting for reply wontfix

Created a simple UDF function (below) to calculate the power of two numbers and …the follow query is successful CREATE FUNCTION udf_test RETURNS FLOAT SONAME 'udf_test.so' select udf_test(1.1, 3.1) Result: udf_test(1.1, 3.1): 1.34374654 But fails when trying to assign the result to a variable as per the example: SELECT *, AVGMVA(tag) AS q from test1; select udf_test(1.1, 3.1) AS pw; sphinxql: syntax error, unexpected $end, expecting FROM or ',' near 'pw' OR with an index select udf_test(1.1, 3.1) AS pw FROM noticesdev; 1064 - index noticesdev: parse error: Sphinx expr: syntax error, unexpected '(', expecting $end near '1.1, 3.1)' ------- FATAL: CRASH DUMP ------- [Mon Jan 31 10:33:59.325 2022] [17413] --- crashed SphinxQL request dump --- select udf_test(1.1, 3.1) AS pw FROM noticesdev --- request dump end --- --- local index:noticesdev Manticore 4.2.0 15e927b28@211223 release Handling signal 11 -------------- backtrace begins here --------------- Program compiled with Clang 12.0.1 Configured with flags: Configured with these definitions: -DDISTR_BUILD=buster -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_JEMALLOC=1 -DWITH_ZLIB=1 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore/data -DFULL_SHARE_DIR=/usr/share/manticore Host OS is Linux x86_64 Additionally udf_test.so is unloaded Environment: Manticore 4.2.0 15e927b28@211223 release Debian GNU/Linux 10 (buster) gcc version 8.3.0 UDF Source #include </usr/include/manticore/sphinxudf.h> #include <unistd.h> #include <stdio.h> #include <string.h> #include <stdlib.h> #include <math.h> int udf_test_ver() { return SPH_UDF_VERSION; } double udf_test ( SPH_UDF_INIT * init, SPH_UDF_ARGS * args, char * error_flag ) { return pow( *(float*)args->arg_values[0], *(float*)args->arg_values[1] ); } int udf_test_init ( SPH_UDF_INIT * init, SPH_UDF_ARGS * args, char * error_message ) { if ( args->arg_count != 2 ) { snprintf ( error_message, SPH_UDF_ERROR_LEN, "UDF_TEST() takes 2 arguments" ); return 1; } if ( args->arg_types[0] != SPH_UDF_TYPE_FLOAT || args->arg_types[1] != SPH_UDF_TYPE_FLOAT ) { snprintf ( error_message, SPH_UDF_ERROR_LEN, "UDF_LEAST() requires 2 float arguments" ); return 1; } return 0; }

Many thanks

Richard

tomat · January 31, 2022, 12:12pm

these are different queries as

1st calculate all d1 to d7 and d for every row at full-scan then uses d value for filtering
2nd calculate d for every row at full-scan and uses d for filtering then at the final stage for max_matches rows it calculates d2 to d7

that is why calculate single d at the 2nd query is much faster then d1 to d7 and d at the 1st query.

You could issue these queries with

set profiling=1;
query;
show profile;

then check counters to make sure the counters are different at these queries

Richard_Drage · January 31, 2022, 1:00pm

Ah ok, fyi

With min:
fullscan 7.896680 8 99.05

Without min
fullscan 2.176848 8 99.63

Everything else is pretty much the same and to be honest, 1 second is more than acceptable anyway

I’d be interesed to know if SHOW META is connection aware, for example, if I open a connection and perform a query then show meta before closing the connection, will the meta data be for the connections previous query or for the previous query globally?

Richard_Drage · January 31, 2022, 2:58pm

I’d also be interested in sponsoring an enhancement based on the GEODIST function to take lat, lon and single json field which contains a varying number of lat / lon pairs and returns the minimum distance found i.e.

latlon_json:
{“1”: [53.1902, -2.89164], “2”: [53.15899, -2.66857]}

GEODISTJSON(50.37, -4.14, latlon_json, {in=degrees, out=miles}) AS dist

tomat · January 31, 2022, 3:10pm

meta is valid only in current MySQL connection, ie MYSQL_COM_QUIT or sphinxql_timeout closes connection
And every next query replace previous meta information.

tomat · January 31, 2022, 3:13pm

it could be better to write to our Manticore team's services for sponsoring development.

Richard_Drage · February 1, 2022, 7:37am

Thanks Tomat, I’ve reached out

So I can use show meta to retrieve the maximum number or records considered in a limit query rather than having to do select count(*) first to get total records.

Many thanks

R