Geodist Against Document JSON Lat Lon Pairs

Hi,

I’m trying to select documents with multiple lat/lon pairs using geodist but whilst the documentation mentions: If the document has multiple locations, they can be embedded in a JSON object as float pairs, it doesn’t specify how this should be done.

I’ve tried both lat / lon pairs in a single json attribute [[52.673, -1.18767], [52.7208, -1.36021]

SELECT *, GEODIST(50.407505, -5.264365, latlon_json, {in=degrees, out=miles}) AS distance FROM notices ORDER BY DISTANCE ASC

and lat and lon in seperate attributes [50.407505, 50.1682] [-5.264365, -5.10308]

SELECT *, GEODIST(50.407505, -5.264365, lat_json, lon_json, {in=degrees, out=miles}) AS distance FROM notices ORDER BY DISTANCE ASC

but distance is alway returned as 3485.184326

Any assistance greatly appreciated.

If the document has multiple locations, they can be embedded in a JSON object as float pairs, it doesn’t specify how this should be done.

Good question. I’ll check with the team on this next Monday.

But I’m afraid what is meant is that if a document includes multiple locations it may be convenient to store them in a json attribute, but it doesn’t change the way you can access the coordinates, i.e. it doesn’t change the behaviour of GEODIST(), nor it does any other magic. E.g. you can just do:

mysql> drop table if exists t; create table t(f text, j json); insert into t(j) values('{"coord1":[54.833941, 83.096050],"coord2":[54.785440, 83.196717]}'); select *, geodist(54.808514, 83.134377,j.coord1[0],j.coord1[1],{in=degrees, out=km}) dist from t; select *, geodist(54.808514, 83.134377,j.coord2[0],j.coord2[1],{in=degrees, out=km}) dist from t;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t(f text, j json)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t(j) values('{"coord1":[54.833941, 83.096050],"coord2":[54.785440, 83.196717]}')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select *, geodist(54.808514, 83.134377,j.coord1[0],j.coord1[1],{in=degrees, out=km}) dist from t
--------------

+---------------------+-----------------------------------------------------------------+------+------------+
| id                  | j                                                               | f    | dist       |
+---------------------+-----------------------------------------------------------------+------+------------+
| 1514665016379310097 | {"coord1":[54.833941,83.096050],"coord2":[54.785440,83.196717]} |      | 3.75286126 |
+---------------------+-----------------------------------------------------------------+------+------------+
1 row in set (0.01 sec)

--------------
select *, geodist(54.808514, 83.134377,j.coord2[0],j.coord2[1],{in=degrees, out=km}) dist from t
--------------

+---------------------+-----------------------------------------------------------------+------+------------+
| id                  | j                                                               | f    | dist       |
+---------------------+-----------------------------------------------------------------+------+------------+
| 1514665016379310097 | {"coord1":[54.833941,83.096050],"coord2":[54.785440,83.196717]} |      | 4.76190233 |
+---------------------+-----------------------------------------------------------------+------+------------+
1 row in set (0.00 sec)

Hi Sergey,

Perfect, huge thanks for such a quick response, I have around 5 million documents with several geo coordinates each and search results are returned in around a second which is very impressive :slight_smile:

search results are returned in around a second

Try searchd.pseudo_sharding=1, chances are it can be even faster.

Down to just over half a second which is incredible thanks

Hi Sergey,

Just wondered if you could think of a better way to select the minimum distance in the following query?
MySql has a least function that takes multiple values but Manticore’s least funtion requires a json array and using multiple mins doubles the query time.

SELECT *,
GEODIST(53.14208, -2.48365, latlon_json.1[0], latlon_json.1[1], {in=degrees, out=miles}) AS d1,
GEODIST(53.14208, -2.48365, latlon_json.2[0], latlon_json.2[1], {in=degrees, out=miles}) AS d2,
GEODIST(53.14208, -2.48365, latlon_json.3[0], latlon_json.3[1], {in=degrees, out=miles}) AS d3,
GEODIST(53.14208, -2.48365, latlon_json.4[0], latlon_json.4[1], {in=degrees, out=miles}) AS d4,
MIN(d1, d2) as d_r2,
MIN(d_r2, d3) as d_r3,
MIN(d_r3, d4) as d
FROM notices WHERE d<20

Many thanks again

Richard

If the document has multiple locations, they can be embedded in a JSON object as float pairs

I’ve discussed this with the team. It means you can use just a single json attribute instead N of float attributes.

MIN/LEAST don’t support more than 2 arguments, but you can do MIN(MIN(... etc.
It shouldn’t make the query faster, but on the other hand this:

MIN(d1, d2) as d_r2,
MIN(d_r2, d3) as d_r3,
MIN(d_r3, d4) as d

shouldn’t have made it slower. At least not 2 times slower compared to ordering by just single MIN() result. Please provide full query/result examples with SHOW META after the queries.

Hi Sergey,

Re: min, you example if much neater but as you say, doesn’t change the query speed

Query 1 (with min)

SELECT id, firstname, surname, notices_latlon_json, published_online
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.1[0], notices_latlon_json.1[1], {in=degrees, out=miles}) AS d1
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.2[0], notices_latlon_json.2[1], {in=degrees, out=miles}) AS d2
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.3[0], notices_latlon_json.3[1], {in=degrees, out=miles}) AS d3
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.4[0], notices_latlon_json.4[1], {in=degrees, out=miles}) AS d4
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.5[0], notices_latlon_json.5[1], {in=degrees, out=miles}) AS d5
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.6[0], notices_latlon_json.6[1], {in=degrees, out=miles}) AS d6
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.7[0], notices_latlon_json.7[1], {in=degrees, out=miles}) AS d7
, MIN(MIN(MIN(MIN(MIN(MIN(d1, d2), d3), d4), d5), d6), d7) as d
FROM noticesdev
WHERE published_online < 1643673600 AND notices_latlon_json!=’{}’ AND d<=20
ORDER BY d ASC, published_online_bigint DESC
LIMIT 0, 500 OPTION max_matches=500;

SHOW META;

total 500
total_found 391500
time 1.125

Query 2 (without min and using first GEODIST to produce d)

SELECT id, firstname, surname, notices_latlon_json, published_online
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.1[0], notices_latlon_json.1[1], {in=degrees, out=miles}) AS d
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.2[0], notices_latlon_json.2[1], {in=degrees, out=miles}) AS d2
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.3[0], notices_latlon_json.3[1], {in=degrees, out=miles}) AS d3
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.4[0], notices_latlon_json.4[1], {in=degrees, out=miles}) AS d4
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.5[0], notices_latlon_json.5[1], {in=degrees, out=miles}) AS d5
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.6[0], notices_latlon_json.6[1], {in=degrees, out=miles}) AS d6
, GEODIST(53.82973000, -0.11277000, notices_latlon_json.7[0], notices_latlon_json.7[1], {in=degrees, out=miles}) AS d7
FROM noticesdev
WHERE published_online < 1643673600 AND notices_latlon_json!=’{}’ AND d<=20
ORDER BY d ASC, published_online_bigint DESC
LIMIT 0, 500 OPTION max_matches=500;

SHOW META;

total 500
total_found 390782
time 0.307

I wrote a simple UDF to return the min of passed args but there seems to be an issue assigning the result to a variable so I raised a bug report

Many thanks

Richard

these are different queries as

  • 1st calculate all d1 to d7 and d for every row at full-scan then uses d value for filtering
  • 2nd calculate d for every row at full-scan and uses d for filtering then at the final stage for max_matches rows it calculates d2 to d7

that is why calculate single d at the 2nd query is much faster then d1 to d7 and d at the 1st query.

You could issue these queries with

set profiling=1;
query;
show profile;

then check counters to make sure the counters are different at these queries

Ah ok, fyi

With min:
fullscan 7.896680 8 99.05

Without min
fullscan 2.176848 8 99.63

Everything else is pretty much the same and to be honest, 1 second is more than acceptable anyway :slight_smile:

I’d be interesed to know if SHOW META is connection aware, for example, if I open a connection and perform a query then show meta before closing the connection, will the meta data be for the connections previous query or for the previous query globally?

I’d also be interested in sponsoring an enhancement based on the GEODIST function to take lat, lon and single json field which contains a varying number of lat / lon pairs and returns the minimum distance found i.e.

latlon_json:
{“1”: [53.1902, -2.89164], “2”: [53.15899, -2.66857]}

GEODISTJSON(50.37, -4.14, latlon_json, {in=degrees, out=miles}) AS dist

meta is valid only in current MySQL connection, ie MYSQL_COM_QUIT or sphinxql_timeout closes connection
And every next query replace previous meta information.

it could be better to write to our https://manticoresearch.com/services for sponsoring development.

Thanks Tomat, I’ve reached out

So I can use show meta to retrieve the maximum number or records considered in a limit query rather than having to do select count(*) first to get total records.

Many thanks

R