On most retail and social networking websites (along with many others), you’ll have the capability to search for people, businesses, store locations, etc within a given distance of your location. This can be implemented in a number of ways both mathematically and programmatically. In an attempt to reduce the amount of code I (or others) have to write, be it in PHP, Python, or any number languages that may interface with our database, I have chosen to implement these zip code proximity and distance functions as stored procedures in the database.
To start, you may already have your database with Zip Code coordinates, if you are looking for one, Team RedLine offers an excellent Zip Code Database for $5 US that can be easily imported into the database of your choice.
The table that we will be using was created as follows:
CREATE TABLE zipcodes (
zip varchar(5),
lat double precision,
lon double precision,
city varchar(30),
state varchar(30),
state_abbrev varchar(2));
I use 57.2958 as a constant for 180 / π to convert between degrees and radians, beyond this I will focus on the implementation and point you to the Haversine Formula and the Law of Consines if you wish to read more on the math.
First we will examine how these functions are written for PostgreSQL. (MySQL functions are further down)
The first function implements the Law of Consines, and allows the user to specify which measurement (miles or kilometers) to use, to determine the distance between two coordinates.
CREATE FUNCTION calculate_distance(varchar, double precision, double precision, double precision, double precision) RETURNS double precision
AS $_$
DECLARE
earth_radius double precision;
BEGIN
IF $1 = 'mi' THEN
earth_radius := 3959.0;
ELSIF $1 = 'km' THEN
earth_radius := 6371.0;
END IF;
RETURN earth_radius * acos(sin($2 / 57.2958) * sin($4 / 57.2958) + cos($2/ 57.2958) * cos($4 / 57.2958) * cos(($5 / 57.2958) - ($3 / 57.2958)));
END;
$_$
LANGUAGE plpgsql;
The calculate_distance function can certainly be used stand alone, but in our example it is called exclusively from our zip_proximity function below.
zip_proximity takes a refcursor, a zipcode, a distance, and a distance metric (‘mi’ or ‘km’).
Update: Please see this article on implementing this function without using refcursor.
First we retrieve the coordinates for the “home” zipcode, and query the databases, performing calculate_distance on each entry in the database, capturing only those that fall withing the given distance. We add a field to the cursor we return, which is the distance from the home zipcode, to the zipcode which fell within our provided distance.
CREATE FUNCTION zip_proximity(refcursor, character, double precision, varchar) RETURNS refcursor
AS $_$
DECLARE
home_lat float;
home_lon float;
BEGIN
SELECT lat, lon INTO home_lat, home_lon FROM zipcodes WHERE zip = $2;
OPEN $1 FOR
SELECT zip, lat, lon, city, state, state_abbrev, calculate_distance($4, home_lat, home_lon, lat, lon) AS distance FROM zipcodes
WHERE calculate_distance($4, home_lat, home_lon, lat, lon) < $3 ORDER BY distance;
RETURN $1;
END;
$_$
LANGUAGE plpgsql;
Now we will query all zipcodes within a 3 mile radius of Mountain View, California 94043.
Here is how we call our new function(s). Since we are using cursors, we need to be in a transaction.
testdb=# BEGIN;
BEGIN
testdb=# SELECT zip_proximity('zc', '94043', 3, 'mi');
zip_proximity
---------------
zc
(1 row)
testdb=# FETCH ALL FROM zc;
zip | lat | lon | city | state | state_abbrev | distance
-------+----------+------------+---------------+------------+--------------+-------------------
94043 | 37.42337 | -122.07981 | MOUNTAIN VIEW | CALIFORNIA | CA | 0
94039 | 37.41884 | -122.09124 | MOUNTAIN VIEW | CALIFORNIA | CA | 0.701004112864842
94035 | 37.41753 | -122.05283 | MOUNTAIN VIEW | CALIFORNIA | CA | 1.53459076775345
94042 | 37.39314 | -122.07827 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.09052870375317
94041 | 37.38961 | -122.07715 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.33729808540454
94306 | 37.41478 | -122.12139 | PALO ALTO | CALIFORNIA | CA | 2.35776644118448
94303 | 37.44424 | -122.11736 | PALO ALTO | CALIFORNIA | CA | 2.51480871338068
(7 rows)
testdb=# END;
COMMIT
Now let's take a look at the MySQL functions. The calculate_distance function is essentially identical to the PostgreSQL function above.
DELIMITER //
CREATE FUNCTION calculate_distance(measurement varchar(2), base_lat double precision, base_lon double precision, lat double precision, lon double precision) RETURNS double precision
BEGIN
DECLARE earth_radius double precision;
IF measurement = 'km' THEN
SET earth_radius = 6371.0;
ELSEIF measurement = 'mi' THEN
SET earth_radius = 3959.0;
END IF;
RETURN earth_radius * ACOS(SIN(base_lat / 57.2958) * SIN(lat / 57.2958) + COS(base_lat / 57.2958) * COS(lat / 57.2958) * COS((lon / 57.2958) - (base_lon / 57.2958)));
END //
DELIMITER ;
This function differs slightly from it's Postgres counterpart. Since MySQL does not currently support functions that return a cursor, we will create a procedure which will execute the same query that we would have returned in the cursor.
DELIMITER //
CREATE PROCEDURE zip_proximity(zipcode varchar(5), radius double precision, measurement varchar(2))
BEGIN
DECLARE base_lat double precision;
DECLARE base_lon double precision;
SELECT lat, lon INTO base_lat, base_lon FROM zipcodes WHERE zip = zipcode;
SELECT zip, lat, lon, city, state, state_abbrev, calculate_distance(measurement, base_lat, base_lon, lat, lon) AS distance FROM zipcodes
WHERE calculate_distance(measurement, base_lat, base_lon, lat, lon) < radius ORDER BY distance;
END //
DELIMITER ;
Now here is how we would call the procedure, again we are querying for all zip codes within a three mile radius of 94043.
mysql> call zip_proximity('94043', 3, 'mi');
+-------+----------+------------+---------------+------------+--------------+-------------------+
| zip | lat | lon | city | state | state_abbrev | distance |
+-------+----------+------------+---------------+------------+--------------+-------------------+
| 94043 | 37.42337 | -122.07981 | MOUNTAIN VIEW | CALIFORNIA | CA | 0 |
| 94039 | 37.41884 | -122.09124 | MOUNTAIN VIEW | CALIFORNIA | CA | 0.701004115082249 |
| 94035 | 37.41753 | -122.05283 | MOUNTAIN VIEW | CALIFORNIA | CA | 1.53459076784732 |
| 94042 | 37.39314 | -122.07827 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.09052870372395 |
| 94041 | 37.38961 | -122.07715 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.33729808557031 |
| 94306 | 37.41478 | -122.12139 | PALO ALTO | CALIFORNIA | CA | 2.35776644108718 |
| 94303 | 37.44424 | -122.11736 | PALO ALTO | CALIFORNIA | CA | 2.51480871282271 |
+-------+----------+------------+---------------+------------+--------------+-------------------+
7 rows in set (0.75 sec)
Query OK, 0 rows affected (0.75 sec)
Feel free to leave a comment with any questions or suggestions.
Hey TGC,
First I just want to say thanks for posting this code, worked flawlessly for me. I was doing some testing with alternate methods (in Postgres), looking for the fastest execution speed when I found this post.
The thing that surprised me and keeps me from implementing this into our radial proximity search, is that this singular, alternative query:
$sqlselect_in_radius_zip = pg_exec(“SELECT * FROM zips_us_tbl
WHERE (((acos(sin(($latitude*pi()/180)) * sin((lat*pi()/180))+cos(($latitude*pi()/180)) * cos((lat*pi()/180)) * cos((($longitude-lon)*pi()/180))))*180/pi())*60*1.1515) <= $radius
GROUP BY zip,city,state,lat,lon") or die(pg_errormessage());
Executes 3-4 times faster for me and without the use of functions, even when I'm using a huge proximity range and pulling in thousands of results. I was hoping that by using PLPGSQL functions, I could cut query time and let Postgres do more of the "speedy" work but that just wasn't the case.
I like the method of math you are using in your calculations and also the fact that there's an option for passing miles VS kilometers as well. Both methods seem to return very similar results and get the job done but for practical reasons, I have to opt for speed and ditch the functions.
Just thought I'd post that info in case you found it interesting or wanted to test it out. Maybe there's a way to refine the process, pull it into one single function and get it optimized.
Thanks again for putting the work together, very easy to implement.
Pingback: Implementing Zip Code Proximity Functions in MS SQL Server | iamtgc
Pingback: Returning Composite Types in Postgres Stored Procedures | iamtgc
Great …
You want to write in the top that , the function for MYSQL is given below . I was trying to convert from Postgre to MySQL :) .
Anyway I was forced to look and learn something in PostgreSQL :).
Thanks once again .