Implementing Zip Code Proximity Functions in MySQL and PostgreSQL
On most retail and social networking websites (along with many others), you’ll have the capability to search for people, businesses, store locations, etc within a given distance of your location. This can be implemented in a number of ways both mathematically and programmatically. In an attempt to reduce the amount of code I (or others) have to write, be it in PHP, Python, or any number languages that may interface with our database, I have chosen to implement these zip code proximity and distance functions as stored procedures in the database.
To start, you may already have your database with Zip Code coordinates, if you are looking for one, Team RedLine offers an excellent Zip Code Database for $5 US that can be easily imported into the database of your choice.
The table that we will be using was created as follows:
CREATE TABLE zipcodes (
zip varchar(5),
lat double precision,
lon double precision,
city varchar(30),
state varchar(30),
state_abbrev varchar(2));
I use 57.2958 as a constant for 180 / π to convert between degrees and radians, beyond this I will focus on the implementation and point you to the Haversine Formula and the Law of Consines if you wish to read more on the math.
First we will examine how these functions are written for PostgreSQL. (MySQL functions are further down)
The first function implements the Law of Consines, and allows the user to specify which measurement (miles or kilometers) to use, to determine the distance between two coordinates.
CREATE FUNCTION calculate_distance(varchar, double precision, double precision, double precision, double precision) RETURNS double precision
AS $_$
DECLARE
earth_radius double precision;
BEGIN
IF $1 = 'mi' THEN
earth_radius := 3959.0;
ELSIF $1 = 'km' THEN
earth_radius := 6371.0;
END IF;
RETURN earth_radius * acos(sin($2 / 57.2958) * sin($4 / 57.2958) + cos($2/ 57.2958) * cos($4 / 57.2958) * cos(($5 / 57.2958) - ($3 / 57.2958)));
END;
$_$
LANGUAGE plpgsql;
The calculate_distance function can certainly be used stand alone, but in our example it is called exclusively from our zip_proximity function below.
zip_proximity takes a refcursor, a zipcode, a distance, and a distance metric (‘mi’ or ‘km’).
First we retrieve the coordinates for the “home” zipcode, and query the databases, performing calculate_distance on each entry in the database, capturing only those that fall withing the given distance. We add a field to the cursor we return, which is the distance from the home zipcode, to the zipcode which fell within our provided distance.
CREATE FUNCTION zip_proximity(refcursor, character, double precision, varchar) RETURNS refcursor
AS $_$
DECLARE
home_lat float;
home_lon float;
BEGIN
SELECT lat, lon INTO home_lat, home_lon FROM zipcodes WHERE zip = $2;
OPEN $1 FOR
SELECT zip, lat, lon, city, state, state_abbrev, calculate_distance($4, home_lat, home_lon, lat, lon) AS distance FROM zipcodes
WHERE calculate_distance($4, home_lat, home_lon, lat, lon) < $3 ORDER BY distance;
RETURN $1;
END;
$_$
LANGUAGE plpgsql;
Now we will query all zipcodes within a 3 mile radius of Mountain View, California 94043.
Here is how we call our new function(s). Since we are using cursors, we need to be in a transaction.
testdb=# BEGIN;
BEGIN
testdb=# SELECT zip_proximity('zc', '94043', 3, 'mi');
zip_proximity
---------------
zc
(1 row)
testdb=# FETCH ALL FROM zc;
zip | lat | lon | city | state | state_abbrev | distance
-------+----------+------------+---------------+------------+--------------+-------------------
94043 | 37.42337 | -122.07981 | MOUNTAIN VIEW | CALIFORNIA | CA | 0
94039 | 37.41884 | -122.09124 | MOUNTAIN VIEW | CALIFORNIA | CA | 0.701004112864842
94035 | 37.41753 | -122.05283 | MOUNTAIN VIEW | CALIFORNIA | CA | 1.53459076775345
94042 | 37.39314 | -122.07827 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.09052870375317
94041 | 37.38961 | -122.07715 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.33729808540454
94306 | 37.41478 | -122.12139 | PALO ALTO | CALIFORNIA | CA | 2.35776644118448
94303 | 37.44424 | -122.11736 | PALO ALTO | CALIFORNIA | CA | 2.51480871338068
(7 rows)
testdb=# END;
COMMIT
Now let’s take a look at the MySQL functions. The calculate_distance function is essentially identical to the PostgreSQL function above.
DELIMITER //
CREATE FUNCTION calculate_distance(measurement varchar(2), base_lat double precision, base_lon double precision, lat double precision, lon double precision) RETURNS double precision
BEGIN
DECLARE earth_radius double precision;
IF measurement = 'km' THEN
SET earth_radius = 6371.0;
ELSEIF measurement = 'mi' THEN
SET earth_radius = 3959.0;
END IF;
RETURN earth_radius * ACOS(SIN(base_lat / 57.2958) * SIN(lat / 57.2958) + COS(base_lat / 57.2958) * COS(lat / 57.2958) * COS((lon / 57.2958) - (base_lon / 57.2958)));
END //
DELIMITER ;
This function differs slightly from it’s Postgres counterpart. Since MySQL does not currently support functions that return a cursor, we will create a procedure which will execute the same query that we would have returned in the cursor.
DELIMITER //
CREATE PROCEDURE zip_proximity(zipcode varchar(5), radius double precision, measurement varchar(2))
BEGIN
DECLARE base_lat double precision;
DECLARE base_lon double precision;
SELECT lat, lon INTO base_lat, base_lon FROM zipcodes WHERE zip = zipcode;
SELECT zip, lat, lon, city, state, state_abbrev, calculate_distance(measurement, base_lat, base_lon, lat, lon) AS distance FROM zipcodes
WHERE calculate_distance(measurement, base_lat, base_lon, lat, lon) < radius ORDER BY distance;
END //
DELIMITER ;
Now here is how we would call the procedure, again we are querying for all zip codes within a three mile radius of 94043.
mysql> call zip_proximity('94043', 3, 'mi');
+-------+----------+------------+---------------+------------+--------------+-------------------+
| zip | lat | lon | city | state | state_abbrev | distance |
+-------+----------+------------+---------------+------------+--------------+-------------------+
| 94043 | 37.42337 | -122.07981 | MOUNTAIN VIEW | CALIFORNIA | CA | 0 |
| 94039 | 37.41884 | -122.09124 | MOUNTAIN VIEW | CALIFORNIA | CA | 0.701004115082249 |
| 94035 | 37.41753 | -122.05283 | MOUNTAIN VIEW | CALIFORNIA | CA | 1.53459076784732 |
| 94042 | 37.39314 | -122.07827 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.09052870372395 |
| 94041 | 37.38961 | -122.07715 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.33729808557031 |
| 94306 | 37.41478 | -122.12139 | PALO ALTO | CALIFORNIA | CA | 2.35776644108718 |
| 94303 | 37.44424 | -122.11736 | PALO ALTO | CALIFORNIA | CA | 2.51480871282271 |
+-------+----------+------------+---------------+------------+--------------+-------------------+
7 rows in set (0.75 sec)
Query OK, 0 rows affected (0.75 sec)
Feel free to leave a comment with any questions or suggestions.
July 21st, 2010 at 7:43 am
Great …
You want to write in the top that , the function for MYSQL is given below . I was trying to convert from Postgre to MySQL :) .
Anyway I was forced to look and learn something in PostgreSQL :).
Thanks once again .