iamtgc

Implementing Zip Code Proximity Functions in MySQL and PostgreSQL

January 14th, 2009 by tgc

On most retail and social networking websites (along with many others), you’ll have the capability to search for people, businesses, store locations, etc within a given distance of your location. This can be implemented in a number of ways both mathematically and programmatically. In an attempt to reduce the amount of code I (or others) have to write, be it in PHP, Python, or any number languages that may interface with our database, I have chosen to implement these zip code proximity and distance functions as stored procedures in the database.

To start, you may already have your database with Zip Code coordinates, if you are looking for one, Team RedLine offers an excellent Zip Code Database for $5 US that can be easily imported into the database of your choice.

The table that we will be using was created as follows:
CREATE TABLE zipcodes ( zip varchar(5), lat double precision, lon double precision, city varchar(30), state varchar(30), state_abbrev varchar(2));

I use 57.2958 as a constant for 180 / π to convert between degrees and radians, beyond this I will focus on the implementation and point you to the Haversine Formula and the Law of Consines if you wish to read more on the math.

First we will examine how these functions are written for PostgreSQL.

The first function implements the Law of Consines, and allows the user to specify which measurement (miles or kilometers) to use, to determine the distance between two coordinates.
CREATE FUNCTION calculate_distance(varchar, double precision, double precision, double precision, double precision) RETURNS double precision AS $_$ DECLARE earth_radius double precision; BEGIN IF $1 = 'mi' THEN earth_radius := 3959.0; ELSIF $1 = 'km' THEN earth_radius := 6371.0; END IF; RETURN earth_radius * acos(sin($2 / 57.2958) * sin($4 / 57.2958) + cos($2/ 57.2958) * cos($4 / 57.2958) * cos(($5 / 57.2958) - ($3 / 57.2958))); END; $_$ LANGUAGE plpgsql;
The calculate_distance function can certainly be used stand alone, but in our example it is called exclusively from our zip_proximity function below.

zip_proximity takes a refcursor, a zipcode, a distance, and a distance metric (‘mi’ or ‘km’).

First we retrieve the coordinates for the “home” zipcode, and query the databases, performing calculate_distance on each entry in the database, capturing only those that fall withing the given distance. We add a field to the cursor we return, which is the distance from the home zipcode, to the zipcode which fell within our provided distance.

CREATE FUNCTION zip_proximity(refcursor, character, double precision, varchar) RETURNS refcursor AS $_$ DECLARE home_lat float; home_lon float; BEGIN SELECT lat, lon INTO home_lat, home_lon FROM zipcodes WHERE zip = $2; OPEN $1 FOR SELECT zip, lat, lon, city, state, state_abbrev, calculate_distance($4, home_lat, home_lon, lat, lon) AS distance FROM zipcodes WHERE calculate_distance($4, home_lat, home_lon, lat, lon) < $3 ORDER BY distance; RETURN $1; END; $_$ LANGUAGE plpgsql;

Now we will query all zipcodes within a 3 mile radius of Mountain View, California 94043.
Here is how we call our new function(s). Since we are using cursors, we need to be in a transaction.
testdb=# BEGIN; BEGIN testdb=# SELECT zip_proximity('zc', '94043', 3, 'mi'); zip_proximity --------------- zc (1 row) testdb=# FETCH ALL FROM zc; zip | lat | lon | city | state | state_abbrev | distance -------+----------+------------+---------------+------------+--------------+------------------- 94043 | 37.42337 | -122.07981 | MOUNTAIN VIEW | CALIFORNIA | CA | 0 94039 | 37.41884 | -122.09124 | MOUNTAIN VIEW | CALIFORNIA | CA | 0.701004112864842 94035 | 37.41753 | -122.05283 | MOUNTAIN VIEW | CALIFORNIA | CA | 1.53459076775345 94042 | 37.39314 | -122.07827 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.09052870375317 94041 | 37.38961 | -122.07715 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.33729808540454 94306 | 37.41478 | -122.12139 | PALO ALTO | CALIFORNIA | CA | 2.35776644118448 94303 | 37.44424 | -122.11736 | PALO ALTO | CALIFORNIA | CA | 2.51480871338068 (7 rows) testdb=# END; COMMIT

Now let’s take a look at the MySQL functions. The calculate_distance function is essentially identical to the PostgreSQL function above.
DELIMITER // CREATE FUNCTION calculate_distance(measurement varchar(2), base_lat double precision, base_lon double precision, lat double precision, lon double precision) RETURNS double precision BEGIN DECLARE earth_radius double precision; IF measurement = 'km' THEN SET earth_radius = 6371.0; ELSEIF measurement = 'mi' THEN SET earth_radius = 3959.0; END IF; RETURN earth_radius * ACOS(SIN(base_lat / 57.2958) * SIN(lat / 57.2958) + COS(base_lat / 57.2958) * COS(lat / 57.2958) * COS((lon / 57.2958) - (base_lon / 57.2958))); END // DELIMITER ;

This function differs slightly from it’s Postgres counterpart. Since MySQL does not currently support functions that return a cursor, we will create a procedure which will execute the same query that we would have returned in the cursor.
DELIMITER // CREATE PROCEDURE zip_proximity(zipcode varchar(5), radius double precision, measurement varchar(2)) BEGIN DECLARE base_lat double precision; DECLARE base_lon double precision; SELECT lat, lon INTO base_lat, base_lon FROM zipcodes WHERE zip = zipcode; SELECT zip, lat, lon, city, state, state_abbrev, calculate_distance(measurement, base_lat, base_lon, lat, lon) AS distance FROM zipcodes WHERE calculate_distance(measurement, base_lat, base_lon, lat, lon) < radius ORDER BY distance; END // DELIMITER ;

Now here is how we would call the procedure, again we are querying for all zip codes within a three mile radius of 94043.
mysql> call zip_proximity('94043', 3, 'mi'); +-------+----------+------------+---------------+------------+--------------+-------------------+ | zip | lat | lon | city | state | state_abbrev | distance | +-------+----------+------------+---------------+------------+--------------+-------------------+ | 94043 | 37.42337 | -122.07981 | MOUNTAIN VIEW | CALIFORNIA | CA | 0 | | 94039 | 37.41884 | -122.09124 | MOUNTAIN VIEW | CALIFORNIA | CA | 0.701004115082249 | | 94035 | 37.41753 | -122.05283 | MOUNTAIN VIEW | CALIFORNIA | CA | 1.53459076784732 | | 94042 | 37.39314 | -122.07827 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.09052870372395 | | 94041 | 37.38961 | -122.07715 | MOUNTAIN VIEW | CALIFORNIA | CA | 2.33729808557031 | | 94306 | 37.41478 | -122.12139 | PALO ALTO | CALIFORNIA | CA | 2.35776644108718 | | 94303 | 37.44424 | -122.11736 | PALO ALTO | CALIFORNIA | CA | 2.51480871282271 | +-------+----------+------------+---------------+------------+--------------+-------------------+ 7 rows in set (0.75 sec) Query OK, 0 rows affected (0.75 sec)

Feel free to leave a comment with any questions or suggestions.

Posted in MySQL, Postgres

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.