Implementing Zip Code Proximity Functions in MySQL and PostgreSQL

On most retail and social networking websites (along with many others), you’ll have the capability to search for people, businesses, store locations, etc within a given distance of your location. This can be implemented in a number of ways both mathematically and programmatically. In an attempt to reduce the amount of code I (or others) have to write, be it in PHP, Python, or any number languages that may interface with our database, I have chosen to implement these zip code proximity and distance functions as stored procedures in the database.

To start, you may already have your database with Zip Code coordinates, if you are looking for one, Team RedLine offers an excellent Zip Code Database for $5 US that can be easily imported into the database of your choice.

The table that we will be using was created as follows:

CREATE TABLE zipcodes (
    zip varchar(5),
    lat double precision,
    lon double precision,
    city varchar(30),
    state varchar(30),
    state_abbrev varchar(2));

I use 57.2958 as a constant for 180 / π to convert between degrees and radians, beyond this I will focus on the implementation and point you to the Haversine Formula and the Law of Consines if you wish to read more on the math.

First we will examine how these functions are written for PostgreSQL. (MySQL functions are further down)

The first function implements the Law of Consines, and allows the user to specify which measurement (miles or kilometers) to use, to determine the distance between two coordinates.

CREATE FUNCTION calculate_distance(varchar, double precision, double precision, double precision, double precision) RETURNS double precision
    AS $_$
   DECLARE
      earth_radius double precision;
   BEGIN
      IF $1 = 'mi' THEN
         earth_radius := 3959.0;
      ELSIF $1 = 'km' THEN
         earth_radius := 6371.0;
      END IF;
      RETURN earth_radius * acos(sin($2 / 57.2958) * sin($4 / 57.2958) + cos($2/ 57.2958) * cos($4 / 57.2958) * cos(($5 / 57.2958) - ($3 / 57.2958)));
   END;
   $_$
    LANGUAGE plpgsql;

The calculate_distance function can certainly be used stand alone, but in our example it is called exclusively from our zip_proximity function below.

zip_proximity takes a refcursor, a zipcode, a distance, and a distance metric (‘mi’ or ‘km’).
Update: Please see this article on implementing this function without using refcursor.

First we retrieve the coordinates for the “home” zipcode, and query the databases, performing calculate_distance on each entry in the database, capturing only those that fall withing the given distance. We add a field to the cursor we return, which is the distance from the home zipcode, to the zipcode which fell within our provided distance.

CREATE FUNCTION zip_proximity(refcursor, character, double precision, varchar) RETURNS refcursor
    AS $_$
   DECLARE
      home_lat float;
      home_lon float;
   BEGIN
      SELECT lat, lon INTO home_lat, home_lon FROM zipcodes WHERE zip = $2;
      OPEN $1 FOR
         SELECT zip, lat, lon, city, state, state_abbrev, calculate_distance($4, home_lat, home_lon, lat, lon) AS distance FROM zipcodes
             WHERE calculate_distance($4, home_lat, home_lon, lat, lon) < $3 ORDER BY distance;
      RETURN $1;
   END;
   $_$
    LANGUAGE plpgsql;

Now we will query all zipcodes within a 3 mile radius of Mountain View, California 94043.
Here is how we call our new function(s). Since we are using cursors, we need to be in a transaction.

testdb=# BEGIN;
BEGIN
testdb=# SELECT zip_proximity('zc', '94043', 3, 'mi');
 zip_proximity
---------------
 zc
(1 row)

testdb=# FETCH ALL FROM zc;
  zip  |   lat    |    lon     |     city      |   state    | state_abbrev |     distance
-------+----------+------------+---------------+------------+--------------+-------------------
 94043 | 37.42337 | -122.07981 | MOUNTAIN VIEW | CALIFORNIA | CA           |                 0
 94039 | 37.41884 | -122.09124 | MOUNTAIN VIEW | CALIFORNIA | CA           | 0.701004112864842
 94035 | 37.41753 | -122.05283 | MOUNTAIN VIEW | CALIFORNIA | CA           |  1.53459076775345
 94042 | 37.39314 | -122.07827 | MOUNTAIN VIEW | CALIFORNIA | CA           |  2.09052870375317
 94041 | 37.38961 | -122.07715 | MOUNTAIN VIEW | CALIFORNIA | CA           |  2.33729808540454
 94306 | 37.41478 | -122.12139 | PALO ALTO     | CALIFORNIA | CA           |  2.35776644118448
 94303 | 37.44424 | -122.11736 | PALO ALTO     | CALIFORNIA | CA           |  2.51480871338068
(7 rows)

testdb=# END;
COMMIT

Now let's take a look at the MySQL functions. The calculate_distance function is essentially identical to the PostgreSQL function above.

DELIMITER //
CREATE FUNCTION calculate_distance(measurement varchar(2), base_lat double precision, base_lon double precision, lat double precision, lon double precision) RETURNS double precision
   BEGIN
      DECLARE earth_radius double precision;
      IF measurement = 'km' THEN
         SET earth_radius = 6371.0;
      ELSEIF measurement = 'mi' THEN
         SET earth_radius = 3959.0;
      END IF;
      RETURN earth_radius * ACOS(SIN(base_lat / 57.2958) * SIN(lat / 57.2958) + COS(base_lat / 57.2958) * COS(lat / 57.2958) * COS((lon / 57.2958) - (base_lon / 57.2958)));
   END //
DELIMITER ;

This function differs slightly from it's Postgres counterpart. Since MySQL does not currently support functions that return a cursor, we will create a procedure which will execute the same query that we would have returned in the cursor.

DELIMITER //
CREATE PROCEDURE zip_proximity(zipcode varchar(5), radius double precision, measurement varchar(2))
   BEGIN
   DECLARE base_lat double precision;
   DECLARE base_lon double precision;
   SELECT lat, lon INTO base_lat, base_lon FROM zipcodes WHERE zip = zipcode;
   SELECT zip, lat, lon, city, state, state_abbrev, calculate_distance(measurement, base_lat, base_lon, lat, lon) AS distance FROM zipcodes
      WHERE calculate_distance(measurement, base_lat, base_lon, lat, lon) < radius ORDER BY distance;
   END //
DELIMITER ;

Now here is how we would call the procedure, again we are querying for all zip codes within a three mile radius of 94043.

mysql> call zip_proximity('94043', 3, 'mi');
+-------+----------+------------+---------------+------------+--------------+-------------------+
| zip   | lat      | lon        | city          | state      | state_abbrev | distance          |
+-------+----------+------------+---------------+------------+--------------+-------------------+
| 94043 | 37.42337 | -122.07981 | MOUNTAIN VIEW | CALIFORNIA | CA           |                 0 |
| 94039 | 37.41884 | -122.09124 | MOUNTAIN VIEW | CALIFORNIA | CA           | 0.701004115082249 |
| 94035 | 37.41753 | -122.05283 | MOUNTAIN VIEW | CALIFORNIA | CA           |  1.53459076784732 |
| 94042 | 37.39314 | -122.07827 | MOUNTAIN VIEW | CALIFORNIA | CA           |  2.09052870372395 |
| 94041 | 37.38961 | -122.07715 | MOUNTAIN VIEW | CALIFORNIA | CA           |  2.33729808557031 |
| 94306 | 37.41478 | -122.12139 | PALO ALTO     | CALIFORNIA | CA           |  2.35776644108718 |
| 94303 | 37.44424 | -122.11736 | PALO ALTO     | CALIFORNIA | CA           |  2.51480871282271 |
+-------+----------+------------+---------------+------------+--------------+-------------------+
7 rows in set (0.75 sec)

Query OK, 0 rows affected (0.75 sec)

Feel free to leave a comment with any questions or suggestions.

This entry was posted in MySQL, Postgres. Bookmark the permalink.

7 Responses to Implementing Zip Code Proximity Functions in MySQL and PostgreSQL

  1. Jay U says:

    Thank you so much for posting your code on this. I adapted your MySQL code for my specific needs and it works perfect. It’s been many years since I’ve done any trig so you saved me alot of time trying to figure out that formula. Very much appreciated.

  2. Pingback: Exploring Postgres Extenstions | iamtgc

  3. Jason says:

    Just implemented this on PostGres 9.1, worked first time! Thank you very much for the post.

  4. Joel says:

    Hey TGC,

    First I just want to say thanks for posting this code, worked flawlessly for me. I was doing some testing with alternate methods (in Postgres), looking for the fastest execution speed when I found this post.

    The thing that surprised me and keeps me from implementing this into our radial proximity search, is that this singular, alternative query:

    $sqlselect_in_radius_zip = pg_exec(“SELECT * FROM zips_us_tbl
    WHERE (((acos(sin(($latitude*pi()/180)) * sin((lat*pi()/180))+cos(($latitude*pi()/180)) * cos((lat*pi()/180)) * cos((($longitude-lon)*pi()/180))))*180/pi())*60*1.1515) <= $radius
    GROUP BY zip,city,state,lat,lon") or die(pg_errormessage());

    Executes 3-4 times faster for me and without the use of functions, even when I'm using a huge proximity range and pulling in thousands of results. I was hoping that by using PLPGSQL functions, I could cut query time and let Postgres do more of the "speedy" work but that just wasn't the case.

    I like the method of math you are using in your calculations and also the fact that there's an option for passing miles VS kilometers as well. Both methods seem to return very similar results and get the job done but for practical reasons, I have to opt for speed and ditch the functions.

    Just thought I'd post that info in case you found it interesting or wanted to test it out. Maybe there's a way to refine the process, pull it into one single function and get it optimized.

    Thanks again for putting the work together, very easy to implement.

  5. Pingback: Implementing Zip Code Proximity Functions in MS SQL Server | iamtgc

  6. Pingback: Returning Composite Types in Postgres Stored Procedures | iamtgc

  7. Hari K T says:

    Great …
    You want to write in the top that , the function for MYSQL is given below . I was trying to convert from Postgre to MySQL :) .
    Anyway I was forced to look and learn something in PostgreSQL :).
    Thanks once again .

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>