PostgreSQL
Comparing PostGIS and PostgreSQL's earthdistance
When dealing with geolocations in PostgreSQL, there are a number of extensions to choose from. In this post we're going to take a closer look at two of those extensions, Postgres's earthdistance and the third party PostGIS.
To illustrate how the different geolocation modules and methods work, I'm going to use an addresses table as an example to work with.
create table addresses (
id serial primary key,
name varchar,
latitude float8,
longitude float8
);
The table will have a few rows so we have some data to manipulate.
insert into addresses (name, longitude, latitude)
values ('Hashrocket JAX', '-81.3927381' ,'30.2918842'),
('Hashrocket Chicago', '-87.6473133', '41.8853881'),
('Satchel''s Pizza', '-82.3018702', '29.6739466'),
('V Pizza', '-81.3905175', '30.2938423'),
('Artichoke Pizza', '-73.9860525', '40.7321652'),
('Giordano''s', '-87.6252984', '41.8850284');
earthdistance
The earthdistance module is an officially supported extension for Postgres. It offers two different ways for dealing with geolocations, either through points or cubes.
The earthdistance module can be installed using the "create extension" command. It's worth noting that earthdistance depends on the cube extension, so you'll need to install both extensions, regardless of whether you're choosing to work with point or cubes for your geolocation data.
create extension if not exists cube;
create extension if not exists earthdistance;
A general downside of the earthdistance module is that during calculations, it assumes that the earth is perfectly spherical. This means that it doesn't have the highest level of accuracy.
earthdistance with points
When using earthdistance with point calculations, only one operator is provided to you, <@>
.
The <@>
operator expects two different points in the form "point(longitude, latitude)" to perform calculations. The final result is a floating point number representing the distance in statute miles between the two points.
Usage (converting the statute miles to meters):
select name,
(
point(a.longitude, a.latitude)<@>point(hr_jax.longitude, hr_jax.latitude)
) * 1609.344 as distance
from addresses a,
lateral (
select id, latitude, longitude from addresses where name = 'Hashrocket JAX'
) as hr_jax
where a.id <> hr_jax.id
order by distance;
For more info on lateral joins, check out our related PG Cast!
This query returns:
name | distance
--------------------+------------------
V Pizza | 304.733565043637
Satchel's Pizza | 111302.276328273
Artichoke Pizza | 1339314.19457982
Giordano's | 1404466.30300668
Hashrocket Chicago | 1405284.85759896
(5 rows)
- Pros:
- officially supported by Postgres
- can be used with standard Postgres columns (either a
point
or two numeric columns cast to a point during the calculation, as in the example above) - lowest complexity (only one function available)
- fully compatible with Heroku
- Cons:
- assumes the earth is perfectly spherical
- has edge cases (calculations break down around the 180th meridian and the poles)
- units cannot be changed in the calculations; if you need something other than statute miles, you will need to convert it every time
earthdistance with cubes
The earthdistance module when used with cube calculations is much more flexible than points.
A major difference with the cube calculations in earthdistance is that you can override the earth()
function, which returns the assumed radius of the earth in meters. By overriding the function, you can change the units for the module to whatever best suits your needs.
To do cube-based distance calculations in earthdistance, you would use the earth_distance()
function. This function expects to receive two "earth" type coordinates to compare. The earth type is a cube coordinate representing latitude, longitude, and distance from the center of the earth. To convert latitude and longitude coordinates to an earth type coordinate, we can use the ll_to_earth()
function provided by the earthdistance module (we don't have to figure out the distance from the center of the earth ourselves).
Usage:
select name, earth_distance(
ll_to_earth(a.latitude, a.longitude),
ll_to_earth(hr_jax.latitude, hr_jax.longitude)
) as distance
from addresses a,
lateral (
select id, latitude, longitude from addresses where name = 'Hashrocket JAX'
) as hr_jax
where a.id <> hr_jax.id
order by distance;
This query returns:
name | distance
--------------------+------------------
V Pizza | 305.077048249532
Satchel's Pizza | 111427.731700124
Artichoke Pizza | 1340823.81474079
Giordano's | 1406049.35995852
Hashrocket Chicago | 1406868.83719205
(5 rows)
- Pros:
- officially supported by Postgres
- can be used with standard Postgres columns
- no edge cases
- fully compatible with Heroku
- Cons:
- assumes the earth is perfectly spherical
PostGIS
PostGIS is a third party module maintained by a project steering committee. It is not officially supported by Postgres. That being said, it is recommended by Postgres if your application requires a very high level of accuracy for your distance calculations.
The PostGIS module can be installed with the "create extension" command:
create extension if not exists postgis;
When added to your database, the PostGIS module adds a new table and four related views.
List of relations
Schema | Name | Type | Owner
--------+-------------------+----------+---------
public | addresses | table | marylee
public | addresses_id_seq | sequence | marylee
public | geography_columns | view | marylee
public | geometry_columns | view | marylee
public | raster_columns | view | marylee
public | raster_overviews | view | marylee
public | spatial_ref_sys | table | marylee
(7 rows)
The table, called spatial_ref_sys
, contains thousands of spatial reference system definitions, and the information for how to transform between them. While robust, the definitions are not fully comprehensive. As such, it is possible to define your own should you need them.
Three of the views, geography_columns
, geometry_columns
, and raster_columns
, are catalogs of all the columns for each respective column type (geography, geometry, and raster), along with their table names and other metadata. These views exist for performance reasons, aggregating metadata surrounding the existing spatial geometry columns in the database so that any necessary projection, processing or rendering can be performed without needing to inspect each geometry.
The final view, raster_overviews
, gathers all the raster columns and provides information about the raster resolutions and generally contain a low resolution representation of the raster, allowing for faster (albeit less accurate) computations.
When using PostGIS for geolocations in Postgres, it's recommended to use the geography column type, even though it's less performant than the geometry column type, and has fewer functions available. The geography column type doesn’t require us to have knowledge of projections and planar coordinate systems, making it much more straightforward for us to use.
PostGIS geolocations have a concept called SRID, or spatial resource identifier. If the SRID option is omitted during geography calculations, PostGIS will default to a value of 4326, which is the SRID for WGS 84, or the World Geodetic System of 1984, and the standard for the Global Positioning System.
To do geographic distance calculations in PostGIS, you can use the ST_Distance()
function.
Usage:
select name,
ST_Distance(
ST_MakePoint(a.longitude, a.latitude)::geography,
ST_MakePoint(hr_jax.longitude, hr_jax.latitude)::geography
) as distance
from addresses a,
lateral (
select id, latitude, longitude from addresses where name = 'Hashrocket JAX'
) as hr_jax
where a.id <> hr_jax.id
order by distance;
This query returns:
name | distance
--------------------+------------------
V Pizza | 304.55736704
Satchel's Pizza | 111306.96677308
Artichoke Pizza | 1337861.30762176
Giordano's | 1402495.27866689
Hashrocket Chicago | 1403318.44471188
(5 rows)
- Pros:
- no edge cases
- highest accuracy
- ability to change the spatial reference identifier, or SRID
- Cons:
- only has beta support on Heroku
- highest complexity, including non-SQL-standard column types and potentially some understanding of planar coordinate systems
- requires the use of PostGIS provided columns (or requires you to cast the data type to a PostGIS spatial type, as shown above)
- if you want to have different units of measurement returned, you'll need to change your projection (use the non-default SRID), or convert yourself
- comes with a 5000+ row table and three views that you will likely not be using if your use case is geolocations within the default SRID
Conclusion
Since the earthdistance point calculations have edge cases, I would consider it a nonviable option for any production code base. It may be a fun thing to use for a personal project, but given that the other two options discussed in this post don't have edge cases, it doesn't make sense to use the point calculations. That leaves us with either PostGIS or earthdistance cubes.
Here's a quick query to gather together the distances between our addresses using all three of our discussed options:
select name,
ST_Distance(
ST_MakePoint(a.longitude, a.latitude)::geography,
ST_MakePoint(hr_jax.longitude, hr_jax.latitude)::geography
) as postgis_distance,
earth_distance(
ll_to_earth(a.latitude, a.longitude),
ll_to_earth(hr_jax.latitude, hr_jax.longitude)
) as cube_distance,
(
point(a.longitude, a.latitude)<@>point(hr_jax.longitude, hr_jax.latitude)
) * 1609.344 as point_distance
from addresses a,
lateral (
select id, latitude, longitude from addresses where name = 'Hashrocket JAX'
) as hr_jax
where a.id <> hr_jax.id
order by point_distance;
The result:
name | postgis_distance | cube_distance | point_distance
--------------------+------------------+------------------+------------------
V Pizza | 304.55736704 | 305.077048249532 | 304.733565043637
Satchel's Pizza | 111306.96677308 | 111427.731700124 | 111302.276328273
Artichoke Pizza | 1337861.30762176 | 1340823.81474079 | 1339314.19457982
Giordano's | 1402495.27866689 | 1406049.35995852 | 1404466.30300668
Hashrocket Chicago | 1403318.44471188 | 1406868.83719205 | 1405284.85759896
(5 rows)
Looking at the results from our query, we can see that our discrepancy is negligibly small when comparing PostGIS and earthdistance cubes. It's just a 3 kilometer difference on a distance of 1,400 kilometers, and less than a full meter on a distance of 300 meters. With that in mind, deciding between the two would fall entirely to use case.
Are you looking to get a reasonably accurate idea of a distance between two locations? The earthdistance module is your friend.
Do you need extreme accuracy in your measurements? Best use PostGIS.
Do you want to be able to define your spatial resource identifiers? Go for PostGIS.
Hosting on Heroku? You may want to stick with earthdistance (at least while PostGIS is still in beta).