### ### Planet PostGIS

Welcome to Planet PostGIS

March 02, 2023

Crunchy Data

Geocoding with Web APIs in Postgres

Geocoding is the process of taking addresses or location information and getting the coordinates for that location. Anytime you route a new location or look up a zip code, the back end is geocoding the location and then using the geocode inside other PostGIS functions to give you the routes, locations, and other data you asked for.

PostGIS comes equipped with an easy way to use the US Census data with the Tiger geocoder. Using the Tiger geocoder requires downloading large amounts of census data and in space-limited databases, this may not be ideal. Using a geocoding web API service can be a space saving solution in these cases.

I am going to show you how to set up a really quick function using plpython3u to hit a web service geocoder every time that we get a new row in the database.

Installing plpython3u

The plpython3u extension comes with Crunchy Bridge or you can add it to your database. To get started run the following:

CREATE EXTENSION  plpython3u;

Creating a function to geocode addresses

In this example, I'll use the US census geocoding API as our web service, and build a function to geocode addresses based on that.

The function puts together parameters to hit the census geocoding API and then parses the resulting object, and returns a geometry:

CREATE OR REPLACE FUNCTION geocode(address text)
RETURNS geometry
AS $$
	import requests
	try:
		payload = {'address' : address , 'benchmark' : 2020, 'format' : 'json'}
		base_geocode = 'https://geocoding.geo.census.gov/geocoder/locations/onelineaddress'
		r = requests.get(base_geocode, params = payload)
		coords = r.json()['result']['addressMatches'][0]['coordinates']
		lon = coords['x']
		lat = coords['y']
		geom = f'SRID=4326;POINT({lon} {lat})'
	except Exception as e:
		plpy.notice(f'address failed: {address}')
		plpy.notice(f'error: {e.message}')
		geom = None
	return geom
$$
LANGUAGE 'plpython3u';

Using this function to geocode Crunchy Data's headquarters:

SELECT ST_AsText(geocode('162 Seven Farms Drive Charleston, SC 29492'));

Deploying this function for new data

But what if we want to automatically run this every time an address is inserted into a table? Let's say we have a table with a field ID, an address, and a point that we want to auto-populate on inserts.

CREATE TABLE addresses (
	fid SERIAL PRIMARY KEY,
	address VARCHAR,
	geom GEOMETRY(POINT, 4326)
);

We can make use of a Postgres trigger to add the geocode before every insert! Triggers are a very powerful way to leverage built in functions to automatically transform your data as it enters the database, and this particular case is a great demo for them!

CREATE OR REPLACE FUNCTION add_geocode()
RETURNS trigger AS
$$
DECLARE
    loc geometry;
BEGIN
    loc := geocode(NEW.address);
    NEW.geom = loc;
    RETURN NEW;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER update_geocode BEFORE INSERT ON addresses
    FOR EACH ROW EXECUTE FUNCTION add_geocode();

Now when running an insert, the value is automatically geocoded!

INSERT INTO addresses(address) VALUES ('415 Mission St, San Francisco, CA 94105');

postgres=# SELECT fid, address, ST_AsText(geom) FROM addresses;
 fid |                 address                 |                        geom
-----+-----------------------------------------+----------------------------------------------------
   1 | 415 Mission St, San Francisco, CA 94105 | 0101000020E610000097CD0E2B66995EC0BB004B2729E54240

Summary

If you’re space limited, using a web API based geocoder might be the way to go. Using a geocoder function with triggers on new row inserts will get you geocoded addresses in a snap.

by Jacob Coblentz (Jacob.Coblentz@crunchydata.com) at March 02, 2023 04:00 PM

February 23, 2023

Crunchy Data

Postgres Raster Query Basics

In geospatial terminology, a "raster" is a cover of an area divided into a uniform gridding, with one or more values assigned to each grid cell.

A "raster" in which the values are associated with red, green and blue bands might be a visual image. The rasters that come off the Landsat 7 earth observation satellite have eight bands: red, green, blue, near infrared, shortwave infrared, thermal, mid-infrared and panchromatic.

Database Rasters

Working with raster data via SQL is a little counter-intuitive: rasters don't neatly fit the relational model the way vector geometries do. A table of parcels where one column is the geometry and the others are the owner name, address, and tax roll id makes sense. How should a raster fit into a table? As a row for every pixel? For every scan row? What other values should be associated with each row?

There is no clean relationship between "real world objects" and the database representation of a raster, because a raster has nothing to say about objects, it is just a collection of measurements.

We can squeeze rasters into the database, but doing so makes working with the data more complex. Before loading data, we need to enable PostGIS and the raster module.

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;

Loading Rasters

For this example, we will load raster data for a "digital elevation model" (DEM), a raster with just one band, the elevation at each pixel.

Using the SRTM Tile Grabber I downloaded one tile of old SRTM data. Then using the gdalinfo utility, read out the metadata about the file.

wget http://srtm.csi.cgiar.org/wp-content/uploads/files/srtm_5x5/TIFF/srtm_12_03.zip
unzip srtm_12_03.zip
gdalinfo srtm_12_03.tif

The metadata tells me two useful things for loading the data:

  • The coordinate system of the data is WGS 84.
  • The pixel size is 3 arc-seconds.
  • The pixel type is Int16, so two bytes per pixel.

Knowing that, I can build a raster2pgsql call to load the data into a raster table.

raster2pgsql \
    -I \                 # create a spatial index on the column
    -s 4326 \            # use 4326 (WGS 84) as the spatial reference for the raster
    -t 32x32 \           # tile the raster into 32 by 32 pixel tiles
    srtm_12_03.tif | \   # name of the raster
    psql dem             # target database connection

Once loaded the raster table looks like this on a map.

And it looks like this in the database.

Table "public.srtm_12_03"

 Column |  Type
--------+---------
 rid    | integer
 rast   | raster
Indexes:
    "srtm_12_03_pkey" PRIMARY KEY, btree (rid)
    "srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))

It's a pretty boring table! Just a bunch of binary raster tiles and a unique key for each.

-- 29768 rows
SELECT Count(*) FROM srtm_12_03;

Those binary raster tiles aren't just opaque blobs though, we can look inside them with the right functions. Here we get a summary of all the raster tiles in the table.

SELECT (ST_SummaryStatsAgg(rast, 1, true)).* FROM srtm_12_03;
count  | 28966088
sum    | 20431360140
mean   | 705.3544869434907
stddev | 561.252765463607
min    | -291
max    | 4371

Tiles, Tiles, Tiles

Remember when we loaded the data with raster2pgsql we specified a "tile size" of 32 by 32 pixels? This has a number of implications.

  • First, at 32x32, a tile of 2-byte Int16 data like our DEM will take up 2048 bytes. This is small enough to fit in the database page size, which means the data will not end up stored in a side table by the TOAST subsystem that handles large row values.
  • Second, our input file had 6000x6000 pixels, which is enough pixels to generate 35156 32x32 tiles. Our table only has 29768 rows, because raster2pgsql does not generate tiles when the contents are all "no data" pixels (as the DEM data is over the ocean).

The loaded data looks like this.

Notice how small each tile is. As a general rule, when working with raster data queries,

  • the first step will be to efficiently find the relevant tile(s);
  • the second step will be to use the tiles to find the answer you want.

Finding tiles efficiently means using spatial index, and the spatial index definition as we saw above is this:

"srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))

This is a functional index, which means in order to access it, we need to copy the functional part: st_convechull(rast) when forming our query.

The ST_ConvexHull(raster) converts a raster tile into a polygon defining the boundary of the tile. When querying raster tables, you will use this function a great deal to convert rasters into polygons suitable for querying a spatial index.

Point Query

The simplest raster query is to take a point, and find the value of the raster under that point.

Here is a point table with one point in it:

CREATE TABLE mappoint AS
SELECT ST_Point(-123.7273, 47.8467, 4326)::geometry(Point, 4326) AS geom,
       1 AS fid;

The nice thing about points is that they only hit one tile at a time. So we don't have to think too hard about what to do with our tile sets.

SELECT ST_Value(srtm.rast, pt.geom)
FROM srtm_12_03 srtm
JOIN mappoint pt
  ON ST_Intersects(pt.geom, ST_ConvexHull(srtm.rast))
 st_value
----------
     1627

Here we use the ST_ConvexHull(raster) function to get access to our spatial index on the raster table, and the ST_Intersects(geom, geom) function to test the condition.

The ST_Value(raster, geom) function reads the pixel value from the raster at the location of the point.

Polygon Query

Summarizing rasters under polygons is more involved than reading point values, because polygons will frequently overlap multiple tiles, so you have to think in terms of "sets of raster tiles" instead of "the raster" when building your query.

  • Then we will summarize the clipped tiles, to get the average elevation in our polygon.

The final complete query looks like this.

WITH clipped_tiles AS (
  SELECT ST_Clip(srtm.rast, ply.geom) AS rast, srtm.rid
  FROM srtm_12_03 srtm
  JOIN mappoly ply
    ON ST_Intersects(ply.geom, ST_ConvexHull(srtm.rast))
)
SELECT (ST_SummaryStatsAgg(rast, 1, true)).*
  FROM clipped_tiles;
count  | 362369
sum    | 388175193
mean   | 1071.2152336430545
stddev | 441.7982032761408
min    | 108
max    | 2374

Conclusion

Working with database rasters analytically can be challenging, particularly if you are used to thinking about them as single, unitary coverages. Remember to apply the basic rules of database rasters:

  • Find the chips you need to answer your query.
  • Make sure to use ST_ConvexHull(raster) to drive a spatial index filter.
  • Assemble the chips as needed to answer the query (you might need to use the ST_Union(raster) aggregate).
  • Carry out your actual raster query.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at February 23, 2023 04:00 PM

February 10, 2023

Crunchy Data

Temporal Filtering in pg_featureserv with CQL

In a previous post we announced the CQL filtering capability in pg_featureserv. It provides powerful functionality for attribute and spatial querying of data in PostgreSQL and PostGIS.

Another important datatype which is often present in datasets is temporal. Temporal datasets contain attributes which are dates or timestamps. The CQL standard defines some special-purpose syntax to support temporal filtering. This allows pg_featureserv to take advantage of the extensive capabilities of PostgreSQL for specifying queries against time-valued attributes. This post in the CQL series will show some examples of temporal filtering in pg_featureserv.

CQL Temporal filters

Temporal filtering in CQL is provided using temporal literals and conditions.

Temporal literal values may be dates or timestamps:

2001-01-01
2010-04-23T01:23:45

Note: The temporal literal syntax is based on an early version of the OGC API Filter and CQL standard. The current draft CQL standard has a different syntax: DATE('1969-07-20') and TIMESTAMP('1969-07-20T20:17:40Z'). It also supports intervals: INTERVAL('1969-07-16', '1969-07-24'). A subsequent version of pg_featureserv will support this syntax as well.

Temporal conditions allow time-valued properties and literals to be compared via the standard boolean cmparison operators <,>,<=,>=,=,<> and the BETWEEN..AND operator:

start_date >= 2001-01-01
event_time BETWEEN 2010-04-22T06:00 AND 2010-04-23T12:00

The draft CQL standard provides dedicated temporal operators, such as T_AFTER, T_BEFORE, T_DURING, etc. A future version of pg_featureserv will likely provide these operators.

Publishing Historical Tropical Storm tracks

We'll demonstrate temporal filters using a dataset with a strong time linkage: tracks of tropical storms (or hurricanes). There is a dataset of Historical Tropical Storm Tracks available here.

The data requires some preparation. It is stored as a set of records of line segments representing 6-hour long sections of storm tracks. To provide simpler querying we will model the data using a single record for each storm, with a line geometry showing the entire track and attributes for the start and end time for the track.

The data is provided in Shapefile format. As expected for a worldwide dataset, it is in the WGS84 geodetic coordinate system (lat/long). In PostGIS this common Spatial Reference System is assigned an identifier (SRID) of 4326.

The PostGIS shp2pgsql utility can be used to load the dataset into a spatial table called trop_storm_raw. The trop_storm_raw table is a temporary staging table allowing the raw data to be loaded and made available for the transformation phase of data preparation.

shp2pgsql -c -D -s 4326 -i -I -W LATIN1 "Historical Tropical Storm Tracks.shp" public.trop_storm_raw | psql -d database

The options used are:

  • -c - create a new table
  • -D - use PostgreSQL dump format to load the data
  • -s - specify the SRID of 4326
  • -i - use 32-bit integers
  • -I - create a GIST index on the geometry column (this is not strictly necessary, since this is just a temporary staging table)
  • -W - specifies the encoding of the input attribute data in the DBF file

Next, create the table having the desired data model:

CREATE TABLE public.trop_storm (
    btid int PRIMARY KEY,
    name text,
    wind_kts numeric,
    pressure float8,
    basin text,
    time_start timestamp,
    time_end timestamp,
    geom geometry(MultiLineString, 4326)
);

It's good practice to add comments to the table and columns. These will be displayed in the pg_featureserv Web UI.

COMMENT ON TABLE public.trop_storm IS 'This is my spatial table';
COMMENT ON COLUMN public.trop_storm.geom IS 'Storm track LineString';
COMMENT ON COLUMN public.trop_storm.name IS 'Name assigned to storm';
COMMENT ON COLUMN public.trop_storm.btid IS 'Id of storm';
COMMENT ON COLUMN public.trop_storm.wind_kts IS 'Maximum wind speed in knots';
COMMENT ON COLUMN public.trop_storm.pressure IS 'Minumum pressure in in millibars';
COMMENT ON COLUMN public.trop_storm.basin IS 'Basin in which storm occured';
COMMENT ON COLUMN public.trop_storm.time_start IS 'Timestamp of storm start';
COMMENT ON COLUMN public.trop_storm.time_end IS 'Timestamp of storm end';

Now the power of SQL can be used to transform the raw data into the simpler data model. The track sections can be combined into single tracks with a start and end time using the following query.

  • The original data represents the track sections as MultiLineStrings with single elements. The element is extracted using ST_GeometryN so that the result of aggregating them using ST_Collect is a MultiLineString, not a GeometryCollection. (An alternative is to aggregate into a GeometryCollection and use ST_CollectionHomogenize to reduce it to a MultiLineString.)
  • The final ST_Multi ensures that all tracks are stored as MultiLineStrings, as required by the type constraint on the geom column.
  • the filter condition time_end - time_start < '1 year'::interval removes tracks spanning the International Date Line.
WITH data AS (
 SELECT btid, name, wind_kts, pressure, basin, geom,
    make_date(year::int, month::int, day::int) + ad_time::time AS obs_time
 FROM trop_storm_raw ORDER BY obs_time
),
tracks AS (
  SELECT btid,
    MAX(name) AS name,
    MAX(wind_kts) AS wind_kts,
    MAX(pressure) AS pressure,
    MAX(basin) AS basin,
    MIN(obs_time) AS time_start,
    MAX(obs_time) AS time_end,
    ST_Multi( ST_LineMerge( ST_Collect( ST_GeometryN(geom, 1)))) AS geom
  FROM data GROUP BY btid
)
INSERT INTO trop_storm
SELECT * FROM tracks WHERE time_end - time_start < '1 year'::interval;

This is a small dataset, and pg_featureserv does not require one, but as per best practice we can create a spatial index on the geometry column:

CREATE INDEX trop_storm_gix ON public.trop_storm USING GIST ( geom );

Once the trop_storm table is created and populated, it can be published in pg_featureserv. Issuing the following request in a browser shows the feature collection in the Web UI:

http://localhost:9000/collections.html

tropstorm

http://localhost:9000/collections/public.trop_storm.html

tropstorm

The dataset can be viewed using pg_featureserv's built-in map viewer (note that to see all 567 records displayed it is probably necessary to increase the limit on the number of response features):

http://localhost:9000/collections/public.trop_storm/items.html?limit=1000

publictropstorm

Querying by Time Range

That's a lot of storm tracks. It would be easier to visualize a smaller number of tracks. A natural way to subset the data is by querying over a time range. Let's retrieve the storms between the start of 2005 and the end of 2009. This is done by adding a filter parameter with a CQL expression against the dataset temporal property time_start (storms typically do not span the start of years). To query values lying between a range of times it is convenient to use the BETWEEN operator. The filter condition is time_start BETWEEN 2005-01-01 AND 2009-12-31. The full request is:

http://localhost:9000/collections/public.trop_storm/items.html?filter=time_start BETWEEN 2005-01-01 AND 2009-12-31&limit=100

Submitting this query produces a result with 68 tracks:

tropstormtracks

Querying by Time and Space

Temporal conditions can be combined with other kinds of filters. For instance, we can execute a spatio-temporal query by using a temporal condition along with a spatial condition. In this example, we query the storms which occurred in 2005 and after in Florida. The temporal condition is expressed as time_start > 2005-01-01.

The spatial condition uses the INTERSECTS predicate to test whether the line geometry of a storm track intersects a polygon representing the (simplified) coastline of Florida. The polygon is provided as a geometry literal using WKT. (For more information about spatial filtering with CQL in pg_featureserv see this blog post.)

POLYGON ((-81.4067 30.8422, -79.6862 25.3781, -81.1609 24.7731, -83.9591 30.0292, -85.2258 29.6511, -87.5892 29.9914, -87.5514 31.0123, -81.4067 30.8422))

polygon

Putting these conditions together in a boolean expression using AND, the request to retrieve the desired tracks from pg_featureserv is:

http://localhost:9000/collections/public.trop_storm/items.html?filter=time_start > 2005-01-01 AND INTERSECTS(geom, POLYGON ((-81.4067 30.8422, -79.6862 25.3781, -81.1609 24.7731, -83.9591 30.0292, -85.2258 29.6511, -87.5892 29.9914, -87.5514 31.0123, -81.4067 30.8422)) )&limit=100

This query produces a result with only 9 tracks, all of which cross Florida:

temporalfiltering

Try it yourself!

CQL temporal filtering is included in the forthcoming pg_featureserv Version 1.3. But you can try it out now by downloading the latest build. Let us know what use cases you find for CQL temporal filtering! Crunchy Data offers full managed PostGIS in the Cloud, with Container apps to run pg_featureserv. Try it today.

by Martin Davis (Martin.Davis@crunchydata.com) at February 10, 2023 03:00 PM

February 03, 2023

PostGIS Development

Fixing a broken postgis raster install

As of PostGIS 3.0, the PostGIS raster support is no longer part of the postgis extension, but instead spun off into a PostGIS extension called postgis_raster.

Upgrading raster from 2.* to 3.* covers the proper way of upgrading your install from PostGIS 2+ to PostGIS 3+.

Unfortunately many people in a heat of panic when seeing the notice "UNPACKAGED" in their check of

SELECT postgis_full_extension()

tried to do something like drop raster type with cascade. Dropping the raster type DOES NOT remove all the raster functions, but DOES destroy all your raster data if you had any. Do not do that. But if you did happen to do that, this solution is for you.

The only solution after breaking your raster install is to remove the remaining bits of the postgis raster functionality. At that point, you can reinstall using CREATE EXENSION postgis_raster, or not bother if you don't need raster support.

by Regina Obe at February 03, 2023 12:00 AM

February 02, 2023

Paul Ramsey

When Proj Grid-Shifts Disappear

Last week a user noted on the postgis-users list (paraphrase):

I upgraded from PostGIS 2.5 to 3.3 and now the results of my coordinate transforms are wrong. There is a vertical shift between the systems I’m using, but my vertical coordinates are unchanged.

Hmmm.

PostGIS gets all its coordinate reprojection smarts from the proj library. The user’s query looked like this:

SELECT ST_AsText(
    ST_Transform('SRID=7405;POINT(545068 258591 8.51)'::geometry, 
    4979
    ));

“We just use proj” is a lot less certain and stable an assertion than it appears on the surface. In fact, PostGIS “just uses proj” for proj versions from 4.9 all the way up to 9.2, and there has been a lot of change to the proj library over that sweep of releases.

  • The API radically changed around proj version 6, and that required a major rework of how PostGIS called the library.
  • The way proj dereferenced spatial reference identifiers into reprojection algorithms changed around then too (it got much slower) which required more changes in how we interacted with the library.
  • More recently the way proj handled “transformation grids” changed, which was the source of the complaint.

Exploring Proj

The first thing to do in debugging this “PostGIS problem” was to establish if it was in fact a PostGIS problem, or a problem in proj. There are commandline tools in proj to query what pipelines the system will use for a transform, and what the effect on coordinates will be, so I can take PostGIS right out of the picture.

We can run the query on the commandline:

echo 545068 258591 8.51 | cs2cs 'EPSG:7405' 'EPSG:4979'

Which returns:

52d12'23.241"N  0d7'17.603"E 8.510

So directly using proj we are seeing the same problem as in PostGIS SQL: no change in the vertical dimension, it goes in at 8.51 and comes out at 8.51. So the problem is not PostGIS, is is proj.

Transformation Grids

Cartographic transformations are nice deterministic functions, they take in a longitude and latitude and spit out an X and a Y.

(x,y) = f(theta, phi)
(theta, phi) = finv(x, y)

But not all transformations are cartographic transformations, some are transformation between geographic reference systems. And many of those are lumpy and kind of random.

For example, the North American 1927 Datum (NAD27) was built from classic survey techniques, starting from the “middle” (Kansas) and working outwards, chain by chain, sighting by sighting. The North American 1983 Datum (NAD83) was built with the assistance of the first GPS units. The accumulated errors of survey over distance are not deterministic, they are kind of lumpy and arbitrary. So the transformation from NAD27 to NAD83 is also kind of lumpy and arbitrary.

How do you represent lumpy and arbitrary transformations? With a grid! The grid says “if your observation falls in this cell, adjust it this much, in this direction”.

For the NAD27->NAD83 conversion, the NADCON grids have been around (and continuously improved) for a generation.

Here’s a picture of the horizontal deviations in the NADCON grid.

Transformations between vertical systems also frequently require a grid.

So what does this have to do with our bug? Well, the way proj gets its grids changed in version 7.

Grid History

Proj grids have always been a bit of an outlier. They are much larger than just the source code is. They are localized in interest (someone in New Zealand probably doesn’t need European grids), not everyone needs all the grids. So historically they were distributed in zip files separately from the code.

This is all well and good, but software packagers wanted to provide a good “works right at install” experience to their end users, so they bundled up the grids into the proj packages.

As more and more people consumed proj via packages and software installers, the fact that the grids were “separate” from proj became invisible to the end users: they just download software and it works.

This was fine while the collection of grids was a manageable size. But it is not manageable any more.

In working through the GDALBarn project to improve proj, Even Roualt decided to find all the grids that various agencies had released for various places. It turns out, there are a lot more grids than proj previously bundled. Gigabytes more.

Grid CDN

Simply distributing the whole collection of grids as a default with proj was not going to work anymore.

So for proj 7, Even proposed moving to a download-on-demand model for proj grids. If a transformation request requires a grid, proj will attempt to download the necessary grid from the internet, and save it in a local cache.

Now everyone can get the very best possible tranformation between system, everywhere on the globe, as long as they are connected to the internet.

Turn It On!

Except… the network grid feature is not turned on by default! So for versions of proj higher than 7, the software ships with no grids, and the software won’t check for grids on the network… until you turn on the feature!

There are three ways to turn it on, I’m going to focus on the PROJ_NETWORK environment variable because it’s easy to toggle. Let’s look at the proj transformation pipeline from our original bug.

projinfo -s EPSG:7405 -t EPSG:4979

The projinfo utility reads out all the possible transformation pipelines, in order of desirability (accuracy) and shows what each step is. Here’s the most desireable pipeline for our transform.

+proj=pipeline
  +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
        +y_0=-100000 +ellps=airy
  +step +proj=hgridshift +grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif
  +step +proj=vgridshift +grids=uk_os_OSGM15_GB.tif +multiplier=1
  +step +proj=unitconvert +xy_in=rad +xy_out=deg
  +step +proj=axisswap +order=2,1

This transform actually uses two grids! A horizontal and a vertical shift. Let’s run the shift with the network explicitly turned off.

echo 545068 258591 8.51 | PROJ_NETWORK=OFF cs2cs 'EPSG:7405' 'EPSG:4979'

52d12'23.241"N  0d7'17.603"E 8.510

Same as before, and the elevation value is unchanged. Now run with PROJ_NETWORK=ON.

echo 545068 258591 8.51 | PROJ_NETWORK=ON cs2cs 'EPSG:7405' 'EPSG:4979'

52d12'23.288"N  0d7'17.705"E 54.462

Note that the horizontal and vertical results are different with the network, because we now have access to both grids, via the CDN.

No Internet?

If you have no internet, how do you do grid shifted transforms? Well, much like in the old days of proj, you have to manually grab the grids you need. Fortunately there is a utility for that now that makes it very easy: projsync.

You can just download all the files:

projsync --all

Or you can download a subset for your area of concern:

projsync --bbox 2,49,2,49

If you don’t want to turn on network access via the environment variable, you can hunt down the proj.ini file and flip the network = on variable.

February 02, 2023 08:00 AM

January 12, 2023

Crunchy Data

Fun with Letters in PostGIS 3.3!

Working at Crunchy Data on the spatial team, I'm always looking for new features and fun things to show on live demos. I recently started playing around with ST_Letters and wanted to jot down some quick code samples for playing around with this feature, introduced in PostGIS 3.3. These examples are super easy to use, they don't need any data!

The screenshots shown below came from pgAdmin's geometry viewer and will also work with other query GUI tools like QGIS or DBeaver.

ST_Letters

Here's a simple example to get started with ST_Letters. This will work on any Postgres database, running the PostGIS extension version 3.3+.

Select ST_Letters('PostGIS'); postgis letters

It's also possible to overlay letters on a map, just like any other polygon. Since the default for ST_Letters results in a polygon starting at the baseline at the origin of the chosen projection, with a maximum height of 100 "units" (from the bottom of the descenders to the tops of the capitals).

letters on top

That's not ideal. We need a way to both move it and resize it.

First, we want to make a point in the middle of San Francisco in order to serve as a centroid for where we want to move the letters, and we also want to rescale the letters in order to approximately fit over the City of San Francisco. Using the formula for converting units in WGS84 to meters, 0.001 works approximately well enough to fit over the San Francisco Bay Area.

Next we use ST_Translate in order to move the letters from the top of the map to fit over the Bay Area. Finally, mostly because it looks cool, we use ST_Rotate to rotate the polygon 45 degrees.

WITH
san_fran_pt AS (
  SELECT ST_Point(-122.48, 37.758, 4326) AS geom),
letters AS (
  SELECT ST_Scale(ST_SetSRID(
           ST_Letters('San Francisco'), 4326),
           0.001, 0.001) AS geom),
letters_geom AS (
    SELECT ST_Translate(
            letters.geom,
            ST_X(san_fran_pt.geom) - ST_X(ST_Centroid(letters.geom)),
            ST_Y(san_fran_pt.geom) - ST_Y(ST_Centroid(letters.geom))
        ) AS geom
    FROM letters, san_fran_pt
)
SELECT ST_Rotate(geom, -pi() / 4, ST_Centroid(geom))
FROM letters_geom;

letters on map

ST_ConcaveHull demo'd with ST_Letters

A great use case for ST_Letters is for demoing PostGIS functions. In this post, I'm going to demo the function ST_ConcaveHull, which creates a concave polygon which encloses the vertices of a target geometry. ST_ConcaveHull was recently updated in PostGIS 3.3.0, in order to use GEOS 3.11, which makes the input parameters easier to understand and results in a large speed upgrade. Here's a short demo of how different parameters of param_pctconvex and param_allow_holes for ST_ConcaveHull operate on points generated by ST_GeneratePoints and ST_Letters.

First, let's generate a table of randomly generated points that fill in the letters in 'postgis'.

CREATE TABLE public.word_pts AS
WITH word AS (
  SELECT ST_Letters('postgis') AS geom
  ),
letters AS ( -- dump letter multipolygons into individual polygons
  SELECT (ST_Dump(word.geom)).geom
  FROM word
  )
SELECT
  letters.geom AS polys,
  ST_GeneratePoints(letters.geom, 100) AS pts
FROM letters;

SELECT pts FROM word_pts.pts

word points

Then, we set the convexity to a fairly high parameter (param_pctconvex=0.75, indicating a highly convex shape), and don't allow there to be holes in the shape (param_allow_holes=false)

SELECT ST_ConcaveHull(pts, 0.75, false) FROM word_pts;

concave .75

Doesn't look much like 'postgis'!

Next, we reduce the convexity, but don't allow holes in the shape.

SELECT ST_ConcaveHull(pts, 0.5, false) FROM word_pts;

concave .5 false

A little better, but still hard to recognize 'postgis'. What if we allowed holes?

SELECT ST_ConcaveHull(pts, 0.5, true) FROM word_pts;

This starts to look a bit more like the word 'postgis', with the hole in 'p' being clear.

concave .5

As we start to make the shape more concave, it begins to take on more and more recognizable as 'postgis'....until it doesn't and starts to look closer to modern art.

SELECT ST_ConcaveHull(pts, 0.35, true) FROM word_pts;

concave .35

SELECT ST_ConcaveHull(pts, 0.05, true) FROM word_pts;

concave .05

Polygons too!

ST_ConcaveHull is also useful on multipolygons, and follows the same properties as demo'd on multipoints. It's important to note that if there are already holes in the existing multipolygon, setting param_allow_holes=false will still create convex polygons with "holes" in the middle, following the original polygon. The concave hulls will always contains the original polygons!

SELECT ST_ConcaveHull(ST_Letters('postgis'), 0.5, false);

concave .5

As the convexity decreases and holes are allowed, the shape looks more and more like the original polygons in the original table.

SELECT ST_ConcaveHull(ST_Letters('postgis'), 0.1, true);

Concave .1

ST_TriangulatePolygon

The last demo here is the function ST_TriangulatePolygon, new in PostGIS 3.3. This function computes the "best quality" triangulation of a polygon (and also works on multipolygons too!). This can be extremely useful for computing meshes of polygons in a quick and efficient manner.

SELECT ST_TriangulatePolygon(ST_Letters('postgis'));

ST_TriangulatePolygon

Summary

ST_Letters provides a useful starting point for demoing functions on points and polygons. The new improvements in ST_ConcaveHull make it more useful for generating concave hulls of geometries and they are significantly more intuitive to use. ST_TriangulatePolygon can be useful for finding meshes of polygons and multipolygons. The team at Crunchy Data will continue to make important contributions to PostGIS in order to help our users create interesting and innovative open source solutions!

by Jacob Coblentz (Jacob.Coblentz@crunchydata.com) at January 12, 2023 03:00 PM

January 06, 2023

Crunchy Data

Timezone Transformation Using Location Data & PostGIS

Imagine your system captures event data from all over the world but the data all comes in UTC time giving no information about the local timing of an event. How can we quickly convert between the UTC timestamp and local time zone using GPS location? We can quickly solve this problem using PostgreSQL and PostGIS.

This example assumes you have a Postgres database running with PostGIS. If you’re new to PostGIS, see PostGIS for Newbies.

Steps we will follow

  1. Timezone Shape file Overview: For World Timezone shape file, I have been following a really nice project by Evan Siroky, Timezone Boundary Builder. We’ll download the timezones-with-oceans.shapefile.zip from this location.
  2. Load Shape file: Using shp2pgsql, convert shape file to sql to create timezones_with_ocean table.
  3. PostgreSQL internal view pg_timezone_names: Understand pg_timezone_names view.
  4. Events table and insert sample data: Create events table and insert sample data.
  5. Transformation Query: Transform event UTC timestamp to event local timestamp.

Overview of data relationship

Below is an overview of the data relationship and join conditions we will be using.

timzone_flow.png

Timezone Shape file Overview

A “shape file” commonly refers to a collection of files with .shp, .shx, .dbf, and other extensions on a common prefix name, which in our case is combined-shapefile-with-oceans.*. **combined-shapefile-with-oceans contains polygons with the boundaries of the world's timezones. With this data we can start our process.

Load Shape file

We will be using shp2pgsql to generate sql file from shape file to create public.timezones_with_ocean and inserts data in a table. The table contains fields gid, tzid and geometry.

Export the Host, user, password variables

export PGHOST=p.<pgcluster name>.db.postgresbridge.com
export PGUSER=<DBUser>
export PGPASSWORD=<dbPassword>

Create sql file from shape file

shp2pgsql -s 4326  "combined-shapefile.shp" public.timezones_with_oceans   > timezone_shape.sql

Create public.timezones_with_ocean and load timestamp data

psql -d timestamp -f timezone_shape.sql

Query a bit of sample data

SELECT tzid, ST_AsText(geom), geom FROM public.timezone_with_oceans limit 10;

Visualize Sample data

sample data

Using PgAdmin highlight geom column and click on eye icon visualize the geometry on map showing below.

Geometry in pgAdmin

PostgreSQL internal view pg_timezone_names

PostgreSQL provides a view of pg_timezone_names with a list of time zone names recognized by SET TIMEZONE. By default, PostgresSQL also provides their associated abbreviations, UTC offsets, and daylight-savings status, which our clients need to know.

pg_timezone_names view columns description

Column Type Description
name text Time zone name
abbrev text Time zone abbreviation
utc_offset interval Offset from UTC (positive means east of Greenwich)
is_dst bool True if currently observing daylight savings

pg_timezone sample data

Sample data

Events table and insert sample data

Now that we have the timezone shape file loaded, we can create an event table, load sample transaction data, and apply a timestamp conversion transformation query.

CREATE TABLE IF NOT EXISTS public.events
(
    event_id bigint NOT NULL,
    eventdatetime timestamp without time zone NOT NULL,
	event_type varchar(25) not null,
    latitude double precision NOT NULL,
    longitude double precision NOT NULL,
    CONSTRAINT events_pkey PRIMARY KEY (event_id)
);

INSERT INTO public.events(
	event_id, eventdatetime, event_type, latitude, longitude)
	VALUES (10086492,'2021-08-17 23:17:05','Walking',34.894089,-86.51148),
(50939,'2021-08-19 10:27:12','Hiking',34.894087,-86.511484),
(10086521,'2021-09-09 19:32:37','Swiming',34.642584,-86.761291),
(22465493,'2021-09-30 11:43:34','Swiming',33.611151,-86.799522),
(22465542,'2021-11-26 22:40:44.197','Swiming',34.64259,-86.761452),
(22465494,'2021-09-30 11:43:34','Hiking',33.611151,-86.799522),
(10087348,'2021-07-01 13:42:15','Swiming',25.956098,-97.535303),
(22466679,'2021-09-01 12:25:06','Hiking',25.956112,-97.535304),
(22466685,'2021-09-02 13:41:07','Swiming',25.956102,-97.535305),
(10088223,'2021-11-29 13:19:53','Hiking',25.956097,-97.535303),
(22246192,'2021-06-16 22:21:23','Walking',37.083726,-113.577984),
(9844188,'2021-06-23 20:18:43','Swiming',37.1067,-113.561401),
(22246294,'2021-06-25 21:50:06','Walking',37.118719,-113.598038),
(22246390,'2021-07-01 18:15:54','Hiking',37.109579,-113.562923),
(9844332,'2021-07-04 19:11:13','Walking',37.251538,-113.614708),
(9845242,'2021-11-04 13:25:40.425','Swiming',37.251542,-113.614699),
(84843,'2021-11-23 14:33:20','Swiming',37.251541,-113.614698),
(22247674,'2021-12-21 14:31:15','Swiming',37.251545,-113.614691),
(22246714,'2021-08-09 14:46:51','Swiming',37.109597,-113.562912),
(9845116,'2021-10-18 14:59:51','Swiming',37.082777,-113.554991);

Sample Event Data event data

Transformation Query

Now we can convert the UTC timestamp to the local time for an event. Using PostGIS function St_Intersects, we can find the timezone_with_oceans.geom polygon in which an event point lies. This gives the name of the timezone where the event occurred. To create our transformation query:

  • First we create the location geometry using Longitude and Latitude from the events table.

  • Using PostGIS function St_Intersects, we will find common points between timezone_with_oceans.geom and an event’s location geometry giving us information on where the event occurred.

  • Join pg_timezone_names to timezone_with_oceans on name and tzid respectively, to retrieve abbrev, utc_offset, and is_dst fields from pg_timezone_names.

  • Using PostgreSQL AT TIME ZONE operator and pg_timezone_name, we convert UTC event timestamp to local event timestamp completing the process, e.g.

    timestamp '2021-07-05 00:59:12' at time zone 'America/Denver' → 2021-07-04 18:59:12+00’

Transformation Query SQL

SELECT event_id, latitude, longitude, abbrev,
       utc_offset,is_dst, eventdatetime,
       ((eventdatetime::timestamp WITH TIME ZONE AT TIME ZONE abbrev)::timestamp WITH TIME ZONE)
           AS eventdatetime_local
FROM public.events
JOIN timezone_with_oceans ON ST_Intersects(ST_Point(longitude, latitude, 4326) , geom)
JOIN pg_timezone_names ON tzid = name;

local timezone data

Closing Thoughts

PostgreSQL and PostGIS allow you to easily and dynamically solve timezone transformation. I hope this blog was helpful, and we at Crunchy Data wish you happy learning.

by Rekha Khandhadia (Rekha.Khandhadia@crunchydata.com) at January 06, 2023 03:00 PM

December 04, 2022

Boston GIS (Regina Obe, Leo Hsu)

PostGIS Day 2022 Videos are out and some more highlights

Elizabeth Chistensen already gave a succinct summary of some of the PostGIS Day 2022 presentations, which you can see here.

There were many case study presentations which involved use of PostGIS, QGIS, OpenStreetMap, and pgRouting (and other extensions that extend PostGIS) as well as many "How to" videos. There were also talks on "How PostGIS is made". I'll highlight some of these, which overlap with Elizabeth's list but different angle of view.

Continue reading "PostGIS Day 2022 Videos are out and some more highlights"

by Regina Obe (nospam@example.com) at December 04, 2022 09:27 PM

December 02, 2022

Crunchy Data

PostGIS Day 2022

Crunchy Data hosted the 4th annual PostGIS Day on November 17, 2022. PostGIS Day always comes a day after GIS Day which occurs annually on the 3rd Wednesday of November.

We had speakers from 10 different countries and attendees from more than 70 countries.

PostGIS is the most popular spatial relational database worldwide with:

  • An extensive catalog of spatial functions
  • Rich ecosystem of in-db extensions for routing, event management, external database linkage, point clouds, rasters and more
  • Numerous web frameworks for serving tiles and vector images
  • Tools for data loading and processing
  • Easy integration with QGIS, the most popular open-source mapping software

PostGIS is in our Everyday Lives

What blew me away this year is how PostGIS touches so many parts of our everyday lives. From high speed internet, to water and flood management, there’s PostGIS supporting major parts of the infrastructure we depend on every day.

Water, Soil, Floods, and Environment

We heard a talk about the GISwater project, which is open source asset management for water and wastewater tracking. Helping humanity, bringing the world water resources, using the open source tools. What could be better? This project just gives everyone all the feels.

In a similar way, my ears really perked up with the Vizzuality talk on using PostGIS for conversation efforts, tracking food supply chains, and sustainability.

We were really lucky to have two federal agencies join us as well. USDA talked about using PostGIS as the engine behind the Dynamic Soils Hub project, which tracked a number of key soil changes like erosion and farmland loss across America. A developer who works alongside the National Flood Insurance Program talked about some of his work to collect several massive open data sets and merge them into a set of building outlines for the entire country.

Telling stories about people

I also really appreciated several talks that went a little bit outside the box to talk about how PostGIS can be used in a more people focused context.  We had a nice talk about using PostGIS/QGIS to look at how you can explore infrastructure and how it impacts social inequity and health outcomes. There was also a great talk about routing people's movements outside roads, like on foot or inside buildings. Another talk went outside the box, or should I say inside the goal box, with a talk on how geospatial data is being collected and used in hockey and sports.

Developing nations

We were really fortunate to get a couple talks on how developers in Africa are using PostGIS. We had a talk on using QField, QGIS, and PostGIS for urban planning in Nigeria. A developer from Tanzania showed an isochrone map plugin he wrote for QGIS. Isochrone maps show distance from a certain point with time considered.

Canadian telecom

We had two talks from Canada, where they’re aiming to have the entire country covered by high speed internet by 2030. One of the talks described building a design system in QGIS and PostGIS and even training internal designers to use these tools. We had a second talk about the telecom industry, this one more focused on getting things up to scale for backing a really large enterprise.

Tooling Showcases

PosGIS Day also had a really broad set of talks about ecosystem tools. Seeing these all in total, you realize how big the world of PostGIS is and how many people are building large scale applications out of geospatial data.

PostGIS at scale

  • pgSTAC, an asset catalog used by the Microsoft Planetary Computer and NASA projects
  • Open Street Maps loading, tuning and tips for performance
  • dbt data loading and processing pipeline framework
  • pg_eventserv, an extension for doing real time updates and web sockets

PostGIS functions and tooling

Ecosystem projects

Thanks to everyone who participated this year! If you missed it, plan to join us next year. Mark your calendar, we’ll open a call for papers in September of 2023. Videos are linked above or available as a full playlist on Crunchy Data’s YouTube channel.

PostGIS at Crunchy Data

PostGIS is a supported extension with Crunchy Certified Postgres and is packaged with our products on traditional infrastructure, Kubernetes, and our fully managed Crunchy Bridge. Crunchy Data supports a variety of enterprise clients using PostGIS in production. If you’re using PostGIS and are looking for a host or supported distribution, contact us, we’d love to chat.

by Elizabeth Christensen (Elizabeth.Christensen@crunchydata.com) at December 02, 2022 03:00 PM

November 19, 2022

Boston GIS (Regina Obe, Leo Hsu)

Post GIS Day 2022 Celebrations

I didn't think last two years Post GIS Day conferences could be topped, but I was wrong. This year's was absolutely fabulous and better than all the others. We had two key conferences going on this year (yesterday). The first was the Cruncy Data PostGIS Day 2022 12 hour marathon of nothing but PostGIS related talks. Many thanks to Elizabeth Christensen and Paul Ramsey for putting it all together and for 12 hrs. The PostGIS day celebrations climaxed with many of us playing capture the flag on Paul's new shiny invention until we managed to crash it.

Continue reading "Post GIS Day 2022 Celebrations"

by Regina Obe (nospam@example.com) at November 19, 2022 12:29 AM

November 13, 2022

PostGIS Development

PostGIS 3.3.2, 3.2.4, 3.1.8, 3.0.8 Patch Releases

The PostGIS development team is pleased to provide security, bug fixes and performance enhancements 3.3.2, 3.2.4, 3.1.8 and 3.0.8 for the 3.3, 3.2, 3.1, and 3.0 stable branches.

by Regina Obe at November 13, 2022 12:00 AM

November 12, 2022

PostGIS Development

PostGIS 2.5.9 EOL

The PostGIS Team is pleased to release PostGIS 2.5.9! This is the end-of-life release of the 2.5 branch.

2.5.9

This release is a bug fix release, addressing issues found in the previous 2.5 releases.

by Regina Obe at November 12, 2022 10:00 AM

October 24, 2022

Crunchy Data

Moving Objects and Geofencing with Postgres & PostGIS

In a recent post, we introduced pg_eventserv and the real-time web notifications from database actions.

In this post, we will dive into a practical use case: displaying state, calculating events, and tracking historical location for a set of moving objects.

Screenshot

This demonstration uses pg_eventserv for eventing, and pg_featureserv for external web API, and OpenLayers as the map API, to build a small example application that shows off the common features of moving objects systems.

Try it out!

Features

Moving objects applications can be very complex or very simple, but they usually include a few common baseline features:

  • A real-time view of the state of the objects.
  • Live notifications when objects enter and leave a set of "geofences".
  • Querying the history of the system, to see where objects have been, and to summarize their state (eg, "truck 513 spent 50% of its time in the yard").

Tables

The model has three tables:

Tables

  • objects where the current location of the objects is stored.
  • geofences where the geofences are stored.
  • objects_history where the complete set of all object locations are stored.

Architecture (in brief)

From the outside, the system has the following architecture:

Architecture

Changes to objects are communicated in via a web API backed by pg_featureserv, those changes fires a bunch of triggers that generate events that pg_eventserv pushes out to listening clients via WebSockets.

Architecture (in detail)

  • The user interface generates object movements, via the arrow buttons for each object. This is in lieu of a real "moving object" fleet in the real world generating timestamped GPS tracks.

  • Every movement click on the UI fires a call to a web API, which is just a function published via pg_featureserv, object_move(object_id, direction).

postgisftw.object_move(object_id, direction)
CREATE OR REPLACE FUNCTION postgisftw.object_move(
    move_id integer, direction text)
RETURNS TABLE(id integer, geog geography)
AS $$
DECLARE
  xoff real = 0.0;
  yoff real = 0.0;
  step real = 2.0;
BEGIN

  yoff := CASE
    WHEN direction = 'up' THEN 1 * step
    WHEN direction = 'down' THEN -1 * step
    ELSE 0.0 END;

  xoff := CASE
    WHEN direction = 'left' THEN -1 * step
    WHEN direction = 'right' THEN 1 * step
    ELSE 0.0 END;

  RETURN QUERY UPDATE moving.objects mo
    SET geog = ST_Translate(mo.geog::geometry, xoff, yoff)::geography,
        ts = now()
    WHERE mo.id = move_id
    RETURNING mo.id, mo.geog;

END;
$$
LANGUAGE 'plpgsql' VOLATILE;
  • The object_move(object_id, direction) function just converts the "direction" parameter into a movement vector, and UPDATES the relevant row of the objects table.

  • The change to the objects table fires off the objects_geofence() trigger, which calculates the fences the object is now in.

objects_geofence()
CREATE FUNCTION objects_geofence() RETURNS trigger AS $$
    DECLARE
        fences_new integer[];
    BEGIN
        -- Add the current geofence state to the input
        -- tuple every time.
        SELECT coalesce(array_agg(id), ARRAY[]::integer[])
            INTO fences_new
            FROM moving.geofences
            WHERE ST_Intersects(geofences.geog, new.geog);

        RAISE DEBUG 'fences_new %', fences_new;
        -- Ensure geofence state gets saved
        NEW.fences := fences_new;
        RETURN NEW;
    END;

$$
LANGUAGE 'plpgsql';
  • The change to the objects table then fires off the objects_update() trigger, which:
    • Compares the current set of geofences to the previous set, and thus detects any enter/leave events.
    • Adds the new location of the object to the objects_history tracking table.
    • Composes the new location and any geofence events into a JSON object and puts it into the "objects" NOTIFY queue using pg_notify().
objects_update()
CREATE FUNCTION objects_update() RETURNS trigger AS
$$

    DECLARE
        channel text := 'objects';
        fences_old integer[];
        fences_entered integer[];
        fences_left integer[];
        events_json jsonb;
        location_json jsonb;
        payload_json jsonb;
    BEGIN
        -- Place a copy of the value into the history table
        INSERT INTO moving.objects_history (id, geog, ts, props)
            VALUES (NEW.id, NEW.geog, NEW.ts, NEW.props);

        -- Clean up any nulls
        fences_old := coalesce(OLD.fences, ARRAY[]::integer[]);
        RAISE DEBUG 'fences_old %', fences_old;

        -- Compare to previous fences state
        fences_entered = NEW.fences - fences_old;
        fences_left = fences_old - NEW.fences;

        RAISE DEBUG 'fences_entered %', fences_entered;
        RAISE DEBUG 'fences_left %', fences_left;

        -- Form geofence events into JSON for notify payload
        WITH r AS (
        SELECT 'entered' AS action,
            g.id AS geofence_id,
            g.label AS geofence_label
        FROM moving.geofences g
        WHERE g.id = ANY(fences_entered)
        UNION
        SELECT 'left' AS action,
            g.id AS geofence_id,
            g.label AS geofence_label
        FROM moving.geofences g
        WHERE g.id = ANY(fences_left)
        )
        SELECT json_agg(row_to_json(r))
        INTO events_json
        FROM r;

        -- Form notify payload
        SELECT json_build_object(
            'type', 'objectchange',
            'object_id', NEW.id,
            'events', events_json,
            'location', json_build_object(
                'longitude', ST_X(NEW.geog::geometry),
                'latitude', ST_Y(NEW.geog::geometry)),
            'ts', NEW.ts,
            'color', NEW.color,
            'props', NEW.props)
        INTO payload_json;

        RAISE DEBUG '%', payload_json;

        -- Send the payload out on the channel
        PERFORM (
            SELECT pg_notify(channel, payload_json::text)
        );

        RETURN NEW;
    END;

$$
LANGUAGE 'plpgsql';
  • pg_eventserv picks the event off the NOTIFY queue and pushes it out to all listening clients over WebSockets.
  • The user interface recieves the JSON payload, parses it, and applies the new location to the appropriate object. If there is a enter/leave event on a geofence, the UI also changes the geofence outline color appropriately.

Phew! That's a lot!

  • Side note, the geofences table also has a trigger, layer_change() that catches insert/update/delete events and publishes a JSON notification with pg_notify(). This is also published by pg_eventserv and when the UI receives it, it simply forces a full re-load of geofence data.
layer_change()
CREATE FUNCTION layer_change() RETURNS trigger AS
$$

    DECLARE
        layer_change_json json;
        channel text := 'objects';
    BEGIN
        -- Tell the client what layer changed and how
        SELECT json_build_object(
            'type', 'layerchange',
            'layer', TG_TABLE_NAME::text,
            'change', TG_OP)
          INTO layer_change_json;

        RAISE DEBUG 'layer_change %', layer_change_json;
        PERFORM (
            SELECT pg_notify(channel, layer_change_json::text)
        );
        RETURN NEW;
    END;

$$
LANGUAGE 'plpgsql';

OK, all done.

Trying It Out Yourself

All the code and instructions are available in the moving objects example of pg_eventserv.

Conclusion

  • Moving objects are a classic case of "system state stored in the database".
  • PostgreSQL provides the LISTEN/NOTIFY system to update clients about real-time changes.
  • The pg_eventserv service allows you to push LISTEN/NOTIFY events further out to any WebSockets client and generate a moving object map.
  • Because the state is managed in the database, storing the historical state of the system is trivially easy. $$

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at October 24, 2022 03:00 PM

October 17, 2022

Boston GIS (Regina Obe, Leo Hsu)

Using pg_upgrade to upgrade PostgreSQL 9.6 PostGIS 2.4 to PostgreSQL 15 3.3 on Yum

PostgreSQL 15 came out just last week. To celebrate the arrival of PostgreSQL 15, I will revisit the number one problem people have with PostGIS, how to upgrade your PostGIS enabled cluster, without installing an old PostGIS version.

In a previous article Using pg upgrade to upgrade PostGIS without installing older version I demonstrated a trick for upgrading to a newer PostgreSQL instance from PostGIS 2.2 - 2.whatever without having to install the older version of PostGIS in your new PostgreSQL service. This is a revisit of that article, but with considerations for upgrading from a 2 series to a 3 series.

Fear not. Going from PostGIS 2+ to PostGIS 3+ can still be done without installing the old PostGIS 2+ in your new cluster. Additionally once you are on PostGIS 3.1+, you should never have to do symlink or copy hacks to upgrade say PostGIS 3.1 to PostGIS 3.4. At least not for the same major version.

If per chance you are on PostGIS 1 and trying to jump all the way to PostGIS 3+, then sadly you need to do a pg_dump of your old database/cluster and pg_restore into your new cluster.

Continue reading "Using pg_upgrade to upgrade PostgreSQL 9.6 PostGIS 2.4 to PostgreSQL 15 3.3 on Yum"

by Regina Obe (nospam@example.com) at October 17, 2022 01:15 AM

October 13, 2022

Martin Davis (Lin.ear th.inking)

Relational Properties of DE-9IM spatial predicates

There is an elegant mathematical theory of binary relations.  Homogeneous relations are an important subclass of binary relations in which both domains are the same.  A homogeneous relation R is a subset of all ordered pairs (x,y) with x and y elements of the domain.  This can be thought of as a boolean-valued function R(x,y), which is true if the pair has the relationship and false if not.

The restricted structure of homogeneous relations allows describing them by various properties, including:

Reflexive - R(x, x) is always true

Irreflexive - R(x, x) is never true

Symmetric - if R(x, y), then R(y, x)

Antisymmetric - if R(x, y) and R(y, x), then x = y

Transitive - if R(x, y) and R(y, z), then R(x, z)

The Dimensionally Extended 9-Intersection Model (DE-9IM) represents the topological relationship between two geometries.  

Various useful subsets of spatial relationships are specified by named spatial predicates.  These are the basis for spatial querying in many spatial systems including the JTS Topology Suite, GEOS and PostGIS.

The spatial predicates are homogeneous binary relations over the Geometry domain  They can thus be categorized in terms of the relational properties above.  The following table shows the properties of the standard predicates:

PredicateReflexive /
Irreflexive
Symmetric /
Antisymmetric
Transitive

EqualsRST
IntersectsRS-
DisjointRS-
ContainsRA-
WithinRA-
CoversRAT
CoveredByRAT
CrossesIS-
OverlapsIS-
TouchesIS-

Notes

  • Contains and Within are not Transitive because of the quirk that "Polygons do not contain their Boundary" (explained in this post).  A counterexample is a Polygon that contains a LineString lying in its boundary and interior, with the LineString containing a Point that lies in the Polygon boundary.  The Polygon does not contain the Point.  So Contains(poly, line) = true and Contains(line, pt) = true, but Contains(poly, pt) = false.   The predicates Covers and CoveredBy do not have this idiosyncrasy, and thus are transitive.

by Dr JTS (noreply@blogger.com) at October 13, 2022 10:03 PM

October 01, 2022

Paul Ramsey

September 30, 2022

Anita Graser (Underdark)

Detecting close encounters using MobilityDB 1.0

It’s been a while since we last talked about MobilityDB in 2019 and 2020. Since then, the project has come a long way. It joined OSGeo as a community project and formed a first PSC, including the project founders Mahmoud Sakr and Esteban Zimányi as well as Vicky Vergara (of pgRouting fame) and yours truly.

This post is a quick teaser tutorial from zero to computing closest points of approach (CPAs) between trajectories using MobilityDB.

Setting up MobilityDB with Docker

The easiest way to get started with MobilityDB is to use the ready-made Docker container provided by the project. I’m using Docker and WSL (Windows Subsystem Linux on Windows 10) here. Installing WLS/Docker is out of scope of this post. Please refer to the official documentation for your operating system.

Once Docker is ready, we can pull the official container and fire it up:

docker pull mobilitydb/mobilitydb
docker volume create mobilitydb_data
docker run --name "mobilitydb" -d -p 25432:5432 -v mobilitydb_data:/var/lib/postgresql mobilitydb/mobilitydb
psql -h localhost -p 25432 -d mobilitydb -U docker

Currently, the container provides PostGIS 3.2 and MobilityDB 1.0:

Loading movement data into MobilityDB

Once the container is running, we can already connect to it from QGIS. This is my preferred way to load data into MobilityDB because we can simply drag-and-drop any timestamped point layer into the database:

For this post, I’m using an AIS data sample in the region of Gothenburg, Sweden.

After loading this data into a new table called ais, it is necessary to remove duplicate and convert timestamps:

CREATE TABLE AISInputFiltered AS
SELECT DISTINCT ON("MMSI","Timestamp") *
FROM ais;

ALTER TABLE AISInputFiltered ADD COLUMN t timestamp;
UPDATE AISInputFiltered SET t = "Timestamp"::timestamp;

Afterwards, we can create the MobilityDB trajectories:

CREATE TABLE Ships AS
SELECT "MMSI" mmsi,
tgeompoint_seq(array_agg(tgeompoint_inst(Geom, t) ORDER BY t)) AS Trip,
tfloat_seq(array_agg(tfloat_inst("SOG", t) ORDER BY t) FILTER (WHERE "SOG" IS NOT NULL) ) AS SOG,
tfloat_seq(array_agg(tfloat_inst("COG", t) ORDER BY t) FILTER (WHERE "COG" IS NOT NULL) ) AS COG
FROM AISInputFiltered
GROUP BY "MMSI";

ALTER TABLE Ships ADD COLUMN Traj geometry;
UPDATE Ships SET Traj = trajectory(Trip);

Once this is done, we can load the resulting Ships layer and the trajectories will be loaded as lines:

Computing closest points of approach

To compute the closest point of approach between two moving objects, MobilityDB provides a shortestLine function. To be correct, this function computes the line connecting the nearest approach point between the two tgeompoint_seq. In addition, we can use the time-weighted average function twavg to compute representative average movement speeds and eliminate stationary or very slowly moving objects:

SELECT S1.MMSI mmsi1, S2.MMSI mmsi2, 
       shortestLine(S1.trip, S2.trip) Approach,
       ST_Length(shortestLine(S1.trip, S2.trip)) distance
FROM Ships S1, Ships S2
WHERE S1.MMSI > S2.MMSI AND
twavg(S1.SOG) > 1 AND twavg(S2.SOG) > 1 AND
dwithin(S1.trip, S2.trip, 0.003)

In the QGIS Browser panel, we can right-click the MobilityDB connection to bring up an SQL input using Execute SQL:

The resulting query layer shows where moving objects get close to each other:

To better see what’s going on, we’ll look at individual CPAs:

Having a closer look with the Temporal Controller

Since our filtered AIS layer has proper timestamps, we can animate it using the Temporal Controller. This enables us to replay the movement and see what was going on in a certain time frame.

I let the animation run and stopped it once I spotted a close encounter. Looking at the AIS points and the shortest line, we can see that MobilityDB computed the CPAs along the trajectories:

A more targeted way to investigate a specific CPA is to use the Temporal Controllers’ fixed temporal range mode to jump to a specific time frame. This is helpful if we already know the time frame we are interested in. For the CPA use case, this means that we can look up the timestamp of a nearby AIS position and set up the Temporal Controller accordingly:

More

I hope you enjoyed this quick dive into MobilityDB. For more details, including talks by the project founders, check out the project website.


This post is part of a series. Read more about movement data in GIS.

by underdark at September 30, 2022 04:24 PM

September 10, 2022

PostGIS Development

PostGIS 3.3.1

The PostGIS Team is pleased to release PostGIS 3.3.1.

This is a bug fix release to address an issue compiling against PostgreSQL 15 Beta 4.

Best served with PostgreSQL 15 Beta 4.

by Regina Obe at September 10, 2022 12:00 AM

August 27, 2022

PostGIS Development

PostGIS 3.3.0 Released

The PostGIS Team is pleased to release PostGIS 3.3.0.

by Regina Obe at August 27, 2022 12:00 AM

August 25, 2022

PostGIS Development

Upgrading postgis_sfcgal to 3.1 or higher

As of PostGIS 3.1, the PostGIS sfcgal support library is no longer part of the postgis core library, but instead spun off into a new library postgis_sfcgal-3.

This change is not an issue for people doing regular, soft-upgrades from a PostGIS < 3.1 compiled with SFCGAL to a PostGIS >= 3.1 with SFCGAL using ALTER EXTENSION postgis_sfcgal UPDATE; or SELECT postgis_extensions_upgrade();. However if you are using pg_upgrade, you might get errors like postgis-3 does not contain function postgis_sfcgal_version() (which is part of the postgis_sfcgal extension).

The three main reasons for this break were:

  • We wanted postgis-3 library to have the same exposed functions regardless if you are compiling with SFCGAL or not. This change was planned in PostGIS 3.0, but only the backend switching plumbing was removed and not the complete detachment.

  • It makes it possible for packagers to offer postgis_sfcgal (perhaps as a separate package), without requiring other users who just want postgis to have to have boost and CGAL.

  • In the past postgis_sfcgal and postgis extensions were hooked together at the hip in the same underlying library, because their were a few functions overlapping in name such as ST_3DIntersects and ST_Intersects. Trying to explain to people how this whole thing worked, to switch the backend to sfcgal if they wanted extended 3D functionality, not to mention the added annoyance GUC backend of notices during upgrade was more of a pain than it was worth. So moving forward, we will not be reusing function names between the two extensions, and will have only non-overlapping function names.

by Regina Obe at August 25, 2022 12:00 AM

August 22, 2022

PostGIS Development

PostGIS 3.3.0rc2

The PostGIS Team is pleased to release PostGIS 3.3.0rc2! Best Served with PostgreSQL 15 beta3 ,GEOS 3.11.0 , and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0rc2

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.3.

by Regina Obe at August 22, 2022 12:00 AM

August 18, 2022

PostGIS Development

PostGIS 3.2.3, 3.1.7, 3.0.7, 2.5.8 Released

The PostGIS development team is pleased to provide bug fix and performance enhancements 3.2.3, 3.1.7, 3.0.7, 2.5.8 for the 3.2, 3.1, 3.0, 2.5 stable branches.

by Regina Obe at August 18, 2022 12:00 AM

August 15, 2022

Crunchy Data

Rise of the Anti-Join

Find me all the things in set "A" that are not in set "B".

This is a pretty common query pattern, and it occurs in both non-spatial and spatial situations. As usual, there are multiple ways to express this query in SQL, but only a couple queries will result in the best possible performance.

Setup

The non-spatial setup starts with two tables with the numbers 1 to 1,000,000 in them, then deletes two records from one of the tables.

CREATE TABLE a AS SELECT generate_series(1,1000000) AS i
ORDER BY random();
CREATE INDEX a_x ON a (i);

CREATE TABLE b AS SELECT generate_series(1,1000000) AS i
ORDER BY random();
CREATE INDEX b_x ON b (i);

DELETE FROM b WHERE i = 444444;
DELETE FROM b WHERE i = 222222;

ANALYZE;

The spatial setup is a 2M record table of geographic names, and a 3K record table of county boundaries. Most of the geonames are inside counties (because we tend to names things on land) but some of them are not (because sometimes we name things in the ocean, or our boundaries are not detailed enough).

Subqueries? No.

Since the problem statement includes the words "not in", this form of the query seems superficially plausible:

SELECT i
  FROM a
  WHERE i NOT IN (SELECT i FROM b);

Perfect! Give me everything from "A" that is not in "B"! Just what we want? In fact, running the query takes so long that I never got it to complete. The explain gives some hints.

                                  QUERY PLAN
------------------------------------------------------------------------------
 Gather  (cost=1000.00..5381720008.33 rows=500000 width=4)
   Workers Planned: 2
   ->  Parallel Seq Scan on a  (cost=0.00..5381669008.33 rows=208333 width=4)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..23331.97 rows=999998 width=4)
                 ->  Seq Scan on b  (cost=0.00..14424.98 rows=999998 width=4)

Note that the subquery ends up materializing the whole second table into memory, where it is scanned over and over and over to test each key in table "A". Not good.

Except? Maybe.

PostgreSQL supports some set-based key words that allow you to find logical combinations of queries: UNION, INTERSECT and EXCEPT.

Here, we can make use of EXCEPT.

SELECT a.i FROM a
EXCEPT
SELECT b.i FROM b;

The SQL still matches our mental model of the problem statement: everything in "A" except for everything in "B".

And it returns a correct answer in about 2.3 seconds.

   i
--------
 222222
 444444
(2 rows)

The query plan is interesting: the two tables are appended and then sorted for duplicates and then only non-dupes are omitted!

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 SetOp Except  (cost=322856.41..332856.40 rows=1000000 width=8)
   ->  Sort  (cost=322856.41..327856.41 rows=1999998 width=8)
         Sort Key: "*SELECT* 1".i
         ->  Append  (cost=0.00..58849.95 rows=1999998 width=8)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..24425.00 rows=1000000 width=8)
                     ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..24424.96 rows=999998 width=8)
                     ->  Seq Scan on b  (cost=0.00..14424.98 rows=999998 width=4)

It's a big hammer, but it works.

Anti-Join? Yes.

The best approach is the "anti-join". One way to express an anti-join is with a special "correlated subquery" syntax:

SELECT a.i
  FROM a
  WHERE NOT EXISTS
    (SELECT b.i FROM b WHERE a.i = b.i);

So this returns results from "A" only where those results result in a no-record-returned subquery against "B".

It takes about 850 ms on my test laptop, so 3 times faster than using EXCEPT in this test, and gets the right answer. The query plan looks like this:

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Gather  (cost=16427.98..31466.36 rows=2 width=4)
   Workers Planned: 2
   ->  Parallel Hash Anti Join  (cost=15427.98..30466.16 rows=1 width=4)
         Hash Cond: (a.i = b.i)
         ->  Parallel Seq Scan on a  (cost=0.00..8591.67 rows=416667 width=4)
         ->  Parallel Hash  (cost=8591.66..8591.66 rows=416666 width=4)
               ->  Parallel Seq Scan on b  (cost=0.00..8591.66 rows=416666 width=4)

The same sentiment can be expressed without the NOT EXISTS construct, using only basic SQL and a LEFT JOIN:

SELECT a.i
  FROM a
  LEFT JOIN b ON (a.i = b.i)
  WHERE b.i IS NULL;

This also takes about 850 ms.

The LEFT JOIN is required to return a record for every row of "A". So what does it do if there's no record in "B" that satisfies the join condition? It returns NULL for the columns of "B" in the join relation for those records. That means any row with a NULL in a column of "B" that is normally non-NULL is a record in "A" that is not in "B".

Now do Spatial

The nice thing about the LEFT JOIN expression of of the solution is that it generalizes nicely to arbitrary join conditions, like those using spatial predicate functions.

"Find the geonames points that are not inside counties"... OK, we will LEFT JOIN geonames with counties and find the records where counties are NULL.

SELECT g.name, g.geonameid, g.geom
  FROM geonames g
  LEFT JOIN counties c
    ON ST_Contains(c.geom, g.geom)
  WHERE g.geom IS NULL;

The answer pops out in about a minute.

Unsurprisingly, that's about how long a standard inner join takes to associate the 2M geonames with the 3K counties, since the anti-join has to do about that much work to determine which records do not match the join condition.

Conclusion

  • "Find the things in A that aren't in B" is a common use pattern.
  • The "obvious" SQL patterns might not be the most efficient ones.
  • The WHERE NOT EXISTS and LEFT JOIN patterns both result in the most efficient query plans and executions.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at August 15, 2022 12:00 PM

August 08, 2022

PostGIS Development

PostGIS 3.3.0rc1

The PostGIS Team is pleased to release PostGIS 3.3.0rc1! Best Served with PostgreSQL 15 beta2 ,GEOS 3.11.0 , and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0rc1

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.2 and new features.

by Regina Obe at August 08, 2022 12:00 AM

July 23, 2022

PostGIS Development

PostGIS 3.2.2

The PostGIS Team is pleased to release PostGIS 3.2.2! This release works for PostgreSQL 9.6-15.

3.2.2

This release is a bug fix release, addressing issues found in the previous 3.2 releases.

by Regina Obe at July 23, 2022 12:00 AM

July 20, 2022

PostGIS Development

PostGIS 3.1.6

The PostGIS Team is pleased to release PostGIS 3.1.6! This release works for PostgreSQL 9.6-14.

3.1.6

This release is a bug fix release, addressing issues found in the previous 3.1 releases.

by Regina Obe at July 20, 2022 12:50 PM

PostGIS Development

PostGIS 3.0.6

The PostGIS Team is pleased to release PostGIS 3.0.6. This release works for PostgreSQL 9.5-13.

3.0.6

by Regina Obe at July 20, 2022 12:00 AM

July 19, 2022

PostGIS Development

PostGIS 2.5.7

The PostGIS Team is pleased to release PostGIS 2.5.7!

2.5.7

This release is a bug fix release, addressing issues found in the previous 2.5 releases.

by Regina Obe at July 19, 2022 12:43 PM

July 13, 2022

PostGIS Development

PostGIS 3.3.0beta2

The PostGIS Team is pleased to release PostGIS 3.3.0beta2! Best Served with PostgreSQL 15 Beta2 ,GEOS 3.11.0 , and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0beta2

This release is a beta of a major release, it includes bug fixes since PostGIS 3.2.1 and new features.

by Regina Obe at July 13, 2022 12:00 AM

July 03, 2022

PostGIS Development

PostGIS 3.3.0beta1

The PostGIS Team is pleased to release PostGIS 3.3.0beta1! Best Served with PostgreSQL 15 Beta2 and GEOS 3.11.0.

This release supports PostgreSQL 11-15.

3.3.0beta1

This release is a beta of a major release, it includes bug fixes since PostGIS 3.2.1 and new features.

by Regina Obe at July 03, 2022 12:00 AM

June 23, 2022

Crunchy Data

Postgres Indexes, Selectivity, and Statistics

The sheer cleverness of relational databases is often discounted because we so frequently use them for very simple data management tasks.

Serialize an object into a row, store with unique key. yawwwn

Search for unique key, deserialize row into an object. yawwwwwwn

The real power of relational databases is juggling "relations" (aka tables) in large numbers and figuring out on-the-fly the most effective way to filter out rows and find an answer.

PostgreSQL has an undeniably clever query planning system that auto-tunes based on the data in the system. It samples tables to gain statistics about the distribution of data, and uses those statistics to choose the order of joins and filters applied to the data for the most efficient query execution.

Even more amazing, the query planning system is modular enough to integrate user-defined data types, like the geometry and geography types in PostGIS. So complex queries involving spatial filters are also correctly planned on-the-fly.

Statistics Targets

Mostly, this just works magically. The system has a rough idea of the selectivity of any given filter or join condition, and can use that estimate to choose the most efficient order of operations to execute multi-table and multi-filter queries over complex collections of relational tables.

But what about when things go wrong?

By default, a PostgreSQL data ships with a default_statistics_target of 100. This means that to populate the statistics for a column the database will draw a sample of 300 * default_statistics_target = 30000 rows.

The system will then use that data to populate the "common value list" and column histogram with roughly default_statistics_target entries.

A large table of keys arranged in a Pareto distribution can make things hard for the statistics system. There may be more common keys than can easily fit into the "common value list", so joins get poorly planned for example.

Fortunately, there are some table specific knobs you can turn.

The statistics target on each table and column can be individually changed, so that more data are sampled, and more granular statistics are gathered.

ALTER TABLE people ALTER COLUMN age SET STATISTICS 200;

Gathering more statistics is usually sufficient to correct planning behavior. However, in special cases (like when a query uses multiple filters on strongly correlated columns) it might be necessary to pull out "extended statistics" to gather even more data for planning.

Indexes Make (Some) Things Faster

The thing about magic is that underneath there is usually some complex machinery to make it all work transparently.

Pretty much every database user understands that there's a thing called an "index" and that adding an "index" to a column will make searches on that column faster.

For most uses, this rule-of-thumb is correct: indexes make random access of a small number of items faster, at the expense of making inserts and updates to a table slightly slower.

What happens, though, if you are accessing a large number of items from a table?

An index is a tree. It's fast to find one item traversing through a tree, it's called an "index scan". But finding a lot of items involves running through the tree over and over and over. It is slow to index scan all the records in a tree.

An efficient index scan query will have a narrow filter:

SELECT * FROM people WHERE age = 4;

Conversely, reading a table from from front to back can be quite fast, if you are interested in most of the records. This is called a "sequence scan".

An efficient sequence scan query will have a wide filter:

SELECT * FROM people WHERE age BETWEEN 45 AND 95;

For any given filter, a relational database will figure out whether the filter is most efficiently run as an index scan or a sequence scan. But how?

Selectivity

The "selectivity" of a filter is a measure of what proportion of the records will be returned by the filter. A filter that returns only 1% of the records is "very selective", and vice versa.

An ordinary filter on numbers or strings or dates might define an equality or a range as a filter. A spatial filter on geographic objects defines a bounding box as a filter. A filter is just a true/false test that every object in the column can be subjected to.

Here's two spatial filters, which one do you suppose is more selective?

The red one, right? Just like our SQL example above on ages, the smaller rectangle must return fewer records. Except not necessarily. Suppose our data table looked like this.

So the blue box is selective and should use an index scan and the red one should use a sequence scan.

But wait, how does the database know which box is selective, without running a query and actually applying the filters to the data?

Statistics

When the ANALYZE command is run, the database will gather "statistics" for every column that has an index on it. (The "autovacuum" system that runs in the background of every PostgreSQL cluster is also an "autoanalyze", gathering statistics at the same time it does table maintenance.)

For data types like integers, real numbers, text and dates the system will gather a "common value list" of values that show up frequently in the table, and a histogram of the frequency of values in different ranges.

For spatial data types, the system builds a spatial histogram of frequencies of occurrance. So, for our example data set, something like this.

Instead of being a potentially unconstrained collection of maybe millions of spatial objects, the database now just has to make a quick summary of a few cells in a histogram to get an estimate of how many features a given query filter will return.

The spatial statistics analyzer in PostGIS is actually a little more sophisticated than that: before filling out the histogram it adjusts the cell widths for each dimension, to try and catch more resolution in places where the data is more dense.

All these statistics do have a cost. Increasing statistics targets will make the ANALYZE command run a little slower, as it samples more data. It will also make the planning stage of queries slower, as the larger collection of statistics need to be read through and compared to the filters and joins in the SQL. However, for complex BI queries, more statistics are almost always worth while to get a better plan for a complex query.

Conclusion

  • PostgreSQL has an undeniably clever query planning system which auto-tunes based on the data in the system.
  • PostGIS plugs right into that clever system and provides spatial selectivity estimates to ensure good plans for spatial queries too.
  • For even more complex situations, such as selectivity estimates on highly correlated columns, PostgreSQL also includes an "extended statistics" system to capture those cases.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at June 23, 2022 03:00 PM

Paul Ramsey

Technology, Magic & PostgreSQL

I have a blog post up today at Crunchy Data on some of the mechanisms that underlie the PostgreSQL query planner, it’s pretty good if I do say so myself.

I was motivated to write it by a conversation over coffee with my colleague Martin Davis. We were talking about a customer with an odd query plan case and I was explaining how the spatial statistics system worked and he said “you should do that up as a blog post”. And, yeah, I should.

One of the things that is striking as you follow the PostgreSQL development community is the extent to which a fairly mature piece of technology like PostgreSQL is stacks of optimizations on top of optimizations on top of optimizations. Building and executing query plans involves so many different paths of execution, that there’s always a new, niche use case to address and improve.

I worked a political campaign a few years ago as a “data science” staffer, and our main problem was stitching together data from multiple systems to get a holistic view of our data.

That meant doing cross-system joins.

The first cut is always easy: pull a few records out of System A with a filter condition and then go to System B and pull the associated records. But then inevitably a new filter condition shows up and applied to A it generates so many records that the association step on B gets overloaded. But it turns out if I start from B and then associate in A it’s fast again.

And thus suddenly I found myself writing a query planner and executor.

It’s only when dumped into the soup of having to solve these problems yourself that you really appreciate the magic that is a mature relational database system. The idea that PostgreSQL can take a query that involves multiple tables of different sizes, with different join cardinalities, and different indexes and figure out an optimal plan in a few milliseconds, and then execute that plan in a streaming, memory efficient way…?

Magic is really the best word I’ve found.

June 23, 2022 08:00 AM

June 21, 2022

Paul Ramsey

Some More PostGIS Users

The question of why organizations are shy about their use of open source is an interesting one, and not completely obvious.

Open source luminary Even Roualt asks:

is there some explanation why most institutions can’t communicate about their PostGIS use ? just because it is a major hurdle for technical people to get their public relationship department approve a communication ? people afraid about being billed about unpaid license fees 🤣 ?

There’s really very little upside to publicizing open source use. There’s no open source marketing department to trumpet the brilliance of your decision, or invite you to a conference to give you an award. On the other hand, if you have made the mistake of choosing an open source solution over a well-known proprietary alternative, there is surely a local sales rep who will call your boss to tell them that you have made a big mistake. (You do have a good relationship with your boss, I hope.)

These reverse incentives can get pretty strong. Evendiagram reports:

Our small group inside a large agency uses postgis. We don’t talk about it, even internally, to avoid the C-suite forcing everyone back to oracle. RHEL repos allow us a lot of software that would otherwise be denied.

This reminds me of my years consulting for the British Columbia government, when technical staff would run data processing or even full-on public web sites from PostgreSQL/PostGIS machines under their desktops.

They would tell their management it was “just a test system” or “a caching layer”, really anything other than “it’s a database”, because if they uttered the magic word “database”, the system would be slated for migration into the blessed realm of enterprise Oracle systems, never to be heard from again.

Logos

Meanwhile, Daryl Herzmann reminds us that the Iowa Mesonet has been on Team PostGIS since 2003.

Iowa Environmental Mesonet, Iowa State University

  • Data being managed in the database
    Meteorological Data, “Common” GIS datasets (roads, counties), Current and Archived NWS Tornado/Flash Flood/Thunderstorm Warnings, Historical Storm Reports, Current and Archived precipitation reports. Climate data
  • How the data is being accessed / manipulated
    From mapserver! Manipulated via Python and PHP.
  • Why you chose to use PostGIS for the application
    Open-Source. Uses my favorite DB, Postgres. Easy integration with mapserver. The support community is fantastic!

Further afield, the GIS portals of governments throughout Ukraine are running on software built on PostGIS.

Jørgen Larsen de Martino notes that:

The Danish Agency for Data Supply and Infrastructure uses PostGIS extensively - and have been using it for the last 10 years - we would not have had the success we have was it not for @PostGIS.

The Utah Geospatial Resource Center uses PostGIS to provide access to multiple spatial layers for direct access in a cloud-hosted PostGIS database called the “Open SGID”. (I can hear DBA heads exploding around the world.)

Counterpoint

While self-reporting is nice, sometimes just a little bit of dedicated searching will do. Interested in PostGIS use in the military? Run a search for “postgis site:mil” and see what pops up!

The 108th wing of the Air Force! Staff Sgt. Steve De Leon is hard at it!

“I’m taking all the data sources that AMC and A2 compile and indexing them into the PostgreSQL/PostGIS data and then from there trying to script Python code so the website can recognize all the indexed data in the PostgreSQL/PostGIS database,” said the De Leon.

The Canadian Department of National Defense is building Maritime Situational Awareness Research Infrastructure with a PostgreSQL/PostGIS standard database component.

PostgreSQL with its PostGIS extension is the selected DBMS for MSARI. To ease mainte- nance and access, if more than one database are used, PostgreSQL will be selected for all databases.

The Coast Guards “Environmental Response Management Application (ERMA)” is also running PostGIS.

The application is based on open source software (PostgreSQL/PostGIS, MapServer, and OpenLayers), that meet Open Geospatial Consortium (OGC) specifications and standards used across federal and international geospatial standards communities. This ensures ERMA is compatible with other commercial and open-source GIS applications that can readily incorporate data from online data projects and avoids licensing costs. Open-source compatibility supports data sharing, leverages existing data projects, reduces ERMA’s maintenance costs, and ensures system flexibility as the technology advances. Because ERMA is open source, it can easily be customized to meet specific user requirements.

More logos?

Want to appear in this space? Email me!

June 21, 2022 08:00 AM

June 20, 2022

Paul Ramsey

Some PostGIS Users

Last week, I wrote that getting large organizations to cop to using PostGIS was a hard lift, despite that fact that, anecdotally, I know that there is massive use of PostGIS in every sector, at every scale of institution.

Simple Clues

Here’s a huge tell that PostGIS is highly in demand: despite the fact that PostGIS is a relatively complex extension to build (it has numerous dependencies) and deploy (the upgrade path between versions can be complex) every single cloud offering of PostgreSQL includes PostGIS.

AWS, Google Cloud, Azure, Crunchy Bridge, Heroku, etc, etc. Also forked not-quite-Postgres things like Aurora and AlloyDB. Also not-Postgres-but-trying things like Cockroach and Yugabyte.

If PostGIS was a niche hobbyist project…? Complete the sentence any way you like.

Logos

True to form, I received a number of private messages from people working in or with major institutions you have heard of, confirming their PostGIS use, and the fact that the institution would not publicly validate it.

However, I also heard from a couple medium sized companies, which seem to be the only institutions willing to talk about how useful they find open source in growing their businesses.

Hailey Eckstrand of Foundry Spatial writes to say:

Foundry Spatial uses PostGIS in development and production. In development we use it as our GIS processing engine and warehouse. We integrate spatial data (often including rasters that have been loaded into PostGIS) into a watershed fabric and process summaries for millions of watersheds across North America. We often use it in production with open source web tooling to return results through an API based on user input. One of our more complex usages is to return raster results within polygons and along networks within a user supplied distance from a click location. We find the ease and power of summarizing and analyzing many spatial datasets with a single SQL query to be flexible, performant, efficient, and… FUN!

Dian Fay of Understory writes in:

We use PostGIS at Understory to track and record storms, manage fleets of weather stations, and optimize geographic risk concentration for insurance. PostGIS lets us do all this with the database tools we already know & love, and without severing the connections between geographic and other categories of information.

More logos?

Want to appear in this space? Email me!

June 20, 2022 08:00 AM

June 13, 2022

Paul Ramsey

Who are the Biggest PostGIS Users?

The question of “who uses PostGIS” or “how big is PostGIS” or “how real is PostGIS” is one that we have been wrestling with literally since the first public release back in 2001.

There is no doubt that institutional acceptance is the currency of … more institutional acceptance.

Oroboros

So naturally, we would love to have a page of logos of our major users, but unfortunately those users do not self-identify.

As an open source project PostGIS has a very tenuous grasp at best on who the institutional users are, and things have actually gotten worse over time.

Originally, we were a source-only project and the source was hosted on one web server we controlled, so we could literally read the logs and see institutional users. At the time mailing lists were the only source of project communication, so we could look at the list participants, and get a feel from that.

All that’s gone now. Most users get their PostGIS pre-installed by their cloud provider, or pre-built from a package repository.

So what do we know?

IGN

In the early days, I collected use cases from users I identified on the mailing list. My favourite was our first major institutional adopter, the Institut Géographique National, the national mapping agency of France.

IGN

In 2005, they decided to move from a desktop GIS paradigm for their nation-wide basemap (of 150M features), to a database-centric architecture. They ran a bake-off of Oracle, DB2 and PostgreSQL (I wonder who got PostgreSQL into the list) and determined that all the options were similar in performance and functionality for their uses. So they chose the open source one. To my knowledge IGN is to this day a major user of PostgreSQL / PostGIS.

GlobeXplorer

Though long-gone as a brand, it’s possible the image management system that was built by GlobeXplorer in the early 2000’s is still spinning away in the bowels of Maxar.

MAXAR

GlobeXplorer was both one of the first major throughput use cases we learned about, and also the first one where we knew we’d displaced a proprietary incumbant. GlobeXplorer was one of the earliest companies explicitly serving satellite imagery to the web and via web APIs. They used a spatial database to manage their catalogue of images and prepared product. Initially it was built around DB2, but DB2 was a poor scaling choice. PostGIS was both physically faster and (more importantly) massively cheaper as scale went up.

RedFin

RedFin was a rarity, a use case found in the wild that we didn’t have to track down ourselves.

RedFin

They described in some detail their path from MySQL to PostgreSQL, including the advantages of having PostGIS.

Using PostGIS, we could create an index on centroid_col, price, and num_bedrooms. These indexes turned many of our “killer” queries into pussycats.

Google

Google is not that big on promoting any technology they haven’t built in house, but we have heard individual Google developers confirm that they use core open source geospatial libraries in their work, and that PostGIS is included in the mix.

Google

The biggest validation Google ever gave PostGIS was in a press release that recognized that the set of “users of spatial SQL” was basically the same as the set of “PostGIS users”.

Our new functions and data types follow the SQL/MM Spatial standard and will be familiar to PostGIS users and anyone already doing geospatial analysis in SQL. This makes workload migrations to BigQuery easier. We also support WKT and GeoJSON, so getting data in and out to your other GIS tools will be easy.

They didn’t address their new release to “Esri users” or “Oracle users” or “MySQL users”, they addressed it to the relevant population: PostGIS users.

More!

Getting permission to post logos is hard. Really hard. I’ve watched marketing staff slave over it. I’ve slaved over it myself.

Major automaker? Check. Major agricultural company? Check. Major defence contractor? Check, check, check. National government? Check. State, local, regional? Check, check, check. Financial services? Check. Management consulting? Check.

Yes, PostGIS is real.

At some point, for a project with a $0 price point, you just stop. If a user can’t be bothered to do the due diligence on the software themselves, to reap all the advantages we offer, for free, I’m not going to buy them a steak dinner, or spoon feed them references.

That said! If you work for a major government or corporate institution and you are allowed to publicize your use of PostGIS, I would love to write up a short description of your use, for the web site and our presentation materials.

Email me!

June 13, 2022 08:00 AM

May 30, 2022

Martin Davis (Lin.ear th.inking)

Algorithm for Concave Hull of Polygons

The previous post introduced the new ConcaveHullOfPolygons class in the JTS Topology Suite.  This allows computing a concave hull which is constrained by a set of polygonal geometries.  This supports use cases including:

  • generalization of groups of polygon
  • joining polygons
  • filling gaps between polygons

A concave hull of complex polygons

The algorithm developed for ConcaveHullOfPolygons is a novel one (as far as I know).  It uses several features recently developed for JTS, including a neat trick for constrained triangulation.  This post describes the algorithm in detail.

The construction of a concave hull for a set of polygons uses the same approach as the existing JTS ConcaveHull implementation.  The space to be filled by a concave hull is triangulated with a Delaunay triangulation.  Triangles are then "eroded" from the outside of the triangulation, until a criteria for termination is achieved.  A useful termination criteria is that of maximum outside edge length, specified as either an absolute length or a fraction of the range of edge lengths.

For a concave hull of points, the underlying triangulation is easily obtained via the Delaunay Triangulation of the point set.  However, for a concave hull of polygons the triangulation required is for the space between the constraint polygons.  A simple Delaunay triangulation of the polygon vertices will not suffice, because the triangulation may not respect the edges of the polygons.  

Delaunay Triangulation of polygon vertices crosses polygon edges

What is needed is a Constrained Delaunay Triangulation, with the edge segments of the polygons as constraints (i.e. the polygon edge segments are present as triangle edges, which ensures that other edges in the triangulation do not cross them).  There are several algorithms for Constrained Delaunay Triangulations - but a simpler alternative presented itself.  JTS recently added an algorithm for computing Delaunay Triangulations for polygons.  This algorithm supports triangulating polygons with holes (via hole joining).  So to generate a triangulation of the space between the input polygons, they can be inserted as holes in a larger "frame" polygon.  This can be triangulated, and then the frame triangles removed. Given a sufficiently large frame, this leaves the triangulation of the "fill" space between the polygons, out to their convex hull. 

Triangulation of frame with polygons as holes

The triangulation can then be eroded using similar logic to the non-constrained Concave Hull algorithm.  The implementations all use the JTS Tri data structure, so it is easy and efficient to share the triangulation model between them. 

Triangulation after removing frame and eroding triangles

The triangles that remain after erosion can be combined with the input polygons to provide the result concave hull.  The triangulation and the input polygons form a polygonal coverage, so the union can be computed very efficiently using the JTS CoverageUnion class.  If required, the fill area alone can be returned as a result, simply by omitting the input polygons from the union.

Concave Hull and Concave Fill

A useful option is to compute a "tight" concave hull to the outer boundary of the input polygons.  This is easily accomplished by removing triangles which touch only a single polygon.  

Concave Hull tight to outer edges


Concave Hull of complex polygons, tight to outer edges.

Like the Concave Hull of Points algorithm, holes are easily supported by allowing erosion of interior triangles.

Concave Hull of Polygons, allowing holes

The algorithm performance is determined by the cost of the initial polygon triangulation.  This is quite efficient, so the overall performance is very good.

As mentioned, this seems to be a new approach to this geometric problem.  The only comparable implementation I have found is the ArcGIS tool called Aggregate Polygons, which appears to provide similar functionality (including producing a tight outer boundary).  But of course algorithm details are not published and the code is not available.  It's much better to have an open source implementation, so it can be used in spatial tools like PostGIS, Shapely and QGIS (based on the port to GEOS).  Also, this provides the ability to add options and enhanced functionality for use cases which may emerge once this gets some real-world use.

by Dr JTS (noreply@blogger.com) at May 30, 2022 07:24 PM

May 21, 2022

PostGIS Development

PostGIS 3.3.0alpha1

The PostGIS Team is pleased to release PostGIS 3.3.0alpha1! This is the first release to support PostgreSQL 15. Best Served with PostgreSQL 15 Beta1. This release supports PostgreSQL 11-15.

3.3.0alpha1

This release is an alpha of a major release, it includes bug fixes since PostGIS 3.2.1 and new features.

by Regina Obe at May 21, 2022 12:00 AM

May 19, 2022

Crunchy Data

Instant Heatmap with pg_featureserv

Instant Heatmap with pg_featureserv

The pg_featureserv micro-service is a thin middleware that binds tables and functions in a PostgreSQL database to a JSON collections API, accessible over HTTP. Using the Crunchy Bridge container apps, I'm going to give a quick overview of how to set up a web based spatial heatmap from Postgres.

Application

The application uses PostgreSQL to store and search 2.2M geographic names in the USA. Type in the search box and the auto-fill form will find candidate words. Select a word, and the database will perform a full-text search for all the names that match your word, and return the result to the map. The map displays the result using a heat map.

More names closer together will get more vibrant colors, so you can see name density. Try some regional names: bayou, swamp, cherokee, baptist, cougar. Try it for yourself.

Architecture

Using pg_featureserv to build a demonstration application is pretty easy all you need is:

  • PostgreSQL running somewhere
  • pg_featureserv running somewhere
  • A web page hosted somewhere.

Wait, "running somewhere" is doing a lot of work here! Is there any way to do this without become a specialist in managing database and container ops?

Yes, we can do all the heavy lifting with Crunchy Bridge!

  • Crunchy Bridge hosts the database.
  • Crunchy Bridge container apps host the pg_featureserv microservice and a varnish web cache.
  • The static web page goes anywhere that can hold a static web page (S3 in this case).

Prepare the Data

We will build a small web application that can visualize the location of named places in the United States. This is far more interesting than it sounds, because named places carry a lot of local history and context with them, and that context shows up in maps!

Start by downloading the US named places.

wget https://download.geonames.org/export/dump/US.zip
unzip US.zip

For added security, we will connect our pg_featureserv microservice using an application account, and make a separate database for the application.

-- as postgres
CREATE DATABASE geonames WITH OWNER = application;

Then create a table to receive the data and load it up.

Create Table and Copy
CREATE TABLE geonames (
    geonameid      integer primary key,
    name           text,
    asciiname      text,
    alternatenames text,
    latitude       float8,
    longitude      float8,
    featureclass   text,
    featurecode    text,
    countrycode    text,
    countrycode2   text,
    admin1         text,
    admin2         text,
    admin3         text,
    admin4         text,
    population     bigint,
    elevation      integer,
    dem            integer,
    timezone       text,
    modified       date
);

\copy geonames FROM 'US.txt' WITH (
    FORMAT csv,
    DELIMITER E'\t',
    HEADER false,
    ENCODING utf8)

Create Indexes

There are 2.2M named places in the database, and we want to very quickly find names that match our query word. Often the names are multi-word ("Gold River", "West Gold Hills") and we will be searching with just one word. This is where full-text search indexing comes in handy.

Full-text indexes can only be built on a tsvector type. We can either add a new tsvector column to our table, and then populate it, or save a little space and just build a functional index on the tsvector construction function.

CREATE INDEX geonames_name_x
   ON geonames
   USING GIN (to_tsvector('english', name))

The drop-down form fill needs a list of unique name components, preferably in order of frequency, so "interesting" ones appear at the top of the list. We build that by stripping down and aggregating all the names in the table.

CREATE TABLE geonames_stats AS
    SELECT
      count(*) AS ndoc,
      unnest(regexp_split_to_array(lower(trim(name)), E'[^a-zA-Z]')) AS word
    FROM geonames GROUP BY 2;

CREATE INDEX geonames_stats_word_x ON geonames_stats (word text_pattern_ops);

Create Functions to Publish

The web map application needs an HTTP API to connect to, this is where pg_featureserv comes in. We will create two functions:

  • One function to drive the dropdown form field, which takes the characters currently typed and finds all matching words.
  • One function to drive the map, which takes in the query word and returns all matching places.

The magic power of pg_featureserv is in the ability to publish user defined functions.

Any function defined in the postgisftw schema will be published as a web end point.

Define a function to populate the dropdown form field.
CREATE SCHEMA postgisftw;

DROP FUNCTION IF EXISTS postgisftw.geonames_stats_query;

CREATE FUNCTION postgisftw.geonames_stats_query(
    q text DEFAULT 'bea')
RETURNS TABLE(value text, ndoc bigint)
AS $$
BEGIN
    RETURN QUERY
        SELECT g.word as value, g.ndoc
        FROM geonames_stats g
        WHERE g.word LIKE q || '%'
        ORDER BY g.ndoc DESC
        LIMIT 15;
END;
$$
LANGUAGE 'plpgsql'
PARALLEL SAFE
STABLE
STRICT;
Define a function to query for place names.
DROP FUNCTION IF EXISTS postgisftw.geonames_query;

CREATE FUNCTION postgisftw.geonames_query(q text DEFAULT 'beach')
RETURNS TABLE(name text, featureclass text, longitude float8, latitude float8)
AS $$
BEGIN
    RETURN QUERY
        SELECT
            g.name,
            g.featureclass,
            g.longitude,
            g.latitude
        FROM geonames g
        WHERE to_tsvector('english', g.name) @@ plainto_tsquery('english', q)
        ORDER BY md5(g.name)
        LIMIT 10000;
END;
$$
LANGUAGE 'plpgsql'
PARALLEL SAFE
STABLE
STRICT;

Run the Microservices

Our architecture uses two microservices: pg_featureserv to publish the database functions as web services; and varnish to protect the feature service from excessive load if the application gets a lot of traffic.

These services are run as Crunchy Bridge Container Apps using an extension called pgpodman.

CREATE EXTENSION pgpodman;

Once the extension is enabled, the containers can be turned on. This involves some magic strings, primarily the DNS name of the database host, which is available in the connection strings, and also in the container apps user interface.

Run the pg_featureserv container.

SELECT run_container('
    -dt -p 5442:9000/tcp
    --log-driver k8s-file
    --log-opt max-size=1mb
    -e DATABASE_URL="postgres://application:password@p.xxxxxxxxxxxx.db.postgresbridge.com:5432/geonames"
    -e PGFS_SERVER_HTTPPORT=9000
    -e PGFS_PAGING_LIMITDEFAULT=10000
    -e PGFS_PAGING_LIMITMAX=10000
    docker.io/pramsey/pg_featureserv:latest');

(Newlines are inserted into this example so you can see how the parameters are passed to the container. Environment variables to configure the service are passed in with -e.)

Run the varnish container.

SELECT run_container('
    -dt -p 5435:6081/tcp
    --log-driver k8s-file
    --log-opt max-size=1mb
    -e BACKENDS=p.fhlzf432a5gmvaj456jql4a4di.db.postgresbridge.com:5442
    -e DNS_ENABLED=false
    -e COOKIES=true
    -e PARAM_VALUE="-p default_ttl=3600"
    docker.io/pramsey/varnish:latest');

The varnish container is the "outward facing" service, so web requests should come into port 5435, where they will be passed to the varnish process inside the container on port 6081. Then varnish will call out to its BACKEND host (pg_featureserv) this time on port 5442, which in turn is proxied into port 9000 inside the container.

Try it Manually

Once the services are running, you can hit them manually from the outside.

http://p.fhlzf432a5gmvaj456jql4a4di.db.postgresbridge.com:5435/functions/geonames_query/items.json?q=cougar
http://p.fhlzf432a5gmvaj456jql4a4di.db.postgresbridge.com:5435/functions/geonames_stats_query/items.json?q=cougar

Of course, it is much more fun to use the functions with the web map!

Conclusions

  • Publishing web services using the Crunchy Bridge Container Apps is a neat way to quickly stand up web services.
  • PostgreSQL full-text indexes can very quickly search very large corpuses of text.
  • Seeing data visually in a map is a great way to explore it!

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at May 19, 2022 03:00 PM

May 03, 2022

Anita Graser (Underdark)

Inscribed and bounding circles in PostGIS

Today, I’m revisiting work from 2017. In Brezina, Graser & Leth (2017), we looked at different ways to determine the width of sidewalks in Vienna based on the city’s street surface database.

Image source: Brezina, Graser & Leth (2017)

Inscribed and circumscribed circles were a natural starting point. Circumscribed or bounding circle tools (the smallest circle to enclose an input polygon) have been commonly available in desktop GIS and spatial databases. Inscribed circle tools (the largest circle that fits into an input polygon) used to be less readily available. Lately, support has improved since ST_MaximumInscribedCircle has been added in PostGIS 3.1.0 (requires GEOS >= 3.9.0).

The tricky thing is that ST_MaximumInscribedCircle does not behave like ST_MinimumBoundingCircle. While the bounding circle function returns the circle geometry, the inscribed circle function returns a record containing information on the circle center and radius. Handling the resulting records involves some not so intuitive SQL.

Here is what I’ve come up with to get both the circle geometries as well as the radius values:

WITH foo AS 
(
	SELECT id, 
		ST_MaximumInscribedCircle(geom) AS inscribed_circle,
		ST_MinimumBoundingRadius(geom) AS bounding_circle
	FROM demo.sidewalks 
)
SELECT
	id,
	(bounding_circle).radius AS bounding_circle_radius,
	ST_MinimumBoundingCircle(geom) AS bounding_circle_geom, 
	(inscribed_circle).radius AS inscribed_circle_radius,
	ST_Buffer((inscribed_circle).center, (inscribed_circle).radius) AS inscribed_circle_geom
FROM foo

And here is how the results look like in QGIS, with purple shapeburst fills for bounding circles and green shapeburst fills for inscribed circles:

References

Brezina, T., Graser, A., & Leth, U. (2017). Geometric methods for estimating representative sidewalk widths applied to Vienna’s streetscape surfaces database. Journal of Geographical Systems, 19(2), 157-174, doi:10.1007/s10109-017-0245-2.

by underdark at May 03, 2022 04:21 PM

April 24, 2022

PostGIS Development

PostGIS 2.4.10

The PostGIS Team is pleased to release PostGIS 2.4.10! This is the end-of-life release of the 2.4 branch.

2.4.10

This release is a bug fix release, addressing issues found in the previous 2.4 releases.

by Regina Obe at April 24, 2022 12:00 AM

April 21, 2022

PostGIS Development

PostGIS 2.5.6

The PostGIS Team is pleased to release PostGIS 2.5.6!

2.5.6

This release is a bug fix release, addressing issues found in the previous 2.5 releases.

by Regina Obe at April 21, 2022 12:00 AM

April 20, 2022

Paul Ramsey

Some PostGIS Users

Two weeks ago, I wrote that getting large organizations to cop to using PostGIS was a hard lift, despite that fact that, anecdotally, I know that there is massive use of PostGIS in every sector, at every scale of institution.

Simple Clues

Here’s a huge tell that PostGIS is highly in demand: despite the fact that PostGIS is a relatively complex extension to build (it has numerous dependencies) and deploy (the upgrade path between versions can be complex) every single cloud offering of PostgreSQL includes PostGIS.

AWS, Google Cloud, Azure, Crunchy Bridge, Heroku, etc, etc. Also forked not-quite-Postgres things like Aurora and AlloyDB. Also not-Postgres-but-trying things like Cockroach and Yugabyte.

If PostGIS was a niche hobbyist project…? Complete the sentence any way you like.

Logos

True to form, I received a number of private messages from people working in or with major institutions you have heard of, confirming their PostGIS use, and the fact that the institution would not publicly validate it.

However, I also heard from a couple medium sized companies, which seem to be the only institutions willing to talk about how useful they find open source in growing their businesses.

Hailey Eckstrand of Foundry Spatial writes to say:

Foundry Spatial uses PostGIS in development and production. In development we use it as our GIS processing engine and warehouse. We integrate spatial data (often including rasters that have been loaded into PostGIS) into a watershed fabric and process summaries for millions of watersheds across North America. We often use it in production with open source web tooling to return results through an API based on user input. One of our more complex usages is to return raster results within polygons and along networks within a user supplied distance from a click location. We find the ease and power of summarizing and analyzing many spatial datasets with a single SQL query to be flexible, performant, efficient, and&ellps; FUN!

Dian Fay of Understory writes in:

We use PostGIS at Understory to track and record storms, manage fleets of weather stations, and optimize geographic risk concentration for insurance. PostGIS lets us do all this with the database tools we already know & love, and without severing the connections between geographic and other categories of information.

More logos?

Want to appear in this space? Email me!

April 20, 2022 08:00 AM

Paul Ramsey

Who are the Biggest PostGIS Users?

The question of “who uses PostGIS” or “how big is PostGIS” or “how real is PostGIS” is one that we have been wrestling with literally since the first public release back in 2001.

There is no doubt that institutional acceptance is the currency of … more institutional acceptance.

Oroboros

So naturally, we would love to have a page of logos of our major users, but unfortunately those users do not self-identify.

As an open source project PostGIS has a very tenuous grasp at best on who the institutional users are, and things have actually gotten worse over time.

Originally, we were a source-only project and the source was hosted on one web server we controlled, so we could literally read the logs and see institutional users. At the time mailing lists were the only source of project communication, so we could look at the list participants, and get a feel from that.

All that’s gone now. Most users get their PostGIS pre-installed by their cloud provider, or pre-built from a package repository.

So what do we know?

IGN

In the early days, I collected use cases from users I identified on the mailing list. My favourite was our first major institutional adopter, the Institut Géographique National, the national mapping agency of France.

IGN

In 2005, they decided to move from a desktop GIS paradigm for their nation-wide basemap (of 150M features), to a database-centric architecture. They ran a bake-off of Oracle, DB2 and PostgreSQL (I wonder who got PostgreSQL into the list) and determined that all the options were similar in performance and functionality for their uses. So they chose the open source one. To my knowledge IGN is to this day a major user of PostgreSQL / PostGIS.

GlobeXplorer

Though long-gone as a brand, it’s possible the image management system that was built by GlobeXplorer in the early 2000’s is still spinning away in the bowels of Maxar.

MAXAR

GlobeXplorer was both one of the first major throughput use cases we learned about, and also the first one where we knew we’d displaced a proprietary incumbant. GlobeXplorer was one of the earliest companies explicitly serving satellite imagery to the web and via web APIs. They used a spatial database to manage their catalogue of images and prepared product. Initially it was built around DB2, but DB2 was a poor scaling choice. PostGIS was both physically faster and (more importantly) massively cheaper as scale went up.

RedFin

RedFin was a rarity, a use case found in the wild that we didn’t have to track down ourselves.

RedFin

They described in some detail their path from MySQL to PostgreSQL, including the advantages of having PostGIS.

Using PostGIS, we could create an index on centroid_col, price, and num_bedrooms. These indexes turned many of our “killer” queries into pussycats.

Google

Google is not that big on promoting any technology they haven’t built in house, but we have heard individual Google developers confirm that they use core open source geospatial libraries in their work, and that PostGIS is included in the mix.

Google

The biggest validation Google ever gave PostGIS was in a press release that recognized that the set of “users of spatial SQL” was basically the same as the set of “PostGIS users”.

Our new functions and data types follow the SQL/MM Spatial standard and will be familiar to PostGIS users and anyone already doing geospatial analysis in SQL. This makes workload migrations to BigQuery easier. We also support WKT and GeoJSON, so getting data in and out to your other GIS tools will be easy.

They didn’t address their new release to “Esri users” or “Oracle users” or “MySQL users”, they addressed it to the relevant population: PostGIS users.

More!

Getting permission to post logos is hard. Really hard. I’ve watched marketing staff slave over it. I’ve slaved over it myself.

Major automaker? Check. Major agricultural company? Check. Major defence contractor? Check, check, check. National government? Check. State, local, regional? Check, check, check. Financial services? Check. Management consulting? Check.

Yes, PostGIS is real.

At some point, for a project with a $0 price point, you just stop. If a user can’t be bothered to do the due diligence on the software themselves, to reap all the advantages we offer, for free, I’m not going to buy them a steak dinner, or spoon feed them references.

That said! If you work for a major government or corporate institution and you are allowed to publicize your use of PostGIS, I would love to write up a short description of your use, for the web site and our presentation materials.

Email me!

April 20, 2022 08:00 AM

April 11, 2022

Crunchy Data

PostGIS For Newbies

PostGIS is one of the most awesome extensions for PostgreSQL and can turn a relational database into a really powerful GIS (Geographic Information System). The PostGIS community is really great about documentation and training and this post is aimed at getting you some resources on how to get started with the major components of using PostGIS as a super beginner. I’ll help you get a sample dataset up, import a shape file, and get that all published to a web browser.

What is PostGIS?

PostGIS is a Postgres extension for spatial data types like points, lines, and polygons to be stored inside a database. Most commonly you’ll see people using PostGIS with spatial points on a map or globe in longitude and latitude, but there’s some other interesting use cases out there like neuroscience and networking so it can be used for any system with spatial relationships. PostGIS also has a large set of functions that allow you to work with geography and geometry inside sql and indexes to make database tasks efficient.

PostGIS can be used to store spatial data, create and store spatial shapes, determine routes, calculate areas and distances. It is used by map makers, developers, and scientists in a variety of applications like real estate (Redfin), mapping and routing apps like Ride the City, antiquities, and NOAA.

In terms of project architecture, PostGIS will be the primary source of spatial data and often a desktop GIS client will be used to interact, update, change, or query the data. The database feeds a web application and user front end and in some cases other data tools. PostGIS-stack

The Essential Tools

Postgres: You need a working Postgres database, I always use Crunchy Bridge since it comes with all the PostGIS extensions out of the box and it's easy to spin and close down when I’m done.

PostGIS: You just run CREATE EXTENSION postgis; inside PostgresSQL to get going.

pgAdmin: You will very likely want a GUI interface for working with Postgres and PostGIS. pgAdmin has a nice 👁️ geometry viewer for seeing PostGIS data on map format. The primary use of pgAdmin is working with the data, so you’ll still need a desktop GIS to do lots of layering, labeling, and fancy map work.

QGIS: You will very likely want a desktop app for working with GIS data, very similar to the GUI interfaces for databases like pgAdmin but with a lot more functions for loading maps, labeling and enhancing them.

pg_featureserv: This is a quick way to expose your PostGIS data to a web service and API.

Sample PostGIS Data Load

To work a little further into this tutorial, let’s load some PostGIS data into our database. You can use the same dataset as the PostGIS tutorial, so if you’ve done that before or are doing that next, you can use this same set. Using pgAdmin, connect to your database, and then you’ll need to create a database or use the postgis db. Then you’ll install PostGIS, CREATE EXTENSION postgis;. Then run a restore from the backup file.

postgis-restore-1

Refresh your database and you should be able to see the new tables. If you query the streets or neighborhoods and click to the Geometry Viewer, you can see your data.

pgadmin-nyc

If you get into your data and do SELECT * FROM nyc_neighborhoods; in the data view you will see a column geom that is an opaque binary form and you might have been expecting to see long/lat data points. This is where the PostGIS functions start to come in. ST_AsText(geom) is a function to turn that binary into geometry points. If you want to see geometry / latitude longitude information, you’ll need to do SELECT gid, boroname, name, ST_AsText(geom) FROM nyc_neighborhoods; to see the actual lat/long data that’s being rendered on your screen.

Here’s some quick concepts for working with PostGIS SQL and geometry functions.

Finding single points:

SELECT name, ST_AsText(geom)
  FROM nyc_subway_stations
  LIMIT 10;

result

"Cortlandt St"	"POINT(583521.854408956 4507077.862599085)"
"Rector St"	"POINT(583324.4866324601 4506805.373160211)"
"South Ferry"	"POINT(583304.1823994748 4506069.654048115)"
"138th St"	"POINT(590250.10594797 4518558.019924332)"
"149th St"	"POINT(590454.7399891173 4519145.719617855)"
"149th St"	"POINT(590465.8934191109 4519168.697483203)"
"161st St"	"POINT(590573.169495527 4520214.766177284)"
"167th St"	"POINT(591252.8314104103 4520950.353355553)"
"167th St"	"POINT(590946.3972262995 4521077.318976877)"
"170th St"	"POINT(591583.6111452815 4521434.846626811)"

Calculating area

In square meters

SELECT ST_Area(geom)
  FROM nyc_neighborhoods
  WHERE name = 'West Village';

result

1044614.5296485956

Calculating Distance

SELECT ST_Length(geom)
  FROM nyc_streets
  WHERE name = 'Columbus Cir';

result

308.3419936909855

Geometry & Geography

geometrygeography In PostGIS there is an important distinction between geometry and geography - geometry being cartesian and geography adding additional calculations for the curvature of the earth. In general, if you’re dealing with small areas like a city or building, you don’t need to add in the extra computing overhead for geography, but if you’re trying to calculate something larger like airline routes, you do.

This is how you could create a table of geography (ie casting data) from one of our existing tables into a new table has geometry points.

CREATE TABLE nyc_subway_stations_geog AS
SELECT
  Geography(ST_Transform(geom,4326)) AS geog,
  name,
  routes
FROM nyc_subway_stations;

Map Editing and Adding More Data with QGIS

Let’s open up this data set in QGIS now. Once this is installed locally, you’ll connect your database to it. You will add each table you want in the map as a layer and you can add labels and build from there.

Next, let’s add some new data. We can find a shapefile for anything we want to add in New York and we can add that to our dataset. I found something on the NYC Open Dataset website to add in building footprints. Shapefiles (.shp), actually have all the location data in them that we need as well built into the file itself.

Screen Shot 2022-03-03 at 3.24.46 PM

Using the Layer – Add Layer – Add Vector I inserted this shape file into my map and you can see it creates a layer on top of the map I already had. With the QGIS DB Manager tool, you can insert your shape file layer data back into your PostGIS database as its own table. So easy, right?

Dig in more with QGIS in their docs and training manual.

For moving data in and out of PostGIS shp2pgsql and ogr2ogr are also really helpful. Kat has a nice write up about different ways to load PostGIS data to go beyond the PgAdmin backup file and simple shape file examples.

Publishing PostGIS Data to the Web

There’s quite a variety of ways to get PostGIS data to the web, from using a web framework like Python, GeoDjango, Rails, etc. There’s also some open source tools like Geoserver and Mapserver. All of these require quite a bit of setup, so I’ll skip them for this intro, but I will show you one of Crunchy Data’s tools that is a really easy way to get your spatial data in web app and on to the next steps, pg_featureserv. To run this locally as a quick test, install the files, set up your database environment string, start it with ./pg_featureserv, and you have it up at running at http://localhost:9000/index.html.

We’re doing a public beta testing of a feature to run small apps like this inside the database. You can spin up a web based map with one sql command (WOW!):

SELECT run_container('-dt -p 5435:5433/tcp -e DATABASE_URL="postgres://application:ExSdKI5SAH0tyBS3fMQgKsUWt3VEx1iWNx97ElShxalHo@p.tcidkauygvdljdrbxwkqlnjl5y.postgresbridge.com:5432/postgres" -e PGFS_SERVER_HTTPPORT=5433  docker.io/pramsey/pg_featureserv:latest');

High Five 

Ok, you've loaded PostGIS demo data, added layers to QGIS, imported a shape file as a new layer, and added that back into your database. You learned a few sql functions for PostGIS and got a web map running in a browser. Pretty sweet, eh? If you’re ready for more there is so much good stuff out there I don’t even know where to start. I really like Paul Ramsey’s intro video. We have YouTube videos from a few years of PostGIS Days and there’s lots of good stuff on on the Crunchy blog with great PostGIS tips and tricks.

by Elizabeth Christensen (Elizabeth.Christensen@crunchydata.com) at April 11, 2022 09:00 AM

March 15, 2022

Crunchy Data

Spatial Filters in pg_featureserv with CQL

pg_featureserv provides access to the powerful spatial database capabilities of PostGIS and PostgreSQL via a lightweight web service. To do this, it implements the OGC API for Features (OAPIF) RESTful protocol. OAPIF is part of the Open Geospatial Consortium (OGC) OGC API suite of standards.

In a previous post, we announced an exciting new capability for pg_featureserv : support for CQL filters. CQL (Common Query Language) is another OGC standard that provides the equivalent of SQL WHERE clauses for web queries.

As you would expect given the OGC focus on promoting ease-of-access to spatial information, CQL supports spatial filtering. This lets us take advantage of PostGIS' ability to query spatial data very efficiently. In this post we'll show some examples of spatial filtering using CQL with pg_featureserv.

The companion vector tile service pg_tileserv also supports CQL, and spatial filtering works there as well.

CQL Spatial Filters

Spatial filtering in CQL involves using spatial predicates to test a condition on the geometry property of features. Spatial predicates include the standard OGC Simple Features predicates for spatial relationships:

  • INTERSECTS - tests whether two geometries intersect
  • DISJOINT - tests whether two geometries have no points in common
  • CONTAINS - tests whether a geometry contains another
  • WITHIN - tests whether a geometry is within another
  • EQUALS - tests whether two geometries are topologically equal
  • CROSSES - tests whether the geometries cross
  • OVERLAPS - tests whether the geometries overlap
  • TOUCHES - tests whether the geometries touch

pg_featureserv also implements the distance predicate DWITHIN.

Spatial predicates are typically used to compare the feature geometry property against a geometry value. Geometry values are expressed in Well-Known Text (WKT):

POINT (1 2)
LINESTRING (0 0, 1 1)
POLYGON ((0 0, 0 9, 9 0, 0 0))
POLYGON ((0 0, 0 9, 9 0, 0 0),(1 1, 1 8, 8 1, 1 1))
MULTIPOINT ((0 0), (0 9))
MULTILINESTRING ((0 0, 1 1),(1 1, 2 2))
MULTIPOLYGON (((1 4, 4 1, 1 1, 1 4)), ((1 9, 4 9, 1 6, 1 9)))
GEOMETRYCOLLECTION(POLYGON ((1 4, 4 1, 1 1, 1 4)), LINESTRING (3 3, 5 5), POINT (1 5))
ENVELOPE (1, 2, 3, 4)

By default, the coordinate reference system (CRS) of geometry values is geodetic (longitude and latitude). If needed a different CRS can be specified by using the filter-crs parameter. (PostGIS supports a large number of standard coordinate reference systems.)

Here are some examples of spatial filter conditions:

INTERSECTS(geom, ENVELOPE(-100, 49, -90, 50) )
CONTAINS(geom, POINT(-100 49) )
DWITHIN(geom, POINT(-100 49), 0.1)

Of course, these can be combined with attribute conditions to express real-world queries.

Publishing Geographic Names

For these examples we'll use the U.S. Geographic Names Information System (GNIS) dataset. It contains more than 2 million points for named geographical features. We load this data into a PostGIS spatial table called us.geonames. The point location values are stored in a column called geom of type geography. (PostGIS allows storing spatial data as either geometry or geography. We'll explain later why for this case it is better to use geography).

We create a spatial index on this column to provide fast spatial querying.

CREATE TABLE us.geonames (
    id integer PRIMARY KEY,
    name text,
    lat double precision,
    lon double precision,
    type text,
    state text,
    geom geography(Point),
    ts tsvector
);

CREATE INDEX us_geonames_gix ON us.geonames USING GIST ( geom );

We can now publish the dataset with pg_featureserv, and view query results on the web UI.

The service collections page shows the published tables and views (here we have used the configuration Database.TableIncludes setting to publish just the us schema):

http://localhost:9000/collections.html

pgfs-cql-spatial-collections

The collections\us.geonames page shows the collection metadata:

http://localhost:9000/collections/us.geonames.html

pgfs_cql-spatial-usgeonames

Querying with INTERSECTS

For this example we'll query water features on the San Juan Islands in the state of Washington, USA. Because there is no GNIS attribute providing region information, we have to use a spatial filter to specify the area we want to query. We used QGIS to create a polygon enclosing the islands.

pgfs-cql-spatial-sanjuan-polygon

We can convert the polygon to WKT and use it in an INTERSECTS spatial predicate (since we are querying points, WITHIN could be used as well - it produces the same result). To retrieve only water features (Lakes and Reservoirs) we add the condition type IN ('LK','RSV'). The query URL is:

http://localhost:9000/collections/public.geonames/items.html?filter=type IN ('LK','RSV') AND INTERSECTS(geom,POLYGON ((-122.722 48.7054, -122.715 48.6347, -122.7641 48.6046, -122.7027 48.3885, -123.213 48.4536, -123.2638 48.6949, -123.0061 48.7666, -122.722 48.7054)))

The result of the query is a dataset containing 33 GNIS points:

pgfs-cql-spatial-sanjuan-lkrsv

Querying with DWITHIN

Now we'll show an example of using a distance-based spatial filter, using the DWITHIN predicate. This is the reason we loaded the GNIS data as geography.
DWITHIN tests whether a feature geometry is within a given distance of a geometry value. By using the geography type, we can specify the distance in metres, which are the units of measure of the geodetic EPSG:4326 coordinate system. If we had loaded the dataset using the geometry type, the units would have been degrees, which is awkward to use. Also, geography computes the distance correctly on the surface of the earth (using the great-circle distance).

Let's query for mountains (type = 'MT') within 100 kilometres of Seattle (lat/long of about 47.6,-122.34 - note that WKT requires this as long-lat). The query URL is:

http://localhost:9000/collections/us.geonames_geo/items.html?filter=type = 'MT' AND DWITHIN(geom,Point(-122.34 47.6),100000)&limit=1000

This gives a result showing 695 mountains near Seattle. It's a hilly place!

pgfs-cql-spatial-dwithin-mt

Try it yourself!

CQL filtering will be included in the forthcoming pg_featureserv Version 1.3. But you can try it out now by downloading the latest build. Let us know what use cases you find for CQL spatial filtering!

More to come

pg_featureserv with CQL attribute and spatial filtering provides a highly effective platform for accessing data over the web. And we're continuing to enhance it with new capabilities to unlock even more of the power of PostGIS and PostgreSQL. Stay tuned for posts on advanced query functionality including temporal filtering, geometry transformations, and aggregation.

by Martin Davis (Martin.Davis@crunchydata.com) at March 15, 2022 09:00 AM

March 10, 2022

Crunchy Data

PostGIS vs GPU: Performance and Spatial Joins

Every once in a while, a post shows up online about someone using GPUs for common spatial analysis tasks, and I get a burst of techno-enthusiasm. Maybe this is truly the new way!

This week it was a post on GPU-assisted spatial joins that caught my eye. In summary, the author took a 9M record set of parking infractions data from Philadelphia and joined it to a 150 record set of Philadelphia neighborhoods. The process involved building up a little execution engine in Python. It was pretty manual but certainly fast.

philly02

I wondered: how bad would execution on a conventional environment like PostGIS be, in comparison?

Follow along, if you like!

Server Setup

I grabbed an 8-core cloud server with PostgreSQL 14 and PostGIS 3 from Crunchy Bridge DBaaS, like so:

bridge01

Data Download

Then I pulled down the data.

#
# Download Philadelphia parking infraction data
#
curl "https://phl.carto.com/api/v2/sql?filename=parking_violations&format=csv&skipfields=cartodb_id,the_geom,the_geom_webmercator&q=SELECT%20*%20FROM%20parking_violations%20WHERE%20issue_datetime%20%3E=%20%272012-01-01%27%20AND%20issue_datetime%20%3C%20%272017-12-31%27" > phl_parking.csv

#
# Download Philadelphia neighborhoods
#
wget https://github.com/azavea/geo-data/raw/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.zip

#
# Convert neighborhoods file to SQL
#
unzip Neighborhoods_Philadelphia.zip
shp2pgsql -s 102729 -D Neighborhoods_Philadelphia phl_hoods > phl_hoods.sql

# Connect to database
psql -d postgres

Data Loading and Preparation

Then I loaded the CSV and SQL data files into the database.

-- Set up PostGIS
CREATE EXTENSION postgis;

-- Read in the neighborhoods SQL file
\i phl_hoods.sql

-- Reproject the neighborhoods to EPSG:4326 to match the
-- parking data
ALTER TABLE phl_hoods
  ALTER COLUMN geom
  TYPE Geometry(multipolygon, 4326)
  USING st_transform(geom, 4326);

-- Create parking infractions table
CREATE TABLE phl_parking (
    anon_ticket_number integer,
    issue_datetime timestamptz,
    state text,
    anon_plate_id integer,
    division text,
    location text,
    violation_desc text,
    fine float8,
    issuing_agency text,
    lat float8,
    lon float8,
    gps boolean,
    zip_code text
    );

-- Read in the parking data
\copy phl_parking FROM 'phl_parking.csv' WITH (FORMAT csv, HEADER true);

This is the first step that takes any time. Reading in the 9M records from CSV takes about 29 seconds.

Because the parking data lacks a geometry column, I create a second table that does have a geometry column, and then index that.

CREATE TABLE phl_parking_geom AS
  SELECT anon_ticket_number,
    ST_SetSRID(ST_MakePoint(lon, lat), 4326) AS geom
  FROM phl_parking ;

ANALYZE phl_parking_geom;

Making a second copy while creating a geometry column takes about 24 seconds.

Finally, to carry out a spatial join, I will need a spatial index on the parking points. In this case I follow my advice about when to use different spatial indexes and build a "spgist" index on the geometry.

CREATE INDEX phl_parking_geom_spgist_x
  ON phl_parking_geom USING spgist (geom);

This is the longest process, and takes about 60 seconds.

Running the Query

The spatial join query is a simple inner join using ST_Intersects as the join condition.

SELECT h.name, count(*)
  FROM phl_hoods h
  JOIN phl_parking_geom p
    ON ST_Intersects(h.geom, p.geom)
  GROUP BY h.name;

Before running it, though, I took a look at the EXPLAIN output for the query, which is this.

HashAggregate  (cost=4031774.83..4031776.41 rows=158 width=20)
   Group Key: h.name
   ->  Nested Loop  (cost=0.42..4024339.19 rows=1487128 width=12)
         ->  Seq Scan on phl_hoods h  (cost=0.00..30.58 rows=158 width=44)
         ->  Index Scan using phl_parking_geom_spgist_x on phl_parking_geom p
             (cost=0.42..25460.90 rows=941 width=32)
               Index Cond: (geom && h.geom)
               Filter: st_intersects(h.geom, geom)

This is all very good, a nested loop on the smaller neighborhood table against the large indexed parking table, except for one thing: I spun up an 8 core server and my plan has no parallelism!

What is up?

SHOW max_worker_processes;            -- 8
SHOW max_parallel_workers;            -- 8
SHOW max_parallel_workers_per_gather; -- 2
SHOW min_parallel_table_scan_size;    -- 8MB

Aha! That minimum table size for parallel scans seems large compared to our neighborhoods table.

select pg_relation_size('phl_hoods');

-- 237568

Yep! So, first we will set the min_parallel_table_scan_size to 1kb, and then increase the max_parallel_workers_per_gather to 8 and see what happens.

SET max_parallel_workers_per_gather = 8;
SET min_parallel_table_scan_size = '1kB';

The EXPLAIN output for the spatial join now parallelized, but unfortunately only to 4 workers.

Finalize GroupAggregate  (cost=1319424.81..1319505.22 rows=158 width=20)
   Group Key: h.name
   ->  Gather Merge  (cost=1319424.81..1319500.48 rows=632 width=20)
         Workers Planned: 4
         ->  Sort  (cost=1318424.75..1318425.14 rows=158 width=20)
               Sort Key: h.name
               ->  Partial HashAggregate
                   (cost=1318417.40..1318418.98 rows=158 width=20)
                     Group Key: h.name
                     ->  Nested Loop
                         (cost=0.42..1018842.71 rows=59914937 width=12)
                           ->  Parallel Seq Scan on phl_hoods h
                               (cost=0.00..29.39 rows=40 width=1548)
                           ->  Index Scan using phl_parking_geom_spgist_x
                               on phl_parking_geom p
                               (cost=0.42..25460.92 rows=941 width=32)
                                 Index Cond: (geom && h.geom)
                                 Filter: st_intersects(h.geom, geom)

Now we still get a nested loop join on neighborhoods, but this time the planner recognizes that it can scan the table in parallel. Since the spatial join is fundamentally CPU bound and not I/O bound this is a much better plan choice.

SELECT h.name, count(*)
  FROM phl_hoods h
  JOIN phl_parking_geom p
    ON ST_Intersects(h.geom, p.geom)
  GROUP BY h.name;

The final query running with 4 workers takes 24 seconds to execute the join of the 9 million parking infractions with the 150 neighborhoods.

philly01

This is pretty good!

Is it faster than the GPU though? No, the GPU post says his custom python/GPU solution took just 8 seconds to execute. Still, the differences in environment are important:

  • The data in PostgreSQL is fully editable in place, by multiple users and applications.
  • The execution plan in PostgreSQL is automatically optimized. If the next query involves neighborhood names and plate numbers, it will be just as fast and involve no extra custom code.
  • The PostGIS engine is capable of answering 100s of other spatial questions about the database.
  • With a web service like pg_tileserv or pg_featureserv, the data are immediately publishable and remote queryable.

Conclusion

Basically, it is very hard to beat a bespoke performance solution with a general purpose tool. Yet, PostgreSQL/PostGIS comes within "good enough" range of a high end GPU solution, so that counts as a "win" to me.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at March 10, 2022 09:00 AM

March 07, 2022

Crunchy Data

CQL Filtering in pg_featureserv

The goal of pg_featureserv is to provide easy and efficient access to PostGIS from web clients.

To do this, it uses the emerging OGC API for Features (OAPIF) RESTful protocol. This is a natural fit for systems which need to query and communicate spatial data. The core OAPIF specification provides a basic framework for querying spatial datasets, but it has only limited capability to express filtering subsets of spatial tables.

In particular, it only allows filtering on single attribute values, and it only supports limited spatial filtering via the bbox query parameter (in PostGIS terms, this is equivalent to using the && operator with a box2d).

Of course, PostGIS and PostgresQL provide much more powerful capabilities to filter data using SQL WHERE clauses. It would be very nice to be able to use these via pg_featureserv. Luckily, the OGC is defining a way to allow filtering via the Common Query Language (CQL) which, as the name suggests, is a close match to SQL filtering capabilities. This is being issued under the OGC API suite as CQL2 (currently in draft).

Recently we added pg_featureserv support for most of CQL2. Here we'll describe the powerful new capability it provides.

CQL Overview

CQL is a simple language to describe logical expressions. A CQL expression applies to values provided by feature properties and constants including numbers, booleans and text values. Values can be combined using the arithmetic operators +,-,*, / and % (modulo). Conditions on values are expressed using simple comparisons (<,>,<=,>=,=,<>). Other predicates include:

prop IN (val1, val2, ...)
prop BETWEEN val1 AND val2
prop IS [NOT] NULL
prop LIKE | ILIKE pattern

Conditions can be combined with the boolean operators AND,OR and NOT.

You will notice that this is very similar to SQL (probably not a coincidence!). That makes it straightforward to implement, and easy to use for us database people.

CQL also defines syntax for spatial and temporal filters. We'll discuss those in a future blog post.

Filtering with CQL

A CQL expression can be used in a pg_featureserv request in the filter parameter.
This is converted to SQL and included in the WHERE clause of the underlying database query. This allows the database to use its query planner and any defined indexes to execute the query efficiently.

Here's an example. We'll query the Natural Earth admin boundaries dataset, which we've loaded into PostGIS as a spatial table. (See this post for details on how to do this.) We're going to retrieve information about European countries where the population is 5,000,000 or less. The CQL expression for this is continent = 'Europe' AND pop_est <= 5000000.

Here's the query to get this result:

http://localhost:9000/collections/ne.countries/items.json?properties=name,pop_est&filter=continent%20=%20%27Europe%27%20AND%20pop_est%20%3C=%205000000&limit=100

Note: it's a good idea to to URL-encode spaces and special characters.

This returns a GeoJSON response with 25 features:

pgfs-cql-europe-small-json

By using the extension html instead of json in the request we can visualize the result in the pg_featureserv UI:

pgfs-cql-europe-small (1)

More power, fewer functions

One of the cool things about pg_featureserv and its companion pg_tileserv is the ability to serve data provided by PostgreSQL functions. In a previous post we showed how to use a function to find countries where the name matches a search string . Now we can do this more easily and flexibly by using a CQL filter:

http://localhost:9000/collections/ne.countries/items.html?properties=name,pop_est&filter=name%20ILIKE%20%27Mo%25%27

Note that the ILIKE wildcard must be URL-encoded as %25.

pgfs-cql-ilike-mo

And the filter can easily include more complex conditions, which is harder to do with a function. But function serving is still powerful for things like generating spatial data and routing.

More to come...

We'll publish a blog post on the spatial filtering capabilities of CQL soon, along with some other interesting spatial capabilities in pg_featureserv. CQL support will be rolled out in pg_tileserv soon as well. This brings some exciting possibilities for large-scale data visualization!

PostgreSQL provides even more powerful expression capabilities than are available in CQL. There's things like string concatenation and functions, the CASE construct for "computed if", and others. What kinds of things would you like to see pg_featureserv support?

Try it!

CQL filtering will be included in the forthcoming pg_featureserv Version 1.3.
But you can try it out now by simply downloading the latest build. Let us know what use cases you find for CQL filtering!

by Martin Davis (Martin.Davis@crunchydata.com) at March 07, 2022 09:00 AM

February 12, 2022

PostGIS Development

PostGIS 3.2.1

The PostGIS Team is pleased to release PostGIS 3.2.1!

3.2.1

This release is a bug fix release, addressing issues found in the previous 3.1 releases.

by Regina Obe at February 12, 2022 12:00 AM

February 02, 2022

PostGIS Development

PostGIS 3.0.5

The PostGIS Team is pleased to release PostGIS 3.0.5.

3.0.5

by Regina Obe at February 02, 2022 12:00 AM

February 01, 2022

PostGIS Development

PostGIS 3.1.5

The PostGIS Team is pleased to release PostGIS 3.1.5!

3.1.5

This release is a bug fix release, addressing issues found in the previous 3.1 releases.

by Regina Obe at February 01, 2022 12:00 AM