### ### Planet PostGIS

Welcome to Planet PostGIS

November 20, 2023

PostGIS Development

PostGIS Patch Releases

The PostGIS development team is pleased to provide bug fix and performance enhancements 3.4.1, 3.3.5, 3.2.6, 3.1.10, 3.0.10 for the 3.4, 3.3, 3.2, 3.1, 3.0 stable branches.

by Regina Obe at November 20, 2023 12:00 AM

September 11, 2023

Crunchy Data

Random Geometry Generation with PostGIS

A user on the postgis-users had an interesting question today: how to generate a geometry column in PostGIS with random points, linestrings, or polygons?

Random data is important for validating processing chains, analyses and reports. The best way to test a process is to feed it inputs!

Random Points

Random points is pretty easy -- define an area of interest and then use the PostgreSQL random() function to create the X and Y values in that area.

CREATE TABLE random_points AS
  WITH bounds AS (
    SELECT 0 AS origin_x,
           0 AS origin_y,
           80 AS width,
           80 AS height
  )
  SELECT ST_Point(width  * (random() - 0.5) + origin_x,
                  height * (random() - 0.5) + origin_y,
                  4326)::Geometry(Point, 4326) AS geom,
         id
    FROM bounds,
         generate_series(0, 100) AS id

random points

Filling a target shape with random points is a common use case, and there's a special function just for that, ST_GeneratePoints(). Here we generate points inside a circle created with ST_Buffer().

CREATE TABLE random_points AS
  SELECT ST_GeneratePoints(
  	ST_Buffer(
  		ST_Point(0, 0, 4326),
  		50),
  	100) AS geom

If you have PostgreSQL 16, you can use the brand new random_normal() function to generate coordinates with a central tendency.

CREATE TABLE random_normal_points AS
  WITH bounds AS (
    SELECT 0 AS origin_x,
           0 AS origin_y,
           80 AS width,
           80 AS height
  )
  SELECT ST_Point(random_normal(origin_x, width/4),
                  random_normal(origin_y, height/4),
                  4326)::Geometry(Point, 4326) AS geom,
         id
    FROM bounds,
         generate_series(0, 100) AS id

random normal points

For PostgreSQL versions before 16, here is a user-defined version of random_normal().
CREATE OR REPLACE FUNCTION random_normal(
  mean double precision DEFAULT 0.0,
  stddev double precision DEFAULT 1.0)
RETURNS double precision AS
$$
DECLARE
    u1 double precision;
    u2 double precision;
    z0 double precision;
    z1 double precision;
BEGIN
    u1 := random();
    u2 := random();

    z0 := sqrt(-2.0 * ln(u1)) * cos(2.0 * pi() * u2);
    z1 := sqrt(-2.0 * ln(u1)) * sin(2.0 * pi() * u2);

    RETURN mean + (stddev * z0);
END;
$$ LANGUAGE plpgsql;

Random Linestrings

Linestrings are a little harder, because they involve more points, and aesthetically we like to avoid self-crossings of lines.

Two-point linestrings are pretty easy to generate with ST_MakeLine() -- just generate twice as many random points, and use them as the start and end points of the linestrings.

CREATE TABLE random_2point_lines AS
  WITH bounds AS (
    SELECT 0 AS origin_x, 80 AS width,
           0 AS origin_y, 80 AS height
  )
  SELECT ST_MakeLine(
  	       ST_Point(random_normal(origin_x, width/4),
                    random_normal(origin_y, height/4),
                    4326),
  	       ST_Point(random_normal(origin_x, width/4),
                    random_normal(origin_y, height/4),
                    4326))::Geometry(LineString, 4326) AS geom,
         id
    FROM bounds,
         generate_series(0, 100) AS id

random lines 2

Multi-point random linestrings are harder, at least while avoiding self-intersections, and there are a lot of potential approaches. While a recursive CTE could probably do it, an imperative approach using PL/PgSQL is more readable.

The generate_random_linestring() function starts with an empty linestring, and then adds on new segments one at a time, changing the direction of the line with each new segment.

Here is the full generate_random_linestring() definition.
CREATE OR REPLACE FUNCTION generate_random_linestring(
    start_point geometry(Point))
  RETURNS geometry(LineString, 4326) AS
$$
DECLARE
  num_segments integer := 10; -- Number of segments in the linestring
  deviation_max float := radians(45); -- Maximum deviation
  random_point geometry(Point);
  deviation float;
  direction float := 2 * pi() * random();
  segment_length float := 5; -- Length of each segment (adjust as needed)
  i integer;
  result geometry(LineString) := 'SRID=4326;LINESTRING EMPTY';
BEGIN
  result := ST_AddPoint(result, start_point);
  FOR i IN 1..num_segments LOOP
    -- Generate a random angle within the specified deviation
    deviation := 2 * deviation_max * random() - deviation_max;
    direction := direction + deviation;

    -- Calculate the coordinates of the next point
    random_point := ST_Point(
        ST_X(start_point) + cos(direction) * segment_length,
        ST_Y(start_point) + sin(direction) * segment_length,
        ST_SRID(start_point)
      );

    -- Add the point to the linestring
    result := ST_AddPoint(result, random_point);

    -- Update the start point for the next segment
    start_point := random_point;

    END LOOP;

    RETURN result;
END;
$$
LANGUAGE plpgsql;

We can use the generate_random_linestring() function now to turn random start points (created in the usual way) into fully random squiggly lines!

CREATE TABLE random_lines AS
  WITH bounds AS (
    SELECT 0 AS origin_x, 80 AS width,
           0 AS origin_y, 80 AS height
  )
  SELECT id,
    generate_random_linestring(
  	  ST_Point(random_normal(origin_x, width/4),
               random_normal(origin_y, height/4),
               4326))::Geometry(LineString, 4326) AS geom
    FROM bounds,
         generate_series(1, 100) AS id;

random lines

Random Polygons

At the simplest level, a set of random boxes is a set of random polygons, but that's pretty boring, and easy to generate using ST_MakeEnvelope().

CREATE TABLE random_boxes AS
  WITH bounds AS (
    SELECT 0 AS origin_x, 80 AS width,
           0 AS origin_y, 80 AS height
  )
  SELECT ST_MakeEnvelope(
  	       random_normal(origin_x, width/4),
  	       random_normal(origin_y, height/4),
  	       random_normal(origin_x, width/4),
  	       random_normal(origin_y, height/4)
  	     )::Geometry(Polygon, 4326) AS geom,
         id
    FROM bounds,
         generate_series(0, 20) AS id

random boxes

But more interesting polygons have curvy and convex shapes, how can we generate those?

Random Polygons with Concave Hull

One way is to extract a polygon from a set of random points, using ST_ConcaveHull(), and then applying an "erode and dilate" effect to make the curves more pleasantly round.

We start with a random center point for each polygon, and create a circle with ST_Buffer().

hull 1

Then use ST_GeneratePoints() to fill the circle with some random points -- not too many, so we get a nice jagged result.

hull 2

Then use ST_ConcaveHull() to trace a "boundary" around those points.

hull 3

Then apply a negative buffer, to erode the shape.

hull 4

And finally a positive buffer to dilate it back out again.

hull 5

Generating multiple hulls involves stringing together all the above operations with CTEs or subqueries.

Here is the full query to generate multiple polygons with the concave hull method.
CREATE TABLE random_hulls AS
  WITH bounds AS (
    SELECT 0 AS origin_x,
           0 AS origin_y,
           80 AS width,
           80 AS height
  ),
  polypts AS (
    SELECT ST_Point(random_normal(origin_x, width/2),
  	                random_normal(origin_y, width/2),
                    4326)::Geometry(Point, 4326) AS geom,
           polyid
    FROM bounds,
         generate_series(1,10) AS polyid
  ),
  pts AS (
    SELECT ST_GeneratePoints(ST_Buffer(geom, width/5), 20) AS geom,
           polyid
    FROM bounds,
         polypts
  )
  SELECT ST_Multi(ST_Buffer(
  	       ST_Buffer(
  	       	 ST_ConcaveHull(geom, 0.3),
  	       	 -2.0),
  	       3.0))::Geometry(MultiPolygon, 4326) AS geom,
         polyid
    FROM pts;

random hulls

Random Polygons with Voronoi Polygons

Another approach is to again start with random points, but use the Voronoi diagram as the basis of the polygon.

Start with a center point and buffer circle.

voronoi 2

Generate random points in the circle.

voronoi 3

Use the ST_VoronoiPolygons() function to generate polygons that subdivide the space using the random points as seeds.

voronoi 4

Filter just the polygons that are fully contained in the originating circle.

voronoi 5

And then use ST_Union() to merge those polygons into a single output shape.

voronoi 6

Generating multiple hulls again involves stringing together the above operations with CTEs or subqueries.

Here is the full query to generate multiple polygons with the Voronoi method.
CREATE TABLE random_delaunay_hulls AS
  WITH bounds AS (
    SELECT 0 AS origin_x,
           0 AS origin_y,
           80 AS width,
           80 AS height
  ),
  polypts AS (
    SELECT ST_Point(random_normal(origin_x, width/2),
  	                random_normal(origin_y, width/2),
                    4326)::Geometry(Point, 4326) AS geom,
           polyid
    FROM bounds,
         generate_series(1,20) AS polyid
  ),
  voronois AS (
    SELECT ST_VoronoiPolygons(
    	     ST_GeneratePoints(ST_Buffer(geom, width/5), 10)
    	   ) AS geom,
           ST_Buffer(geom, width/5) AS geom_clip,
           polyid
    FROM bounds,
         polypts
  ),
  cells AS (
  	SELECT (ST_Dump(geom)).geom, polyid, geom_clip
  	FROM voronois
  )
  SELECT ST_Union(geom)::Geometry(Polygon, 4326) AS geom, polyid
  FROM cells
  WHERE ST_Contains(geom_clip, geom)
  GROUP BY polyid;

random delaunay

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at September 11, 2023 01:00 PM

September 10, 2023

Boston GIS (Regina Obe, Leo Hsu)

Why People care about PostGIS and Postgres and FOSS4GNA

Paul Ramsey and I recently had a Fireside chat with Path to Cituscon. Checkout the Podcast Why People care about PostGIS and Postgres. There were a surprising number of funny moments and very insightful stuff.

It was a great fireside chat but without the fireplace. We covered the birth and progression of PostGIS for the past 20 years and the trajectory with PostgreSQL. We also learned of Paul's plans to revolutionize PostGIS which was new to me. We covered many other side-line topics, like QGIS whose birth was inspired by PostGIS. We covered pgRouting and mobilitydb which are two other PostgreSQL extension projects that extend PostGIS.

We also managed to fall into the Large Language Model conversation of which Paul and I are on different sides of the fence on.

Continue reading "Why People care about PostGIS and Postgres and FOSS4GNA"

by Regina Obe (nospam@example.com) at September 10, 2023 03:22 AM

August 15, 2023

PostGIS Development

PostGIS 3.4.0

The PostGIS Team is pleased to release PostGIS 3.4.0! This version works with versions PostgreSQL 12-16, GEOS 3.6 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.4.1+ is needed.

3.4.0

This release is a major release, it includes bug fixes since PostGIS 3.3.4 and new features.

by Regina Obe at August 15, 2023 12:00 AM

August 05, 2023

PostGIS Development

PostGIS 3.4.0rc1

The PostGIS Team is pleased to release PostGIS 3.4.0rc1! Best Served with PostgreSQL 16 Beta2 and GEOS 3.12.0.

This version requires PostgreSQL 12 or higher, GEOS 3.6 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.4.1+ is needed.

3.4.0rc1

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.3.4 and new features.

by Regina Obe at August 05, 2023 12:00 AM

July 29, 2023

PostGIS Development

PostGIS 3.4.0beta2

The PostGIS Team is pleased to release PostGIS 3.4.0beta2! Best Served with PostgreSQL 16 Beta2 and GEOS 3.12.0.

This version requires PostgreSQL 12 or higher, GEOS 3.6 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.4.1+ is needed.

3.4.0beta2

This release is a beta of a major release, it includes bug fixes since PostGIS 3.3.4 and new features.

by Regina Obe at July 29, 2023 12:00 AM

July 28, 2023

PostGIS Development

PostGIS 3.3.4 Patch Release

The PostGIS development team is pleased to announce bug fix release 3.3.4, mostly focused on Topology fixes.

by Sandro Santilli at July 28, 2023 12:00 AM

July 14, 2023

PostGIS Development

PostGIS 3.4.0beta1

The PostGIS Team is pleased to release PostGIS 3.4.0beta1! Best Served with PostgreSQL 16 Beta2 and GEOS 3.12.0.

This version requires PostgreSQL 12 or higher, GEOS 3.6 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.4.1+ is needed.

3.4.0beta1

This release is a beta of a major release, it includes bug fixes since PostGIS 3.3.3 and new features.

by Regina Obe at July 14, 2023 12:00 AM

July 08, 2023

Paul Ramsey

MapScaping Podcast: Pg_EventServ

Last month I got to record a couple podcast episodes with the MapScaping Podcast’s Daniel O’Donohue. One of them was on the benefits and many pitfalls of putting rasters into a relational database, and the other was about real-time events and pushing data change information out to web clients!

TL;DR: geospatial data tends to be more “visible” to end user clients, so communicating change to multiple clients in real time can be useful for “common operating” situations.

I also recorded a presentation about pg_eventserv for PostGIS Day 2022.

July 08, 2023 12:00 AM

June 03, 2023

Boston GIS (Regina Obe, Leo Hsu)

PostGIS Bundle 3.3.3 for Windows with MobilityDB

I recently released PostGIS 3.3.3. bundle for Windows which is available on application stackbuilder and OSGeo download site for PostgreSQL 11 - 15. If you are running PostgreSQL 12 or above, you get an additional bonus extension MobilityDB which is an extension that leverages PostGIS geometry and geography types and introduces several more spatial-temporal types and functions specifically targeted for managing objects in motion.

What kind of management, think of getting the average speed a train is moving at a segment in time or collisions in time, without any long SQL code. Just use a function on the trip path, and viola. Think about storing GPS data very compactly in a singe row /column with time and being able to ask very complex questions with very little SQL. True PostGIS can do some of this using geometry with Measure (geometryM) geometry types, but you have to deal with that craziness of converting M back to timestamps, which mobilitydb temporal types automatically encode as true PostgreSQL timestamp types.

Anita Graser, of QGIS and Moving Pandas fame, has written several posts about it such as: Visualizing Trajectories with QGIS and mobilitydb and Detecting close encounters using MobilityDB 1.0.

Continue reading "PostGIS Bundle 3.3.3 for Windows with MobilityDB"

by Regina Obe (nospam@example.com) at June 03, 2023 12:34 AM

May 30, 2023

Crunchy Data

SVG Images from Postgres

PostGIS excels at storing, manipulating and analyzing geospatial data. At some point it's usually desired to convert raw spatial data into a two-dimensional representation to utilize the integrative capabilities of the human visual cortex. In other words, to see things on a map.

PostGIS is a popular backend for mapping technology, so there are many options to choose from to create maps. Data can be rendered to a raster image using a web map server like GeoServer or MapServer; it can be converted to GeoJSON or vector tiles via servers such as pg_featureserv and pg_tileserv and then shipped to a Web browser for rendering by a library such as OpenLayers, MapLibre or Leaflet; or a GIS application such as QGIS can connect to the database and create richly-styled maps from spatial queries.

What these options have in common is that they require external tools which need to be installed, configured and maintained in a separate environment. This can introduce unwanted complexity to a geospatial architecture.

This post presents a simple way to generate maps entirely within the database, with no external infrastructure required.

SVG for the win

A great way to display vector data is to use the Scalable Vector Graphic (SVG) format. It provides rich functionality for displaying and styling geometric shapes. SVG is widely supported by web browsers and other tools.

By including CSS and Javascript it's possible to add advanced styling, custom popups, dynamic behaviour and interaction with other web page elements.

Introducing pg-svg

Generating SVG "by hand" is difficult. It requires detailed knowledge of the SVG specification, and constructing a complex text format in SQL is highly error-prone. While PostGIS has had the function ST_AsSVG for years, it only produces the SVG path data attribute value. Much more is required to create a fully-styled SVG document.

The PL/pgSQL library pg-svg solves this problem! It makes it easy to convert PostGIS data into styled SVG documents. The library provides a simple API as a set of PL/pgSQL functions which allow creating an SVG document in a single SQL query. Best of all, this installs with a set of functions, nothing else required!

Example map of US high points

The best way to understand how pg-svg works is to see an example. We'll create an SVG map of the United States showing the highest point in each state. The map has the following features:

  • All 50 states are shown, with Alaska and Hawaii transformed to better fit the map
  • States are labeled, and filled with a gradient
  • High points are shown at their location by triangles whose color and size indicate the height of the high point.
  • Tooltips provide more information about states and highpoints.

The resulting map looks like this (to see tooltips open the raw image):

The SQL query to generate the map is here. It can be downloaded and run using psql:

psql -A -t -o us-highpt.svg  < us-highpt-svg.sql

The SVG output us-highpt.svg can be viewed in any web browser.

How it Works

Let's break the query down to see how the data is prepared and then rendered to SVG. A dataset of US states in geodetic coordinate system (WGS84, SRID = 4326) is required. We used the Natural Earth states and provinces data available here. It is loaded into a table ne.admin_1_state_prov with the following command:

shp2pgsql -c -D -s 4326 -i -I ne_10m_admin_1_states_provinces.shp ne.admin_1_state_prov | psql

The query uses the SQL WITH construct to organize processing into simple, modular steps. We'll describe each one in turn.

Select US state features

First, the US state features are selected from the Natural Earth boundaries table ne.admin_1_state_prov.

us_state AS (SELECT name, abbrev, postal, geom
  FROM ne.admin_1_state_prov
  WHERE adm0_a3 = 'USA')

Make a US state map

Next, the map is made more compact by realigning the far-flung states of Alaska and Hawaii.
This is done using PostGIS affine transformation functions. The states are made more proportionate using ST_Scale, and moved closer to the lower 48 using ST_Translate. The scaling is done around the location of the state high point, to make it easy to apply the same transformation to the high point feature.

,us_map AS (SELECT name, abbrev, postal,
    -- transform AK and HI to make them fit map
    CASE WHEN name = 'Alaska' THEN
      ST_Translate(ST_Scale(
        ST_Intersection( ST_GeometryN(geom,1), 'SRID=4326;POLYGON ((-141 80, -141 50, -170 50, -170 80, -141 80))'),
        'POINT(0.5 0.75)', 'POINT(-151.007222 63.069444)'::geometry), 18, -17)
    WHEN name = 'Hawaii' THEN
      ST_Translate(ST_Scale(
        ST_Intersection(geom, 'SRID=4326;POLYGON ((-161 23, -154 23, -154 18, -161 18, -161 23))'),
        'POINT(3 3)', 'POINT(-155.468333 19.821028)'::geometry), 32, 10)
    ELSE geom END AS geom
  FROM us_state)

High Points of US states

Data for the highest point in each state is provided as an inline table of values:

,high_pt(name, state, hgt_m, hgt_ft, lon, lat) AS (VALUES
 ('Denali',              'AK', 6198, 20320,  -151.007222,63.069444)
,('Mount Whitney',       'CA', 4421, 14505,  -118.292,36.578583)
...
,('Britton Hill',        'FL',  105,   345,  -86.281944,30.988333)
)

Prepare High Point symbols

The next query does several things:

  • translates the lon and lat location for Alaska and Hawaii high points to match the transformation applied to the state geometry
  • computes the symHeight attribute for the height of the high point triangle symbol
  • assigns a fill color value to each high point based on the height
  • uses ORDER BY to sort the high points by latitude, so that their symbols overlap correctly when rendered
,highpt_shape AS (SELECT name, state, hgt_ft,
    -- translate high points to match shifted states
    CASE WHEN state = 'AK' THEN lon + 18
      WHEN state = 'HI' THEN lon + 32
      ELSE lon END AS lon,
    CASE WHEN state = 'AK' THEN lat - 17
      WHEN state = 'HI' THEN lat + 10
      ELSE lat END AS lat,
    (2.0 * hgt_ft) / 15000.0 + 0.5 AS symHeight,
    CASE WHEN hgt_ft > 14000 THEN '#ffffff'
         WHEN hgt_ft >  7000 THEN '#aaaaaa'
         WHEN hgt_ft >  5000 THEN '#ff8800'
         WHEN hgt_ft >  2000 THEN '#ffff44'
         WHEN hgt_ft >  1000 THEN '#aaffaa'
                             ELSE '#558800'
    END AS clr
  FROM high_pt ORDER BY lat DESC)

Generate SVG elements

The previous queries transformed the raw data into a form suitable for rendering.
Now we get to see pg-svg in action! The next query generates the SVG text for each output image element, as separate records in a result set called shapes.

The SVG elements are generated in the order in which they are drawn - states and labels first, with high-point symbols on top. Let's break it down:

SVG for states

The first subquery produces SVG shapes from the state geometries. The svgShape function produces an SVG shape element for any PostGIS geometry. It also provides optional parameters supporting other capabilities of SVG. Here title specifies that the state name is displayed as a tooltip, and style specifies the styling of the shape. Styling in SVG can be provided using properties defined in the Cascaded Style Sheets (CSS) specification. pg-svg provides the svgStyle function to make it easy to specify the names and values of CSS styling properties.

Note that the fill property value is a URL instead of a color specifier. This refers to an SVG gradient fill which is defined later.

The state geometry is also included in the subquery result set, for reasons which will be discussed below.

,shapes AS (
  -- State shapes
  SELECT geom, svgShape( geom,
    title => name,
    style => svgStyle(  'stroke', '#ffffff',
                        'stroke-width', 0.1::text,
                        'fill', 'url(#state)',
                        'stroke-linejoin', 'round' ) )
    svg FROM us_map

SVG for state labels

Labels for state abbreviations are positioned at the point produced by the ST_PointOnSurface function. (Alternatively, ST_MaximumInscribedCircle could be used.) The SVG is generated by the svgText function, using the specified styling.

  UNION ALL
  -- State names
  SELECT NULL, svgText( ST_PointOnSurface( geom ), abbrev,
    style => svgStyle(  'fill', '#6666ff', 'text-anchor', 'middle', 'font', '0.8px sans-serif' ) )
    svg FROM us_map

SVG for high point symbols

The high point features are displayed as triangular symbols. We use the convenient svgPolygon function with a simple array of ordinates specifying a triangle based at the high point location, with height given by the previously computed svgHeight column. The title is provided for a tooltip, and the styling uses the computed clr attribute as the fill.

  UNION ALL
  -- High point triangles
  SELECT NULL, svgPolygon( ARRAY[ lon-0.5, -lat, lon+0.5, -lat, lon, -lat-symHeight ],
    title => name || ' ' || state || ' - ' || hgt_ft || ' ft',
    style => svgStyle(  'stroke', '#000000',
                        'stroke-width', 0.1::text,
                        'fill', clr  ) )
    svg FROM highpt_shape
)

Produce final SVG image

The generated shape elements need to be wrapped in an <svg> document element. This is handled by the svgDoc function.

The viewable extent of the SVG data needs to be provided by the viewbox parameter. The most common case is to display all of the rendered data. An easy way to determine this is to apply the PostGIS ST_Exrtent aggregate function to the input data (this is why we included the geom column as well as the svg text column). We can include a border by enlarging the extent using the ST_Expand function. The function svgViewBox converts the PostGIS geometry for the extent into SVG format.

We also include a definition for an SVG linear gradient to be used as the fill style for the state features.

SELECT svgDoc( array_agg( svg ),
    viewbox => svgViewbox( ST_Expand( ST_Extent(geom), 2)),
    def => svgLinearGradient('state', '#8080ff', '#c0c0ff')
  ) AS svg FROM shapes;

The output from svgDoc is a text value which can be used anywhere that SVG is supported.

More to Explore

We've shown how the pg-svg SQL function library lets you easily generate map images from PostGIS data right in the database. This can be used as a simple ad-hoc way of visualizing spatial data. Or, it could be embedded in a larger system to automate repetitive map generation workflows.

Although SVG is a natural fit for vector data, there may be situations where producing a map as a bitmap (raster) image makes sense.
For a way of generating raster maps right in the database see this PostGIS Day 2022 presentation. This would be especially appealing where the map is displaying data stored using PostGIS raster data. It would also be possible to combine vector and raster data into a hybrid SVG/image output.

Although we've focussed on creating maps of geospatial data, SVG is often used for creating other kinds of graphics. For examples of using it to create geometric and mathematical designs see the pg-svg demo folder. Here's an image of a Lissajous knot generated by this SQL.

Lissajous Knot

You could even use pg-svg to generate charts of non-spatial data (although this would be better handled by a more task-specific API).

Let us know if you find pg-svg useful, or if you have ideas for improving it!

by Martin Davis (Martin.Davis@crunchydata.com) at May 30, 2023 01:00 PM

May 29, 2023

PostGIS Development

PostGIS 3.3.3, 3.2.5, 3.1.9, 3.0.9 Patch Releases

The PostGIS development team is pleased to provide bug fixes and performance enhancements 3.3.3, 3.2.5, 3.1.9 and 3.0.9 for the 3.3, 3.2, 3.1, and 3.0 stable branches.

by Regina Obe at May 29, 2023 12:00 AM

May 24, 2023

Paul Ramsey

Keynote @ CUGOS Spring Fling

Last month I was invited to give a keynote talk at the CUGOS Spring Fling, a delightful gathering of “Cascadia Users of Open Source GIS” in Seattle. I have been speaking about open source economics at FOSS4G conferences more-or-less every two years, since 2009, and took this opportunity to somewhat revisit the topics of my 2019 FOSS4GNA keynote.

If you liked the video and want to use the materials, the slides are available here under CC BY.

May 24, 2023 04:00 PM

May 17, 2023

Paul Ramsey

MapScaping Podcast: Rasters and PostGIS

Last month I got to record a couple podcast episodes with the MapScaping Podcast’s Daniel O’Donohue. One of them was on the benefits and many pitfalls of putting rasters into a relational database, and it is online now!

TL;DR: most people think “put it in a database” is a magic recipe for: faster performance, infinite scalability, and easy management.

Where the database is replacing a pile of CSV files, this is probably true.

Where the database is replacing a collection of GeoTIFF imagery files, it is probably false. Raster in the database will be slower, will take up more space, and be very annoying to manage.

So why do it? Start with a default, “don’t!”, and then evaluate from there.

For some non-visual raster data, and use cases that involve enriching vectors from raster sources, having the raster co-located with the vectors in the database can make working with it more convenient. It will still be slower than direct access, and it will still be painful to manage, but it allows use of SQL as a query language, which can give you a lot more flexibility to explore the solution space than a purpose built data access script might.

There’s some other interesting tweaks around storing the actual raster data outside the database and querying it from within, that I think are the future of “raster in (not really in) the database”, listen to the episode to learn more!

May 17, 2023 12:00 AM

March 02, 2023

Crunchy Data

Geocoding with Web APIs in Postgres

Geocoding is the process of taking addresses or location information and getting the coordinates for that location. Anytime you route a new location or look up a zip code, the back end is geocoding the location and then using the geocode inside other PostGIS functions to give you the routes, locations, and other data you asked for.

PostGIS comes equipped with an easy way to use the US Census data with the Tiger geocoder. Using the Tiger geocoder requires downloading large amounts of census data and in space-limited databases, this may not be ideal. Using a geocoding web API service can be a space saving solution in these cases.

I am going to show you how to set up a really quick function using plpython3u to hit a web service geocoder every time that we get a new row in the database.

Installing plpython3u

The plpython3u extension comes with Crunchy Bridge or you can add it to your database. To get started run the following:

CREATE EXTENSION  plpython3u;

Creating a function to geocode addresses

In this example, I'll use the US census geocoding API as our web service, and build a function to geocode addresses based on that.

The function puts together parameters to hit the census geocoding API and then parses the resulting object, and returns a geometry:

CREATE OR REPLACE FUNCTION geocode(address text)
RETURNS geometry
AS $$
	import requests
	try:
		payload = {'address' : address , 'benchmark' : 2020, 'format' : 'json'}
		base_geocode = 'https://geocoding.geo.census.gov/geocoder/locations/onelineaddress'
		r = requests.get(base_geocode, params = payload)
		coords = r.json()['result']['addressMatches'][0]['coordinates']
		lon = coords['x']
		lat = coords['y']
		geom = f'SRID=4326;POINT({lon} {lat})'
	except Exception as e:
		plpy.notice(f'address failed: {address}')
		plpy.notice(f'error: {e.message}')
		geom = None
	return geom
$$
LANGUAGE 'plpython3u';

Using this function to geocode Crunchy Data's headquarters:

SELECT ST_AsText(geocode('162 Seven Farms Drive Charleston, SC 29492'));

Deploying this function for new data

But what if we want to automatically run this every time an address is inserted into a table? Let's say we have a table with a field ID, an address, and a point that we want to auto-populate on inserts.

CREATE TABLE addresses (
	fid SERIAL PRIMARY KEY,
	address VARCHAR,
	geom GEOMETRY(POINT, 4326)
);

We can make use of a Postgres trigger to add the geocode before every insert! Triggers are a very powerful way to leverage built in functions to automatically transform your data as it enters the database, and this particular case is a great demo for them!

CREATE OR REPLACE FUNCTION add_geocode()
RETURNS trigger AS
$$
DECLARE
    loc geometry;
BEGIN
    loc := geocode(NEW.address);
    NEW.geom = loc;
    RETURN NEW;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER update_geocode BEFORE INSERT ON addresses
    FOR EACH ROW EXECUTE FUNCTION add_geocode();

Now when running an insert, the value is automatically geocoded!

INSERT INTO addresses(address) VALUES ('415 Mission St, San Francisco, CA 94105');

postgres=# SELECT fid, address, ST_AsText(geom) FROM addresses;
 fid |                 address                 |                        geom
-----+-----------------------------------------+----------------------------------------------------
   1 | 415 Mission St, San Francisco, CA 94105 | 0101000020E610000097CD0E2B66995EC0BB004B2729E54240

Summary

If you’re space limited, using a web API based geocoder might be the way to go. Using a geocoder function with triggers on new row inserts will get you geocoded addresses in a snap.

by Jacob Coblentz (Jacob.Coblentz@crunchydata.com) at March 02, 2023 04:00 PM

February 23, 2023

Crunchy Data

Postgres Raster Query Basics

In geospatial terminology, a "raster" is a cover of an area divided into a uniform gridding, with one or more values assigned to each grid cell.

A "raster" in which the values are associated with red, green and blue bands might be a visual image. The rasters that come off the Landsat 7 earth observation satellite have eight bands: red, green, blue, near infrared, shortwave infrared, thermal, mid-infrared and panchromatic.

Database Rasters

Working with raster data via SQL is a little counter-intuitive: rasters don't neatly fit the relational model the way vector geometries do. A table of parcels where one column is the geometry and the others are the owner name, address, and tax roll id makes sense. How should a raster fit into a table? As a row for every pixel? For every scan row? What other values should be associated with each row?

There is no clean relationship between "real world objects" and the database representation of a raster, because a raster has nothing to say about objects, it is just a collection of measurements.

We can squeeze rasters into the database, but doing so makes working with the data more complex. Before loading data, we need to enable PostGIS and the raster module.

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;

Loading Rasters

For this example, we will load raster data for a "digital elevation model" (DEM), a raster with just one band, the elevation at each pixel.

Using the SRTM Tile Grabber I downloaded one tile of old SRTM data. Then using the gdalinfo utility, read out the metadata about the file.

wget https://srtm.csi.cgiar.org/wp-content/uploads/files/srtm_5x5/TIFF/srtm_12_03.zip
unzip srtm_12_03.zip
gdalinfo srtm_12_03.tif

The metadata tells me two useful things for loading the data:

  • The coordinate system of the data is WGS 84.
  • The pixel size is 3 arc-seconds.
  • The pixel type is Int16, so two bytes per pixel.

Knowing that, I can build a raster2pgsql call to load the data into a raster table.

raster2pgsql \
    -I \                 # create a spatial index on the column
    -s 4326 \            # use 4326 (WGS 84) as the spatial reference for the raster
    -t 32x32 \           # tile the raster into 32 by 32 pixel tiles
    srtm_12_03.tif | \   # name of the raster
    psql dem             # target database connection

Once loaded the raster table looks like this on a map.

And it looks like this in the database.

Table "public.srtm_12_03"

 Column |  Type
--------+---------
 rid    | integer
 rast   | raster
Indexes:
    "srtm_12_03_pkey" PRIMARY KEY, btree (rid)
    "srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))

It's a pretty boring table! Just a bunch of binary raster tiles and a unique key for each.

-- 29768 rows
SELECT Count(*) FROM srtm_12_03;

Those binary raster tiles aren't just opaque blobs though, we can look inside them with the right functions. Here we get a summary of all the raster tiles in the table.

SELECT (ST_SummaryStatsAgg(rast, 1, true)).* FROM srtm_12_03;
count  | 28966088
sum    | 20431360140
mean   | 705.3544869434907
stddev | 561.252765463607
min    | -291
max    | 4371

Tiles, Tiles, Tiles

Remember when we loaded the data with raster2pgsql we specified a "tile size" of 32 by 32 pixels? This has a number of implications.

  • First, at 32x32, a tile of 2-byte Int16 data like our DEM will take up 2048 bytes. This is small enough to fit in the database page size, which means the data will not end up stored in a side table by the TOAST subsystem that handles large row values.
  • Second, our input file had 6000x6000 pixels, which is enough pixels to generate 35156 32x32 tiles. Our table only has 29768 rows, because raster2pgsql does not generate tiles when the contents are all "no data" pixels (as the DEM data is over the ocean).

The loaded data looks like this.

Notice how small each tile is. As a general rule, when working with raster data queries,

  • the first step will be to efficiently find the relevant tile(s);
  • the second step will be to use the tiles to find the answer you want.

Finding tiles efficiently means using spatial index, and the spatial index definition as we saw above is this:

"srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))

This is a functional index, which means in order to access it, we need to copy the functional part: st_convechull(rast) when forming our query.

The ST_ConvexHull(raster) converts a raster tile into a polygon defining the boundary of the tile. When querying raster tables, you will use this function a great deal to convert rasters into polygons suitable for querying a spatial index.

Point Query

The simplest raster query is to take a point, and find the value of the raster under that point.

Here is a point table with one point in it:

CREATE TABLE mappoint AS
SELECT ST_Point(-123.7273, 47.8467, 4326)::geometry(Point, 4326) AS geom,
       1 AS fid;

The nice thing about points is that they only hit one tile at a time. So we don't have to think too hard about what to do with our tile sets.

SELECT ST_Value(srtm.rast, pt.geom)
FROM srtm_12_03 srtm
JOIN mappoint pt
  ON ST_Intersects(pt.geom, ST_ConvexHull(srtm.rast))
 st_value
----------
     1627

Here we use the ST_ConvexHull(raster) function to get access to our spatial index on the raster table, and the ST_Intersects(geom, geom) function to test the condition.

The ST_Value(raster, geom) function reads the pixel value from the raster at the location of the point.

Polygon Query

Summarizing rasters under polygons is more involved than reading point values, because polygons will frequently overlap multiple tiles, so you have to think in terms of "sets of raster tiles" instead of "the raster" when building your query.

  • Then we will summarize the clipped tiles, to get the average elevation in our polygon.

The final complete query looks like this.

WITH clipped_tiles AS (
  SELECT ST_Clip(srtm.rast, ply.geom) AS rast, srtm.rid
  FROM srtm_12_03 srtm
  JOIN mappoly ply
    ON ST_Intersects(ply.geom, ST_ConvexHull(srtm.rast))
)
SELECT (ST_SummaryStatsAgg(rast, 1, true)).*
  FROM clipped_tiles;
count  | 362369
sum    | 388175193
mean   | 1071.2152336430545
stddev | 441.7982032761408
min    | 108
max    | 2374

Conclusion

Working with database rasters analytically can be challenging, particularly if you are used to thinking about them as single, unitary coverages. Remember to apply the basic rules of database rasters:

  • Find the chips you need to answer your query.
  • Make sure to use ST_ConvexHull(raster) to drive a spatial index filter.
  • Assemble the chips as needed to answer the query (you might need to use the ST_Union(raster) aggregate).
  • Carry out your actual raster query.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at February 23, 2023 04:00 PM

February 10, 2023

Crunchy Data

Temporal Filtering in pg_featureserv with CQL

In a previous post we announced the CQL filtering capability in pg_featureserv. It provides powerful functionality for attribute and spatial querying of data in PostgreSQL and PostGIS.

Another important datatype which is often present in datasets is temporal. Temporal datasets contain attributes which are dates or timestamps. The CQL standard defines some special-purpose syntax to support temporal filtering. This allows pg_featureserv to take advantage of the extensive capabilities of PostgreSQL for specifying queries against time-valued attributes. This post in the CQL series will show some examples of temporal filtering in pg_featureserv.

CQL Temporal filters

Temporal filtering in CQL is provided using temporal literals and conditions.

Temporal literal values may be dates or timestamps:

2001-01-01
2010-04-23T01:23:45

Note: The temporal literal syntax is based on an early version of the OGC API Filter and CQL standard. The current draft CQL standard has a different syntax: DATE('1969-07-20') and TIMESTAMP('1969-07-20T20:17:40Z'). It also supports intervals: INTERVAL('1969-07-16', '1969-07-24'). A subsequent version of pg_featureserv will support this syntax as well.

Temporal conditions allow time-valued properties and literals to be compared via the standard boolean comparison operators <,>,<=,>=,=,<> and the BETWEEN..AND operator:

start_date >= 2001-01-01
event_time BETWEEN 2010-04-22T06:00 AND 2010-04-23T12:00

The draft CQL standard provides dedicated temporal operators, such as T_AFTER, T_BEFORE, T_DURING, etc. A future version of pg_featureserv will likely provide these operators.

Publishing Historical Tropical Storm tracks

We'll demonstrate temporal filters using a dataset with a strong time linkage: tracks of tropical storms (or hurricanes). There is a dataset of Historical Tropical Storm Tracks available here.

The data requires some preparation. It is stored as a set of records of line segments representing 6-hour long sections of storm tracks. To provide simpler querying we will model the data using a single record for each storm, with a line geometry showing the entire track and attributes for the start and end time for the track.

The data is provided in Shapefile format. As expected for a worldwide dataset, it is in the WGS84 geodetic coordinate system (lat/long). In PostGIS this common Spatial Reference System is assigned an identifier (SRID) of 4326.

The PostGIS shp2pgsql utility can be used to load the dataset into a spatial table called trop_storm_raw. The trop_storm_raw table is a temporary staging table allowing the raw data to be loaded and made available for the transformation phase of data preparation.

shp2pgsql -c -D -s 4326 -i -I -W LATIN1 "Historical Tropical Storm Tracks.shp" public.trop_storm_raw | psql -d database

The options used are:

  • -c - create a new table
  • -D - use PostgreSQL dump format to load the data
  • -s - specify the SRID of 4326
  • -i - use 32-bit integers
  • -I - create a GIST index on the geometry column (this is not strictly necessary, since this is just a temporary staging table)
  • -W - specifies the encoding of the input attribute data in the DBF file

Next, create the table having the desired data model:

CREATE TABLE public.trop_storm (
    btid int PRIMARY KEY,
    name text,
    wind_kts numeric,
    pressure float8,
    basin text,
    time_start timestamp,
    time_end timestamp,
    geom geometry(MultiLineString, 4326)
);

It's good practice to add comments to the table and columns. These will be displayed in the pg_featureserv Web UI.

COMMENT ON TABLE public.trop_storm IS 'This is my spatial table';
COMMENT ON COLUMN public.trop_storm.geom IS 'Storm track LineString';
COMMENT ON COLUMN public.trop_storm.name IS 'Name assigned to storm';
COMMENT ON COLUMN public.trop_storm.btid IS 'Id of storm';
COMMENT ON COLUMN public.trop_storm.wind_kts IS 'Maximum wind speed in knots';
COMMENT ON COLUMN public.trop_storm.pressure IS 'Minumum pressure in in millibars';
COMMENT ON COLUMN public.trop_storm.basin IS 'Basin in which storm occured';
COMMENT ON COLUMN public.trop_storm.time_start IS 'Timestamp of storm start';
COMMENT ON COLUMN public.trop_storm.time_end IS 'Timestamp of storm end';

Now the power of SQL can be used to transform the raw data into the simpler data model. The track sections can be combined into single tracks with a start and end time using the following query.

  • The original data represents the track sections as MultiLineStrings with single elements. The element is extracted using ST_GeometryN so that the result of aggregating them using ST_Collect is a MultiLineString, not a GeometryCollection. (An alternative is to aggregate into a GeometryCollection and use ST_CollectionHomogenize to reduce it to a MultiLineString.)
  • The final ST_Multi ensures that all tracks are stored as MultiLineStrings, as required by the type constraint on the geom column.
  • the filter condition time_end - time_start < '1 year'::interval removes tracks spanning the International Date Line.
WITH data AS (
 SELECT btid, name, wind_kts, pressure, basin, geom,
    make_date(year::int, month::int, day::int) + ad_time::time AS obs_time
 FROM trop_storm_raw ORDER BY obs_time
),
tracks AS (
  SELECT btid,
    MAX(name) AS name,
    MAX(wind_kts) AS wind_kts,
    MAX(pressure) AS pressure,
    MAX(basin) AS basin,
    MIN(obs_time) AS time_start,
    MAX(obs_time) AS time_end,
    ST_Multi( ST_LineMerge( ST_Collect( ST_GeometryN(geom, 1)))) AS geom
  FROM data GROUP BY btid
)
INSERT INTO trop_storm
SELECT * FROM tracks WHERE time_end - time_start < '1 year'::interval;

This is a small dataset, and pg_featureserv does not require one, but as per best practice we can create a spatial index on the geometry column:

CREATE INDEX trop_storm_gix ON public.trop_storm USING GIST ( geom );

Once the trop_storm table is created and populated, it can be published in pg_featureserv. Issuing the following request in a browser shows the feature collection in the Web UI:

http://localhost:9000/collections.html

tropstorm

http://localhost:9000/collections/public.trop_storm.html

tropstorm

The dataset can be viewed using pg_featureserv's built-in map viewer (note that to see all 567 records displayed it is probably necessary to increase the limit on the number of response features):

http://localhost:9000/collections/public.trop_storm/items.html?limit=1000

publictropstorm

Querying by Time Range

That's a lot of storm tracks. It would be easier to visualize a smaller number of tracks. A natural way to subset the data is by querying over a time range. Let's retrieve the storms between the start of 2005 and the end of 2009. This is done by adding a filter parameter with a CQL expression against the dataset temporal property time_start (storms typically do not span the start of years). To query values lying between a range of times it is convenient to use the BETWEEN operator. The filter condition is time_start BETWEEN 2005-01-01 AND 2009-12-31. The full request is:

http://localhost:9000/collections/public.trop_storm/items.html?filter=time_start BETWEEN 2005-01-01 AND 2009-12-31&limit=100

Submitting this query produces a result with 68 tracks:

tropstormtracks

Querying by Time and Space

Temporal conditions can be combined with other kinds of filters. For instance, we can execute a spatio-temporal query by using a temporal condition along with a spatial condition. In this example, we query the storms which occurred in 2005 and after in Florida. The temporal condition is expressed as time_start > 2005-01-01.

The spatial condition uses the INTERSECTS predicate to test whether the line geometry of a storm track intersects a polygon representing the (simplified) coastline of Florida. The polygon is provided as a geometry literal using WKT. (For more information about spatial filtering with CQL in pg_featureserv see this blog post.)

POLYGON ((-81.4067 30.8422, -79.6862 25.3781, -81.1609 24.7731, -83.9591 30.0292, -85.2258 29.6511, -87.5892 29.9914, -87.5514 31.0123, -81.4067 30.8422))

polygon

Putting these conditions together in a boolean expression using AND, the request to retrieve the desired tracks from pg_featureserv is:

http://localhost:9000/collections/public.trop_storm/items.html?filter=time_start > 2005-01-01 AND INTERSECTS(geom, POLYGON ((-81.4067 30.8422, -79.6862 25.3781, -81.1609 24.7731, -83.9591 30.0292, -85.2258 29.6511, -87.5892 29.9914, -87.5514 31.0123, -81.4067 30.8422)) )&limit=100

This query produces a result with only 9 tracks, all of which cross Florida:

temporalfiltering

Try it yourself!

CQL temporal filtering is included in the forthcoming pg_featureserv Version 1.3. But you can try it out now by downloading the latest build. Let us know what use cases you find for CQL temporal filtering! Crunchy Data offers full managed PostGIS in the Cloud, with Container apps to run pg_featureserv. Try it today.

by Martin Davis (Martin.Davis@crunchydata.com) at February 10, 2023 03:00 PM

February 02, 2023

Paul Ramsey

When Proj Grid-Shifts Disappear

Last week a user noted on the postgis-users list (paraphrase):

I upgraded from PostGIS 2.5 to 3.3 and now the results of my coordinate transforms are wrong. There is a vertical shift between the systems I’m using, but my vertical coordinates are unchanged.

Hmmm.

PostGIS gets all its coordinate reprojection smarts from the proj library. The user’s query looked like this:

SELECT ST_AsText(
    ST_Transform('SRID=7405;POINT(545068 258591 8.51)'::geometry, 
    4979
    ));

“We just use proj” is a lot less certain and stable an assertion than it appears on the surface. In fact, PostGIS “just uses proj” for proj versions from 4.9 all the way up to 9.2, and there has been a lot of change to the proj library over that sweep of releases.

  • The API radically changed around proj version 6, and that required a major rework of how PostGIS called the library.
  • The way proj dereferenced spatial reference identifiers into reprojection algorithms changed around then too (it got much slower) which required more changes in how we interacted with the library.
  • More recently the way proj handled “transformation grids” changed, which was the source of the complaint.

Exploring Proj

The first thing to do in debugging this “PostGIS problem” was to establish if it was in fact a PostGIS problem, or a problem in proj. There are commandline tools in proj to query what pipelines the system will use for a transform, and what the effect on coordinates will be, so I can take PostGIS right out of the picture.

We can run the query on the commandline:

echo 545068 258591 8.51 | cs2cs 'EPSG:7405' 'EPSG:4979'

Which returns:

52d12'23.241"N  0d7'17.603"E 8.510

So directly using proj we are seeing the same problem as in PostGIS SQL: no change in the vertical dimension, it goes in at 8.51 and comes out at 8.51. So the problem is not PostGIS, is is proj.

Transformation Grids

Cartographic transformations are nice deterministic functions, they take in a longitude and latitude and spit out an X and a Y.

(x,y) = f(theta, phi)
(theta, phi) = finv(x, y)

But not all transformations are cartographic transformations, some are transformation between geographic reference systems. And many of those are lumpy and kind of random.

For example, the North American 1927 Datum (NAD27) was built from classic survey techniques, starting from the “middle” (Kansas) and working outwards, chain by chain, sighting by sighting. The North American 1983 Datum (NAD83) was built with the assistance of the first GPS units. The accumulated errors of survey over distance are not deterministic, they are kind of lumpy and arbitrary. So the transformation from NAD27 to NAD83 is also kind of lumpy and arbitrary.

How do you represent lumpy and arbitrary transformations? With a grid! The grid says “if your observation falls in this cell, adjust it this much, in this direction”.

For the NAD27->NAD83 conversion, the NADCON grids have been around (and continuously improved) for a generation.

Here’s a picture of the horizontal deviations in the NADCON grid.

Transformations between vertical systems also frequently require a grid.

So what does this have to do with our bug? Well, the way proj gets its grids changed in version 7.

Grid History

Proj grids have always been a bit of an outlier. They are much larger than just the source code is. They are localized in interest (someone in New Zealand probably doesn’t need European grids), not everyone needs all the grids. So historically they were distributed in zip files separately from the code.

This is all well and good, but software packagers wanted to provide a good “works right at install” experience to their end users, so they bundled up the grids into the proj packages.

As more and more people consumed proj via packages and software installers, the fact that the grids were “separate” from proj became invisible to the end users: they just download software and it works.

This was fine while the collection of grids was a manageable size. But it is not manageable any more.

In working through the GDALBarn project to improve proj, Even Roualt decided to find all the grids that various agencies had released for various places. It turns out, there are a lot more grids than proj previously bundled. Gigabytes more.

Grid CDN

Simply distributing the whole collection of grids as a default with proj was not going to work anymore.

So for proj 7, Even proposed moving to a download-on-demand model for proj grids. If a transformation request requires a grid, proj will attempt to download the necessary grid from the internet, and save it in a local cache.

Now everyone can get the very best possible tranformation between system, everywhere on the globe, as long as they are connected to the internet.

Turn It On!

Except… the network grid feature is not turned on by default! So for versions of proj higher than 7, the software ships with no grids, and the software won’t check for grids on the network… until you turn on the feature!

There are three ways to turn it on, I’m going to focus on the PROJ_NETWORK environment variable because it’s easy to toggle. Let’s look at the proj transformation pipeline from our original bug.

projinfo -s EPSG:7405 -t EPSG:4979

The projinfo utility reads out all the possible transformation pipelines, in order of desirability (accuracy) and shows what each step is. Here’s the most desireable pipeline for our transform.

+proj=pipeline
  +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
        +y_0=-100000 +ellps=airy
  +step +proj=hgridshift +grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif
  +step +proj=vgridshift +grids=uk_os_OSGM15_GB.tif +multiplier=1
  +step +proj=unitconvert +xy_in=rad +xy_out=deg
  +step +proj=axisswap +order=2,1

This transform actually uses two grids! A horizontal and a vertical shift. Let’s run the shift with the network explicitly turned off.

echo 545068 258591 8.51 | PROJ_NETWORK=OFF cs2cs 'EPSG:7405' 'EPSG:4979'

52d12'23.241"N  0d7'17.603"E 8.510

Same as before, and the elevation value is unchanged. Now run with PROJ_NETWORK=ON.

echo 545068 258591 8.51 | PROJ_NETWORK=ON cs2cs 'EPSG:7405' 'EPSG:4979'

52d12'23.288"N  0d7'17.705"E 54.462

Note that the horizontal and vertical results are different with the network, because we now have access to both grids, via the CDN.

No Internet?

If you have no internet, how do you do grid shifted transforms? Well, much like in the old days of proj, you have to manually grab the grids you need. Fortunately there is a utility for that now that makes it very easy: projsync.

You can just download all the files:

projsync --all

Or you can download a subset for your area of concern:

projsync --bbox 2,49,2,49

If you don’t want to turn on network access via the environment variable, you can hunt down the proj.ini file and flip the network = on variable.

February 02, 2023 08:00 AM

January 12, 2023

Crunchy Data

Fun with Letters in PostGIS 3.3!

Working at Crunchy Data on the spatial team, I'm always looking for new features and fun things to show on live demos. I recently started playing around with ST_Letters and wanted to jot down some quick code samples for playing around with this feature, introduced in PostGIS 3.3. These examples are super easy to use, they don't need any data!

The screenshots shown below came from pgAdmin's geometry viewer and will also work with other query GUI tools like QGIS or DBeaver.

ST_Letters

Here's a simple example to get started with ST_Letters. This will work on any Postgres database, running the PostGIS extension version 3.3+.

Select ST_Letters('PostGIS'); postgis letters

It's also possible to overlay letters on a map, just like any other polygon. Since the default for ST_Letters results in a polygon starting at the baseline at the origin of the chosen projection, with a maximum height of 100 "units" (from the bottom of the descenders to the tops of the capitals).

letters on top

That's not ideal. We need a way to both move it and resize it.

First, we want to make a point in the middle of San Francisco in order to serve as a centroid for where we want to move the letters, and we also want to rescale the letters in order to approximately fit over the City of San Francisco. Using the formula for converting units in WGS84 to meters, 0.001 works approximately well enough to fit over the San Francisco Bay Area.

Next we use ST_Translate in order to move the letters from the top of the map to fit over the Bay Area. Finally, mostly because it looks cool, we use ST_Rotate to rotate the polygon 45 degrees.

WITH
san_fran_pt AS (
  SELECT ST_Point(-122.48, 37.758, 4326) AS geom),
letters AS (
  SELECT ST_Scale(ST_SetSRID(
           ST_Letters('San Francisco'), 4326),
           0.001, 0.001) AS geom),
letters_geom AS (
    SELECT ST_Translate(
            letters.geom,
            ST_X(san_fran_pt.geom) - ST_X(ST_Centroid(letters.geom)),
            ST_Y(san_fran_pt.geom) - ST_Y(ST_Centroid(letters.geom))
        ) AS geom
    FROM letters, san_fran_pt
)
SELECT ST_Rotate(geom, -pi() / 4, ST_Centroid(geom))
FROM letters_geom;

letters on map

ST_ConcaveHull demo'd with ST_Letters

A great use case for ST_Letters is for demoing PostGIS functions. In this post, I'm going to demo the function ST_ConcaveHull, which creates a concave polygon which encloses the vertices of a target geometry. ST_ConcaveHull was recently updated in PostGIS 3.3.0, in order to use GEOS 3.11, which makes the input parameters easier to understand and results in a large speed upgrade. Here's a short demo of how different parameters of param_pctconvex and param_allow_holes for ST_ConcaveHull operate on points generated by ST_GeneratePoints and ST_Letters.

First, let's generate a table of randomly generated points that fill in the letters in 'postgis'.

CREATE TABLE public.word_pts AS
WITH word AS (
  SELECT ST_Letters('postgis') AS geom
  ),
letters AS ( -- dump letter multipolygons into individual polygons
  SELECT (ST_Dump(word.geom)).geom
  FROM word
  )
SELECT
  letters.geom AS polys,
  ST_GeneratePoints(letters.geom, 100) AS pts
FROM letters;

SELECT pts FROM word_pts.pts

word points

Then, we set the convexity to a fairly high parameter (param_pctconvex=0.75, indicating a highly convex shape), and don't allow there to be holes in the shape (param_allow_holes=false)

SELECT ST_ConcaveHull(pts, 0.75, false) FROM word_pts;

concave .75

Doesn't look much like 'postgis'!

Next, we reduce the convexity, but don't allow holes in the shape.

SELECT ST_ConcaveHull(pts, 0.5, false) FROM word_pts;

concave .5 false

A little better, but still hard to recognize 'postgis'. What if we allowed holes?

SELECT ST_ConcaveHull(pts, 0.5, true) FROM word_pts;

This starts to look a bit more like the word 'postgis', with the hole in 'p' being clear.

concave .5

As we start to make the shape more concave, it begins to take on more and more recognizable as 'postgis'....until it doesn't and starts to look closer to modern art.

SELECT ST_ConcaveHull(pts, 0.35, true) FROM word_pts;

concave .35

SELECT ST_ConcaveHull(pts, 0.05, true) FROM word_pts;

concave .05

Polygons too!

ST_ConcaveHull is also useful on multipolygons, and follows the same properties as demo'd on multipoints. It's important to note that if there are already holes in the existing multipolygon, setting param_allow_holes=false will still create convex polygons with "holes" in the middle, following the original polygon. The concave hulls will always contains the original polygons!

SELECT ST_ConcaveHull(ST_Letters('postgis'), 0.5, false);

concave .5

As the convexity decreases and holes are allowed, the shape looks more and more like the original polygons in the original table.

SELECT ST_ConcaveHull(ST_Letters('postgis'), 0.1, true);

Concave .1

ST_TriangulatePolygon

The last demo here is the function ST_TriangulatePolygon, new in PostGIS 3.3. This function computes the "best quality" triangulation of a polygon (and also works on multipolygons too!). This can be extremely useful for computing meshes of polygons in a quick and efficient manner.

SELECT ST_TriangulatePolygon(ST_Letters('postgis'));

ST_TriangulatePolygon

Summary

ST_Letters provides a useful starting point for demoing functions on points and polygons. The new improvements in ST_ConcaveHull make it more useful for generating concave hulls of geometries and they are significantly more intuitive to use. ST_TriangulatePolygon can be useful for finding meshes of polygons and multipolygons. The team at Crunchy Data will continue to make important contributions to PostGIS in order to help our users create interesting and innovative open source solutions!

by Jacob Coblentz (Jacob.Coblentz@crunchydata.com) at January 12, 2023 03:00 PM

January 06, 2023

Crunchy Data

Timezone Transformation Using Location Data & PostGIS

Imagine your system captures event data from all over the world but the data all comes in UTC time giving no information about the local timing of an event. How can we quickly convert between the UTC timestamp and local time zone using GPS location? We can quickly solve this problem using PostgreSQL and PostGIS.

This example assumes you have a Postgres database running with PostGIS. If you’re new to PostGIS, see PostGIS for Newbies.

Steps we will follow

  1. Timezone Shape file Overview: For World Timezone shape file, I have been following a really nice project by Evan Siroky, Timezone Boundary Builder. We’ll download the timezones-with-oceans.shapefile.zip from this location.
  2. Load Shape file: Using shp2pgsql, convert shape file to sql to create timezones_with_ocean table.
  3. PostgreSQL internal view pg_timezone_names: Understand pg_timezone_names view.
  4. Events table and insert sample data: Create events table and insert sample data.
  5. Transformation Query: Transform event UTC timestamp to event local timestamp.

Overview of data relationship

Below is an overview of the data relationship and join conditions we will be using.

timzone_flow.png

Timezone Shape file Overview

A “shape file” commonly refers to a collection of files with .shp, .shx, .dbf, and other extensions on a common prefix name, which in our case is combined-shapefile-with-oceans.*. **combined-shapefile-with-oceans contains polygons with the boundaries of the world's timezones. With this data we can start our process.

Load Shape file

We will be using shp2pgsql to generate sql file from shape file to create public.timezones_with_ocean and inserts data in a table. The table contains fields gid, tzid and geometry.

Export the Host, user, password variables

export PGHOST=p.<pgcluster name>.db.postgresbridge.com
export PGUSER=<DBUser>
export PGPASSWORD=<dbPassword>

Create sql file from shape file

shp2pgsql -s 4326  "combined-shapefile.shp" public.timezones_with_oceans   > timezone_shape.sql

Create public.timezones_with_ocean and load timestamp data

psql -d timestamp -f timezone_shape.sql

Query a bit of sample data

SELECT tzid, ST_AsText(geom), geom FROM public.timezone_with_oceans limit 10;

Visualize Sample data

sample data

Using PgAdmin highlight geom column and click on eye icon visualize the geometry on map showing below.

Geometry in pgAdmin

PostgreSQL internal view pg_timezone_names

PostgreSQL provides a view of pg_timezone_names with a list of time zone names recognized by SET TIMEZONE. By default, PostgreSQL also provides their associated abbreviations, UTC offsets, and daylight-savings status, which our clients need to know.

pg_timezone_names view columns description

Column Type Description
name text Time zone name
abbrev text Time zone abbreviation
utc_offset interval Offset from UTC (positive means east of Greenwich)
is_dst bool True if currently observing daylight savings

pg_timezone sample data

Sample data

Events table and insert sample data

Now that we have the timezone shape file loaded, we can create an event table, load sample transaction data, and apply a timestamp conversion transformation query.

CREATE TABLE IF NOT EXISTS public.events
(
    event_id bigint NOT NULL,
    eventdatetime timestamp without time zone NOT NULL,
	event_type varchar(25) not null,
    latitude double precision NOT NULL,
    longitude double precision NOT NULL,
    CONSTRAINT events_pkey PRIMARY KEY (event_id)
);

INSERT INTO public.events(
	event_id, eventdatetime, event_type, latitude, longitude)
	VALUES (10086492,'2021-08-17 23:17:05','Walking',34.894089,-86.51148),
(50939,'2021-08-19 10:27:12','Hiking',34.894087,-86.511484),
(10086521,'2021-09-09 19:32:37','Swiming',34.642584,-86.761291),
(22465493,'2021-09-30 11:43:34','Swiming',33.611151,-86.799522),
(22465542,'2021-11-26 22:40:44.197','Swiming',34.64259,-86.761452),
(22465494,'2021-09-30 11:43:34','Hiking',33.611151,-86.799522),
(10087348,'2021-07-01 13:42:15','Swiming',25.956098,-97.535303),
(22466679,'2021-09-01 12:25:06','Hiking',25.956112,-97.535304),
(22466685,'2021-09-02 13:41:07','Swiming',25.956102,-97.535305),
(10088223,'2021-11-29 13:19:53','Hiking',25.956097,-97.535303),
(22246192,'2021-06-16 22:21:23','Walking',37.083726,-113.577984),
(9844188,'2021-06-23 20:18:43','Swiming',37.1067,-113.561401),
(22246294,'2021-06-25 21:50:06','Walking',37.118719,-113.598038),
(22246390,'2021-07-01 18:15:54','Hiking',37.109579,-113.562923),
(9844332,'2021-07-04 19:11:13','Walking',37.251538,-113.614708),
(9845242,'2021-11-04 13:25:40.425','Swiming',37.251542,-113.614699),
(84843,'2021-11-23 14:33:20','Swiming',37.251541,-113.614698),
(22247674,'2021-12-21 14:31:15','Swiming',37.251545,-113.614691),
(22246714,'2021-08-09 14:46:51','Swiming',37.109597,-113.562912),
(9845116,'2021-10-18 14:59:51','Swiming',37.082777,-113.554991);

Sample Event Data event data

Transformation Query

Now we can convert the UTC timestamp to the local time for an event. Using PostGIS function St_Intersects, we can find the timezone_with_oceans.geom polygon in which an event point lies. This gives the name of the timezone where the event occurred. To create our transformation query:

  • First we create the location geometry using Longitude and Latitude from the events table.

  • Using PostGIS function St_Intersects, we will find common points between timezone_with_oceans.geom and an event’s location geometry giving us information on where the event occurred.

  • Join pg_timezone_names to timezone_with_oceans on name and tzid respectively, to retrieve abbrev, utc_offset, and is_dst fields from pg_timezone_names.

  • Using PostgreSQL AT TIME ZONE operator and pg_timezone_name, we convert UTC event timestamp to local event timestamp completing the process, e.g.

    timestamp '2021-07-05 00:59:12' at time zone 'America/Denver' → 2021-07-04 18:59:12+00’

Transformation Query SQL:

SELECT event_id, latitude, longitude, abbrev,
       utc_offset,is_dst, eventdatetime,
       ((eventdatetime::timestamp WITH TIME ZONE AT TIME ZONE abbrev)::timestamp WITH TIME ZONE)
           AS eventdatetime_local
FROM public.events
JOIN timezone_with_oceans ON ST_Intersects(ST_Point(longitude, latitude, 4326) , geom)
JOIN pg_timezone_names ON tzid = name;

local timezone data

Closing Thoughts

PostgreSQL and PostGIS allow you to easily and dynamically solve timezone transformation. I hope this blog was helpful, and we at Crunchy Data wish you happy learning.

by Rekha Khandhadia (Rekha.Khandhadia@crunchydata.com) at January 06, 2023 03:00 PM

December 04, 2022

Boston GIS (Regina Obe, Leo Hsu)

PostGIS Day 2022 Videos are out and some more highlights

Elizabeth Chistensen already gave a succinct summary of some of the PostGIS Day 2022 presentations, which you can see here.

There were many case study presentations which involved use of PostGIS, QGIS, OpenStreetMap, and pgRouting (and other extensions that extend PostGIS) as well as many "How to" videos. There were also talks on "How PostGIS is made". I'll highlight some of these, which overlap with Elizabeth's list but different angle of view.

Continue reading "PostGIS Day 2022 Videos are out and some more highlights"

by Regina Obe (nospam@example.com) at December 04, 2022 09:27 PM

December 02, 2022

Crunchy Data

PostGIS Day 2022

Crunchy Data hosted the 4th annual PostGIS Day on November 17, 2022. PostGIS Day always comes a day after GIS Day which occurs annually on the 3rd Wednesday of November.

We had speakers from 10 different countries and attendees from more than 70 countries.

PostGIS is the most popular spatial relational database worldwide with:

  • An extensive catalog of spatial functions
  • Rich ecosystem of in-db extensions for routing, event management, external database linkage, point clouds, rasters and more
  • Numerous web frameworks for serving tiles and vector images
  • Tools for data loading and processing
  • Easy integration with QGIS, the most popular open-source mapping software

PostGIS is in our Everyday Lives

What blew me away this year is how PostGIS touches so many parts of our everyday lives. From high speed internet, to water and flood management, there’s PostGIS supporting major parts of the infrastructure we depend on every day.

Water, Soil, Floods, and Environment

We heard a talk about the GISwater project, which is open source asset management for water and wastewater tracking. Helping humanity, bringing the world water resources, using the open source tools. What could be better? This project just gives everyone all the feels.

In a similar way, my ears really perked up with the Vizzuality talk on using PostGIS for conversation efforts, tracking food supply chains, and sustainability.

We were really lucky to have two federal agencies join us as well. USDA talked about using PostGIS as the engine behind the Dynamic Soils Hub project, which tracked a number of key soil changes like erosion and farmland loss across America. A developer who works alongside the National Flood Insurance Program talked about some of his work to collect several massive open data sets and merge them into a set of building outlines for the entire country.

Telling stories about people

I also really appreciated several talks that went a little bit outside the box to talk about how PostGIS can be used in a more people focused context.  We had a nice talk about using PostGIS/QGIS to look at how you can explore infrastructure and how it impacts social inequity and health outcomes. There was also a great talk about routing people's movements outside roads, like on foot or inside buildings. Another talk went outside the box, or should I say inside the goal box, with a talk on how geospatial data is being collected and used in hockey and sports.

Developing nations

We were really fortunate to get a couple talks on how developers in Africa are using PostGIS. We had a talk on using QField, QGIS, and PostGIS for urban planning in Nigeria. A developer from Tanzania showed an isochrone map plugin he wrote for QGIS. Isochrone maps show distance from a certain point with time considered.

Canadian telecom

We had two talks from Canada, where they’re aiming to have the entire country covered by high speed internet by 2030. One of the talks described building a design system in QGIS and PostGIS and even training internal designers to use these tools. We had a second talk about the telecom industry, this one more focused on getting things up to scale for backing a really large enterprise.

Tooling Showcases

PosGIS Day also had a really broad set of talks about ecosystem tools. Seeing these all in total, you realize how big the world of PostGIS is and how many people are building large scale applications out of geospatial data.

PostGIS at scale

  • pgSTAC, an asset catalog used by the Microsoft Planetary Computer and NASA projects
  • Open Street Maps loading, tuning and tips for performance
  • dbt data loading and processing pipeline framework
  • pg_eventserv, an extension for doing real time updates and web sockets

PostGIS functions and tooling

Ecosystem projects

Thanks to everyone who participated this year! If you missed it, plan to join us next year. Mark your calendar, we’ll open a call for papers in September of 2023. Videos are linked above or available as a full playlist on Crunchy Data’s YouTube channel.

PostGIS at Crunchy Data

PostGIS is a supported extension with Crunchy Certified Postgres and is packaged with our products on traditional infrastructure, Kubernetes, and our fully managed Crunchy Bridge. Crunchy Data supports a variety of enterprise clients using PostGIS in production. If you’re using PostGIS and are looking for a host or supported distribution, contact us, we’d love to chat.

by Elizabeth Christensen (Elizabeth.Christensen@crunchydata.com) at December 02, 2022 03:00 PM

November 19, 2022

Boston GIS (Regina Obe, Leo Hsu)

Post GIS Day 2022 Celebrations

I didn't think last two years Post GIS Day conferences could be topped, but I was wrong. This year's was absolutely fabulous and better than all the others. We had two key conferences going on this year (yesterday). The first was the Cruncy Data PostGIS Day 2022 12 hour marathon of nothing but PostGIS related talks. Many thanks to Elizabeth Christensen and Paul Ramsey for putting it all together and for 12 hrs. The PostGIS day celebrations climaxed with many of us playing capture the flag on Paul's new shiny invention until we managed to crash it.

Continue reading "Post GIS Day 2022 Celebrations"

by Regina Obe (nospam@example.com) at November 19, 2022 12:29 AM

November 13, 2022

PostGIS Development

PostGIS 3.3.2, 3.2.4, 3.1.8, 3.0.8 Patch Releases

The PostGIS development team is pleased to provide security, bug fixes and performance enhancements 3.3.2, 3.2.4, 3.1.8 and 3.0.8 for the 3.3, 3.2, 3.1, and 3.0 stable branches.

November 13, 2022 12:00 AM

PostGIS Development

PostGIS 3.3.2, 3.2.4, 3.1.8, 3.0.8 Patch Releases

The PostGIS development team is pleased to provide security, bug fixes and performance enhancements 3.3.2, 3.2.4, 3.1.8 and 3.0.8 for the 3.3, 3.2, 3.1, and 3.0 stable branches.

by Regina Obe at November 13, 2022 12:00 AM

November 12, 2022

PostGIS Development

PostGIS 2.5.9 EOL

The PostGIS Team is pleased to release PostGIS 2.5.9! This is the end-of-life release of the 2.5 branch.

This release is a bug fix release, addressing issues found in the previous 2.5 releases.

November 12, 2022 10:00 AM

PostGIS Development

PostGIS 2.5.9 EOL

The PostGIS Team is pleased to release PostGIS 2.5.9! This is the end-of-life release of the 2.5 branch.

This release is a bug fix release, addressing issues found in the previous 2.5 releases.

by Regina Obe at November 12, 2022 10:00 AM

October 24, 2022

Crunchy Data

Moving Objects and Geofencing with Postgres & PostGIS

In a recent post, we introduced pg_eventserv and the real-time web notifications from database actions.

In this post, we will dive into a practical use case: displaying state, calculating events, and tracking historical location for a set of moving objects.

Screenshot

This demonstration uses pg_eventserv for eventing, and pg_featureserv for external web API, and OpenLayers as the map API, to build a small example application that shows off the common features of moving objects systems.

Try it out!

Features

Moving objects applications can be very complex or very simple, but they usually include a few common baseline features:

  • A real-time view of the state of the objects.
  • Live notifications when objects enter and leave a set of "geofences".
  • Querying the history of the system, to see where objects have been, and to summarize their state (eg, "truck 513 spent 50% of its time in the yard").

Tables

The model has three tables:

Tables

  • objects where the current location of the objects is stored.
  • geofences where the geofences are stored.
  • objects_history where the complete set of all object locations are stored.

Architecture (in brief)

From the outside, the system has the following architecture:

Architecture

Changes to objects are communicated in via a web API backed by pg_featureserv, those changes fires a bunch of triggers that generate events that pg_eventserv pushes out to listening clients via WebSockets.

Architecture (in detail)

  • The user interface generates object movements, via the arrow buttons for each object. This is in lieu of a real "moving object" fleet in the real world generating timestamped GPS tracks.

  • Every movement click on the UI fires a call to a web API, which is just a function published via pg_featureserv, object_move(object_id, direction).

postgisftw.object_move(object_id, direction)
CREATE OR REPLACE FUNCTION postgisftw.object_move(
    move_id integer, direction text)
RETURNS TABLE(id integer, geog geography)
AS $$
DECLARE
  xoff real = 0.0;
  yoff real = 0.0;
  step real = 2.0;
BEGIN

  yoff := CASE
    WHEN direction = 'up' THEN 1 * step
    WHEN direction = 'down' THEN -1 * step
    ELSE 0.0 END;

  xoff := CASE
    WHEN direction = 'left' THEN -1 * step
    WHEN direction = 'right' THEN 1 * step
    ELSE 0.0 END;

  RETURN QUERY UPDATE moving.objects mo
    SET geog = ST_Translate(mo.geog::geometry, xoff, yoff)::geography,
        ts = now()
    WHERE mo.id = move_id
    RETURNING mo.id, mo.geog;

END;
$$
LANGUAGE 'plpgsql' VOLATILE;
  • The object_move(object_id, direction) function just converts the "direction" parameter into a movement vector, and UPDATES the relevant row of the objects table.

  • The change to the objects table fires off the objects_geofence() trigger, which calculates the fences the object is now in.

objects_geofence()
CREATE FUNCTION objects_geofence() RETURNS trigger AS $$
    DECLARE
        fences_new integer[];
    BEGIN
        -- Add the current geofence state to the input
        -- tuple every time.
        SELECT coalesce(array_agg(id), ARRAY[]::integer[])
            INTO fences_new
            FROM moving.geofences
            WHERE ST_Intersects(geofences.geog, new.geog);

        RAISE DEBUG 'fences_new %', fences_new;
        -- Ensure geofence state gets saved
        NEW.fences := fences_new;
        RETURN NEW;
    END;

$$
LANGUAGE 'plpgsql';
  • The change to the objects table then fires off the objects_update() trigger, which:
    • Compares the current set of geofences to the previous set, and thus detects any enter/leave events.
    • Adds the new location of the object to the objects_history tracking table.
    • Composes the new location and any geofence events into a JSON object and puts it into the "objects" NOTIFY queue using pg_notify().
objects_update()
CREATE FUNCTION objects_update() RETURNS trigger AS
$$

    DECLARE
        channel text := 'objects';
        fences_old integer[];
        fences_entered integer[];
        fences_left integer[];
        events_json jsonb;
        location_json jsonb;
        payload_json jsonb;
    BEGIN
        -- Place a copy of the value into the history table
        INSERT INTO moving.objects_history (id, geog, ts, props)
            VALUES (NEW.id, NEW.geog, NEW.ts, NEW.props);

        -- Clean up any nulls
        fences_old := coalesce(OLD.fences, ARRAY[]::integer[]);
        RAISE DEBUG 'fences_old %', fences_old;

        -- Compare to previous fences state
        fences_entered = NEW.fences - fences_old;
        fences_left = fences_old - NEW.fences;

        RAISE DEBUG 'fences_entered %', fences_entered;
        RAISE DEBUG 'fences_left %', fences_left;

        -- Form geofence events into JSON for notify payload
        WITH r AS (
        SELECT 'entered' AS action,
            g.id AS geofence_id,
            g.label AS geofence_label
        FROM moving.geofences g
        WHERE g.id = ANY(fences_entered)
        UNION
        SELECT 'left' AS action,
            g.id AS geofence_id,
            g.label AS geofence_label
        FROM moving.geofences g
        WHERE g.id = ANY(fences_left)
        )
        SELECT json_agg(row_to_json(r))
        INTO events_json
        FROM r;

        -- Form notify payload
        SELECT json_build_object(
            'type', 'objectchange',
            'object_id', NEW.id,
            'events', events_json,
            'location', json_build_object(
                'longitude', ST_X(NEW.geog::geometry),
                'latitude', ST_Y(NEW.geog::geometry)),
            'ts', NEW.ts,
            'color', NEW.color,
            'props', NEW.props)
        INTO payload_json;

        RAISE DEBUG '%', payload_json;

        -- Send the payload out on the channel
        PERFORM (
            SELECT pg_notify(channel, payload_json::text)
        );

        RETURN NEW;
    END;

$$
LANGUAGE 'plpgsql';
  • pg_eventserv picks the event off the NOTIFY queue and pushes it out to all listening clients over WebSockets.
  • The user interface recieves the JSON payload, parses it, and applies the new location to the appropriate object. If there is a enter/leave event on a geofence, the UI also changes the geofence outline color appropriately.

Phew! That's a lot!

  • Side note, the geofences table also has a trigger, layer_change() that catches insert/update/delete events and publishes a JSON notification with pg_notify(). This is also published by pg_eventserv and when the UI receives it, it simply forces a full re-load of geofence data.
layer_change()
CREATE FUNCTION layer_change() RETURNS trigger AS
$$

    DECLARE
        layer_change_json json;
        channel text := 'objects';
    BEGIN
        -- Tell the client what layer changed and how
        SELECT json_build_object(
            'type', 'layerchange',
            'layer', TG_TABLE_NAME::text,
            'change', TG_OP)
          INTO layer_change_json;

        RAISE DEBUG 'layer_change %', layer_change_json;
        PERFORM (
            SELECT pg_notify(channel, layer_change_json::text)
        );
        RETURN NEW;
    END;

$$
LANGUAGE 'plpgsql';

OK, all done.

Trying It Out Yourself

All the code and instructions are available in the moving objects example of pg_eventserv.

Conclusion

  • Moving objects are a classic case of "system state stored in the database".
  • PostgreSQL provides the LISTEN/NOTIFY system to update clients about real-time changes.
  • The pg_eventserv service allows you to push LISTEN/NOTIFY events further out to any WebSockets client and generate a moving object map.
  • Because the state is managed in the database, storing the historical state of the system is trivially easy. $$

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at October 24, 2022 03:00 PM

October 17, 2022

Boston GIS (Regina Obe, Leo Hsu)

Using pg_upgrade to upgrade PostgreSQL 9.6 PostGIS 2.4 to PostgreSQL 15 3.3 on Yum

PostgreSQL 15 came out just last week. To celebrate the arrival of PostgreSQL 15, I will revisit the number one problem people have with PostGIS, how to upgrade your PostGIS enabled cluster, without installing an old PostGIS version.

In a previous article Using pg upgrade to upgrade PostGIS without installing older version I demonstrated a trick for upgrading to a newer PostgreSQL instance from PostGIS 2.2 - 2.whatever without having to install the older version of PostGIS in your new PostgreSQL service. This is a revisit of that article, but with considerations for upgrading from a 2 series to a 3 series.

Fear not. Going from PostGIS 2+ to PostGIS 3+ can still be done without installing the old PostGIS 2+ in your new cluster. Additionally once you are on PostGIS 3.1+, you should never have to do symlink or copy hacks to upgrade say PostGIS 3.1 to PostGIS 3.4. At least not for the same major version.

If per chance you are on PostGIS 1 and trying to jump all the way to PostGIS 3+, then sadly you need to do a pg_dump of your old database/cluster and pg_restore into your new cluster.

Continue reading "Using pg_upgrade to upgrade PostgreSQL 9.6 PostGIS 2.4 to PostgreSQL 15 3.3 on Yum"

by Regina Obe (nospam@example.com) at October 17, 2022 01:15 AM

October 13, 2022

Martin Davis (Lin.ear th.inking)

Relational Properties of DE-9IM spatial predicates

There is an elegant mathematical theory of binary relations.  Homogeneous relations are an important subclass of binary relations in which both domains are the same.  A homogeneous relation R is a subset of all ordered pairs (x,y) with x and y elements of the domain.  This can be thought of as a boolean-valued function R(x,y), which is true if the pair has the relationship and false if not.

The restricted structure of homogeneous relations allows describing them by various properties, including:

Reflexive - R(x, x) is always true

Irreflexive - R(x, x) is never true

Symmetric - if R(x, y), then R(y, x)

Antisymmetric - if R(x, y) and R(y, x), then x = y

Transitive - if R(x, y) and R(y, z), then R(x, z)

The Dimensionally Extended 9-Intersection Model (DE-9IM) represents the topological relationship between two geometries.  

Various useful subsets of spatial relationships are specified by named spatial predicates.  These are the basis for spatial querying in many spatial systems including the JTS Topology Suite, GEOS and PostGIS.

The spatial predicates are homogeneous binary relations over the Geometry domain  They can thus be categorized in terms of the relational properties above.  The following table shows the properties of the standard predicates:

PredicateReflexive /
Irreflexive
Symmetric /
Antisymmetric
Transitive

EqualsRST
IntersectsRS-
DisjointRS-
ContainsRA-
WithinRA-
CoversRAT
CoveredByRAT
CrossesIS-
OverlapsIS-
TouchesIS-

Notes

  • Contains and Within are not Transitive because of the quirk that "Polygons do not contain their Boundary" (explained in this post).  A counterexample is a Polygon that contains a LineString lying in its boundary and interior, with the LineString containing a Point that lies in the Polygon boundary.  The Polygon does not contain the Point.  So Contains(poly, line) = true and Contains(line, pt) = true, but Contains(poly, pt) = false.   The predicates Covers and CoveredBy do not have this idiosyncrasy, and thus are transitive.

by Dr JTS (noreply@blogger.com) at October 13, 2022 10:03 PM

October 01, 2022

Paul Ramsey

September 30, 2022

Anita Graser (Underdark)

Detecting close encounters using MobilityDB 1.0

It’s been a while since we last talked about MobilityDB in 2019 and 2020. Since then, the project has come a long way. It joined OSGeo as a community project and formed a first PSC, including the project founders Mahmoud Sakr and Esteban Zimányi as well as Vicky Vergara (of pgRouting fame) and yours truly.

This post is a quick teaser tutorial from zero to computing closest points of approach (CPAs) between trajectories using MobilityDB.

Setting up MobilityDB with Docker

The easiest way to get started with MobilityDB is to use the ready-made Docker container provided by the project. I’m using Docker and WSL (Windows Subsystem Linux on Windows 10) here. Installing WLS/Docker is out of scope of this post. Please refer to the official documentation for your operating system.

Once Docker is ready, we can pull the official container and fire it up:

docker pull mobilitydb/mobilitydb
docker volume create mobilitydb_data
docker run --name "mobilitydb" -d -p 25432:5432 -v mobilitydb_data:/var/lib/postgresql mobilitydb/mobilitydb
psql -h localhost -p 25432 -d mobilitydb -U docker

Currently, the container provides PostGIS 3.2 and MobilityDB 1.0:

Loading movement data into MobilityDB

Once the container is running, we can already connect to it from QGIS. This is my preferred way to load data into MobilityDB because we can simply drag-and-drop any timestamped point layer into the database:

For this post, I’m using an AIS data sample in the region of Gothenburg, Sweden.

After loading this data into a new table called ais, it is necessary to remove duplicate and convert timestamps:

CREATE TABLE AISInputFiltered AS
SELECT DISTINCT ON("MMSI","Timestamp") *
FROM ais;

ALTER TABLE AISInputFiltered ADD COLUMN t timestamp;
UPDATE AISInputFiltered SET t = "Timestamp"::timestamp;

Afterwards, we can create the MobilityDB trajectories:

CREATE TABLE Ships AS
SELECT "MMSI" mmsi,
tgeompoint_seq(array_agg(tgeompoint_inst(Geom, t) ORDER BY t)) AS Trip,
tfloat_seq(array_agg(tfloat_inst("SOG", t) ORDER BY t) FILTER (WHERE "SOG" IS NOT NULL) ) AS SOG,
tfloat_seq(array_agg(tfloat_inst("COG", t) ORDER BY t) FILTER (WHERE "COG" IS NOT NULL) ) AS COG
FROM AISInputFiltered
GROUP BY "MMSI";

ALTER TABLE Ships ADD COLUMN Traj geometry;
UPDATE Ships SET Traj = trajectory(Trip);

Once this is done, we can load the resulting Ships layer and the trajectories will be loaded as lines:

Computing closest points of approach

To compute the closest point of approach between two moving objects, MobilityDB provides a shortestLine function. To be correct, this function computes the line connecting the nearest approach point between the two tgeompoint_seq. In addition, we can use the time-weighted average function twavg to compute representative average movement speeds and eliminate stationary or very slowly moving objects:

SELECT S1.MMSI mmsi1, S2.MMSI mmsi2, 
       shortestLine(S1.trip, S2.trip) Approach,
       ST_Length(shortestLine(S1.trip, S2.trip)) distance
FROM Ships S1, Ships S2
WHERE S1.MMSI > S2.MMSI AND
twavg(S1.SOG) > 1 AND twavg(S2.SOG) > 1 AND
dwithin(S1.trip, S2.trip, 0.003)

In the QGIS Browser panel, we can right-click the MobilityDB connection to bring up an SQL input using Execute SQL:

The resulting query layer shows where moving objects get close to each other:

To better see what’s going on, we’ll look at individual CPAs:

Having a closer look with the Temporal Controller

Since our filtered AIS layer has proper timestamps, we can animate it using the Temporal Controller. This enables us to replay the movement and see what was going on in a certain time frame.

I let the animation run and stopped it once I spotted a close encounter. Looking at the AIS points and the shortest line, we can see that MobilityDB computed the CPAs along the trajectories:

A more targeted way to investigate a specific CPA is to use the Temporal Controllers’ fixed temporal range mode to jump to a specific time frame. This is helpful if we already know the time frame we are interested in. For the CPA use case, this means that we can look up the timestamp of a nearby AIS position and set up the Temporal Controller accordingly:

More

I hope you enjoyed this quick dive into MobilityDB. For more details, including talks by the project founders, check out the project website.


This post is part of a series. Read more about movement data in GIS.

by underdark at September 30, 2022 04:24 PM

September 10, 2022

PostGIS Development

PostGIS 3.3.1

The PostGIS Team is pleased to release PostGIS 3.3.1.

This is a bug fix release to address an issue compiling against PostgreSQL 15 Beta 4.

Best served with PostgreSQL 15 Beta 4.

September 10, 2022 12:00 AM

PostGIS Development

PostGIS 3.3.1

The PostGIS Team is pleased to release PostGIS 3.3.1.

This is a bug fix release to address an issue compiling against PostgreSQL 15 Beta 4.

Best served with PostgreSQL 15 Beta 4.

by Regina Obe at September 10, 2022 12:00 AM

August 27, 2022

PostGIS Development

PostGIS 3.3.0 Released

The PostGIS Team is pleased to release PostGIS 3.3.0.

August 27, 2022 12:00 AM

PostGIS Development

PostGIS 3.3.0 Released

The PostGIS Team is pleased to release PostGIS 3.3.0.

by Regina Obe at August 27, 2022 12:00 AM

August 22, 2022

PostGIS Development

PostGIS 3.3.0rc2

The PostGIS Team is pleased to release PostGIS 3.3.0rc2! Best Served with PostgreSQL 15 beta3 ,GEOS 3.11.0 , and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0rc2

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.3.

August 22, 2022 12:00 AM

PostGIS Development

PostGIS 3.3.0rc2

The PostGIS Team is pleased to release PostGIS 3.3.0rc2! Best Served with PostgreSQL 15 beta3 ,GEOS 3.11.0 , and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0rc2

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.3.

by Regina Obe at August 22, 2022 12:00 AM

August 18, 2022

PostGIS Development

PostGIS 3.2.3, 3.1.7, 3.0.7, 2.5.8 Released

The PostGIS development team is pleased to provide bug fix and performance enhancements 3.2.3, 3.1.7, 3.0.7, 2.5.8 for the 3.2, 3.1, 3.0, 2.5 stable branches.

August 18, 2022 12:00 AM

PostGIS Development

PostGIS 3.2.3, 3.1.7, 3.0.7, 2.5.8 Released

The PostGIS development team is pleased to provide bug fix and performance enhancements 3.2.3, 3.1.7, 3.0.7, 2.5.8 for the 3.2, 3.1, 3.0, 2.5 stable branches.

by Regina Obe at August 18, 2022 12:00 AM

August 15, 2022

Crunchy Data

Rise of the Anti-Join

Find me all the things in set "A" that are not in set "B".

This is a pretty common query pattern, and it occurs in both non-spatial and spatial situations. As usual, there are multiple ways to express this query in SQL, but only a couple queries will result in the best possible performance.

Setup

The non-spatial setup starts with two tables with the numbers 1 to 1,000,000 in them, then deletes two records from one of the tables.

CREATE TABLE a AS SELECT generate_series(1,1000000) AS i
ORDER BY random();
CREATE INDEX a_x ON a (i);

CREATE TABLE b AS SELECT generate_series(1,1000000) AS i
ORDER BY random();
CREATE INDEX b_x ON b (i);

DELETE FROM b WHERE i = 444444;
DELETE FROM b WHERE i = 222222;

ANALYZE;

The spatial setup is a 2M record table of geographic names, and a 3K record table of county boundaries. Most of the geonames are inside counties (because we tend to names things on land) but some of them are not (because sometimes we name things in the ocean, or our boundaries are not detailed enough).

Subqueries? No.

Since the problem statement includes the words "not in", this form of the query seems superficially plausible:

SELECT i
  FROM a
  WHERE i NOT IN (SELECT i FROM b);

Perfect! Give me everything from "A" that is not in "B"! Just what we want? In fact, running the query takes so long that I never got it to complete. The explain gives some hints.

                                  QUERY PLAN
------------------------------------------------------------------------------
 Gather  (cost=1000.00..5381720008.33 rows=500000 width=4)
   Workers Planned: 2
   ->  Parallel Seq Scan on a  (cost=0.00..5381669008.33 rows=208333 width=4)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..23331.97 rows=999998 width=4)
                 ->  Seq Scan on b  (cost=0.00..14424.98 rows=999998 width=4)

Note that the subquery ends up materializing the whole second table into memory, where it is scanned over and over and over to test each key in table "A". Not good.

Except? Maybe.

PostgreSQL supports some set-based key words that allow you to find logical combinations of queries: UNION, INTERSECT and EXCEPT.

Here, we can make use of EXCEPT.

SELECT a.i FROM a
EXCEPT
SELECT b.i FROM b;

The SQL still matches our mental model of the problem statement: everything in "A" except for everything in "B".

And it returns a correct answer in about 2.3 seconds.

   i
--------
 222222
 444444
(2 rows)

The query plan is interesting: the two tables are appended and then sorted for duplicates and then only non-dupes are omitted!

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 SetOp Except  (cost=322856.41..332856.40 rows=1000000 width=8)
   ->  Sort  (cost=322856.41..327856.41 rows=1999998 width=8)
         Sort Key: "*SELECT* 1".i
         ->  Append  (cost=0.00..58849.95 rows=1999998 width=8)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..24425.00 rows=1000000 width=8)
                     ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..24424.96 rows=999998 width=8)
                     ->  Seq Scan on b  (cost=0.00..14424.98 rows=999998 width=4)

It's a big hammer, but it works.

Anti-Join? Yes.

The best approach is the "anti-join". One way to express an anti-join is with a special "correlated subquery" syntax:

SELECT a.i
  FROM a
  WHERE NOT EXISTS
    (SELECT b.i FROM b WHERE a.i = b.i);

So this returns results from "A" only where those results result in a no-record-returned subquery against "B".

It takes about 850 ms on my test laptop, so 3 times faster than using EXCEPT in this test, and gets the right answer. The query plan looks like this:

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Gather  (cost=16427.98..31466.36 rows=2 width=4)
   Workers Planned: 2
   ->  Parallel Hash Anti Join  (cost=15427.98..30466.16 rows=1 width=4)
         Hash Cond: (a.i = b.i)
         ->  Parallel Seq Scan on a  (cost=0.00..8591.67 rows=416667 width=4)
         ->  Parallel Hash  (cost=8591.66..8591.66 rows=416666 width=4)
               ->  Parallel Seq Scan on b  (cost=0.00..8591.66 rows=416666 width=4)

The same sentiment can be expressed without the NOT EXISTS construct, using only basic SQL and a LEFT JOIN:

SELECT a.i
  FROM a
  LEFT JOIN b ON (a.i = b.i)
  WHERE b.i IS NULL;

This also takes about 850 ms.

The LEFT JOIN is required to return a record for every row of "A". So what does it do if there's no record in "B" that satisfies the join condition? It returns NULL for the columns of "B" in the join relation for those records. That means any row with a NULL in a column of "B" that is normally non-NULL is a record in "A" that is not in "B".

Now do Spatial

The nice thing about the LEFT JOIN expression of of the solution is that it generalizes nicely to arbitrary join conditions, like those using spatial predicate functions.

"Find the geonames points that are not inside counties"... OK, we will LEFT JOIN geonames with counties and find the records where counties are NULL.

SELECT g.name, g.geonameid, g.geom
  FROM geonames g
  LEFT JOIN counties c
    ON ST_Contains(c.geom, g.geom)
  WHERE g.geom IS NULL;

The answer pops out in about a minute.

Unsurprisingly, that's about how long a standard inner join takes to associate the 2M geonames with the 3K counties, since the anti-join has to do about that much work to determine which records do not match the join condition.

Conclusion

  • "Find the things in A that aren't in B" is a common use pattern.
  • The "obvious" SQL patterns might not be the most efficient ones.
  • The WHERE NOT EXISTS and LEFT JOIN patterns both result in the most efficient query plans and executions.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at August 15, 2022 12:00 PM

August 08, 2022

PostGIS Development

PostGIS 3.3.0rc1

The PostGIS Team is pleased to release PostGIS 3.3.0rc1! Best Served with PostgreSQL 15 beta2 ,GEOS 3.11.0 , and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0rc1

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.2 and new features.

August 08, 2022 12:00 AM

PostGIS Development

PostGIS 3.3.0rc1

The PostGIS Team is pleased to release PostGIS 3.3.0rc1! Best Served with PostgreSQL 15 beta2 ,GEOS 3.11.0 , and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0rc1

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.2 and new features.

August 08, 2022 12:00 AM

July 23, 2022

PostGIS Development

PostGIS 3.2.2

The PostGIS Team is pleased to release PostGIS 3.2.2! This release works for PostgreSQL 9.6-15.

3.2.2

This release is a bug fix release, addressing issues found in the previous 3.2 releases.

July 23, 2022 12:00 AM

PostGIS Development

PostGIS 3.2.2

The PostGIS Team is pleased to release PostGIS 3.2.2! This release works for PostgreSQL 9.6-15.

3.2.2

This release is a bug fix release, addressing issues found in the previous 3.2 releases.

July 23, 2022 12:00 AM

July 20, 2022

PostGIS Development

PostGIS 3.1.6

The PostGIS Team is pleased to release PostGIS 3.1.6! This release works for PostgreSQL 9.6-14.

3.1.6

This release is a bug fix release, addressing issues found in the previous 3.1 releases.

July 20, 2022 12:50 PM

PostGIS Development

PostGIS 3.1.6

The PostGIS Team is pleased to release PostGIS 3.1.6! This release works for PostgreSQL 9.6-14.

3.1.6

This release is a bug fix release, addressing issues found in the previous 3.1 releases.

July 20, 2022 12:50 PM

PostGIS Development

PostGIS 3.0.6

The PostGIS Team is pleased to release PostGIS 3.0.6. This release works for PostgreSQL 9.5-13.

3.0.6

July 20, 2022 12:00 AM

PostGIS Development

PostGIS 3.0.6

The PostGIS Team is pleased to release PostGIS 3.0.6. This release works for PostgreSQL 9.5-13.

3.0.6

July 20, 2022 12:00 AM

June 23, 2022

Crunchy Data

Postgres Indexes, Selectivity, and Statistics

The sheer cleverness of relational databases is often discounted because we so frequently use them for very simple data management tasks.

Serialize an object into a row, store with unique key. yawwwn

Search for unique key, deserialize row into an object. yawwwwwwn

The real power of relational databases is juggling "relations" (aka tables) in large numbers and figuring out on-the-fly the most effective way to filter out rows and find an answer.

PostgreSQL has an undeniably clever query planning system that auto-tunes based on the data in the system. It samples tables to gain statistics about the distribution of data, and uses those statistics to choose the order of joins and filters applied to the data for the most efficient query execution.

Even more amazing, the query planning system is modular enough to integrate user-defined data types, like the geometry and geography types in PostGIS. So complex queries involving spatial filters are also correctly planned on-the-fly.

Statistics Targets

Mostly, this just works magically. The system has a rough idea of the selectivity of any given filter or join condition, and can use that estimate to choose the most efficient order of operations to execute multi-table and multi-filter queries over complex collections of relational tables.

But what about when things go wrong?

By default, a PostgreSQL data ships with a default_statistics_target of 100. This means that to populate the statistics for a column the database will draw a sample of 300 * default_statistics_target = 30000 rows.

The system will then use that data to populate the "common value list" and column histogram with roughly default_statistics_target entries.

A large table of keys arranged in a Pareto distribution can make things hard for the statistics system. There may be more common keys than can easily fit into the "common value list", so joins get poorly planned for example.

Fortunately, there are some table specific knobs you can turn.

The statistics target on each table and column can be individually changed, so that more data are sampled, and more granular statistics are gathered.

ALTER TABLE people ALTER COLUMN age SET STATISTICS 200;

Gathering more statistics is usually sufficient to correct planning behavior. However, in special cases (like when a query uses multiple filters on strongly correlated columns) it might be necessary to pull out "extended statistics" to gather even more data for planning.

Indexes Make (Some) Things Faster

The thing about magic is that underneath there is usually some complex machinery to make it all work transparently.

Pretty much every database user understands that there's a thing called an "index" and that adding an "index" to a column will make searches on that column faster.

For most uses, this rule-of-thumb is correct: indexes make random access of a small number of items faster, at the expense of making inserts and updates to a table slightly slower.

What happens, though, if you are accessing a large number of items from a table?

An index is a tree. It's fast to find one item traversing through a tree, it's called an "index scan". But finding a lot of items involves running through the tree over and over and over. It is slow to index scan all the records in a tree.

An efficient index scan query will have a narrow filter:

SELECT * FROM people WHERE age = 4;

Conversely, reading a table from from front to back can be quite fast, if you are interested in most of the records. This is called a "sequence scan".

An efficient sequence scan query will have a wide filter:

SELECT * FROM people WHERE age BETWEEN 45 AND 95;

For any given filter, a relational database will figure out whether the filter is most efficiently run as an index scan or a sequence scan. But how?

Selectivity

The "selectivity" of a filter is a measure of what proportion of the records will be returned by the filter. A filter that returns only 1% of the records is "very selective", and vice versa.

An ordinary filter on numbers or strings or dates might define an equality or a range as a filter. A spatial filter on geographic objects defines a bounding box as a filter. A filter is just a true/false test that every object in the column can be subjected to.

Here's two spatial filters, which one do you suppose is more selective?

The red one, right? Just like our SQL example above on ages, the smaller rectangle must return fewer records. Except not necessarily. Suppose our data table looked like this.

So the blue box is selective and should use an index scan and the red one should use a sequence scan.

But wait, how does the database know which box is selective, without running a query and actually applying the filters to the data?

Statistics

When the ANALYZE command is run, the database will gather "statistics" for every column that has an index on it. (The "autovacuum" system that runs in the background of every PostgreSQL cluster is also an "autoanalyze", gathering statistics at the same time it does table maintenance.)

For data types like integers, real numbers, text and dates the system will gather a "common value list" of values that show up frequently in the table, and a histogram of the frequency of values in different ranges.

For spatial data types, the system builds a spatial histogram of frequencies of occurrance. So, for our example data set, something like this.

Instead of being a potentially unconstrained collection of maybe millions of spatial objects, the database now just has to make a quick summary of a few cells in a histogram to get an estimate of how many features a given query filter will return.

The spatial statistics analyzer in PostGIS is actually a little more sophisticated than that: before filling out the histogram it adjusts the cell widths for each dimension, to try and catch more resolution in places where the data is more dense.

All these statistics do have a cost. Increasing statistics targets will make the ANALYZE command run a little slower, as it samples more data. It will also make the planning stage of queries slower, as the larger collection of statistics need to be read through and compared to the filters and joins in the SQL. However, for complex BI queries, more statistics are almost always worth while to get a better plan for a complex query.

Conclusion

  • PostgreSQL has an undeniably clever query planning system which auto-tunes based on the data in the system.
  • PostGIS plugs right into that clever system and provides spatial selectivity estimates to ensure good plans for spatial queries too.
  • For even more complex situations, such as selectivity estimates on highly correlated columns, PostgreSQL also includes an "extended statistics" system to capture those cases.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at June 23, 2022 03:00 PM