### ### Planet PostGIS

Welcome to Planet PostGIS

August 26, 2019

Paul Ramsey

Waiting for PostGIS 3: GEOS 3.8

While PostGIS includes lots of algorithms and functionality we have built ourselves, it also adds geospatial smarts to PostgreSQL by linking in specialized libraries to handle particular problems:

  • Proj for coordinate reference support;
  • GDAL for raster functions and formats;
  • GEOS for computational geometry (basic operations);
  • CGAL for more computational geometry (3D operations); and
  • for format support, libxml2, libjsonc, libprotobuf-c

Many of the standard geometry processing functions in PostGIS are actually evaluated inside the GEOS library, so updates in GEOS are very important to PostGIS – they add new functionality or smooth the behaviour of existing functions.

Functions backed by GEOS include:

These functions are all “overlay operation” functions – they take in geometry arguments and construct new geometries for output. Under the covers is an operation called an “overlay”, which combines all the edges of the inputs into a graph and then extracts new outputs from that graph.

While the “overlay operations” in GEOS are very reliable, they are not 100% reliable. When operations fail, the library throws the dreaded TopologyException, which indicates the graph is in an inconsistent and unusable state.

Because there are a lot of PostGIS users and they manage a lot of data, there are a non-zero number of cases that cause TopologyExceptions, and upset users. We would like take that number down to zero.

With luck, GEOS 3.8 will succeed in finally bringing fully robust overlay operations to the open source community. The developer behind the GEOS algorithms, Martin Davis, recently joined Crunchy Data, and has spent this summer working on a new overlay engine.

Overlay failures are caused when intersections between edges result in inconsistencies in the overlay graph. Even using double precision numbers, systems have only 51 bits of precision to represent coordinates, and that fixed precision can result in graphs that don’t correctly reflect their inputs.

The solution is building a system that can operate on any fixed precision and retain valid geometry. As an example, here the new engine builds valid representations of Europe at any precision, even ludicrously coarse ones.

europe at different precisions

In practice, the engine will be used with a tolerance that is close to double precision, but still provides enough slack to handle tricky cases in ways that users find visually “acceptable”. Initially the new functionality should slot under the existing PostGIS functions without change, but in the future we will be able to expose knobs to allow users to explicitly set the precision domain they want to work in.

GEOS 3.8 may not be released in time for PostGIS 3, but it will be a close thing. In addition to the new overlay engine, a lot of work has been done making the code base cleaner, using more “modern” C++ idioms, and porting forward new fixes to existing algorithms.

August 26, 2019 08:00 AM

August 25, 2019

Paul Ramsey

Waiting for PostGIS 3: Parallelism in PostGIS

Parallel query has been a part of PostgreSQL since 2016 with the release of version 9.6 and in theory PostGIS should have been benefiting from parallelism ever since.

In practice, the complex nature of PostGIS has meant that very few queries would parallelize under normal operating configurations – they could only be forced to parallelize using oddball configurations.

With PostgreSQL 12 and PostGIS 3, parallel query plans will be generated and executed far more often, because of changes to both pieces of software:

  • PostgreSQL 12 includes a new API that allows extensions to modify query plans and add index clauses. This has allowed PostGIS to remove a large number of inlined SQL functions that were previously acting as optimization barriers to the planner.
  • PostGIS 3 has taken advantage of the removal of the SQL inlines to re-cost all the spatial functions with much higher costs. The combination of function inlining and high costs used to cause the planner to make poor decisions, but with the updates in PostgreSQL that can now be avoided.

Increasing the costs of PostGIS functions has allowed us to encourage the PostgreSQL planner to be more aggressive in choosing parallel plans.

PostGIS spatial functions are far more computationally expensive than most PostgreSQL functions. An area computation involves lots of math involving every point in a polygon. An intersection or reprojection or buffer can involve even more. Because of this, many PostGIS queries are bottlenecked on CPU, not on I/O, and are in an excellent position to take advantage of parallel execution.

One of the functions that benefits from parallelism is the popular ST_AsMVT() aggregate function. When there are enough input rows, the aggregate will fan out and parallelize, which is great since ST_AsMVT() calls usually wrap a call to the expensive geometry processing function, ST_AsMVTGeom().

Tile 1/0/0

Using the Natural Earth Admin 1 layer of states and provinces as an input, I ran a small performance test, building a vector tile for zoom level one.

parallel MVT tile performance

Spatial query performance appears to scale about the same as non-spatial as the number of cores increases, taking 30-50% less time with each doubling of processors, so not quite linearly.

Join, aggregates and scans all benefit from parallel planning, though since the gains are sublinear there’s a limit to how much performance you can extract from an operation by throwing more processors at at. Also, operations that do a large amount of computation within a single function call, like ST_ClusterKMeans, do not automatically parallelize: the system can only parallelize the calling of functions multiple times, not the internal workings of single functions.

August 25, 2019 08:00 AM

August 24, 2019

Paul Ramsey

Waiting for PostGIS 3: ST_Transform() and Proj6

Where are you? Go ahead and figure out your answer, I’ll wait.

No matter what your answer, whether you said “sitting in my office chair” or “500 meters south-west of city hall” or “48.43° north by 123.36° west”, you expressed your location relative to something else, whether that thing was your office layout, your city, or Greenwich.

A geospatial database like PostGIS has to have able to convert between these different reference frames, known as “coordinate reference systems”. The math for these conversions and the definition of the standards needed to align them all is called “geodesy”, a field with sufficient depth and detail that you can make a career out of it.

geographic crs

Fortunately for users of PostGIS, most of the complexity of geodesy is hidden under the covers, and all you need to know is that different common coordinate reference systems are described in the spatial_ref_sys table, so making conversions between reference system involves knowing just two numbers: the srid (spatial reference id) of your source reference system, and the srid of your target system.

For example, to convert (and display) a point from the local British Columbia Albers coordinate reference system to geographic coordinates relative to the North American datum (NAD83), the following SQL is used:

SELECT ST_AsText(
    ST_Transform(
        ST_SetSRID('POINT(1195736 383004)'::geometry, 3005),
        4269)
    )
                 st_astext                 
-------------------------------------------
 POINT(-123.360004575121 48.4299914959586)

PostGIS makes use of the Proj library for coordinate reference system conversions, and PostGIS 3 will support the latest Proj release, version 6.

Proj 6 includes support for time-dependent datums and for direct conversion between datums. What does that mean? Well, one of the problems with the earth is that things move. And by things, I mean the ground itself, the very things we measure location relative to.

Highway fault

As location measurement becomes more accurate, and expectations for accuracy grow, reference shifts that were previously dealt with every 50 years have to be corrected closer to real time.

  • North America had two datums set in the twentieth century, NAD 27 and NAD 83. In 2022, North America will get new datums, fixed to the tectonic plates that make up the continent, and updated regularly to account for continental drift.
  • Australia is on fast moving plates that travel about 7cm per year, and will modernize its datums in 2020. 7cm/year may not sount like much, but that means that a coordinate captured to centimeter accuracy will be almost a meter out of place in a decade. For an autonomous drone navigating an urban area, that could be the difference between an uneventful trip and a crash.

Being able to accurately convert between local reference frames, like continental datums where static data are captured, to global frames like those used by the GPS/GLONASS/Galileo systems is critical for accurate and safe geo-spatial calculations.

Local vs global datum

Proj 6 combines updates to handle the new frames, along with computational improvements to make conversions between frames more accurate. Older versions used a “hub and spoke” system for conversions between systems: all conversions had WGS84 as a “neutral” frame in the middle.

A side effect of using WGS84 as a pivot system was increased error, because no conversion between reference systems is error free: a conversion from one frame to another would accrue the error associated with two conversions, instead of one. Additionally, local geodesy agencies–like NOAA in the USA and GeoScience Australia–published very accurate direct transforms between historical datums, like NAD27 and NAD83, but old versions of Proj relied on hard-coded hacks to enable direct transforms. Proj 6 automatically finds and uses direct system-to-system transforms where they exist, for the most accurate possible transformation.

August 24, 2019 08:00 AM

August 23, 2019

Paul Ramsey

Waiting for PostGIS 3: ST_TileEnvelope(z,x,y)

With the availability of MVT tile format in PostGIS via ST_AsMVT(), more and more people are generating tiles directly from the database. Doing so usually involves a couple common steps:

  • exposing a tiled web map API over HTTP
  • converting tile coordinates to ground coordinates to drive tile generation

Tile coordinates consist of three values:

  • zoom, the level of the tile pyramid the tile is from
  • x, the coordinate of the tile at that zoom, counting from the left, starting at zero
  • y, the coordinate of the tile at that zoom, counting from the top, starting at zero

Tile pyramid

Most tile coordinates reference tiles built in the “spherical mercator” projection, which is a planar project that covers most of the world, albeit with substantial distance distortions the further north you go.

Knowing the zoom level and tile coordinates, the math to find the tile bounds in web mercator is fairly straightforward.

Tile envelope in Mercator

Most of the people generating tiles from the database write their own small wrapper to convert from tile coordinate to mercator, as we demonstrated in a blog post last month.

It seems duplicative for everyone to do that, so we have added a utility function: ST_TileEnvelope()

By default, ST_TileEnvelope() takes in zoom, x and y coordinates and generates bounds in Spherical Mercator.

SELECT ST_AsText(ST_TileEnvelope(3, 4, 2));
                    st_astext                                       
----------------------------------------------------
 POLYGON((0 5009377.5,0 10018755,5009377.5 10018755,
          5009377.5 5009377.5,0 5009377.5))

If you need to generate tiles in another coordinate system – a rare but not impossible use case – you can swap in a different spatial reference system and tile set bounds via the bounds parameter, which can encode both the tile plane bounds and the spatial reference system in a geometry:

SELECT ST_AsText(ST_TileEnvelope(
    3, 4, 2, 
    bounds => ST_MakeEnvelope(0, 0, 1000000, 1000000, 3005)
    ));

Note that the same tile coordinates generate different bounds – because the base level tile bounds are different.

                      st_astext                                     
--------------------------------------------------------
 POLYGON((500000 625000,500000 750000,625000 750000,
          625000 625000,500000 625000))

ST_TileEnvelope() will also happily generate non-square tiles, if you provide a non-square bounds.

August 23, 2019 08:00 AM

August 22, 2019

Paul Ramsey

Waiting for PostGIS 3: Separate Raster Extension

The raster functionality in PostGIS has been part of the main extension since it was introduced. When PostGIS 3 is released, if you want raster functionality you will need to install both the core postgis extension, and also the postgis_raster extension.

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;

Breaking out the raster functionality allows packagers to more easily build stripped down “just the basics” PostGIS without also building the raster dependencies, which include the somewhat heavy GDAL library.

The raster functionality remains intact however, and you can still do nifty things with it.

For example, download and import a “digital elevation model” file from your local government. In my case, a file that covers the City of Vancouver.

Vancouver DEM

# Make a new working database and enable postgis + raster
createdb yvr_raster
psql -c 'CREATE EXTENSION postgis' yvr_raster
psql -c 'CREATE EXTENSION postgis_raster' yvr_raster

# BC open data from https://pub.data.gov.bc.ca/datasets/175624/
# Download the DEM file, unzip and load into the database
wget https://pub.data.gov.bc.ca/datasets/175624/92g/092g06_e.dem.zip
unzip 092g06_e.dem.zip

# Options: create index, add filename to table, SRID is 4269, use 56x56 chip size
raster2pgsql -I -F -s 4269 -t 56x56 092g06_e.dem dem092g06e | psql yvr_raster

After the data load, we have a table of 56 pixel square elevation chips named dem092g06e. If you map the extents of the chips, they look like this:

Vancouver DEM Chips

Imagine a sealevel rise of 30 meters (in an extreme case, Greenland plus Antarctica would be 68 meters). How much of Vancouver would be underwater? It’s mostly a hilly place. Let’s find out.

CREATE TABLE poly_30 AS 
  SELECT (
   ST_DumpAsPolygons(
    ST_SetBandNoDataValue(
     ST_Reclass(
      ST_Union(rast), 
      '0-30:1-1, 31-5000:0-0', '2BUI'),
     0))).*
FROM dem092g06e d

There are a lot of nested functions, so reading from the innermost, we:

  • union all the chips together into one big raster
  • reclassify all values from 0-30 to 1, and all higher values to 0
  • set the “nodata” value to 0, we don’t care about things that are above our threshold
  • create a vector polygon for each value in the raster (there only one value: “1”)

The result looks like this:

Vancouver 30m Flood Zone

We can grab building footprints for Vancouver and see how many buildings are going to be underwater.

# Vancouver open data 
# https://data.vancouver.ca/datacatalogue/buildingFootprints.htm
wget ftp://webftp.vancouver.ca/OpenData/shape/building_footprints_2009_shp.zip
unzip building_footprints_2009_shp.zip

# Options: create index, SRID is 26910, use dump format
shp2pgsql -I -s 26910 -D building_footprints_2009 buildings | psql yvr_raster

Before we can compare the buildings to the flood zone, we need to put them into the same projection as the flood zone (SRID 4269).

ALTER TABLE buildings
ALTER COLUMN geom 
TYPE geometry(MultiPolygonZ, 4269)
USING ST_Transform(geom, 4269)

Vancouver Buildings

(There are building on the north short of Burrard inlet, but this data is from the City of Vancouver. Jurisdictional boundaries are the bane of geospatial analysis.)

Now we can find flooded buildings.

CREATE TABLE buildings_30_poly AS
  SELECT b.* 
    FROM buildings b
    JOIN poly_30 p
      ON ST_Intersects(p.geom, b.geom)
    WHERE ST_Intersects(p.geom, ST_Centroid(b.geom))

There’s another way to find buildings below 30 meters, without having to build a polygon: just query the raster value underneath each building, like this:

CREATE TABLE buildings_30_rast AS
  SELECT b.*
    FROM buildings b
    JOIN dem092g06e d
      ON ST_Intersects(b.geom, d.rast)
    WHERE ST_Value(d.rast, ST_Centroid(b.geom)) < 30;

Since polygon building can be expensive, joining the raster and vector layers is usually the way we want to carry out this analysis.

Vancouver Buildings in 30m Flood Zone

For bringing continuous data (elevations, temperatures, model results) into GIS analysis, the postgis_raster extension can be quite useful.

August 22, 2019 08:00 AM

August 21, 2019

Paul Ramsey

Waiting for PostGIS 3: ST_AsMVT Performance

Vector tiles are the new hotness, allowing large amounts of dynamic data to be sent for rendering right on web clients and mobile devices, and making very beautiful and highly interactive maps possible.

Since the introduction of ST_AsMVT(), people have been generating their tiles directly in the database more and more, and as a result wanting tile generation to go faster and faster.

Every tile generation query has to carry out the following steps:

  • Gather all the relevant rows for the tile
  • Simplify the data appropriately to match the resolulution of the tile
  • Clip the data to the bounds of the tile
  • Encode the data into the MVT protobuf format

For PostGIS 3.0, performance of tile generation has been vastly improved.

  • First, the clipping process has been sped up and made more reliable by integrating the wagyu clipping algorithm directly into PostGIS. This has sped up clipping of polygons in particular, and reduced instances of invalid output geometries.

  • Second, the simplification and precision reduction steps have been streamlined, to avoid unnecessary copying and work on simple cases like points and short lines. This has sped up handling of simple points and lines.

  • Finally, ST_AsMVT() aggregate itself has been made parallelizeable, so that all the work above can be parcelled out to multiple CPUs, dramatically speeding up generation of tiles with lots of input geometry.

PostGIS vector tile support has gotten so good that even projects with massive tile generation requirements, like the OpenMapTiles project, have standardized their tiling on PostGIS.

August 21, 2019 08:00 AM

August 20, 2019

Paul Ramsey

Waiting for PostGIS 3: Hilbert Geometry Sorting

With the release of PostGIS 3.0, queries that ORDER BY geometry columns will return rows using a Hilbert curve ordering, and do so about twice as fast.

Whuuuut!?!

The history of “ordering by geometry” in PostGIS is mostly pretty bad. Up until version 2.4 (2017), if you did ORDER BY on a geometry column, your rows would be returned using the ordering of the minimum X coordinate value in the geometry.

One of the things users expect of “ordering” is that items that are “close” to each other in the ordered list should also be “close” to each other in the domain of the values. For example, in a set of sales orders ordered by price, rows next to each other have similar prices.

To visualize what geometry ordering looks like, I started with a field of 10,000 random points.

CREATE SEQUENCE points_seq;

CREATE TABLE points AS
SELECT 
  (ST_Dump(
    ST_GeneratePoints(
        ST_MakeEnvelope(-10000,-10000,10000,10000,3857),
        10000)
    )).geom AS geom,
  nextval('points_seq') AS pk;

Ten thousand random points

Now select from the table, and use ST_MakeLine() to join them up in the desired order. Here’s the original ordering, prior to version 2.4.

SELECT ST_MakeLine(geom ORDER BY ST_X(geom)) AS geom
FROM points;

Random Points in XMin Order

Each point in the ordering is close in the X coordinate, but since the Y coordinate can be anything, there’s not much spatial coherence to the ordered set. A better spatial ordering will keep points that are near in space also near in the set.

For 2.4 we got a little clever. Instead of sorting by the minimum X coordinate, we took the center of the geomery bounds, and created a Morton curve out of the X and Y coordinates. Morton keys just involve interleaving the bits of the values on the two axes, which is a relatively cheap operation.

The result is way more spatially coherent.

Random Points in Morton Order

For 3.0, we have replaced the Morton curve with a Hilbert curve. The Hilbert curve is slightly more expensive to calculate, but we offset that by stripping out some wasted cycles in other parts of the old implementation, and the new code is now faster, and also even more spatially coherent.

SELECT ST_MakeLine(geom ORDER BY geom) AS geom
FROM points;

Random Points in Hilbert Order

PostGIS 3.0 will be released some time this fall, hopefully before the final release of PostgreSQL 12.

August 20, 2019 08:00 AM

August 19, 2019

Paul Ramsey

Waiting for PostGIS 3: ST_AsGeoJSON(record)

With PostGIS 3.0, it is now possible to generate GeoJSON features directly without any intermediate code, using the new ST_AsGeoJSON(record) function.

The GeoJSON format is a common transport format, between servers and web clients, and even between components of processing chains. Being able to create useful GeoJSON is important for integrating different parts in a modern geoprocessing application.

PostGIS has had an ST_AsGeoJSON(geometry) for forever, but it does slightly less than most users really need: it takes in a PostGIS geometry, and outputs a GeoJSON “geometry object”.

The GeoJSON geometry object is just the shape of the feature, it doesn’t include any of the other information about the feature that might be included in the table or query. As a result, developers have spent a lot of time writing boiler-plate code to wrap the results of ST_AsGeoJSON(geometry) up with the columns of a result tuple to create GeoJSON “feature objects”.

The ST_AsGeoJSON(record) function looks at the input tuple, and takes the first column of type geometry to convert into a GeoJSON geometry. The rest of the columns are added to the GeoJSON features in the properties member.

SELECT ST_AsGeoJSON(subq.*) AS geojson 
FROM ( 
  SELECT ST_Centroid(geom), type, admin 
  FROM countries 
  WHERE name = 'Canada' 
) AS subq
{"type": "Feature", 
 "geometry": { 
    "type":"Point", 
    "coordinates":[-98.2939042718784,61.3764628013483] 
  }, 
  "properties": { 
    "type": "Sovereign country", 
    "admin": "Canada" 
  } 
}

Using GeoJSON output it’s easy to stream features directly from the database into an OpenLayers or Leaflet web map, or to consume them with ogr2ogr for conversion to other geospatial formats.

August 19, 2019 08:00 AM

August 11, 2019

PostGIS Development

PostGIS 3.0.0alpha4, 2.5.3, 2.4.8, 2.3.10 Released

The PostGIS development team is pleased to provide enhancements/features 3.0.0alpha4 for 3.0 feature major branch bug fix 2.5.3, 2.4.8, and 2.3.10 for the 2.5, 2.4, and 2.3 stable branches.

3.0.0alpha4 This release works with PostgreSQL 9.5-12beta3 and GEOS >= 3.6

Best served with PostgreSQL 12beta3. Designed to take advantage of features in PostgreSQL 12 and Proj 6

2.5.3 This release supports PostgreSQL 9.3-12 You are encouraged to use the PostGIS 3.0 unreleased branch with PostgreSQL 12 , which has features specifically designed to take advantage of features new in PostgreSQL 12.

2.4.8 This release supports PostgreSQL 9.3-10.

2.3.10

This release supports PostgreSQL 9.2-10.

View all closed tickets for 3.0.0, 2.5.3, 2.4.8, 2.3.10.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

by Regina Obe at August 11, 2019 12:00 AM

July 31, 2019

Paul Ramsey

Simple SQL GIS

And, late on a Friday afternoon, the plaintive cry was heard!

And indeed, into the sea they do go!

And ‘lo, the SQL faeries were curious, and gave it a shot!

##### Commandline OSX/Linux #####

# Get the Shape files
# http://www.elections.bc.ca/index.php/voting/electoral-maps-profiles/
wget http://www.elections.bc.ca/docs/map/redis08/GIS/ED_Province.exe

# Exe? No prob, it's actually a self-extracting ZIP
unzip ED_Province

# Get a PostGIS database ready for the data
createdb ed_clip
psql -c "create extension postgis" -d ed_clip

# Load into PostGIS
# The .prj says it is "Canada Albers Equal Area", but they
# lie! It's actually BC Albers, EPSG:3005
shp2pgsql -s 3005 -i -I ED_Province ed | psql -d ed_clip

# We need some ocean! Use Natural Earth...
# http://www.naturalearthdata.com/downloads/
wget http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_ocean.zip
unzip ne_10m_ocean.zip

# Load the Ocean into PostGIS!
shp2pgsql -s 4326 -i -I ne_10m_ocean ocean | psql -d ed_clip

# OK, now we connect to PostGIS and start working in SQL
psql -e ed_clip
-- How big is the Ocean table?
SELECT Count(*) FROM ocean;

-- Oh, only 1 polygon. Well, that makes it easy... 
-- For each electoral district, we want to difference away the ocean.
-- The ocean is a one big polygon, this will take a while (if we
-- were being more subtle, we'd first clip the ocean down to 
-- a reasonable area around BC.)
CREATE TABLE ed_clipped AS
SELECT 
  CASE 
  WHEN ST_Intersects(o.geom, ST_Transform(e.geom,4326))
  THEN ST_Difference(ST_Transform(e.geom,4326), o.geom)
  ELSE ST_Transform(e.geom,4326)
  END AS geom,
  e.edabbr,
  e.edname
FROM ed e, ocean o;

-- Check our geometry types...
SELECT DISTINCT ST_GeometryType(geom) FROM ed_clipped;

-- Oh, they are heterogeneous. Let's force them all multi
UPDATE ed_clipped SET geom = ST_Multi(geom);
# Dump the result out of the database back into shapes
pgsql2shp -f ed2009_ocean ed_clip ed_clipped
zip ed2009_ocean.zip ed2009_ocean.*
mv ed2009_ocean.zip ~/Dropbox/Public/

No more districts in oceans!

And the faeries were happy, and uploaded their polygons!

Update: And the lamentations ended, and the faeries also rejoiced.

July 31, 2019 08:00 AM

July 30, 2019

Paul Ramsey

PostGIS Overlays

One question that comes up often during our PostGIS training is “how do I do an overlay?” The terminology can vary: sometimes they call the operation a “union” sometimes an “intersect”. What they mean is, “can you turn a collection of overlapping polygons into a collection of non-overlapping polygons that retain information about the overlapping polygons that formed them?”

So an overlapping set of three circles becomes a non-overlapping set of 7 polygons.

Calculating the overlapping parts of a pair of shapes is easy, using the ST_Intersection() function in PostGIS, but that only works for pairs, and doesn’t capture the areas that have no overlaps at all.

How can we handle multiple overlaps and get out a polygon set that covers 100% of the area of the input sets? By taking the polygon geometry apart into lines, and then building new polygons back up.

Let’s construct a synthetic example: first, generate a collection of random points, using a Gaussian distribution, so there’s more overlap in the middle. The crazy math in the SQL below just converts the uniform random numbers from the random() function into normally distributed numbers.

CREATE TABLE pts AS
  WITH rands AS (
  SELECT generate_series as id, 
         random() AS u1, 
         random() AS u2 
  FROM generate_series(1,100)
)
SELECT
  id,
  ST_SetSRID(ST_MakePoint(
    50 * sqrt(-2 * ln(u1)) * cos(2*pi()*u2),
    50 * sqrt(-2 * ln(u1)) * sin(2*pi()*u2)), 4326) AS geom
FROM rands;

The result looks like this:

Now, we turn the points into circles, big enough to have overlaps.

CREATE TABLE circles AS
  SELECT id, ST_Buffer(geom, 10) AS geom 
    FROM pts;

Which looks like this:

Now it’s time to take the polygons apart. In this case we’ll take the exterior ring of the circles, using ST_ExteriorRing(). If we were dealing with complex polygons with holes, we’d have to use ST_DumpRings(). Once we have the rings, we want to make sure that everywhere rings cross the lines are broken, so that no lines cross, they only touch at their end points. We do that with the ST_Union() function.

CREATE TABLE boundaries AS
  SELECT ST_Union(ST_ExteriorRing(geom)) AS geom
    FROM circles;

What comes out is just lines, but with end points at every crossing.

Now that we have noded lines, we can collect them into a multi-linestring and feed them to ST_Polygonize() to generate polygons. The polygons come out as one big multi-polygon, so we’ll use ST_Dump() to convert it into a table with one row per polygon.

CREATE SEQUENCE polyseq;
CREATE TABLE polys AS
  SELECT nextval('polyseq') AS id, 
         (ST_Dump(ST_Polygonize(geom))).geom AS geom
  FROM boundaries;

Now we have a set of polygons with no overlaps, only one polygon per area.

So, how do we figure out how many overlaps contributed to each incoming polygon? We can join the centroids of the new small polygons with the set of original circles, and calculate how many circles contain each centroid point.

A spatial join will allow us to calculate the number of overlaps.

ALTER TABLE polys ADD COLUMN count INTEGER DEFAULT 0;
UPDATE POLYS set count = p.count
FROM (
  SELECT count(*) AS count, p.id AS id  
  FROM polys p 
  JOIN circles c 
  ON ST_Contains(c.geom, ST_PointOnSurface(p.geom)) 
  GROUP BY p.id
) AS p
WHERE p.id = polys.id;

That’s it! Now we have a single coverage of the area, where each polygon knows how much overlap contributed to it. Ironically, when visualized using the coverage count as a variable in the color ramp, it looks a lot like the original image, which was created with a simple transparency effect. However, the point here is that we’ve created new data, in the count attribute of the new polygon layer.

The same decompose-and-rebuild-and-join-centroids trick can be used to overlay all kinds of features, and to carry over attributes from the original input data, achieving the classic “GIS overlay” workflow. Happy geometry mashing!

July 30, 2019 08:00 AM

July 01, 2019

PostGIS Development

PostGIS 3.0.0alpha3

The PostGIS development team is pleased to release PostGIS 3.0.0alpha3.

This release works with PostgreSQL 9.5-12beta2 and GEOS >= 3.6

Best served with PostgreSQL 12beta2.

Continue Reading by clicking title hyperlink ..

by Regina Obe at July 01, 2019 12:00 AM

June 28, 2019

Stephen Mather

Tracing golden monkeys through time

My collegue TUYISINGIZE Deogratias (“Deo) and others at Dian Fossey Gorilla Fund International have been studying golden monkeys (Cercopithecus kandti) in Rwanda. Golden monkeys are an endangered monkey along the Albertine Rift (including the Virungas, host to the endangered mountain gorilla). They are also cute as can be, but more on that another time.

Golden monkey (Cercopithecus kandti) head.jpg

Deo has been leading efforts to track the golden monkeys in several locations across their range, observing their habits. Among the data gathered is the location of the groups of the monkeys as they move through their range. One element we want to understand from these data are how much does each group move per day.

The raw data look something like this:

raw_data

So we tweak things a bit to get ids in order of date and time, and also prep the data so that the date and time are proper types in PostgreSQL:

DROP TABLE IF EXISTS goldenmonkeys_sorted;
CREATE TABLE goldenmonkeys_sorted AS
(
    WITH nodate AS (
        SELECT gid, geom, id, lat, lon, alt, dater || ' ' || timer AS dater, month AS monther, season, groupid FROM hr_g_all
    )
    , sorted AS (
	SELECT gid, geom, id, lat, lon, alt, dater::TIMESTAMP WITH TIME ZONE AS datetimer, monther, season, groupid FROM nodate
		ORDER BY groupid, datetimer
    )
    SELECT gid AS id, ROW_NUMBER() OVER( PARTITION BY gid) AS gid, datetimer, date(datetimer) AS dater, monther, season, groupid, geom FROM sorted
        ORDER BY gid
);

Resulting in the following:

massaged_data

gishwatiGolden Monkey Ranging in Gishwati National Park

Ok. Now we want to turn this into traces of the movements of the monkeys everyday. Something like this:

gishwati_trace

But for every trace, for every day for each group.

We will create a function that leverages WITH RECURSIVE. We’ve seen this before. WITH RECURSIVE allows us to take each record in sequence and perform operations with the previous record, in this case calculating travel time, travel distance, and combining the individual points into a single line with ST_MakeLine.

Now to use our function, we need a list of dates and groups so we can calculate this for each day:

Now we have traces not just for one day and group, but all traces and groups:

gishwati_tracesz

gishwati_traces

 

by smathermather at June 28, 2019 10:51 AM

June 07, 2019

Paul Ramsey

Parallel PostGIS and PgSQL 12 (2)

In my last post I demonstrated that PostgreSQL 12 with PostGIS 3 will provide, for the first time, automagical parallelization of many common spatial queries.

This is huge news, as it opens up the possibility of extracting more performance from modern server hardware. Commenters on the post immediately began conjuring images of 32-core machines reducing their query times to miliseconds.

So, the next question is: how much more performance can we expect?

To investigate, I acquired a 16 core machine on AWS (m5d.4xlarge), and installed the current development snapshots of PostgreSQL and PostGIS, the code that will become versions 12 and 3 respectively, when released in the fall.

How Many Workers?

The number of workers assigned to a query is determined by PostgreSQL: the system looks at a given query, and the size of the relations to be processed, and assigns workers proportional to the log of the relation size.

For parallel plans, the “explain” output of PostgreSQL will include a count of the number of workers planned and assigned. That count is exclusive of the leader process, and the leader process actually does work outside of its duties in coordinating the query, so the number of CPUs actually working is more than the num_workers, but slightly less than num_workers+1. For these graphs, we’ll assume the leader fully participates in the work, and that the number of CPUs in play is num_workers+1.

Forcing Workers

PostgreSQL’s automatic calculation of the number of workers could be a blocker to performing analysis of parallel performance, but fortunately there is a workaround.

Tables support a “storage parameter” called parallel_workers. When a relation with parallel_workers set participates in a parallel plan, the value of parallel_workers over-rides the automatically calculated number of workers.

ALTER TABLE pd SET ( parallel_workers = 8);

In order to generate my data, I re-ran my queries, upping the number of parallel_workers on my tables for each run.

Setup

Before running the tests, I set all the global limits on workers high enough to use all the CPUs on my test server.

SET max_worker_processes = 16;
SET max_parallel_workers = 16;
SET max_parallel_workers_per_gather = 16;

I also loaded my data and created indexes as usual. The tables I used for these tests were:

  • pd a table of 69,534 polygons
  • pts_10 a table of 695,340 points

Scan Performance

I tested two kinds of queries: a straight scan query, with only one table in play; and, a spatial join with two tables. I used the usual queries from my annual parallel tests.

EXPLAIN ANALYZE 
  SELECT Sum(ST_Area(geom)) 
    FROM pd;

Scan performance improved well at first, but started to flatten out noticably after 8 cores.

Workers 1 2 4 8 16
Time (ms) 318 167 105 62 47

The default number of CPUs the system wanted to use was 4 (1 leader + 3 workers), which is probably not a bad choice, as the expected gains from addition workers shallows out as the core count grows.

Join Performance

The join query computes the join of 69K polygons against 695K points. The points are actually generated from the polygons, so there are precisely 10 points in each polygon, so the resulting relation would be 690K records long.

EXPLAIN ANALYZE
 SELECT *
  FROM pd 
  JOIN pts_10 pts
  ON ST_Intersects(pd.geom, pts.geom);

For unknown reasons, it was impossible to force out a join plan with only 1 worker (aka 2 CPUs) so that part of our chart/table is empty.

Workers 1 2 4 8 16
Time (ms) 26789 - 9371 5169 4043

The default number of workers is 4 (1 leader + 3 workers) which, again, isn’t bad. The join performance shallows out faster than the scan performance, and above 10 CPUs is basically flat.

Conclusions

  • There is a limit to how much advantage adding workers to a plan will gain you
  • The limit feels intuitively lower than I expected given the CPU-intensity of the workloads
  • The planner does a pretty good, slightly conservative, job of picking a realistic number of workers

June 07, 2019 04:00 PM

June 02, 2019

PostGIS Development

PostGIS 3.0.0alpha2

The PostGIS development team is pleased to release PostGIS 3.0.0alpha2.

This release works with PostgreSQL 9.5-12beta1 and GEOS >= 3.6

Best served with PostgreSQL 12beta1.

Continue Reading by clicking title hyperlink ..

by Regina Obe at June 02, 2019 12:00 AM

May 27, 2019

Paul Ramsey

Parallel PostGIS and PgSQL 12

For the last couple years I have been testing out the ever-improving support for parallel query processing in PostgreSQL, particularly in conjunction with the PostGIS spatial extension. Spatial queries tend to be CPU-bound, so applying parallel processing is frequently a big win for us.

Parallel PostGIS and PgSQL 12

Initially, the results were pretty bad.

  • With PostgreSQL 10, it was possible to force some parallel queries by jimmying with global cost parameters, but nothing would execute in parallel out of the box.
  • With PostgreSQL 11, we got support for parallel aggregates, and those tended to parallelize in PostGIS right out of the box. However, parallel scans still required some manual alterations to PostGIS function costs, and parallel joins were basically impossible to force no matter what knobs you turned.

With PostgreSQL 12 and PostGIS 3, all that has changed. All standard query types now readily parallelize using our default costings. That means parallel execution of:

  • Parallel sequence scans,
  • Parallel aggregates, and
  • Parallel joins!!

TL;DR:

PostgreSQL 12 and PostGIS 3 have finally cracked the parallel spatial query execution problem, and all major queries execute in parallel without extraordinary interventions.

What Changed

With PostgreSQL 11, most parallelization worked, but only at much higher function costs than we could apply to PostGIS functions. With higher PostGIS function costs, other parts of PostGIS stopped working, so we were stuck in a Catch-22: improve costing and break common queries, or leave things working with non-parallel behaviour.

For PostgreSQL 12, the core team (in particular Tom Lane) provided us with a sophisticated new way to add spatial index functionality to our key functions. With that improvement in place, we were able to globally increase our function costs without breaking existing queries. That in turn has signalled the parallel query planning algorithms in PostgreSQL to parallelize spatial queries more aggressively.

Setup

In order to run these tests yourself, you will need:

  • PostgreSQL 12
  • PostGIS 3.0

You’ll also need a multi-core computer to see actual performance changes. I used a 4-core desktop for my tests, so I could expect 4x improvements at best.

The setup instructions show where to download the Canadian polling division data used for the testing:

  • pd a table of ~70K polygons
  • pts a table of ~70K points
  • pts_10 a table of ~700K points
  • pts_100 a table of ~7M points

PDs

We will work with the default configuration parameters and just mess with the max_parallel_workers_per_gather at run-time to turn parallelism on and off for comparison purposes.

When max_parallel_workers_per_gather is set to 0, parallel plans are not an option.

  • max_parallel_workers_per_gather sets the maximum number of workers that can be started by a single Gather or Gather Merge node. Setting this value to 0 disables parallel query execution. Default 2.

Before running tests, make sure you have a handle on what your parameters are set to: I frequently found I accidentally tested with max_parallel_workers set to 1, which will result in two processes working: the leader process (which does real work when it is not coordinating) and one worker.

show max_worker_processes;
show max_parallel_workers;
show max_parallel_workers_per_gather;

Aggregates

Behaviour for aggregate queries is still good, as seen in PostgreSQL 11 last year.

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 4;

EXPLAIN ANALYZE 
  SELECT Sum(ST_Area(geom)) 
    FROM pd;

Boom! We get a 3-worker parallel plan and execution about 3x faster than the sequential plan.

Scans

The simplest spatial parallel scan adds a spatial function to the target list or filter clause.

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 4;

EXPLAIN ANALYZE 
  SELECT ST_Area(geom)
    FROM pd; 

Boom! We get a 3-worker parallel plan and execution about 3x faster than the sequential plan. This query did not work out-of-the-box with PostgreSQL 11.

 Gather  
   (cost=1000.00..27361.20 rows=69534 width=8)
   Workers Planned: 3
   ->  Parallel Seq Scan on pd  
   (cost=0.00..19407.80 rows=22430 width=8)

Joins

Starting with a simple join of all the polygons to the 100 points-per-polygon table, we get:

SET max_parallel_workers_per_gather = 4;

EXPLAIN  
 SELECT *
  FROM pd 
  JOIN pts_100 pts
  ON ST_Intersects(pd.geom, pts.geom);

PDs & Points

Right out of the box, we get a parallel plan! No amount of begging and pleading would get a parallel plan in PostgreSQL 11

 Gather  
   (cost=1000.28..837378459.28 rows=5322553884 width=2579)
   Workers Planned: 4
   ->  Nested Loop  
       (cost=0.28..305122070.88 rows=1330638471 width=2579)
         ->  Parallel Seq Scan on pts_100 pts  
             (cost=0.00..75328.50 rows=1738350 width=40)
         ->  Index Scan using pd_geom_idx on pd  
             (cost=0.28..175.41 rows=7 width=2539)
               Index Cond: (geom && pts.geom)
               Filter: st_intersects(geom, pts.geom)

The only quirk in this plan is that the nested loop join is being driven by the pts_100 table, which has 10 times the number of records as the pd table.

The plan for a query against the pt_10 table also returns a parallel plan, but with pd as the driving table.

EXPLAIN  
 SELECT *
  FROM pd 
  JOIN pts_10 pts
  ON ST_Intersects(pd.geom, pts.geom);

Right out of the box, we still get a parallel plan! No amount of begging and pleading would get a parallel plan in PostgreSQL 11

 Gather  
   (cost=1000.28..85251180.90 rows=459202963 width=2579)
   Workers Planned: 3
   ->  Nested Loop  
       (cost=0.29..39329884.60 rows=148129988 width=2579)
         ->  Parallel Seq Scan on pd  
             (cost=0.00..13800.30 rows=22430 width=2539)
         ->  Index Scan using pts_10_gix on pts_10 pts  
             (cost=0.29..1752.13 rows=70 width=40)
               Index Cond: (geom && pd.geom)
               Filter: st_intersects(pd.geom, geom)

Conclusions

  • With PostgreSQL 12 and PostGIS 3, most spatial queries that can take advantage of parallel processing should do so automatically.
  • !!!!!!!!!!!

May 27, 2019 04:00 PM

May 26, 2019

PostGIS Development

PostGIS 3.0.0alpha1

The PostGIS development team is pleased to release PostGIS 3.0.0alpha1.

This release works with PostgreSQL 9.5-12beta1 and GEOS >= 3.6

Best served with PostgreSQL 12beta1.

Continue Reading by clicking title hyperlink ..

by Regina Obe at May 26, 2019 12:00 AM

May 22, 2019

Anita Graser (Underdark)

Movement data in GIS #23: trajectories in context

Today’s post continues where “Why you should be using PostGIS trajectories” leaves off. It’s the result of a collaboration with Eva Westermeier. I had the pleasure to supervise her internship at AIT last year and also co-supervised her Master’s thesis [0] on the topic of enriching trajectories with information about their geographic context.

Context-aware analysis of movement data is crucial for different domains and applications, from transport to ecology. While there is a wealth of data, efficient and user-friendly contextual trajectory analysis is still hampered by a lack of appropriate conceptual approaches and practical methods. (Westermeier, 2018)

Part of the work was focused on evaluating different approaches to adding context information from vector datasets to trajectories in PostGIS. For example, adding land cover context to animal movement data or adding information on anchoring and harbor areas to vessel movement data.

Classic point-based model vs. line-based model

The obvious approach is to intersect the trajectory points with context data. This is the classic point data model of contextual trajectories. It’s straightforward to add context information in the point-based model but it also generates large numbers of repeating annotations. In contrast, the line data model using, for example, PostGIS trajectories (LinestringM) is more compact since trajectories can be split into segments at context borders. This creates one annotation per segment and the individual segments are convenient to analyze (as described in part #12).

Spatio-temporal interpolation as provided by the line data model offers additional advantages for the analysis of annotated segments. Contextual segments start and end at the intersection of the trajectory linestring with context polygon borders. This means that there are no gaps like in the point-based model. Consequently, while the point-based model systematically underestimates segment length and duration, the line-based approach offers more meaningful segment length and duration measurements.

Schematic illustration of a subset of an annotated trajectory in two context classes, a) systematic underestimation of length or duration in the point data model, b) full length or duration between context polygon borders in the line data model (source: Westermeier (2018))

Another issue of the point data model is that brief context changes may be missed or represented by just one point location. This makes it impossible to compute the length or duration of the respective context segment. (Of course, depending on the application, it can be desirable to ignore brief context changes and make the annotation process robust towards irrelevant changes.)

Schematic illustration of context annotation for brief context changes, a) and b)
two variants for the point data model, c) gapless annotation in the line data model (source: Westermeier (2018) based on Buchin et al. (2014))

Beyond annotations, context can also be considered directly in an analysis, for example, when computing distances between trajectories and contextual point objects. In this case, the point-based approach systematically overestimates the distances.

Schematic illustration of distance measurement from a trajectory to an external
object, a) point data model, b) line data model (source: Westermeier (2018))

The above examples show that there are some good reasons to dump the classic point-based model. However, the line-based model is not without its own issues.

Issues

Computing the context annotations for trajectory segments is tricky. The main issue is that ST_Intersection drops the M values. This effectively destroys our trajectories! There are ways to deal with this issue – and the corresponding SQL queries are published in the thesis (p. 38-40) – but it’s a real bummer. Basically, ST_Intersection only provides geometric output. Therefore, we need to reconstruct the temporal information in order to create usable trajectory segments.

Finally, while the line-based model is well suited to add context from other vector data, it is less useful for context data from continuous rasters but that was beyond the scope of this work.

Conclusion

After the promising results of my initial investigations into PostGIS trajectories, I was optimistic that context annotations would be a straightforward add-on. The line-based approach has multiple advantages when it comes to analyzing contextual segments. Unfortunately, generating these contextual segments is much less convenient and also slower than I had hoped. Originally, I had planned to turn this work into a plugin for the Processing toolbox but the results of this work motivated me to look into other solutions. You’ve already seen some of the outcomes in part #20 “Trajectools v1 released!”.

References

[0] Westermeier, E.M. (2018). Contextual Trajectory Modeling and Analysis. Master Thesis, Interfaculty Department of Geoinformatics, University of Salzburg.


This post is part of a series. Read more about movement data in GIS.

by underdark at May 22, 2019 07:31 PM

May 17, 2019

Paul Ramsey

Keynote @ FOSS4G NA 2019

Last month I was invited to give a keynote talk at FOSS4G North America in San Diego. I have been speaking about open source economics at FOSS4G conferences more-or-less every two years, since 2009, and I look forward to revisting the topic regularly: the topic is every-changing, just like the technology.

In 2009, the central pivot of thought about open source in the economy was professional open source firms in the Red Hat model. Since they we’ve taken a ride through a VC-backed “open core” bubble and are now grappling with an environment where the major cloud platforms are absorbing most of the value of open source while contributing back proportionally quite little.

What will the next two years hold? I dunno! But I have increasingly little faith that a good answer will emerge organically via market forces.

If you liked the video and want to use the materials, the slides are available here under CC BY.

May 17, 2019 04:00 PM

March 27, 2019

Paul Ramsey

GeoJSON Features from PostGIS

Every once in a while, someone comes to me and says:

Sure, it’s handy to use ST_AsGeoJSON to convert a geometry into a JSON equivalent, but all the web clients out there like to receive full GeoJSON Features and I end up writing boilerplate to convert database rows into GeoJSON. Also, the only solution I can find on the web is scary and complex. Why don’t you have a row_to_geojson function?

And the answer (still) is that working with rows is fiddly and I don’t really feel like it.

However! It turns out that, with the tools for JSON manipulation already in PostgreSQL and a little scripting it’s possible to make a passable function to do the work.

Start with a simple table.

DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable (
  pk SERIAL PRIMARY KEY,
  name TEXT,
  size DOUBLE PRECISION,
  geom GEOMETRY
);

INSERT INTO mytable (name, size, geom) VALUES
  ('Peter', 1.0, 'POINT(2 34)'),
  ('Paul', 2.0, 'POINT(5 67)');

You can convert any row into a JSON structure using the to_jsonb() function.

SELECT to_jsonb(mytable.*) FROM mytable;

 {"pk": 1, "geom": "010100000000000000000000400000000000004140", "name": "Peter", "size": 1}
 {"pk": 2, "geom": "010100000000000000000014400000000000C05040", "name": "Paul", "size": 2}

That’s actually all the information we need to create a GeoJSON feature, it just needs to be re-arranged. So let’s make a little utility function to re-arrange it.

CREATE OR REPLACE FUNCTION rowjsonb_to_geojson(
  rowjsonb JSONB, 
  geom_column TEXT DEFAULT 'geom')
RETURNS TEXT AS 
$$
DECLARE 
 json_props jsonb;
 json_geom jsonb;
 json_type jsonb;
BEGIN
 IF NOT rowjsonb ? geom_column THEN
   RAISE EXCEPTION 'geometry column ''%'' is missing', geom_column;
 END IF;
 json_geom := ST_AsGeoJSON((rowjsonb ->> geom_column)::geometry)::jsonb;
 json_geom := jsonb_build_object('geometry', json_geom);
 json_props := jsonb_build_object('properties', rowjsonb - geom_column);
 json_type := jsonb_build_object('type', 'Feature');
 return (json_type || json_geom || json_props)::text;
END; 
$$ 
LANGUAGE 'plpgsql' IMMUTABLE STRICT;

Voila! Now we can turn any relation into a proper GeoJSON “Feature” with just one(ish) function call.

SELECT rowjsonb_to_geojson(to_jsonb(mytable.*)) FROM mytable;                         

 {"type": "Feature", "geometry": {"type": "Point", "coordinates": [2, 34]}, "properties": {"pk": 1, "name": "Peter", "size": 1}}
 {"type": "Feature", "geometry": {"type": "Point", "coordinates": [5, 67]}, "properties": {"pk": 2, "name": "Paul", "size": 2}}

Postscript

You might be wondering why I made my function take in a jsonb input instead of a record, for a perfect row_to_geojson analogue to row_to_json. The answer is, the PL/PgSQL planner caches types, including the materialized types of the record parameter, on the first evaluation, which makes it impossible to use the same function for multiple tables. This is “too bad (tm)” but fortunately it is an easy workaround to just change the input to jsonb using to_json() before calling our function.

March 27, 2019 01:00 PM

March 26, 2019

Paul Ramsey

Notes for FDW in PostgreSQL 12

TL;DR: There are some changes in PostgresSQL 12 that FDW authors might be surprised by! Super technical, not suitable for ordinary humans.

OK, so I decided to update my two favourite extension projects (pgsql-http and pgsql-ogr-fdw) yesterday to support PostgreSQL 12 (which is the version currently under development likely to be released in the fall).

Fixing up pgsql-http was pretty easy, involving just one internal function signature change.

Fixing up pgsql-ogr-fdw involved some time in the debugger wondering what had changed.

Your Slot is Empty

When processing an FDW insert/update/delete, your code is expected to take a TupleTableSlot as input and use the data in that slot to apply the insert/update/delete operation to your backend data store, whatever that may be (OGR in my case). The data lived in the tts_values array, and the null flags in tts_isnull.

In PostgreSQL 12, the slot arrives at your ExecInsert/ExecUpdate/ExecDelete callback function empty! The tts_values array is populated with Datum values of 0, yet the tts_isnull array is full of true values. There’s no data to pass back to the FDW source.

What gives?!?

Andres Freund has been slowly laying the groundwork for pluggable storage in PostgreSQL, and one of the things that work has affected is TupleTableSlot. Now when you get a slot, it might not have been fully populated yet, and that is what is happening in the FDW code.

The short-term fix is just to force the slot to populate by calling slot_getallattrs, and then go on with your usual work. That’s what I did. A more future-proof way would be to use slot_getattr and only retrieve the attributes you need (assuming you don’t just need them all).

Your VarLena might have a Short Header

Varlena types are the variable size types, like text, bytea, and varchar. Varlena types store their length and some extra information in a header. The header is potentially either 4 bytes or 1 byte long. Practically it is almost always a 4 byte header. If you call the standard VARSIZE and VARDATA macros on a varlena, the macros assume a 4 byte header.

The assumption has always held (for me), but not any more!

I found that as of PostgreSQL 12, I’m getting back varchar data with a 1-byte header! Surprise!

The fix is to stop assuming a 4-byte header. If you want the size of the varlena data area, less the header, use VARSIZE_ANY_EXHDR instead of VARSIZE() - VARHDRSZ. If you want a pointer into the data area, use VARDATA_ANY instead of VARDATA. The “any” macros first test the header type, and then apply the appropriate macro.

I have no idea what commit caused short varlena headers to make a comeback, but it was fun figuring out what the heck was going on.

March 26, 2019 01:00 PM

March 25, 2019

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PGConf US 2019 Data Loading Slides up

I gave a talk at PGConf US 2019 on some of the many ways you can load data into PostgreSQL using open source tools. This is similar to the talk I gave last year but with the addition of the pgloader commandline tool and the http PostgreSQL extension.

HTML slides PDF slides

Even though it was a talk Not much about PostGIS, but just tricks for loading data, I managed to get a mouthful of PostGIS in there.

by Leo Hsu and Regina Obe (nospam@example.com) at March 25, 2019 10:00 PM

March 11, 2019

PostGIS Development

PostGIS 2.5.2, 2.4.7, 2.3.9 Released

The PostGIS development team is pleased to provide bug fix 2.5.2, 2.4.7, and 2.3.9 for the 2.5, 2.4, and 2.3 stable branches.

These are the first versions to be able to compile against Proj 6.0.0, You must upgrade to these if you are using Proj 6.

2.5.2 This release supports PostgreSQL 9.3-11 (will compile against PostgreSQL 12, but not pass tests. Use only for pg_upgrade. You are encouraged to use the PostGIS 3.0 unreleased branch with PostgreSQL 12 , which has features specifically designed to take advantage of features new in PostgreSQL 12).

2.4.7 This release supports PostgreSQL 9.3-10.

2.3.9

This release supports PostgreSQL 9.2-10.

View all closed tickets for 2.5.2, 2.4.7, 2.3.9.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

by Regina Obe at March 11, 2019 12:00 AM

February 22, 2019

Paul Ramsey

Proj 6 in PostGIS

Map projection is a core feature of any spatial database, taking coordinates from one coordinate system and converting them to another, and PostGIS has depended on the Proj library for coordinate reprojection support for many years.

Proj 6 in PostGIS

For most of those years, the Proj library has been extremely slow moving. New projection systems might be added from time to time, and some bugs fixed, but in general it was easy to ignore. How slow was development? So slow that the version number migrated into the name, and everyone just called it “Proj4”.

No more.

Starting a couple years ago, new developers started migrating into the project, and the pace of development picked up. Proj 5 in 2018 dramatically improved the plumbing in the difficult area of geodetic transformation, and promised to begin changing the API. Only a year later, here is Proj 6, with yet more huge infrastructural improvements, and the new API.

Some of this new work was funded via the GDALBarn project, so thanks go out to those sponsors who invested in this incredibly foundational library and GDAL maintainer Even Roualt.

For PostGIS that means we have to accomodate ourselves to the new API. Doing so not only makes it easier to track future releases, but gains us access to the fancy new plumbing in Proj.

Proj 6 in PostGIS

For example, Proj 6 provides:

Late-binding coordinate operation capabilities, that takes metadata such as area of use and accuracy into account… This can avoid in a number of situations the past requirement of using WGS84 as a pivot system, which could cause unneeded accuracy loss.

Or, put another way: more accurate results for reprojections that involve datum shifts.

Here’s a simple example, converting from an old NAD27/NGVD29 3D coordinate with height in feet, to a new NAD83/NAVD88 coordinate with height in metres.

SELECT ST_Astext(
         ST_Transform(
           ST_SetSRID(geometry('POINT(-100 40 100)'),7406), 
           5500));

Note that the height in NGVD29 is 100 feet, if converted directly to meters, it would be 30.48 metres. The transformed point is:

POINT Z (-100.0004058 40.000005894 30.748549546)

Hey look! The elevation is slightly higher! That’s because in addition to being run through a horizontal NAD27/NAD83 grid shift, the point has also been run through a vertical shift grid as well. The result is a more correct interpretation of the old height measurement in the new vertical system.

Astute PostGIS users will have long noted that PostGIS contains three sources of truth for coordinate references systems (CRS).

Within the spatial_ref_sys table there are columns:

  • The authname, authsrid that can be used, if you have an authority database, to lookup an authsrid and get a CRS. Well, Proj 6 now ships with such a database. So there’s one source of truth.
  • The srtext, a string representation of a CRS, in a standard ISO format. That’s two sources.
  • The proj4text, the old Proj string for the CRS. Until Proj 6, this was the only form of definition that the Proj library could consume, and hence the only source of truth that mattered to PostGIS. Now, it’s a third source of truth.

Knowing this, when you ask PostGIS to transform to an SRID, what will it do?

  • If there are non-NULL values in authname and authsrid ask Proj to return a CRS based on those entries.
  • If Proj fails, and there is a non-NULL srtext ask Proj to build a CRS using that text.
  • If Proj still fails, and there is a non-NULL proj4text ask Proj to build a CRS using that text.

In general, the best transforms will come by having Proj look-up the CRS in its own database, because then it can apply all the power of “late binding” to ensure the best transformation for each geometry. Hence we bias in favour of Proj lookups, then the quite detailed WKT format, and finally the old Proj format.

February 22, 2019 01:00 PM

February 11, 2019

Postgres OnLine Journal (Leo Hsu, Regina Obe)

Compiling http extension on ubuntu 18.04

We recently installed PostgreSQL 11 on an Ubuntu 18.04 using apt.postgresql.org. Many of our favorite extensions were already available via apt (postgis, ogr_fdw to name a few), but it didn't have the http extension we use a lot. The http extension is pretty handy for querying things like Salesforce and other web api based systems. We'll outline the basic compile and install steps. While it's specific to the http extension, the process is similar for any other extension you may need to compile.

Continue reading "Compiling http extension on ubuntu 18.04"

by Leo Hsu and Regina Obe (nospam@example.com) at February 11, 2019 08:31 AM

February 04, 2019

Paul Ramsey

Dr. JTS comes to Crunchy

Today’s an exciting day in the Victoria office of Crunchy Data – our local staff count goes from one to two, as Martin Davis joins the company!

This is kind of a big deal, because this year Martin and I will be spending much or our time on the core computational geometry library that powers PostGIS, the GEOS library, and the JTS library from which it derives its structure.

Why is that a big deal? Because GEOS, JTS and other language ports provide the computational geometry algorithms underneath most of the open source geospatial ecosystem – so improvements in our core libraries ripple out to help a huge swathe of other software.

JTS came first, initially as a project of the British Columbia government. GEOS is a C++ port of JTS. There are also Javascript and .Net ports (JSTS and NTS).

Each of those libraries has developed a rich downline of other libraries and projects that depend on them. On the desktop, on the web, in the middleware, JTS and GEOS power all of it.

So we know that work on JTS and GEOS on our side is going to benefit far more than just PostGIS.

I’ve already spent a decent amount of time on bringing the GEOS library up to date with the changes in JTS over the past few months, and trying to fulfill the “maintainer” role, merging pull requests and closing some outstanding tickets.

As Martin starts adding to JTS, I now feel more confident in my ability to bring those changes into the C++ world of GEOS as they land.

Without pre-judging what will get first priority, topics of overlay robustness, predicate performance, and geometry cleaning are near the top of our list.

Our spatial customers at Crunchy process a lot of geometry, so ensuring that PostGIS (GEOS) operations are robust and high performance is a big win for PostgreSQL and for our customers as well.

February 04, 2019 01:00 PM

January 01, 2019

Boston GIS (Regina Obe, Leo Hsu)

Using pg_upgrade to upgrade PostgreSQL 9.3 PostGIS 2.1 to PostgreSQL 11 2.5 on Yum

In a previous article Using pg upgrade to upgrade PostGIS without installing older version I demonstrated a trick for upgrading to a newer PostgreSQL instance from PostGIS 2.2 - 2.whatever without having to install the older version of PostGIS in your new PostgreSQL service. Unfortunately that trick does not work if coming from PostGIS 2.1 because in PostGIS 2.2 we renamed a c lib function that backed sql functions in 2.1.

Fear not. There is still a way to upgrade from 2.1 to 2.5 without installing an older version of PostGIS in your new PostgreSQL instance. To do so, you need to add a step and that is to remove the functions in 2.1 that are backed by this renamed lib function. In upcoming PostGIS 3.0, we've added this function back and have it throw an error so that even coming from PostGIS 2.1, you can upgrade just the same as you do from later versions.

Continue reading "Using pg_upgrade to upgrade PostgreSQL 9.3 PostGIS 2.1 to PostgreSQL 11 2.5 on Yum"

by Regina Obe (nospam@example.com) at January 01, 2019 06:48 AM

December 19, 2018

Michal Zimmermann

CentOS PostGIS Upgrade Hell… Yet Again

PostGIS upgrades used to be a nightmare. Broken dependencies, version mismatches, you name it. Upgrading PostgreSQL 10 with PostGIS 2.4 to PostgreSQL 11 on CentOS has been my mission impossible for two days. And it doesn’t seem to come to an end.

What? Why?

We’re running fairly large spatially enabled PostgreSQL 10 database cluster. To keep up with pretty fast development, I was hoping to pg_upgrade it to PostgreSQL 11.

Tried and failed

I’ve been trying different upgrade strategies with PostgreSQL 11 already running to no avail. Here comes the list.

Install PostGIS 2.4 to PostgreSQL 11 and pg_upgrade

yum install postgis24_11
systemctl stop postgresql-11

su postgres
/usr/pgsql-11/bin/pg_upgrade \
  --check \
  -b /usr/pgsql-10/bin/ -B /usr/pgsql-11/bin/ \
  -d /var/lib/pgsql/10/data -D /var/lib/pgsql/11/data \
  --link \
  -U root \
  -o ' -c config_file=/var/lib/pgsql/10/data/postgresql.conf' -O ' -c config_file=/var/lib/pgsql/11/data/postgresql.conf'

This results in:

Your installation references loadable libraries that are missing from the new installation. You can add these libraries to the new installation, or remove the functions using them from the old installation. A list of problem libraries is in the file: loadable_libraries.txt

loadable_libraries.txt says the following:

could not load library "$libdir/postgis-2.4": ERROR:  could not load library "/usr/pgsql-11/lib/postgis-2.4.so": /usr/pgsql-11/lib/postgis-2.4.so: undefined symbol: geod_polygon_init

Duckduckgoing I found the related PostgreSQL mailing list thread.

Build and install PostGIS 2.4 from source to PostgreSQL 11 and pg_upgrade

The bug report says there’s something wrong with proj4 version, so I chose proj49 and geos37.

yum install proj49 proj49-devel
wget https://download.osgeo.org/postgis/source/postgis-2.4.6.tar.gz
tar -xzvf postgis-2.4.6.tar.gz
cd postgis-2.4.6

./configure \
  --with-pgconfig=/usr/pgsql-11/bin/pg_config \
  --with-geosconfig=/usr/geos37/bin/geos-config \
  --with-projdir=/usr/proj49/

make &amp;&amp; make install

CREATE EXTENSION postgis fails with could not load library "/usr/pgsql-11/lib/postgis-2.4.so": /usr/pgsql-11/lib/postgis-2.4.so: undefined symbol: geod_polygon_init. Oh my.

Install PostGIS 2.5 to PostgreSQL 10 and pg_upgrade

Running out of ideas, I tried to install PostGIS 2.5 to our PostgreSQL 10 cluster and pg_upgrade.

yum install postgis25_10

The resulting error appeared almost instantly:

Transaction check error:
file /usr/pgsql-10/bin/shp2pgsql-gui from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/lib/liblwgeom.so from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/lib/postgis-2.4.so from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer.sql from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer_data_us.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer_data_us.sql from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_sfcgal.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_tiger_geocoder.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_topology.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64

What the…

Build and install PostGIS 2.5 from source to PostgreSQL 10 and pg_upgrade

wget https://download.osgeo.org/postgis/source/postgis-2.5.1.tar.gz
tar -xzvf postgis-2.5.1.tar.gz
cd postgis-2.5.1

./configure \
  --with-pgconfig=/usr/pgsql-10/bin/pg_config \
  --with-geosconfig=/usr/geos37/bin/geos-config

make &amp;&amp; make install

CREATE EXTENSION postgis fails with ERROR: could not load library "/usr/pgsql-10/lib/postgis-2.5.so": /usr/pgsql-10/lib/postgis-2.5.so: undefined symbol: GEOSFrechetDistanceDensify. Again? Really?

GEOSFrechetDistanceDensify was added in GEOS 3.7 (linked in ./configure), yet ldd /usr/pgsql-10/lib/postgis-2.5.so says:

linux-vdso.so.1 =&gt;  (0x00007ffd4c5fa000)
libgeos_c.so.1 =&gt; /usr/geos36/lib64/libgeos_c.so.1 (0x00007f68ddf5a000)
libproj.so.0 =&gt; /lib64/libproj.so.0 (0x00007f68ddd07000)
libjson-c.so.2 =&gt; /lib64/libjson-c.so.2 (0x00007f68ddafc000)
libxml2.so.2 =&gt; /lib64/libxml2.so.2 (0x00007f68dd792000)
libm.so.6 =&gt; /lib64/libm.so.6 (0x00007f68dd48f000)
libSFCGAL.so.1 =&gt; /lib64/libSFCGAL.so.1 (0x00007f68dc9c0000)
libc.so.6 =&gt; /lib64/libc.so.6 (0x00007f68dc5f3000)
libgeos-3.6.3.so =&gt; /usr/geos36/lib64/libgeos-3.6.3.so (0x00007f68dc244000)
libstdc++.so.6 =&gt; /lib64/libstdc++.so.6 (0x00007f68dbf3d000)
libgcc_s.so.1 =&gt; /lib64/libgcc_s.so.1 (0x00007f68dbd27000)
libdl.so.2 =&gt; /lib64/libdl.so.2 (0x00007f68dbb22000)
libz.so.1 =&gt; /lib64/libz.so.1 (0x00007f68db90c000)
liblzma.so.5 =&gt; /lib64/liblzma.so.5 (0x00007f68db6e6000)
/lib64/ld-linux-x86-64.so.2 (0x000055960f119000)
libCGAL.so.11 =&gt; /usr/lib64/libCGAL.so.11 (0x00007f68db4bd000)
libCGAL_Core.so.11 =&gt; /usr/lib64/libCGAL_Core.so.11 (0x00007f68db284000)
libmpfr.so.4 =&gt; /usr/lib64/libmpfr.so.4 (0x00007f68db029000)
libgmp.so.10 =&gt; /usr/lib64/libgmp.so.10 (0x00007f68dadb0000)
libboost_date_time-mt.so.1.53.0 =&gt; /usr/lib64/libboost_date_time-mt.so.1.53.0 (0x00007f68dab9f000)
libboost_thread-mt.so.1.53.0 =&gt; /usr/lib64/libboost_thread-mt.so.1.53.0 (0x00007f68da988000)
libboost_system-mt.so.1.53.0 =&gt; /usr/lib64/libboost_system-mt.so.1.53.0 (0x00007f68da783000)
libboost_serialization-mt.so.1.53.0 =&gt; /usr/lib64/libboost_serialization-mt.so.1.53.0 (0x00007f68da517000)
libpthread.so.0 =&gt; /lib64/libpthread.so.0 (0x00007f68da2fa000)
librt.so.1 =&gt; /usr/lib64/librt.so.1 (0x00007f68da0f2000)

I’m nearly desperate after spending two days trying to break through. I have ~ 300 GB of PostgreSQL data to migrate to the current version and there seems to be no possible way to do it in CentOS.

One more thing to note: using yum install postgis25_11 and CREATE EXTENSION postgis in v11 database fails with the exact same error like the one above. I really enjoy working with PostgreSQL and PostGIS, yet there’s hardly something I fear more than trying to upgrade those two things together.

by Michal Zimmermann at December 19, 2018 09:00 AM

December 10, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PostGIS 2.5.1 Bundle for Windows

PostGIS 2.5.1 was released on November 18th 2018 and I finished off packaging the PostGIS 2.5.1 windows builds and installers targeted for PostgreSQL EDB distribution this weekend and pushing them up to stackbuilder. This covers PostgreSQL 9.4-11 64-bit and PostgreSQL 95-10 (32bit).

Note that PostGIS 2.5 series will be the last of the PostGIS 2s. Goodbye PostGIS 2.* and start playing with the in-development version of PostGIS 3. Snapshot binaries for PostGIS 3.0 windows development are also available on the PostGIS windows download page. These should work for both BigSQL and EDB distributions.

Continue reading "PostGIS 2.5.1 Bundle for Windows"

by Leo Hsu and Regina Obe (nospam@example.com) at December 10, 2018 12:18 AM

November 26, 2018

Michal Zimmermann

Implementing Linked List with PostgreSQL Recursive CTE

I’ve been working on a book/storytelling pet project recently. Dealing with book events and keeping them in order was a task that was to be tackled sooner or later. While both frontend and backend of the app could deal with linked and ordered data, database might be just about the best place to do so.

What you might need a linked list for

You have a set of chronological events. The set is not complete at the beginning and position of events might be changed (e.g. their neighbouring events might change in time).

Implementation

Linked list is a perfect structure for such a case (see Wikipedia). You can keep your data in tact using just id and previous/next id.

CREATE TABLE public.events (
  id integer generated always as identity primary key,
  previous_id integer
);

COPY public.events (id, previous_id) FROM stdin;
7   \N
10  5
5   1
1   3
3   8
8   9
9   2
2   6
6   4
4   7
\.

Generating the list of events in the right order is the matter of running one recursive CTE query.

WITH RECURSIVE evt(id) AS (
SELECT
    id,
    previous_id
FROM events
WHERE previous_id IS NULL
UNION
SELECT
    e.id,
    e.previous_id
FROM events e
JOIN evt ON (e.previous_id = evt.id)
)
SELECT * FROM evt;

It gathers the first event (the one having the previous pointer set to NULL) and iteratively adds the following ones. Note that this version is actually the reverse implementation of the linked list, pointing to the previous instead of the next event. All it would take to change that, would be finding the event id not present in previous_id column as the first one instead of WHERE previous_id IS NULL.

With the data coming properly sorted to the client, all it has to do is rendering the list.

by Michal Zimmermann at November 26, 2018 08:00 PM

November 24, 2018

PostGIS Development

PostGIS 2.3.8, 2.4.6

The PostGIS development team is pleased to provide bug fix 2.3.8 and 2.4.6 for the 2.3 and 2.4 stable branches.

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 24, 2018 12:00 AM

November 22, 2018

PostGIS Development

PostGIS 2.2.8 EOL

The PostGIS development team is pleased to provide bug fix 2.2.8 for the 2.2 stable branch.

This is the End-Of-Life and final release for PostGIS 2.2 series.

We encourage you to upgrade to a newer minor PostGIS version. Refer to our Version compatibility and EOL Policy for details on versions you can upgrade to.

This release supports PostgreSQL 9.1-9.6.

2.2.8

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 22, 2018 12:00 AM

November 18, 2018

PostGIS Development

PostGIS 2.5.1

The PostGIS development team is pleased to provide bug fix 2.5.1 for the 2.5 stable branch.

Although this release will work for PostgreSQL 9.4 thru PostgreSQL 11, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11 and GEOS 3.7.0.

Best served with PostgreSQL 11.1 and pgRouting 2.6.1.

WARNING: If compiling with PostgreSQL+JIT, LLVM >= 6 is required Supported PostgreSQL versions for this release are: PostgreSQL 9.4 - PostgreSQL 11 GEOS >= 3.5

2.5.1

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 18, 2018 12:00 AM

November 17, 2018

Simon Greener

Spatial Representation of a Communications Pit

This article shows how to represent a communications pit using logical referencing rather than hard spatial objects in PostGIS. The solution was originally created on Oracle.

by Simon Greener at November 17, 2018 12:05 AM

November 03, 2018

PostGIS Development

Howard Hughes Medical Institute

As a software engineer at the Howard Hughes Medical Institute, I work on a collaborative neuron reconstruction and analysis software called CATMAID 1 (screenshot: 3), which is used for neuroscience research. We use PostGIS to represent neurons in a 3D space.

Continue Reading by clicking title hyperlink ..

by Tom Kazimiers at November 03, 2018 12:00 AM

November 02, 2018

PostGIS Development

Vanguard Appraisals

Vanguard Appraisals is new to the GIS world. In fact, we aren’t really in the GIS world; we just kind of brush up against it. We do mass property appraisal for entire county and city jurisdictions, and we develop software to collect, price and maintain values. We also host assessment data online so that homeowners can search and find property information much simpler from the comfort of their own home. Our software and websites are used in 7 states (IA, IL, MN, MO, NE, ND, SD).

Continue Reading by clicking title hyperlink ..

by Andy Colson at November 02, 2018 12:00 AM

October 25, 2018

Safe Software

A Beginner’s Guide to… Bulk Database Updates with FME

This article is a simple guide to bulk database updates with FME.

Sometimes my blog posts are like a courtroom novel. The recent article on precision is an example of this. I reach a successful verdict, but getting there involves an opening statement, a journey through obscure aspects of theory, cross-examining our functionality, and making a final argument for various design decisions. Only after all that can I prove beyond doubt that our work is fit for purpose.

That sort of article is fun to write and it does illustrate Safe’s thought processes in designing FME functionality. But sometimes – like reading a courtroom novel – you want to skip to the final chapter. You just want the juicy part where our hero attorney has already got the result, and simply illuminates the solution in logical steps.

Well, today, I am that hero attorney. I am the Perry Mason of the FME world, and this article is the final chapter of “Updating Databases with FME”. No theory or cross examination involved. Step by step I’ll cast light on how FME made it easy to push mass updates to a database.

There is just one design issue I’ll touch on, and it is important. In law there are multiple jurisdictions, and a technique Perry Mason tries in a California court may not work for Atticus Finch in Alabama. Similarly, in our world of data there are multiple databases. But! FME techniques you use for Oracle will work for SQL Server, and FME techniques you use for Postgres will work for an Esri Geodatabase.

How can this happen? It’s because we’ve worked really hard to harmonize (or standardize) the interfaces inside FME. So whatever database applies in your jurisdiction – or even if your work involves several database types – this post covers you.

So let’s get on with it…

Setting up an FME Database Connection

When you want to read to or write from a database – including database updates – you need authorization. FME defines authorization parameters using a connection tool. It’s accessed through Tools > FME Options in FME Workbench. So start Workbench, select Tools > FME Options and you will get this dialog:

When I click on Database Connections, I get a list of available connections. If I want to create a new one I press the plus button and get this dialog:

Just enter your database details in there, test and then save. Now you have a connection defined, you can use it wherever you like in FME. So first let’s use it to insert some data into a database…

Inserting Data to a Database with FME

To carry out database updates, first you need data in the database! Let’s say you want to read a dataset – maybe an Excel spreadsheet – and write it to a database, creating a table at the same time. That’s simply done in FME by generating a new workspace and choosing a database format. The generate option exists on the start page of FME Workbench, or you can use the shortcut Ctrl+G. That opens the basic dialog for defining a translation:

Here I have filled in the fields to define a translation of data about parks in the city of Vancouver. The translation is from MapInfo TAB (a spatial data format) to PostGIS (a standard PostgreSQL database with a spatial extension). The spatial part is not necessary – the same setup works for data without a spatial component – but maps are cool so I’ll go with it.

The database connection I selected is the one I defined above, saving me the effort of re-entering my authentication details.

Click OK and that dialog creates a workspace that looks like this:

Each object on the left is a table, layer or class in your source data. Each object on the right is a table in your database.

Setting Database Parameters

When using databases the key settings for each table are accessed by clicking the cogwheel icon on those objects, like this:

Database Insert Parameters

The table name is the first parameter, and so I can rename the table to be something different; and I can choose which schema (Table Qualifier) to write to as well.

But the most important parameter (Feature Operation) tells us that we are INSERTing data, and I can also choose in what way the table is created:

So I can choose to create the table regardless of if it exists (Drop and Create), create it if it doesn’t already exist (Create if Needed), just add to the existing table (Use Existing), or empty it if it already exists (Truncate Existing).

Which I use depends on the scenario I am working through, but in this case – to create and fill a table – I’ll use Create if Needed. The advantage over Drop and Create is that if another user already has a table with that name (and I haven’t checked) then at least I won’t delete their content first.

Anyway, I run the workspace and FME loads the data:

Of course, at some point in the future I might find the source of the data (Parks.tab) has changed, and I need to update my data based on that changed dataset…

Updating Records in a Database with FME

Say I receive a dataset named ParksUpdates.tab. The simplest way to update my database is to do the same process as above, but to use Drop and Create as the table operation. That way I am just replacing everything:

But that relies on ParksUpdates.tab being the FULL replacement dataset. What if it only includes the records that need an update? Well in that scenario I simply change the operation to UPDATE (instead of INSERT) and choose to Use Existing table:

Database Updates Parameters

Notice that when I pick UPDATE, then another parameter becomes available to me: Match Columns. I need this to define which feature updates which record. In this case I have an attribute called parkid in my source data and a field (column) named parkid in my database table; so that’s the attribute I select:

So if an incoming feature has the attribute parkid=13, then its contents are used to update the database record where parkid=13.

That’s simple enough, but to add a little complexity (not too much) there is also a WHERE clause I can use instead. This lets me define the match where the attribute and field names are not the same (for example ParkNumber and parkid) but also allows me to add extra conditions using field names:

 

Here, for example, I’m updating records where ParkNumber = parkid, but also only where the neighborhoodname field is “Downtown”. So records outside of the Downtown area aren’t updated, even if the park ID matches. I could do a similar test for a status field (active, inactive) among many other examples.

So we do updates here, and that’s simple enough; but sometimes we also want to delete records…

Deleting Records from a Database with FME

Let’s assume my UpdatedParks dataset is a list of records to delete from the database table, not add. To do that I simply change the operation from UPDATE to DELETE:

Database Delete Parameters

I get the same Match Column parameter (or WHERE clause) to define which incoming features should delete which existing records, and this is again easy to define.

So deletes are no more complex than updates; the key question is what happens when I want to both delete and update records simultaneously?

Updating AND Deleting Database Records with FME

Let’s say some of the incoming records are updates, while others are deletions:

Obviously I can’t set the operation parameter to both DELETE and UPDATE for the entire table. What I do instead is tag each feature with the operation it will carry out. I do this using an attribute called fme_db_operation:

Database Updates AND Deletes

You can see here that I have added an attribute to each stream of data, using an AttributeCreator transformer. The attribute name is fme_db_operation. For one lot of data I set the value to UPDATE. The other set of data has a value of DELETE. This is how I tag each feature with its own operation.

I still have to set the operation type on the table itself. But this time, rather than choosing Insert, Update, or Delete, I choose the option labelled fme_db_operation:

Now when I run the workspace, the features tagged UPDATE update database records, while the features tagged DELETE delete database records. The Match Column (or WHERE clause) provides a match between features and records.

The one assumption is that we already know which feature are deletes and which are updates. In the above example, the source data is already divided into two. If we aren’t sure of that then we might need to do what is called Change Detection

Change Detection and Database Updates with FME

Change Detection is where we have a new dataset and want to compare it to existing records to find what has changed. Here is such a workspace:

It looks quite simple, and in fact it is. I have added a reader (Readers > Add Reader) to read the existing contents of my database table, and an UpdateDetector transformer to compare these records to the UpdatedParks dataset, to identify where changes have occurred. I can detect changes either on field values, or spatial contents, or both.

Then it’s just a case of writing the results back to the database table. I don’t even need to create the fme_db_operation attribute; the UpdateDetector has done that for me. I must just check that the table is set with the correct operation (fme_db_operation) and that Match Column is set.

At this point you probably know more than enough to carry out database updates; but there is one more scenario I can perhaps mention. What if each feature has a different match column? In that case you can write your match in the form of a where clause, and store it as an attribute. Then use that attribute for the match in the table parameters:

…just like that!

NOTE: In FME2019 the ChangeDetector has undergone various improvements and should be your go-to transformer instead of the UpdateDetector.

Wrap Up

I hope that was a good explanation for database updates, short on theory and long on practical examples. Speaking of which, we have an online tutorial on database updates, where you can carry out exercises, step by step, using similar examples to what I showed today. So if you want to try some of these techniques in a safe practice environment, click the above link to visit the tutorial.

As I mentioned, although I used Postgres here, most of our database formats use an identical interface because we’ve invested a lot of time standardizing them all. That work is still ongoing, so one of the tutorial articles includes a list of standardized formats, and what to do if your format is not yet updated.

And now, if you’ll excuse me, I’m going to do like Perry Mason and wrap up my case by going out to celebrate with steak and cocktails!

PS: Safe Software now has an Instagram page. I’m not sure what an instagram is, but here’s a picture of me presenting at a prior user conference (oh, and riding an inflatable horse):

https://www.instagram.com/safesoftware/

by Mark Ireland at October 25, 2018 10:03 AM

October 01, 2018

Paul Ramsey

PostGIS Code Sprint 2018 #2

An important topic of conversation this sprint was what kinds of core PostgreSQL features might make PostGIS better in the future?

PostGIS Code Sprint 2018 #2

Wider Typmod

The current attribue typmod column is a 32-bit integer. For geometry, we are already packing that 32 bits to the limit: a couple bits for dimensionality, some more for the type number, and the bit kahune, a bunch for the SRID number. Having even a 64-bit typmod number would allow even more interesting things, like declared coordinate precision, to fit in there. Maybe we are abusing typmod and there’s a better way to do what we want though?

Parallel GIST Scan

The PostGIS spatial index is built using the PostgreSQL GIST index infrastructure, so anything that makes GIST scans faster is a win for us. This would be a big win for folks with large tables (and thus deep trees) and who run scans that return a lot of indexed tuples.

Faster GIST Index Building

B-Tree index builds are accellerated by pre-sorting the inputs; could the same trick be used in building GIST indexes? Again, for large tables, GIST index building is slower than B-Tree and “faster” is the #1 feature all existing users want.

Multi-Threading in Functions

This isn’t a feature request, so much as a request for clarification and assurance: PostGIS calls out to other libraries, like GEOS, and it’s possible we could make some of our algorithms there faster via parallel processing. If we do our parallel processing within a function call, so the PostgreSQL thread of execution isn’t doing anything until we return, is it OK for us to do threading? We use pthreads, or maybe OpenMP.

Compressed Datum Slicing

“Detoast datum slice” doesn’t actually get around to the slicing step until after the datum is decompressed, which can make some queries quite slow. We already try to read boxes from the headers of our objects, and for large objects that means decompressing the whole thing: it would be nice to only decompress the first few bytes. I have an ugly patch I will be testing to try and get committed.

Forced Inlining

A problem we have with PostgreSQL right now is that we cannot effectively cost our functions due to the current inlining behaviour on our wrapper functions like ST_Intersects(). When raised on the list, the hackers came to a tentative conclusion that improving the value caching behaviour would be a good way to avoid having inlining in incorrect places. It sounded like subtle and difficult work, and nobody jumped to it.

We propose leaving the current inlining logic unchanged, but adding an option to SQL language functions to force inlining for those functions. That way there is no ambiguity: we always want our wrapper functions inlined; always, always, always. If we could use an INLINE option on our wrapper function definitions all the current careful logic can remain in place for more dynamic use cases.

Extension Version Dependency

PostGIS is growing a small forest of dependent extensions

  • some inside the project, like postgis_topology and now postgis_raster
  • some outside the project, like PgRouting and pgpointcloud

When a new version of PostGIS is installed, we want all the PostGIS extensions to be updated. When a third party extension is installed, it may require features from a particular recent version of PostGIS.

The extension framework supports dependency, but for safety, as the ecosystem grows, version dependencies are going to be required eventually.

Global Upgrade Paths

Right now extension upgrade paths have to explicitly state the start and end version of the path. So an upgrade file might be named postgis--2.3.4--2.3.5.sql. That’s great if you have four or five versions. We have way more than that. The number of upgrade files we have keeps on growing and growing.

Unlike upgrade files for smaller projects, we drop and recreate all the functions in our upgrade files. That means that actually our current version upgrade file is capable of upgrading any prior version. Nonetheless, we have to make a copy, or a symlink, many many version combinations.

If there was a global “version”, we could use our master upgrade script, and only ship one script for each new version: postgis--ANY--2.3.5.sql

Size Based Costing in the Planner

Right now costing in the planner is based heavily on the “number of rows” a given execution path might generate at each stage. This is fine when the cost of processing each tuple is fairly uniform.

For us, the cost of processing a tuple can vary wildly: calculating the area of a 4 point polygon is far cheaper than calculating the area of a 40000 point polygon. Pulling a large feature out of TOAST tuples is more expensive than pulling it from main storage.

Having function COST taken into more consideration in plans, and having that COST scale with the average size of tuples would make for better plans for PostGIS. It would also make for better plans for PostgreSQL types that can get very large, like text and tsvector.

The analysis hooks might have to be enriched to also ask for stats on average tuple size for a query key, in addition to selectivity, so a query that pulled a moderate number of huge objects might have a higher cost than one that pulled quite a few small objects.

We Are Not Unreasonable People

We just want our due, you know?

Thanks, PostgreSQL team!

October 01, 2018 08:00 AM

September 30, 2018

Paul Ramsey

PostGIS Code Sprint 2018 #1

When I tell people I am heading to an open source “code sprint”, which I try to do at least once a year, they ask me “what do you do there?”

When I tell them, “talk, mostly”, they are usually disappointed. There’s a picture, which is not unearned, of programmers bent over their laptops, quietly tapping away. And that happens, but the real value, even when there is lots of tapping, is in the high-bandwidth, face-to-face communication.

PostGIS Code Sprint 2018 #1

So, inevitably I will be asked what I coded, this week at the PostGIS Code Sprint and I will answer… “uhhhhh”. I did a branch of PostgreSQL that will do partial decompression of compressed tuples, but didn’t get around to testing it. I tested some work that others had done. But mostly, we talked.

PostGIS 3

Why move to PostGIS 3 for the next release? Not necessarily because we will have any earth-shattering features, but to carry out a number of larger changes. Unlike most PostgreSQL extensions, PostGIS has a lot of legacy from past releases and has added, removed and renamed functions over time. These things are disruptive, and we’d like to do some of the known disruptive things at one time.

Split Vector and Raster

When we brought raster into PostGIS, we included it in the “postgis” extension, so if you CREATE EXTENSION postgis you get both vector and raster features. The rationale was that if we left it optional, packagers wouldn’t build it, and thus most people wouldn’t have access to the functionality, so it wouldn’t get used, so we’d be maintaining unused garbage code.

Even being included in the extension, by and large people haven’t used it much, and the demand from packagers and other users to have a “thin” PostGIS with only vector functionality have finally prevailed: when you ALTER EXTENSION postgis UPDATE TO '3.0.0' the raster functions will be unbundled from the extension. They can then be re-bundled into a “postgis_raster” dependent package and either dropped or kept around depending on user preference.

Remove postgis.so Minor Version

For users in production, working with packaged PostgreSQL, in deb or rpm packages, the packaging system often forces you to have only one version of PostGIS installed at a time. When upgrading PostgreSQL and PostGIS the net effect is to break pg_upgrade, meaning PostGIS users are mandated to do a full dump/restore.

Removing the minor version will allow the pg_upgrade process to run through, and users can then run the sql ALTER EXTENSION postgis UPDATE command to synchronize their SQL functions with the new binary postgis.so library.

This is good for most users. It’s bad for users who expect to be able to run multiple versions of PostGIS on one server: they won’t easily be able to. There will be a switch to make it possible to build with minor versions again, but we expect it will mostly be used by us (developers) for testing and development.

Serialization

As I discussed recently, the compression of geometries in PostgreSQL can have a very large effect on performance.

A new serialization could:

  • use a more effective compression format for our kind of data, arrays of autocorrelated doubles
  • add space for more flag bits for things like
    • encoding a smaller point format
    • flagging empty geometries
    • noting the validity of the object
    • noting the presense of a unique hash code
    • extra version bits
    • optional on-disk internal indexes or summary shapes

It’s not clear that a new serialization is a great idea. The only really pressing problem is that we are starting to use up our flag space.

Validity Flag

Testing geometry validity is computationally expensive, so for workflows that require validity a lot of time is spent checking and rechecking things that have already been confirmed to be valid. Having a flag on the object would allow the state to be marked once, the first time the check is done.

The downside of a validity flag is that every operation that alters coordinates must then carefully make sure to turn the flag off again, as the object may have been rendered invalid by the processing.

Exception Policy

A common annoyance for advanced users of PostGIS is when a long running computation stops suddenly and the database announces “TopologyException”!

It would be nice to provide some ways for users to indicate to the database that they are OK losing some data or changing some data, if that will complete the processing for 99% of the data.

We discussed adding some standard parameters to common processing functions:

  • null_on_exception (true = return null, false = exception)
  • repair_on_exception (true = makevalid() the inputs, false = do the null_on_exception action)

Modern C

C is not a quickly changing langauge, but since the PostgreSQL project has moved to C99, we will also be moving to C99 as our checked standard language.

Named Parameters

A lot of our function definitions were written before the advent of default values and named parameters as PostgreSQL function features. We will modernize our SQL extension file so we’re using named parameters everywhere. For users this will mean that correct parameter order will not be required anymore, it will be optional if you use named parameters.

M Coordinate

PostGIS supports “4 dimensional” features, with X, Y, Z and M, but while everyone knows what X, Y and Z are, only GIS afficionados know what “M” is. We will document M and also try and bring M support into GEOS so that operations in GEOS are “M preserving”.

Project Actions

Web Site

The web site is getting a little crufty around content, and the slick styling of 7 years ago is not quite as slick. We agreed that it would be OK to modernize, with particular emphasis on:

  • thinking about new user onboarding and the process of learning
  • supporting mobile devices with a responsive site style
  • using “standard” static site infrastructure (jekyll probably)

Standard Data

We agreed that having some standard data that was easy to install would make a whole bunch of other tasks much easier:

  • writing standard workshops and tutorials, so all the examples lined up
  • writing a performance harness that tracked the performance of common queries over time
  • having examples in the reference documentation that didn’t always have to generate their inputs on the fly

News File Policy

It’s a tiny nit, but for developers back-porting fixes over our 4-5 stable branches, knowing where to note bugs in the NEWS files, and doing it consistently is important.

  • Bug fixes applied to stable branches always listed in NEWS for each branch applied to
  • Bug fixes applied to stable and trunk should be listed in NEWS for each branch and NEWS in trunk
  • Bug fixes applied to only trunk can be left out of trunk NEWS (these bugs are development artifacts, not bugs that users in production have reported)

Next…

We also discussed features and changes to PostgreSQL that would help PostGIS improve, and I’ll write about those in the next post.

September 30, 2018 08:00 AM

September 28, 2018

Paul Ramsey

5x Faster Spatial Join with this One Weird Trick

My go-to performance test for PostGIS is the point-in-polygon spatial join: given a collection of polygons of variables sizes and a collection of points, count up how many points are within each polygon. It’s a nice way of testing indexing, point-in-polygon calculations and general overhead.

Setup

First download some polygons and some points.

Load the shapes into your database.

shp2pgsql -s 4326 -D -I ne_10m_admin_0_countries.shp countries | psql performance
shp2pgsql -s 4326 -D -I ne_10m_populated_places.shp places | psql performance

Now we are ready with 255 countries and 7343 places.

Countries and Places

One thing to note about the countries is that they are quite large objects, with 149 of them having enough vertices to be stored in TOAST tuples.

SELECT count(*) 
  FROM countries 
  WHERE ST_NPoints(geom) > (8192 / 16);

Baseline Performance

Now we can run the baseline performance test.

SELECT count(*), c.name 
  FROM countries c 
  JOIN places p 
  ON ST_Intersects(c.geom, p.geom) 
  GROUP BY c.name;

On my laptop, this query takes 25 seconds.

If you stick the process into a profiler while running it you’ll find that over 20 of those seconds are spent in the pglz_decompress function. Not doing spatial algorithms or computational geometry, just decompressing the geometry before handing it on to the actual processing.

Among the things we talked about this week at our PostGIS code sprint have been clever ways to avoid this overhead:

  • Patch PostgreSQL to allow partial decompression of geometries.
  • Enrich our serialization format to include a unique hash key at the front of geometries.

These are cool have-your-cake-and-eat-too ways to both retain compression for large geometries and be faster when feeding them into the point-in-polygon machinery.

However, they ignore a more brutal and easily testable approach to avoiding decompression: just don’t compress in the first place.

One Weird Trick

PostGIS uses the “main” storage option for its geometry type. The main option tries to keep geometries in their original table until they get too large, then compresses them in place, then moves them to TOAST.

There’s another option “external” that keeps geometries in place, and if they get too big moves them to TOAST uncompressed. PostgreSQL allows you to change the storage on columns at run-time, so no hacking or code is required to try this out.

-- Change the storage type
ALTER TABLE countries
  ALTER COLUMN geom
  SET STORAGE EXTERNAL;

-- Force the column to rewrite
UPDATE countries
  SET geom = ST_SetSRID(geom, 4326);

-- Re-run the query  
SELECT count(*), c.name 
  FROM countries c 
  JOIN places p 
  ON ST_Intersects(c.geom, p.geom) 
  GROUP BY c.name;

The spatial join now runs in under 4 seconds.

What’s the penalty?

  • With a “main” storage the table+toast+index is 6MB.
  • With a “external” storage the table+toast+index is 9MB.

Conclusion

For a 50% storage penalty, on a table that has far more large objects than most spatial tables, we achieved a 500% performance improvement. Maybe we shouldn’t apply compression to our large geometry at all?

Using “main” storage was mainly a judgement call back when we decided on it, it wasn’t benchmarked or anything – it’s possible that we were just wrong. Also, only large objects are compressed; since most tables are full of lots of small objects (short lines, points) changing to “external” by default wouldn’t have any effect on storage size at all.

September 28, 2018 08:00 AM

September 23, 2018

PostGIS Development

PostGIS 2.5.0

The PostGIS development team is pleased to release PostGIS 2.5.0.

Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11beta4+ and GEOS 3.7.0 which were released recently.

Best served with PostgreSQL 11 beta4 and pgRouting 2.6.1.

WARNING: If compiling with PostgreSQL+JIT, LLVM >= 6 is required Supported PostgreSQL versions for this release are: PostgreSQL 9.4 - PostgreSQL 12 (in development) GEOS >= 3.5

2.5.0

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 23, 2018 12:00 AM

September 16, 2018

PostGIS Development

PostGIS 2.5.0rc2

The PostGIS development team is pleased to release PostGIS 2.5.0rc2.

Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11beta3+ and GEOS 3.7.0 which were released recently.

Best served with PostgreSQL 11beta3.

WARNING: If compiling with PostgreSQL 11+JIT, LLVM >= 6 is required

2.5.0rc2

Changes since PostGIS 2.5.0rc1 release are as follows:

  • 4162, ST_DWithin documentation examples for storing geometry and radius in table (Darafei Praliaskouski, github user Boscop).
  • 4163, MVT: Fix resource leak when the first geometry is NULL (Raúl Marín)
  • 4172, Fix memory leak in lwgeom_offsetcurve (Raúl Marín)
  • 4164, Parse error on incorrectly nested GeoJSON input (Paul Ramsey)
  • 4176, ST_Intersects supports GEOMETRYCOLLECTION (Darafei Praliaskouski)
  • 4177, Postgres 12 disallows variable length arrays in C (Laurenz Albe)
  • 4160, Use qualified names in topology extension install (Raúl Marín)
  • 4180, installed liblwgeom includes sometimes getting used instead of source ones (Regina Obe)

View all closed tickets for 2.5.0.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

by Regina Obe at September 16, 2018 12:00 AM

September 13, 2018

Simon Greener

PGAdmin Finally Has A Spatial Viewer

A great day has arrived for PostgreSQL developers (not just Spatial geeks) in that finally an integrated spatial viewer has been added to PgAdmin 4.3 as announced by Regina Obe

You can see the announcement and some examples here

by Simon Greener at September 13, 2018 11:10 PM

September 12, 2018

PostGIS Development

PostGIS Patch Release 2.4.5

The PostGIS development team is pleased to provide bug fix 2.4.5 for the 2.4 stable branch.

View all closed tickets for 2.4.5.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

2.4.5

by Paul Ramsey at September 12, 2018 12:00 AM

September 09, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PGOpen 2018 Data Loading Presentation Slides

At PGOpen 2018 in San Francisco, we gave a talk on 10 ways to load data into Posgres. This is one of the rare talks where we didn't talk much about PostGIS. However we did showcase tools ogr_fdw, ogr2ogr, shp2pgsql, which are commonly used for loading spatial data, but equally as good for loading non-spatial data. Below are the slide links.

Continue reading "PGOpen 2018 Data Loading Presentation Slides"

by Leo Hsu and Regina Obe (nospam@example.com) at September 09, 2018 04:16 AM

September 08, 2018

Boston GIS (Regina Obe, Leo Hsu)

pgAdmin4 now offers PostGIS geometry viewer

pgAdmin4 version 3.3 released this week comes with a PostGIS geometry viewer. You will be able to see the graphical output of your query directly in pgAdmin, provided you output a geometry or geography column. If your column is of SRID 4326 (WGS 84 lon/lat), pgAdmin will automatically display against an OpenStreetMap background.

We have Xuri Gong to thank for working on this as a PostGIS/pgAdmin Google Summer of Code (GSOC) project. We'd like to thank Victoria Rautenbach and Frikan Erwee for mentoring.

Continue reading "pgAdmin4 now offers PostGIS geometry viewer"

by Regina Obe (nospam@example.com) at September 08, 2018 09:13 PM

August 19, 2018

PostGIS Development

PostGIS 2.5.0rc1

The PostGIS development team is pleased to release PostGIS 2.5.0rc1.

This release is a work in progress. Remaining time will be focused on bug fixes and documentation until PostGIS 2.5.0 release. Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11beta3+ and GEOS 3.7.0rc1 which were released recently.

Best served with PostgreSQL 11beta3 which was recently released.

Changes since PostGIS 2.5.0beta2 release are as follows:

  • 4146, Fix compilation error against Postgres 12 (Raúl Marín).
  • 4147, 4148, Honor SOURCEDATEEPOCH when present (Christoph Berg).

View all closed tickets for 2.5.0.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

2.5.0rc1

by Regina Obe at August 19, 2018 12:00 AM

July 19, 2018

Anita Graser (Underdark)

Movement data in GIS #15: writing a PL/pgSQL stop detection function for PostGIS trajectories

Do you sometimes start writing an SQL query and around at line 50 you get the feeling that it might be getting out of hand? If so, it might be useful to start breaking it down into smaller chunks and wrap those up into custom functions. Never done that? Don’t despair! There’s an excellent PL/pgSQL tutorial on postgresqltutorial.com to get you started.

To get an idea of the basic structure of a PL/pgSQL function and to proof that PostGIS datatypes work just fine in this context, here’s a basic function that takes a trajectory geometry and outputs its duration, i.e. the difference between its last and first timestamp:

CREATE OR REPLACE FUNCTION AG_Duration(traj geometry) 
RETURNS numeric LANGUAGE 'plpgsql'
AS $BODY$ 
BEGIN
RETURN ST_M(ST_EndPoint(traj))-ST_M(ST_StartPoint(traj));
END; $BODY$;

My end goal for this exercise was to implement a function that takes a trajectory and outputs the stops along this trajectory. Commonly, a stop is defined as a long stay within an area with a small radius. This leads us to the following definition:

CREATE OR REPLACE FUNCTION AG_DetectStops(
   traj geometry, 
   max_size numeric, 
   min_duration numeric)
RETURNS TABLE(sequence integer, geom geometry) 
-- implementation follows here!

Note how this function uses RETURNS TABLE to enable it to return all the stops that it finds. To add a line to the output table, we need to assign values to the sequence and geom variables and then use RETURN NEXT.

Another reason to use PL/pgSQL is that it enables us to write loops. And loops I wanted for my stop detection function! Specifically, I wanted to go through all the points in the trajectory:

FOR pt IN SELECT (ST_DumpPoints(traj)).geom LOOP
-- here comes the magic!
END LOOP;

Eventually the function should go through the trajectory and identify all segments that stay within an area with max_size diameter for at least min_duration time. To test for the area size, we can use:

IF ST_MaxDistance(segment,pt) &lt;= max_size THEN is_stop := true; 

Putting everything together, my current implementation looks like this:

CREATE OR REPLACE FUNCTION AG_DetectStops(traj geometry, max_size numeric, min_duration numeric)
    RETURNS TABLE(sequence integer, geom geometry) LANGUAGE 'plpgsql'
AS $BODY$
DECLARE 
    pt geometry;
    segment geometry;
    is_stop boolean;
    previously_stopped boolean;
    stop_sequence integer;
    p1 geometry;
BEGIN
segment := NULL;
sequence := 0;
is_stop := false;
previously_stopped := false;
p1 := NULL;
FOR pt IN SELECT (ST_DumpPoints(traj)).geom 
LOOP
   IF segment IS NULL AND p1 IS NULL THEN 
      p1 := pt; 
   ELSIF segment IS NULL THEN 
      segment := ST_MakeLine(p1,pt); 
      p1 := NULL;
      IF ST_Length((segment)) <= max_size THEN is_stop := true; END IF;
   ELSE 
      segment := ST_AddPoint(segment,pt); 
      -- if we're in a stop, we want to grow the segment, otherwise we remove points to the specified min_duration
      IF NOT is_stop THEN
         WHILE ST_NPoints(segment) > 2 AND AG_Duration(ST_RemovePoint(segment,0)) >= min_duration LOOP
            segment := ST_RemovePoint(segment,0); 
         END LOOP;
      END IF;
      -- a stop is identified if the segment stays within a circle of diameter = max_size
      IF ST_Length((segment)) <= max_size THEN is_stop := true;  -- not necessary but faster
      ELSIF ST_Distance((ST_StartPoint(segment)),(pt)) > max_size THEN is_stop := false;
      ELSIF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true;											
      ELSE is_stop := false; 
      END IF;
      -- if we found the end of a stop, we need to check if it lasted long enough
      IF NOT is_stop AND previously_stopped THEN 
         IF ST_M(ST_PointN(segment,ST_NPoints(segment)-1))-ST_M(ST_StartPoint(segment)) >= min_duration THEN
            geom := ST_RemovePoint(segment,ST_NPoints(segment)-1); 
            RETURN NEXT;
            sequence := sequence + 1;
            segment := NULL;
            p1 := pt;
	     END IF;
      END IF;
   END IF;
   previously_stopped := is_stop;
END LOOP;
IF previously_stopped AND AG_Duration(segment) >= min_duration THEN 
   geom := segment; 
   RETURN NEXT; 
END IF;
END; 
$BODY$;	

While this function is not really short, it’s so much more readable than my previous attempts of doing this in pure SQL. Some of the lines for determining is_stop are not strictly necessary but they do speed up processing.

Performance still isn’t quite where I’d like it to be. I suspect that all the adding and removing points from linestring geometries is not ideal. In general, it’s quicker to find shorter stops in smaller areas than longer stop in bigger areas.

Let’s test! 

Looking for a testing framework for PL/pgSQL, I found plpgunit on Github. While I did not end up using it, I did use its examples for inspiration to write a couple of tests, e.g.

CREATE OR REPLACE FUNCTION test.stop_at_beginning() RETURNS void LANGUAGE 'plpgsql'
AS $BODY$
DECLARE t0 integer; n0 integer;
BEGIN
WITH temp AS ( SELECT AG_DetectStops(
   ST_GeometryFromText('LinestringM(0 0 0, 0 0 1, 0.1 0.1 2, 2 2 3)'),
   1,1) stop 
)
SELECT ST_M(ST_StartPoint((stop).geom)), 
       ST_NPoints((stop).geom) FROM temp INTO t0, n0;	
IF t0 = 0 AND n0 = 3
   THEN RAISE INFO 'PASSED - Stop at the beginning of the trajectory';
   ELSE RAISE INFO 'FAILED - Stop at the beginning of the trajectory';
END IF;
END; $BODY$;

Basically, each test is yet another PL/pgSQL function that doesn’t return anything (i.e. returns void) but outputs messages about the status of the test. Here I made heavy use of the PERFORM statement which executes the provided function but discards the results:


Update: The source code for this function is now available on https://github.com/anitagraser/postgis-spatiotemporal

by underdark at July 19, 2018 09:31 PM

July 08, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

Using procedures for batch geocoding and other batch processing

One of the features we are looking forward to in upcoming PostgreSQL 11 is the introduction of procedures via the CREATE PROCEDURE ANSI-SQL construct. The major benefit that sets apart procedures from functions is that procedures are not wrapped in an outer transaction and can have COMMITs within them. This means it's not an all or nothing like it is with functions. Even if you stop a procedure in motion, whatever work has been done and committed is saved. In the case of functions, a stop or failure would roll-back all the work. It also means you can see work in progress of a stored procedure since the work will already have been committed. This is a huge benefit for batch processing. Batch processing covers a lot of use-cases of PostGIS users since a good chunk of PostGIS work involves doing some kind of batch processing of data you get from third-parties or machines.

Continue reading "Using procedures for batch geocoding and other batch processing"

by Leo Hsu and Regina Obe (nospam@example.com) at July 08, 2018 05:40 AM

June 25, 2018

Boston GIS (Regina Obe, Leo Hsu)

New in QGIS 3.2 Save Project to PostgreSQL

We've got customers discovering PostGIS and GIS in general or migrating away from ArcGIS family of tools. When they ask, "How do I see my data?", we often point them at QGIS which is an open source GIS desktop with rich integration with PostGIS/PostgreSQL.

QGIS is something that is great for people who need to live in their GIS environment since it allows for easily laying on other datasources, web services and maps. The DBManager tool allows for more advanced querying (like writing Spatial SQL queries that take advantage of the 100s of functions PostGIS has to offer) , ability to import/export data, and create PostgreSQL views.

QGIS has this thing called Projects, which allow for defining map layers and the symbology associated with them. For example what colors do you color your roads, and any extra symbols, what field attributes do you overlay - street name etc. Projects are usually saved in files with a .qgs or .qgz extension. If you spent a lot of time styling these layers, chances are you want to share them with other people in your group. This can become challenging if your group is not connected via network share.

Continue reading "New in QGIS 3.2 Save Project to PostgreSQL"

by Regina Obe (nospam@example.com) at June 25, 2018 05:47 AM