### ### Planet PostGIS

Welcome to Planet PostGIS

January 24, 2022


Elevation Profiles and Flightlines with PostGIS

A community member on the postgis-users mailing list had a question recently:

by Paul Ramsey at January 24, 2022 05:30 PM

January 18, 2022

Martin Davis (Lin.ear th.inking)

Concave Hulls in JTS

A common spatial need is to find a polygon that accurately represents a set of points.  The convex hull of the points often does not provide this, since it can enclose large areas which contain no points.  What is required is a non-convex hull, often termed the concave hull.  

The Convex Hull and a Concave Hull of a point set

A concave hull is generally considered to have some or all of the following properties:

  • The hull is a simply connected polygon
  • It contains all the input points
  • The vertices in the hull polygon boundary are all input points
  • The hull may or may not contain holes
For a typical point set there are many polygons which meet these criteria, with varying degrees of concaveness.  Concave Hull algorithms provide a numeric parameter which controls the amount of concaveness in the result.  The nature of this parameter is particularly important, since it affects the ease-of-use in practical scenarios.  Ideally it has the following characteristics:

  • Simple geometric basis: this allows the user to understand the effect of the parameter and aids in determining an effective value
  • Scale-free (dimensionless): this allows a single parameter value to be effective on varying sizes of geometry, which is essential for batch or automated processing
  • Local (as opposed to global):  A local property (such as edge length) gives the algorithm latitude to determine the concave shape of the points. A global property (such as area) over-constrains the possible result. 
  • Monotonic area:  larger (or smaller) values produce a sequence of more concave areas
  • Monotonic containment :the sequence of hulls produced are topologically nested
  • Convex-bounded: an extremal value produces the convex hull

This is a well-studied problem, and many different approaches have been proposed.  Some notable ones are:

Of these, Delaunay Erosion (Chi-shapes) offers the best set of features.  It is straightforward to code and is performant.  It uses the control parameter of Edge Length Ratio, a fraction of the difference between the longest and shortest edges in the underlying Delaunay triangulation.  This is easy to reason about, since it is scale-free and corresponds to a simple property of the point set (that of distance between vertices).  It can be extended to support holes.  And it has a track record of use, notably in Oracle Spatial.  

ConcaveHull generated by Delaunay Erosion with Edge Length Ratio = 0.3

Recently the Park-Oh algorithm has become popular, thanks to a high-quality implementation in Concaveman project (which has spawned numerous ports).  However, it has some drawbacks.  It can't support holes (and likely not disconnected regions and discarding outlier points).  As the paper points out and experiment confirms, it produces rougher outlines than the Delaunay-based algorithm.  Finally, the control parameter for Delaunay Erosion has a simpler geometrical basis which makes it easier to use.

Given these considerations, the new JTS ConcaveHull algorithm utilizes Delaunay Erosion. The algorithm ensures that the computed hull is simply connected, and contains all the input points.  The Edge Length Ratio is used as the control parameter. A value of 1 produces the convex hull; 0 produces a concave hull of minimal size.  Alternatively the maximum edge length can be specified directly. This allows alternative strategies to determine an appropriate length value; for instance, another possibility is to use a fraction of the longest edge in the Minimum Spanning Tree of the input points.  

The recently-added Tri data structure provides a convenient basis for the implementation,.  It operates as follows:

  1. The Delaunay Triangulation of the input points is computed and represented as a set of of Tris
  2. The Tris on the border of the triangulation are inserted in a priority queue, sorted by longest boundary edge
  3. While the queue is non-empty, the head Tri is popped from the queue.  It is removed from the triangulation if it does not disconnect the area.  Insert new border Tris into the queue if they have a boundary edge length than the target length
  4. The Tris left in the triangulation form the area of the Concave Hull 

Thanks to the efficiency of the JTS Delaunay Triangulation the implementation is quite performant, approaching the performance of a Java port of Concaveman.  

Concave Hull of Ukraine dataset; Edge Length Ratio = 0.1

Optionally holes can be allowed to be present in the hull polygon (while maintaining a simply connected result).  A classic demonstration of this is to generate hulls for text font glyphs:

This algorithm is in the process of being ported to GEOS.  The intention is to use it to enhance the PostGIS ST_ConcaveHull function, which has known issues and has proven difficult to use.

Further Ideas

  • Disconnected Result - It is straightforward to extend the algorithm to allow a disconnected result (i.e. a MultiPolygon).  This could be provided as an option.
  • Outlier Points - It is also straightforward to support discarding "outlier" points.
  • Polygon Concave Hull - computing a concave "outer hull" for a polygon can be used to simplify the polygon while guaranteeing the hull contains the original polygon.  Additionally, an "inner hull" can be computed which is fully contained in the original.  The implementation of a Polygon Concave Hull algorithm is well under way and will be released in JTS soon. 

by Dr JTS (noreply@blogger.com) at January 18, 2022 02:53 PM

January 14, 2022

PostGIS Development

Upgrading raster from 2.* to 3.*

As of PostGIS 3.0, the PostGIS raster support is no longer part of the postgis extension, but instead spun off into a new PostGIS extension called postgis_raster.

The two main reasons for this break were:

  • Raster functionality in PostGIS is fat with over 150 functions and several types. Wading through these extra functions frustrated many who had no use for rasters.

  • Raster gdal dependency is very big and many dreamed of having postgis extension without the big raster dependency.

While repackaging raster as it’s own extension resolved many complaints, it meant a slightly more complicated upgrade process going from PostGIS 2.something to 3.something that even experienced PostGIS users managed to screw up.

I will detail the proper way of upgrading PostGIS raster, from a PostGIS 2.* install to 3.* install.

You can run these steps using psql or pgAdmin or any other PostgreSQL tool you want.

Regardless which version of PostGIS you are coming from, you should install the PostGIS 3.* binaries first.

Continue Reading by clicking title hyperlink ..

by Regina Obe at January 14, 2022 12:00 AM

January 13, 2022


PostGIS 3.2 New and Improved

Last month, just under the wire for a 2021 release, the 3.2 version of PostGIS hit the streets! This new PostGIS also supports the latest 3.10 release of GEOS, which underpins a few of the new features.

by Paul Ramsey at January 13, 2022 06:23 PM

December 18, 2021

Boston GIS (Regina Obe, Leo Hsu)

PostGIS Day 2021 Highlights

This year's PostGIS Day was on November 18, 2021. I celebrated like many others in a PostGIS day virtual conference. You can find the talks and videos at PostGIS Day 2021. Others celebrated with parties, food, and spirits. This was my favorite PostGIS Day conference ever. Each year just gets better.

If I were to sum up this year's conference I would say: A generous helping of code, lots of humor, and lots of people. Thanks Elizabeth Christensen, Paul Ramsey and Crunchy Data for putting this conference together. All talks were really good so hard to isolate just a couple.

Continue reading "PostGIS Day 2021 Highlights"

by Regina Obe (nospam@example.com) at December 18, 2021 10:04 PM

PostGIS Development

PostGIS 3.2.0 Released

The PostGIS Team is pleased to release PostGIS 3.2.0, the Olivier Courtin release.

This release would not be possible without the various developers listed in the credits as well as the companies that provided funding and developer time.

Companies that contributed significantly to this release are:

  • CrunchyData Geometry processing functions (and improvements to GEOS, new raster functions, and address_standardizer pcre2 support
  • Kontur Speed improvements to raster, MVT, GiST indexing, better usability of geometry processing, and new radius clustering mode.
  • Netlab Continuous improvement of PostGIS topology, PostGIS upgrade plumbing, PostGIS test plumbing, and CI bot management
  • NIBIO Developer funding for topology improvements
  • Paragon Corporation — General CI bot management, release management, alignment of PostGIS with PostgreSQL versions.
Continue Reading by clicking title hyperlink ..

by Regina Obe at December 18, 2021 12:00 AM

December 17, 2021

PostGIS Development

Olivier Courtin Farewell

In March 2020 we lost a long time PostGIS developer and friend, Olivier Courtin. The PostGIS 3.2.0 release is named in his honor.
Olivier at Code Sprint Continue Reading by clicking title hyperlink ..

by Regina Obe at December 17, 2021 12:00 AM

December 16, 2021

Paul Ramsey

PostGIS Nearest Neighbor Syntax

It turns out that it is possible to get an indexed n-nearest-neighbor (KNN) search out of PostGIS along with a distance using only one distance calculation and one target literal.

SELECT id, $point <-> geom AS distance
FROM geoms
ORDER BY distance

See that?!? Using the column-name syntax for ORDER BY, the <-> operator pulls double duty, both returning the distance to the target list and also forcing an index-assisted KNN ordering.

I never considered this possibility until seeing it in this tweet. Before I would have been doing this:

SELECT id, ST_Distance($point, geom) AS distance
FROM geoms
ORDER BY $point <-> geom

Two distance calculations (one in the function, one in the operator) and two references to the literal. Yuck!

December 16, 2021 08:00 AM

December 10, 2021

PostGIS Development

PostGIS 3.2.0rc1 Released

The PostGIS Team is pleased to release the first rc of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS can utilize the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.1 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis, postgis_raster and postgis_topology extensions and a new input/export format FlatGeobuf.

Continue Reading by clicking title hyperlink ..

by Regina Obe at December 10, 2021 12:00 AM

December 04, 2021

PostGIS Development

PostGIS 3.2.0beta3 Released

The PostGIS Team is pleased to release the third beta of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS can utilize the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.1 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis, postgis_raster and postgis_topology extensions and a new input/export format FlatGeobuf.

Continue Reading by clicking title hyperlink ..

by Regina Obe at December 04, 2021 12:00 AM

December 03, 2021


Tricks for Faster Spatial Indexes

One of the curious aspects of spatial indexes is that the nodes of the tree can overlap, because the objects being indexed themselves also overlap.

by Paul Ramsey at December 03, 2021 05:07 PM

December 01, 2021


PostGIS Day 2021

Crunchy Data hosted the third annual PostGIS Day on November 18th.This was our second year with a virtual format and another year of record attendance! We had attendees from more than 99 countries.

by Elizabeth Christensen (Elizabeth.Christensen@crunchydata.com) at December 01, 2021 08:42 PM

November 26, 2021

PostGIS Development

PostGIS 3.2.0beta2 Released

The PostGIS Team is pleased to release the second beta of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS utilizes the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.1 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis_raster and postgis_topology extensions.

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 26, 2021 12:00 AM

October 23, 2021

PostGIS Development

PostGIS 3.2.0beta1 Released

The PostGIS Team is pleased to release the first beta of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14. This version of PostGIS utilizes the faster GiST building support API introduced in PostgreSQL 14. If compiled with recently released GEOS 3.10.0 you can take advantage of improvements in ST_MakeValid and numerous speed improvements. This release also includes many additional functions and improvements for postgis_raster and postgis_topology extensions.

Continue Reading by clicking title hyperlink ..

by Regina Obe at October 23, 2021 12:00 AM

October 21, 2021

Boston GIS (Regina Obe, Leo Hsu)

FOSS4G 2021 Buenos Aires Videos are out

The Free and Open Source for GIS conference : FOSS4G 2021 Buenos Aires Online took place September 27th to October 2nd, 2021 online. Many of the videos from the conference (including mine on PostGIS) are at FOSS4G 2021 Playlist viewable online.

Continue reading "FOSS4G 2021 Buenos Aires Videos are out"

by Regina Obe (nospam@example.com) at October 21, 2021 05:37 PM

September 11, 2021

PostGIS Development

PostGIS 3.2.0alpha1 Released

The PostGIS Team is pleased to release the first alpha of the upcoming PostGIS 3.2.0 release.

Best served with PostgreSQL 14 beta3. This version of PostGIS utilizes the faster GiST building support API introduced in PostgreSQL 14. If compiled with the in-development GEOS 3.10dev you can take advantage of improvements in ST_MakeValid. This release also includes many additional functions and improvements for postgis_raster and postgis_topology extensions.

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 11, 2021 12:00 AM

September 04, 2021

PostGIS Development

PostGIS 3.1.4, 3.0.4 Released

The PostGIS development team is pleased to provide bug fix and performance enhancements 3.1.4 and 3.0.4 for the 3.1, 3.0 stable branches.

3.1.4 This release supports PostgreSQL 9.6-14.

3.0.4 This release works with PostgreSQL 9.5-13 and GEOS >= 3.6 Designed to take advantage of features in PostgreSQL 12+ and Proj 6+

View all closed tickets for 3.1.4, 3.0.4.

After installing the binaries or after running pg_upgrade:

For PostGIS 3.1, 3.0, 2.5 do below which will upgrade all your postgis extensions.

SELECT postgis_extensions_upgrade();

For PostGIS 2.4 and below do:


— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

by Regina Obe at September 04, 2021 12:00 AM

September 03, 2021


Querying Spatial Data with PostGIS and ogr_fdw

In my last post, I did a simple intro to foreign data wrappers in PostgreSQL. postgres_fdw is an extension available in Postgres core that allows you to issue queries against another Postgres database. It's just one of many foreign data wrappers that you can use in Postgres, so for today's post we'll look at another that works especially well with spatial data formats: ogr_fdw.

I had also previously talked about some different ways to get spatial data into a Postgres/PostGIS database, but the cool thing about foreign data wrappers is that they're an alternative to needing to have everything in the same data store. With spatial data being stored and shared in so many different formats, imagine being able to abstract that conversion away and just focus on analysis. Read on for a couple of quick demos.

by Kat Batuigas (kat.batuigas@crunchydata.com) at September 03, 2021 04:40 PM

September 01, 2021

Paul Ramsey

GeoMob Podcast - PostGIS

I neglected to post about this at the time, which I guess is a testament to the power of twitter to suck up energy that might otherwise be used in blogging, but for posterity I am going to call out here:

Have a listen.

September 01, 2021 08:00 AM

August 20, 2021


Waiting for PostGIS 3.2: Secure Cloud Raster Access

Raster data access from the spatial database is an important feature, and the coming release of PostGIS will make remote access more practical by allowing access to private cloud storage.

Previous versions could access rasters in public buckets, which is fine for writing blog posts, but in the real world people frequently store their data in private buckets, so we clearly needed the ability to add security tokens to our raster access.

by Paul Ramsey at August 20, 2021 02:11 PM

July 23, 2021


Waiting for PostGIS 3.2: ST_Contour and ST_SetZ

One theme of the 3.2 release is new analytical functionality in the raster module, and access to cloud-based rasters via the "out-db" option for rasters. Let's explore two new functions and exercise cloud raster support at the same time.

by Paul Ramsey at July 23, 2021 06:33 PM

July 21, 2021


Using PostGIS and pg_featureserv with QGIS

My colleague Kat Batuigas recently wrote about using the powerful open-source QGIS desktop GIS to import data into PostGIS from an ArcGIS Feature Service. This is a great first step toward moving your geospatial stack onto the performant, open source platform provided by PostGIS. And there's no need to stop there! Crunchy Data has developed a suite of spatial web services that work natively with PostGIS to expose your data to the web, using industry-standard protocols. These include:

by Martin Davis at July 21, 2021 04:31 PM

July 19, 2021


Waiting for PostGIS 3.2: ST_MakeValid

One of the less visible improvements coming in PostGIS 3.2 (via the GEOS 3.10 release) is a new algorithm for repairing invalid polygons and multipolygons.

Algorithms like polygon intersection, union and difference rely on guarantees that the structure of inputs follows certain rules. We call geometries that follow those rules "valid" and those that do not "invalid".

by Paul Ramsey at July 19, 2021 03:01 PM

July 02, 2021

PostGIS Development

PostGIS 3.1.3

The PostGIS Team is pleased to release PostGIS 3.1.3! Best Served with PostgreSQL 14 beta2.

This release is a bug fix release, addressing issues found in the previous 3.1 release.

  • #4929, Fix missing error from GetRingEdges when invoked with unexistent topology or edge (Sandro Santilli)

  • #4927, Fix PostgreSQL 14 compile FuncnameGetCandidates changes needed to compile against PostgreSQL 14 beta2 or higher (Regina Obe, Julien Rouhaud)

Continue Reading by clicking title hyperlink ..

by Regina Obe at July 02, 2021 12:00 AM

May 31, 2021

Paul Ramsey

PostGIS at 20, The Beginning

Twenty years ago today, the first email on the postgis users mailing list (at that time hosted on yahoogroups.com) was sent, announcing the first numbered release of PostGIS.


The early history of PostGIS was tightly bound to a consulting company I had started a few years prior, Refractions Research. My first contracts ended up being with British Columbia (BC) provincial government managers who, for their own idiosyncratic reasons, did not want to work with ESRI software, and as a result our company accrued skills and experience beyond what most “GIS companies” in the business had.

We got good at databases, and the FME. We got good at Perl, and eventually Java. We were the local experts in a locally developed (and now defunct) data analysis tool called Facet, which was the meat of our business for the first four years or so.


That Facet tool was a key part of a “watershed analysis atlas” the BC government commissioned from Facet in the late 1990’s. We worked as sub-contractors, building the analytical routines that would suck in dozens of environmental layers, chop them up by watershed, and spit out neat tables and maps, one for each watershed. Given the computational power of the era, we had to use multiple Sun workstations to run the final analysis province-wide, and to manage the job queue, and keep track of intermediate results, we placed them all into tables in PostgreSQL.

Putting the chopped up pieces of spatial data as blobs into PostgreSQL was what inspired PostGIS. It seemed really obvious that we had the makings of an interactive real-time analysis engine, with all this processed data in the database, if we could just do more with the blobs than only stuff them in and pull them out.

Maybe We Should do Spatial Databases?

Reading about spatial databases circa 2000 you would find that:

This led to two initiatives on our part, one of which succeeded and the other of which did not.

First, I started exploring whether there was an opportunity in the BC government for a consulting company that had skill with Oracle’s spatial features. BC was actually standardized on Oracle as the official database for all things governmental. But despite working with the local sales rep and looking for places where spatial might be of interest, we came up dry.


The existing big Oracle ministries (Finance, Justice) didn’t do spatial, and the heavily spatial natural resource ministries (Forests, Environment) were still deeply embedded in a “GIS is special” head space, and didn’t see any use for a “spatial database”. This was all probably a good thing, as it turned out.

Our second spatial database initiative was to explore whether any of the spatial models described in the OpenGIS Simple Features for SQL specification were actually practical. In addition to describing the spatial types and functions, the specification described three ways to store the spatial part of a table.


  • In a set of side tables (scheme 1a), where each feature was broken down into x’s and y’s stored in rows and columns in a table of numbers.
  • In a “binary large object” (BLOB) (scheme 1b).
  • In a “geometry type” (scheme 2).

Since the watershed work had given us experience with PostgreSQL, we carried out the testing with that database, examining: could we store spatial data in the database and pull it out efficiently enough to make a database-backed spatial viewer.


For the viewer part of the equation, we ran all the experiments using a Java applet called JShape. I was quite fond of JShape and had built a few little map viewer web pages for clients using it, so hooking it up to a dynamic data source rather than files was a rather exciting prospect.

All the development was done on the trusty Sun Ultra 10 I had taken out a $10,000 loan to purchase when starting up the company. (At the time, we were still making a big chunk of our revenue from programming against the Facet software, which only ran on Sun hardware.)


  • The first experiment, shredding the data into side tables, and then re-constituting it for display was very disappointing. It was just too slow to be usable.
  • The second experiment, using the PostgreSQL BLOB interface to store the objects, was much faster, but still a little disappointing. And there was no obvious way to add an index to the data.


At this point we almost stopped: we’d tried all the stuff explained in the user-level documentation for PostgreSQL. But our most sophisticated developer, Dave Blasby, who had actually studied computer science (most of us had mathematics and physics degrees), and was unafraid of low-level languages, looked through the PostgreSQL code and contrib section and said he could probably do a custom type, given some time.

So he took several days and gave it a try. He succeeded!

When Dave had a working prototype, we hooked it up to our little applet and the thing sang. It was wonderfully quick, even when we loaded up quite large tables, zooming around the spatial data and drawing our maps. This is something we’d only seen on fancy XWindows displays on UNIX workstations and now were were doing it in an applet on ordinary PC. It was quite amazing.

We had gotten a lot of very good use out of the PostgreSQL database, but there was no commercial ecosystem for PostgreSQL extensions, so it seemed like the best business use of PostGIS was to put it “out there” as open source and see if it generated some in-bound customer traffic.

At the time, Refractions had perhaps 6 staff (it’s hard to remember precisely) and many of them contributed, both to the initial release and over time.

  • Dave Blasby continued polishing the code, adding some extra functions that seemed to make sense.
  • Jeff Lounsbury, the only other staffer who could write C, took up the task of a utility to convert Shape files into SQL, to make loading spatial data easier.
  • I took on the work of setting up a Makefile for the code, moving it into a CVS repository, writing the documentation, and getting things ready for open sourcing.
  • Graeme Leeming and Phil Kayal, my business partners, put up with this apparently non-commercial distraction. Chris Hodgson, an extremely clever developer, must have been busy elsewhere or perhaps had not joined us just yet, but he shows up in later commit logs.


Finally, on May 31, Dave sent out the initial release announcement. It was PostGIS 0.1, and you can still download it, if you like. This first release had a “geometry” type, a spatial index using the PostgreSQL GIST API, and these functions:

  • npoints(GEOMETRY)
  • nrings(GEOMETRY)
  • mem_size(GEOMETRY)
  • numb_sub_objs(GEOMETRY)
  • summary(GEOMETRY)
  • length3d(GEOMETRY)
  • length2d(GEOMETRY)
  • area2d(GEOMETRY)
  • perimeter3d(GEOMETRY)
  • perimeter2d(GEOMETRY)
  • truly_inside(GEOMETRY, GEOMETRY)

The only analytical function, “truly_inside()” just tested if a point was inside a polygon. (For a history of how PostGIS got many of the other analytical functions it now has, see History of JTS and GEOS on Martin Davis’ blog.)

Reading through those early mailing list posts from 2001, it’s amazing how fast PostGIS integrated into the wider open source geospatial ecosystem. There are posts from Frank Warmerdam of GDAL and Daniel Morissette of MapServer within the first month of release. Developers from the Java GeoTools/GeoServer ecosystem show up early on as well.

There was a huge demand for an open source spatial database, and we just happened to show up at the right time.

Where are they Now?

  • Graeme, Phil, Jeff and Chris are still doing geospatial consulting at Refractions Research.
  • Dave maintained and improved PostGIS for the first couple years. He left Refractions for other work, but still works in open source geospatial from time to time, mostly in the world of GeoServer and other Java projects.
  • I found participating in the growth of PostGIS very exciting, and much of my consulting work… less exciting. In 2008, I left Refractions and learned enough C to join the PostGIS development community as a contributor, which I’ve been doing ever since, currently as a Executive Geospatial Engineer at Crunchy Data.

May 31, 2021 08:00 AM

May 21, 2021

PostGIS Development

PostGIS 3.1.2

The PostGIS Team is pleased to release the release of PostGIS 3.1.2!

This release is a bug fix release, addressing issues found in the previous 3.1 release.

  • #4871, TopoGeometry::geometry cast returns NULL for empty TopoGeometry objects (Sandro Santilli)
  • #4826, postgistigergeocoder Better answers when no zip is provided (Regina Obe)
  • #4817, handle more complex compound coordinate dystems (Paul Ramsey)
  • #4842, Only do axis flips on CRS that have a “Lat” as the first column (Paul Ramsey)
  • Support recent Proj versions that have removed pjgetrelease (Paul Ramsey)
  • #4835, Adjust tolerance for geodetic calculations (Paul Ramsey)
  • #4840, Improper conversion of negative geographic azimuth to positive (Paul Ramsey)
  • #4853, DBSCAN cluster not formed when recordset length equal to minPoints (Dan Baston)
  • #4863, Update bboxes after scale/affine coordinate changes (Paul Ramsey)
  • #4876, Fix raster issues related to PostgreSQL 14 tablefunc changes (Paul Ramsey, Regina Obe)
  • #4877, mingw64 PostGIS / PostgreSQL 14 compile (Regina Obe, Tom Lane)
  • #4838, Update to support Tiger 2020 (Regina Obe)
  • #4890, Change Proj cache lifetime to last as long as connection (Paul Ramsey)
  • #4845, Add Pg14 build support (Paul Ramsey)
Continue Reading by clicking title hyperlink ..

by Paul Ramsey at May 21, 2021 12:00 AM

May 18, 2021


Waiting for PostGIS 3.2: ST_InterpolateRaster

A common situation in the spatial data world is having discrete measurements of a continuous variable. Every place in the world has a temperature, but there are only a finite number of thermometers: how should we reason about places without thermometers and how should we model temperature?

by Paul Ramsey at May 18, 2021 03:41 PM

May 05, 2021


(The Many) Spatial Indexes of PostGIS

Spatial indexes are used in PostGIS to quickly search for objects in space. Practically, this means very quickly answering questions of the form:
  • "all the things inside this this" or
  • "all the things near this other thing"

Because spatial objects are often quite large and complex (for example, coastlines commonly are defined with thousands of points), spatial indexes use "bounding boxes" as index and search keys:

  • Bounding boxes are of a small, fixed size, only 4 floats for a 2D box; and,
  • Bounding boxes are very inexpensive to compare to test things like containment.

So the simple story of spatial indexes is: if you are planning to do spatial queries (which, if you are storing spatial objects, you probably are) you should create a spatial index for your table.

by Paul Ramsey at May 05, 2021 08:43 PM

May 04, 2021

Paul Ramsey

Spatial Indexes and Bad Queries

So, this happened:

Tweet about Indexes

Basically a GeoDjango user posted some workarounds to some poor performance in spatial queries, and the original query was truly awful and the workaround not a lot better, so I snarked, and the GeoDjango maintainer reacted in kind.

Sometimes a guy just wants to be a prick on the internet, you know? But still, I did raise the red flag of snarkiness, so it it seems right and proper to pay the fine.

I come to this not knowing entirely the contract GeoDjango has with its users in terms of “hiding the scary stuff”, but I will assume for these purposes that:

  • Data can be in any coordinate system.
  • Data can use geometry or geography column types.
  • The system has freedom to create indexes as necessary.

So the system has to cover up a lot of variability in inputs to hide the scary stuff.

We’ll assume a table name of the_table a geometry column name of geom and a geography column name of geog.

Searching Geography

This is the easiest, since geography queries conform to the kind of patterns new users expect: the coordinates are in lon/lat but the distances are provided/returned in metres.

Hopefully the column has been spatially indexed? You can check in the system tables.

FROM pg_indexes 
WHERE tablename = 'the_table';

Yes, there are more exact ways to query the system tables for this information, I give the simple example for space reasons.

If it has not been indexed, you can make a geography index like this:

CREATE INDEX the_table_geog_x 
  ON the_table
  USING GIST (geog);

And then a “buffered” query, that finds all objects within a radius of an input geometry (any geometry, though only a point is shown here) looks like this.

FROM the_table
    ST_SetSRID(ST_MakePoint(%lon, %lat), 4326),

Note that there is no “buffering” going on here! A radius search is logically equivalent and does not pay the cost of building up buffers, which is an expensive operation.

Also note that, logically, ST_DWithin(A, B, R) is the same as ST_Distance(A, B) < R, but in execution the former can leverage a spatial index (if there is one) while the latter cannot.

Indexable Functions

Since I mention that ST_DWithin() is indexable, I should list all the functions that can make use of a spatial index:

And for a bonus there are also a few operators that access spatial indexes.

  • geom_a && geom_b returns true if the bounding box of geom_a intersects the bounding box of geom_b in 2D space.
  • geom_a &&& geom_b returns true if the bounding box of geom_a intersects the bounding box of geom_b in ND space (an ND index is required for this to be index assisted),

Searching Planar Geometry

If the data are planar, then spatial searching should be relatively easy, even if the input geometry is not in the same coordinate system.

First, is your data planar? Here’s a quick-and-dirty query that returns true for geographic data and false for planar.

SELECT srs.srtext ~ '^GEOGCS' 
FROM spatial_ref_sys srs
JOIN geometry_columns gc
ON srs.srid = gc.srid
WHERE gc.f_table_name = 'the_table'

Again, do you have a spatial index on your geometry column? Check!

CREATE INDEX the_table_geom_x 
  ON the_table
  USING GIST (geom);

Now, assuming query coordinates in the same planar projection as the data, a fast query will look like this:

FROM the_table
    ST_SetSRID(ST_MakePoint(%x, %y), %srid)

But what if users are feeding in queries in geographic coordinates? No problem, just convert them to planar before querying:

FROM the_table
        ST_SetSRID(ST_MakePoint(%lon, %lat), 4326), 

Note that here we are transforming the geography query to planar, not transforming the planar column to geographic!

Why? There’s only one query object, and there’s potentially 1000s of rows of table data. And also, the spatial index has been built on the planar data: it cannot assist the query unless the query is in the same projection.

Searching Lon/Lat Geometry

This is the hard one. It is quite common for users to load geographic data into the “geometry” column type. So the database understands them as planar (that’s what the geometry column type is for!) while their units (longitude and latitude) are in fact angular.

There are benefits to staying in the geometry column type:

  • There are far more functions native to geometry, so you can avoid a lot of casting.
  • If you are mostly working with planar queries you can get better performance from 2D functions.

However, there’s a huge downside: questions that expect metric answers or metric parameters can’t be answered. ST_Distance() between two geometry objects with lon/lat coordinates will return an answer in… “degrees”? Not really an answer that makes any sense, as cartesian math on anglar coordinates returns garbage.

So, how to get around this conundrum? First, the system has to recognize the conundrum!

  • Is the column type “geometry”?
  • Is the SRID a long/lat coordinate system? (See test query above.)

Both yes? Ok, this is what we do.

First, create a functional index on the geography version of the geometry data. (Since you’re here, make a standard geometry index too, for other purposes.)

CREATE INDEX the_table_geom_geog_x
ON the_table
USING GIST (geography(geom));

CREATE INDEX the_table_geom
ON the_table
USING GIST (geom);

Now we have an index that understands geographic coordinates!

All we need now is a way to query the table that uses that index efficiently. The key with functional indexes is to ensure the function you used in the index shows up in your query.

FROM the_table
    geography(ST_SetSRID(ST_MakePoint(%lon, %lat), 4326))

What’s going on here? By using the “geography” version of ST_DWithin() (where both spatial arguments are of type “geography”) I get a search in geography space, and because I have created a functional index on the geography version of the “geom” column, I get it fully indexed.

Random Notes

  • The user blog post asserts incorrectly that their best performing query is much faster because it is “using the spatial index”.
        ST_GeomFromEWKB('\001\001\000\000 \346\020\000\000\000\000\000\000\000@U@\000\000\000\000\000\000@@'::bytea)) AS "ds" 
     FROM "core_searchcriteria" 
        WHERE ST_DistanceSphere(
            ST_GeomFromEWKB('\001\001\000\000 \346\020\000\000\000\000\000\000\000@U@\000\000\000\000\000\000@@'::bytea)
        ) <= "core_searchcriteria"."distance";
  • However, the WHERE clause is just not using any of the spatially indexable functions. Any observed speed-up is just because it’s less brutally ineffecient than the other queries.

  • Why was the original brutally inefficient?

        CAST("core_searchcriteria"."geo_location" AS geography(POINT,4326)), "core_searchcriteria"."distance"
        )::bytea AS "buff" FROM "core_searchcriteria" 
    WHERE ST_Intersects(
            CAST("core_searchcriteria"."geo_location" AS geography(POINT,4326)), "core_searchcriteria"."distance"), 
        ST_GeomFromEWKB('\001\001\000\000 \346\020\000\000\000\000\000\000\000@U@\000\000\000\000\000\000@@'::bytea)
  • The WHERE clause converts the entire contents of the data column to geography and then buffers every single object in the system.
  • It then compares all those buffered objects to the query object (what, no index? no).
  • Since the column objects have all been buffered… any spatial index that might have been built on the objects is unusable. The index is on the originals, not on the buffered objects.

May 04, 2021 08:00 AM

April 02, 2021


Creating mosaics, clipping and removing overlapping satellite images

Intro This post describes ways to download, clip and join satellite images. The module sentinel-mosaic is used throughout this blog post. Background There’s a number of satellites launched by the European Space Agency that take images of Earth which are then sent to ground stations and made publicly available through the Copernicus Open Access Hub and its respective API. In this post we’re focusing mainly on the data from Sentinel-2A and Sentinel-2B which were designed for multiple purposes, one of those purposes being land monitoring.

April 02, 2021 12:00 AM

April 01, 2021

Paul Ramsey

Dumping a ByteA with psql

Sometimes you just have to work with binary in your PostgreSQL database, and when you do the bytea type is what you’ll be using. There’s all kinds of reason to work with bytea:

  • You’re literally storing binary things in columns, like image thumbnails.
  • You’re creating a binary output, like an image, a song, a protobuf, or a LIDAR file.
  • You’re using a binary transit format between two types, so they can interoperate without having to link to each others internal format functions. (This is my favourite trick for creating a library with optional PostGIS integration, like ogr_fdw.)

Today I was doing some debugging on the PostGIS raster code, testing out a new function for interpolating a grid surface from a non-uniform set of points, and I needed to be able to easily see what the raster looked like.

Interpolated raster surface grid

There’s a function to turn a PostGIS raster into a GDAL image format, so I could create image data right in the database, but in order to actually see the image, I needed to save it out as a file. How to do that without writing a custom program? Easy! (haha)

Basic steps:

  • Pipe the query of interest into the database
  • Access the image/music/whatever as a bytea
  • Convert that bytea to a hex string using encode()
  • Ensure psql is not wrapping the return in any extra cruft
  • Pipe the hex return value into xxd
  • Redirect into final output file

Here’s what it would look like if I was storing PNG thumbnails in my database and wanted to see one:

echo "SELECT encode(thumbnail, 'hex') FROM persons WHERE id = 12345" \
  | psql --quiet --tuples-only -d dbname \
  | xxd -r -p \
  > thumbnail.png

Any bytea output can be pushed through this chain, here’s what I was using to debug my ST_GDALGrid() function.

echo "SELECT encode(ST_AsGDALRaster(ST_GDALGrid('MULTIPOINT(10.5 9.5 1000, 11.5 8.5 1000, 10.5 8.5 500, 11.5 9.5 500)'::geometry, ST_AddBand(ST_MakeEmptyRaster(200, 400, 10, 10, 0.01, -0.005, 0, 0), '16BSI'), 'invdist' ), 'GTiff'), 'hex')" \
  | psql --quiet --tuples-only grid \
  | xxd -r -p \
  > testgrid.tif 

April 01, 2021 08:00 AM

March 24, 2021


Performance Improvements in GEOS

We at Crunchy Data put as much development effort into improving GEOS as we do improving PostGIS proper, because the GEOS library is so central to much geospatial processing.

by Paul Ramsey at March 24, 2021 05:49 PM

March 14, 2021


Polygon gridding using Geopandas and Shapely

Intro This post will discuss some work involving maps I’ve helped a client with. The main goal of the project was collecting various datasets from web services. One of those web services has an endpoint that receives as a parameter a series of points that define a polygon for which the API request is made (the response will be a series of resources that are located inside that polygon). The API supports pagination, so if the area of the polygon is too big, we’ll have to do additional requests for all the result pages.

March 14, 2021 12:00 AM

February 16, 2021


ArcGIS Feature Service to PostGIS: The QGIS Way

As a GIS newbie, I've been trying to use local open data for my own learning projects. I recently relocated to Tampa, Florida and was browsing through the City of Tampa open data portal and saw that they have a Public Art map. That looked like a cool dataset to work with but I couldn't find the data source anywhere in the portal. I reached out to the nice folks on the city's GIS team and they gave me an ArcGIS-hosted URL. 

To get the public art features into PostGIS I decided to use the "ArcGIS Feature Service" option in QGIS to point to the ArcGIS API, then export the feature layer to PostGIS. 

by Kat Batuigas (kat.batuigas@crunchydata.com) at February 16, 2021 02:31 PM

February 02, 2021

Martin Davis (Lin.ear th.inking)

OverlayNG and Invalid Geometry

A recent blog post by Elephant Tamer gives a critical appraisal of the improvements to overlay processing shipped in PostGIS 3.1 with GEOS 3.9.  The author is disappointed that PostGIS still reports errors when overlay is used on invalid geometry.  However, this is based on a misunderstanding of the technology. 

GEOS 3.9 includes OverlayNG, ported from the JTS Topology Suite). It brings a major advance in overlay robustness, along with other improvements (described here and here).  Previously, robustness limitations in the overlay algorithm could sometimes cause errors even for inputs which were topologically valid.  This was doubly problematic because there was no fully effective way to process the input geometries to avoid the errors.  Now, OverlayNG solves this problem completely.  Valid inputs will always produce a valid and essentially correct(*) output.

(*) "Essentially" correct, because in order to achieve full robustness a snapping heuristic may be applied to the input geometry.  However, this is done with a very fine tolerance, so should not appreciably alter the output from the theoretically correct value.

But for invalid inputs, OverlayNG will still report errors.  The reason is that there is a wide variety of gruesome ways in which geometry can be invalid.  Automated handling of invalidity would involve expensive extra processing, and also require making assumptions about what area a corrupt geometry is intended to represent.  Rather than silently repairing invalid geometry and returning potentially incorrect results, the design decision is to report this situation as an error.

In fact, OverlayNG is able to handle "mildly" invalid polygons, as described in this post. This covers situations which are technically invalid according to the OGC SFS specification, but which still have well-defined topology.  This includes self-touching rings (sometimes called "inverted polygons" or "exverted holes"), and zero-width gores and spikes.

Taking a detailed look at the data used in the blog post, we can see these improvements at work.  The dataset is the ecology polygons obtained from the GDOS WFS server.  This contains 7662 geometries, of which 10 are invalid.  Using the old overlay algorithm, 9 of these invalid polygons cause TopologyException errors.  Using OverlayNG, only 4 of them cause errors. 

The polygons that can now be processed successfully are typical "OGC-invalid" situations, which do not materially affect the polygonal topology.  These include self-touching rings with pinch points:

and zero-width gores:

Of the cases that still cause errors, two are classic small bow-tie errors:

And two are wildly invalid self-crossing rings:

The last two are good examples of geometry which is so invalid that it is impossible to unambiguously decide what area is represented (although ST_MakeValid will happily grind them into something that is technically valid).

Ultimately it is the user's responsibility to ensure that geometries to be processed by overlay (and many other PostGIS functions) have valid topology (as reported by ST_IsValid).  Ideally this is done by correcting the data at source.  But it can also be done a posteriori in the database itself, by either the ST_MakeValid function, or the well-known buffer(0) trick. (Which to use is a topic for another blog post...)

One improvement that could be made is to check for input validity when OverlayNG throws an error.  Then PostGIS can report definitively that an overlay error is caused by invalid input.  If there is an overlay error that is not caused by invalidity, the PostGIS team wants to hear about it! 

And perhaps there is a case to be made for repairing invalid geometry automatically, even if the repair is suspect.  Possibly this could be invoked via a flag parameter on the overlay functions. More research is required - feedback is welcome!

by Dr JTS (noreply@blogger.com) at February 02, 2021 06:42 PM

January 28, 2021

PostGIS Development

PostGIS 3.1.1

The PostGIS Team is pleased to release the release of PostGIS 3.1.1!

This release is a bug fix release, addressing issues found in the previous 3.1 release.

  • #4814, Crash passing collection with only empty components to ST_MakeValid
  • #4818, Make the VSICURL synthetic driver work as documented
  • #4825, Unstable results from ST_MakeValid
  • #4823, Avoid listing the same geometry in different collections
Continue Reading by clicking title hyperlink ..

by Paul Ramsey at January 28, 2021 12:00 AM

January 21, 2021


Production PostGIS Vector Tiles: Caching

Building maps that use dynamic tiles from the database is a lot of fun: you get the freshest data, you don't have to think about generating a static tile set, and you can do it with very minimal middleware, using pg_tileserv.

However, the day comes when it's time to move your application from development to production, what kinds of things should you be thinking about?

by Paul Ramsey at January 21, 2021 07:08 PM

December 19, 2020

Martin Davis (Lin.ear th.inking)

Randomization to the Rescue!

Now that OverlayNG has landed on JTS master, it is getting attention from downstream projects interested in porting it or using it.  One of the JTS ports is the Net Topology Suite (NTS) project, and it is very proactive about tracking the JTS codebase.  Soon after the OverlayNG commit an NTS developer noticed an issue while running the performance tests in JTS: the InteriorPointInAreaPerfTest was now causing a StackOverflowError.  This turned out to be caused by the change to GeometryPrecisionReducer to use OverlayNG with Snap-Rounding to perform robust precision reduction of polygons.  Further investigation revealed that failure was occurring while querying the KdTree used in the HotPixelIndex in the SnapRoundingNoder.  Moreover, in addition to the outright failure, even when queries did succeed they were taking an excessively long time for large inputs.

The reason for using a K-D tree as the search structure for HotPixels is that it supports two kinds of queries with a single structure: 

  • Lookup queries to find the HotPixel for a coordinate.  These queries are performed while building the index incrementally. Hence a dynamic data structure like a K-D tree is required, rather than a static index such as STRtree.
  • Range queries to find all HotPixels within a given extent.  These queries are run after the index is loaded.

The JTS KdTree supports both of these queries more efficiently than QuadTree, which is the other dynamic index in JTS.  However, K-D trees are somewhat notorious for becoming unbalanced if inserted points are coherent (i.e. contain monotonic runs in either ordinate dimension).  This is exactly the situation which occurs in large, relatively smooth polygons - which is the test data used in InteriorPointInAreaPerfTest.  An unbalanced tree is deeper than it would be if perfectly balanced.  In extreme cases this leads to trees of very large depth relative to their size.  This slows down query performance and, because the KdTree query implementation uses recursion, can also lead to stack overflow.  

A perfectly balanced K-D tree, with depth = logN.  In the worst case an unbalanced tree has only one node at each level (depth = N).

I considered a few alternatives to overcome this major blocker.  One was to use a QuadTree, but as mentioned this would reduce performance.  There are schemes to load a K-D tree in a balanced way, but they seemed complex and potentially non-performant.

It's always great when people who file issues are also able to provide code to fix the problem.  And happily the NTS developer submitted a pull request with a very nice solution.  He observed that while the K-D tree was being built incrementally, in fact the inserted points were all available beforehand.  He proposed randomizing them before insertion, using the elegantly simple Fisher-Yates shuffle.  This worked extremely well, providing a huge performance boost and eliminated the stack overflow error.  Here's the relative performance times:

Num PtsRandomizedIn Order
10K126 ms341 ms
20K172 ms1924 ms
50K417 ms12.3 s
100K1030 ms59 s
200K1729 ms240 s
500K5354 msOverflow

Once the solution was merged into JTS, my colleague Paul Ramsey quickly ported it to GEOS, so that PostGIS and all the other downstream clients of GEOS would not encounter this issue.

It's surprising to me that this performance issue hasn't shown up in the other two JTS uses of KdTree: SnappingNoder and ConstrainedDelaunayTriangulator. More investigation required!

by Dr JTS (noreply@blogger.com) at December 19, 2020 03:35 AM

December 18, 2020

PostGIS Development

PostGIS 3.1.0

The PostGIS Team is pleased to release the release of PostGIS 3.1.0!

This version exposes the new features of GEOS 3.9 as well as numerous core performance enhancements for spatial joins, large object access, text format output and more.

Performance is a key feature of this release, with improvements to spatial joins, text outputs, large object reads, vector tile output, and a host of smaller tweaks.

The k-means clustering code has been enhanced to support weighting and higher dimensional clusters.

Geometry generators to create hexagonal and square tilings have been added, for simpler in-the-database summarization queries.

Finally, PostGIS exposes the latest enhancements in the GEOS geometry library 3.9 version. The new overlay engine (aka “OverlayNG”) provides more robust handling of difficult input geometries, using a set of new noding strategies to process geometry. For the end user, this should mean no more “topology exceptions” when using the union, difference, intersection or symmetric difference functions. PostGIS also exposes the new fixed precision overlay capability via an additional grid-size parameter on ST_Intersection and the other overlay functions.

Continue Reading by clicking title hyperlink ..

by Paul Ramsey at December 18, 2020 12:00 AM

December 16, 2020

Paul Ramsey

Waiting for PostGIS 3.1: GEOS 3.9

This post originally appeared on the Crunchy Data blog.

While we talk about “PostGIS” like it’s one thing, it’s actually the collection of a number of specialized geospatial libraries, along with a bunch of code of its own.

  • PostGIS provides core functionality
    • bindings to PostgreSQL, the types and indexes,
    • format reading and writing
    • basic algorithms like distance and area
    • performance tricks like caching
    • simple geometry manipulations (add a point, dump rings, etc)
    • algorithms that don’t exist in the other libraries
  • Proj provides coordinate system transformations
  • GDAL provides raster algorithms and format supports
  • GEOS provides computational geometry algorithms
    • geometry relationships, like “intersects”, “touches” and “relate”
    • geometry operations, like “intersection”, “union”
    • basic algorithms, like “triangulate”

The algorithms in GEOS are actually a port to C++ of algoriths in the JTS Java library. The ecosystem of projects that depend on GEOS or JTS or one of the other language ports of GEOS is very large.

GEOS/JTS Ecosystem

Overlay NG

Over the past 12 months, the geospatial team at Crunchy Data has invested heavily in JTS/GEOS development, overhauling the overlay engine that backs the Intersection, Union, Difference and SymDifference functions in all the projects that depend on the library.


The new overlay engine, “Overlay NG”, promises to be more reliable, and hopefully also faster for most common cases.

One use of overlay code is chopping large objects up, to find the places they have in common. This query summarizes climate zones (bec) by watershed (wsa).

    Sum(ST_Area(ST_Intersection(b.geom, w.geom))) AS area_zone, 
FROM bec b
JOIN wsa w
ON ST_Intersects(b.geom, w.geom)
WHERE w.nwwtrshdcd like '128-835500-%'


The new implementation for this query runs about 2 times faster than the original. Even better, when run on a larger area with more data, the original implementation fails – it’s not possible to get a result out. The new implementation runs to completion.

Another common use of overlay code is melting together areas that share an attribute. This query takes (almost) every watershed on Vancouver Island and melts them together.

SELECT ST_Union(geom)
FROM wsa
WHERE nwwtrshdcd like '920-%'
   OR nwwtrshdcd like '930-%'

At the start, there are 1384 watershed polygons.

Vancouver Island watersheds

At the end there is just one.

Vancouver Island

The new implementation takes about 50% longer currently, but it is more robust and less likely to fail than the original.

Fixed Precision Overlay

The way Overlay NG ensures robust results, is by falling back to more and more reliable noding approaches. “Noding” refers to how new vertices are introduced into geometries during the overlay process.

  • Initially a naive “floating point” noding is used, that just uses double precision coordinates. This works most of the time, but occasionally fails when noding “almost parallel” edges.
  • On failure, a “snapping” noding is used, which nudges nearby edges and nodes together within a tolerance. That works most of the time, but occasionally fails.
  • Finally, a “fixed precision” noding nudges all of the coordinates in both geometries into a fixed space, where edge collapses can be handled deterministically. This is the lowest performance approach, but it very very rarely occurs.

Sometimes, end users actually prefer to have their geometry forced into a fixed precision grid, and for overlay to use a fixed precision. For those users, with PostGIS 3.1 and GEOS 3.9 there are some new parameters in the intersection/union/difference functions.

Precision reduction

The new “gridSize” parameter determines the size of the grid to snap to when generating new outputs. This can be used both to generate new geometries, and also to precision reduce existing geometries, just by unioning a geometry with an empty geometry.

Inscribed Circle

As always, there are a few random algorithmic treats in each new GEOS release. For 3.9, there is the “inscribed circle”, which finds the largest circle that can be fit inside a polygon (or any other boundary).

Vancouver Island inscribed circle

In addition to making a nice picture, the inscribed circle functions as a measure of the “wideness” of a polygon, so it can be used for things like analyzing river polygons to determine the widest point, or placing labels.

December 16, 2020 08:00 AM

Paul Ramsey

Waiting for PostGIS 3.1: Grid Generators

This post originally appeared on the Crunchy Data blog.

Summarizing data against a fixed grid is a common way of preparing data for analysis. Fixed grids have some advantages over natural and administrative boundaries:

  • No appeal to higher authorities
  • Equal unit areas
  • Equal distances between cells
  • Good for passing data from the “spatial” computational realm to a “non-spatial” realm

Ideally, we want to be able to generate grids that have some key features:

  • Fixed origin point, so that grid can be re-generated and not move
  • Fixed cell coordinates for a given cell size, so that the same cell can be referred to just using a cell address, without having to materialize the cell bounds


The ST_SquareGrid(size, bounds) function generates a grid with an origin at (0, 0) in the coordinate plane, and fills in the square bounds of the provided geometry.

SELECT (ST_SquareGrid(400000, ST_Transform(a.geom, 3857))).* 
FROM admin a  
WHERE name = 'Brazil';

So a grid generated using Brazil as the driving geometry looks like this.

Brazil square grid


The ST_HexagonGrid(size, bounds) function works much the same as the square grid function.

Hexagons are popular for some cartographic display purposes and modeling purposes. Surprisingly they can also be indexed using the same two-dimensional indexing scheme as squares.

The hexagon grid starts with a (0, 0) hexagon centered at the origin, and the gridding for a bounds includes all hexagons that touch the bounds.

Hexagon gridding

As with the square gridding, the coordinates of hexes are fixed for a particular gridding size.

SELECT (ST_HexagonGrid(100000, ST_Transform(a.geom, 3857))).* 
FROM admin a  
WHERE name = 'Germany';

Here’s a 100km hexagon gridding of Germany.

Germany hex grid

Summarizing with Grids

It’s possible to materialize grid-based summaries, without actually materializing the grids, using the generator functions to create the desired grids on-the-fly.

Here’s a summary of population points, using a hex grid.

SELECT sum(pop_max) as pop_max, hexes.geom
        ST_SetSRID(ST_EstimatedExtent('places', 'geom'), 4326)
    ) AS hexes
    places AS p
    ON ST_Intersects(p.geom, hexes.geom)
GROUP BY hexes.geom;

World population summary

It’s also possible to join up on-the-fly gridding to visualization tools, for very dynamic user experiences, feeding these dynamically generated grids out to the end user via pg_tileserv.

December 16, 2020 08:00 AM

Paul Ramsey

Waiting for PostGIS 3.1: Performance

This post originally appeared on the Crunchy Data blog.

Open source developers sometimes have a hard time figuring out what feature to focus on, in order to generate the maximum value for end users. As a result, they will often default to performance.

Performance is the one feature that every user approves of. The software will keep on doing all the same cool stuff, only faster.

For PostGIS 3.1, there have been a number of performance improvements that, taken together, might add up to a substantial performance gain for your workloads.

Large Geometry Caching

Spatial joins have been slowed down by the overhead of large geometry access for a very long time.

ON ST_Intersects(A.geom, B.geom)

PostgreSQL will plan and execute spatial joins like this using a “nested loop join”, which means iterating through one side of the join, and testing the join condition. This results in executions that look like:

  • ST_Intersects(A.geom(1), B.geom(1))
  • ST_Intersects(A.geom(1), B.geom(2))
  • ST_Intersects(A.geom(1), B.geom(3))

So one side of the test repeats over and over.

Geometry Caches

Caching that side and avoiding re-reading the large object for each iteration of the loop makes a huge difference to performance. We have seen 20 times speed-ups in common spatial join workloads (see below).

The fixes are quite technical, but if you are interested we have a detailed write-up available.

Header-only Geometry Reads

The on-disk format for geometry includes a short header that includes information about the geometry bounds, the spatial reference system and dimensionality. That means it’s possible for some functions to return an answer after only reading a few bytes of the header, rather than the whole object.

However, not every function that could do a fast read, did do a fast read. That is now fixed.

Faster Text Generation

It’s pretty common for web applications and others to generate text formats inside the database, and the code for doing so was not optimized. Generating “well-known text” (WKT), GeoJSON, and KML output all now use a faster path and avoid unnecessary copying.

PostGIS also now uses the same number-to-text code as PostgreSQL, which has been shown to be faster, and also allows us to expose a little more control over precision to end users.

How Much Faster?

For the specific use case of spatially joining, here is my favourite test case:

Admin0 and Populated Places

Load the data into both versions.

shp2pgsql -D -s 4326 -I ne_10m_admin_0_countries admin | psql postgis30
shp2pgsql -D -s 4326 -I ne_10m_populated_places places | psql postgis30

Run a spatial join that finds the sum of populated places in each country.

SELECT Sum(p.pop_max) as pop_max, a.name
FROM admin a
JOIN places p
ON ST_Intersects(a.geom, p.geom)
GROUP BY a.name

Average time over 5 runs:

  • PostGIS 3.0 = 23.4s
  • PostGIS 3.1 = 0.9s

This test is somewhat of a “worst case”, in that there are lots of very large countries in the admin data, but it gives an idea of the kind of speed-ups that are available for spatial joins against collections that include larger (250+ coordinates) geometries.

December 16, 2020 08:00 AM

December 14, 2020

PostGIS Development

PostGIS 3.1.0rc1

The PostGIS Team is pleased to release the release candidate of the upcoming PostGIS 3.1.0 release. This version is exposes some of the new performance and feature enhancements in GEOS 3.9 as well as numerous speed enhancements not requiring newer GEOS.

Best served with:

Continue Reading by clicking title hyperlink ..

by Paul Ramsey at December 14, 2020 12:00 AM

December 13, 2020

Raúl Marín Rodríguez

Waiting for Postgis 3.1: Performance with large geometries

Today I want to talk about partial decompression in PostGIS and how it has improved the speed of some operations with large geometries, that is the geometries made of hundreds of points or vertices. I don’t think we’ve intentionally focused on performance for the upcoming PostGIS 3.1 but unconsciously I...

December 13, 2020 11:00 PM

Raúl Marín Rodríguez

Using WebAssembly to add Argon2 to Snowflake

Snowflake seems like one of the hottest databases of the moment. Sorry, data warehouse. Tomayto, tomahto. Anyway, I wanted to give it a try and see how hard it is to add new SQL functions to it. Although Snowflake supports Javascript User Defined Functions I don’t like using JS to...

December 13, 2020 11:00 PM

December 09, 2020

PostGIS Development

PostGIS 3.1.0beta1

The PostGIS Team is pleased to release the first beta of the upcoming PostGIS 3.1.0 release. This version is exposes some of the new performance and feature enhancements in not yet relesed GEOS 3.9 as well as numerous speed enhancements not requiring newer GEOS. Requires GEOS 3.6+ and PostgreSQL 9.6+. To use MVT you will need protobuf-c 1.1. or higher.

Best served with:

Continue Reading by clicking title hyperlink ..

by Paul Ramsey at December 09, 2020 12:00 AM

December 05, 2020

Raúl Marín Rodríguez

Waiting for Postgis 3.1: Output functions

On the improvements to PostGIS output functions - To continue with the changes in PostGIS 3.1, in this post I’ll cover the performance improvements on many functions that output geometries either as binary or as text. I will talk about several changes which have in common that they kicked off by a single question: “Now what?” After the...

December 05, 2020 11:00 PM

November 20, 2020

PostGIS Development

PostGIS 3.0.3

The PostGIS Team is pleased to release PostGIS 3.0.3.

Best served with PostgreSQL 13.1, and GEOS 3.8.1

pgRouting 3.1.1

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 20, 2020 12:00 AM

November 19, 2020

Paul Ramsey

Waiting for Postgis 3.1: Vector tile improvements

This is a guest post from Raúl Marín, a core PostGIS contributor and a former colleague of mine at Carto. Raúl is an amazing systems engineer and has been cruising through the PostGIS code base making things faster and more efficient. You can find the original of this post at his new personal tech blog. – Paul

I’m not big on creating new things, I would rather work on improving something that’s already in use and has proven its usefulness. So whenever I’m thinking about what I should do next I tend to look for projects or features that are widely used, where the balance between development and runtime costs favors a more in depth approach.

Upon reviewing the changes of the upcoming PostGIS 3.1 release, it shouldn’t come as a surprise then that most of my contributions are focused on performance. When in doubt, just make it faster.

Since CARTO, the company that pays for my lunch, uses PostGIS’ Vector Tile functions as its backend for dynamic vector maps, any improvement there will have a clear impact on the platform. This is why since the appearance of the MVT functions in PostGIS 2.4 they’ve been enhanced in each major release, and 3.1 wasn’t going to be any different.

In this occasion the main reason behind the changes wasn’t the usual me looking for trouble, but the other way around. As ST_AsMVT makes it really easy to extract information from the database and into the browser, a common pitfall is to use SELECT * to extract all available columns which might move a lot of data unnecessarily and generate extremely big tiles. The easy solution to this problem is to only select the properties needed for the visualization but it’s hard to apply it retroactively once the application is in production and already depending on the inefficient design.

So there I was, looking into why the OOM killer was stopping databases, and discovering queries using a massive amount of resources to generate tiles 50-100 times bigger than they should (the recommendation is smaller than 500 KB). And in this case, the bad design of extracting all columns from the dataset was worsened by the fact that is was being applied to a large dataset; this triggered PostgreSQL parallelism requiring extra resources to generate chunks in parallel and later merge them together. In PostGIS 3.1 I introduced several changes to improve the performance of these 2 steps: the parallel processing and the merge of intermediate results.

The changes

Without getting into too much detail, the main benefit comes from changing the vector tile .proto such that a feature can only hold one value at a time. This is what the specification says, but not what the .proto enforces, therefore the internal library was allocating memory that it never used.

There are other additional changes, such as improving how values are merged between parallel workers, so feel free to have a look at the final commit itself if you want more details.

Performance comparison

The best way to see the impact of these changes is through some examples. In both cases I am generating the same tile, in the same exact server and with the same dependencies; the only change was to replace the PostGIS library, which in 3.0 to 3.1 doesn’t require an upgrade.

In the first example the tile contains all the columns of the 287k points in it. As I’ve mentioned before, it is discouraged to do this, but it is the simplest query to generate.

And for the second example, I’m generating the same tile but now only including the minimal columns for the visualization:

We can see, both in 3.0 and 3.1, that adding only the necessary properties makes things 10 times as fast as with the full data, and also that Postgis 3.1 is 30-40% faster in both situations.

Memory usage

Aside from speed, this change also greatly reduces the amount of memory used to generate a tile.

To see it in action, we monitor the PostgreSQL process while it’s generating the tile with all the properties. In 3.0, we observe in the blue line that the memory usage increases with time until it reaches around 2.7 GB at the end of the transaction.

We now monitor the same request on a server using Postgis 3.1. In this case the server uses around a third of the memory as in 3.0 (1GB vs 2.7GB) and, instead of having a linear increase, the memory is returned back to the system as soon as possible.

To sum it all up: PostGIS 3.1 is faster and uses less memory when generating large vector tiles.

November 19, 2020 08:00 PM

PostGIS Development

PostGIS 3.1.0alpha3

The PostGIS Team is pleased to release the third alpha of upcoming PostGIS 3.1.0 release. This version is exposes some of the new performance and feature enhancements in not yet relesed GEOS 3.9 as well as numerous speed enhancements not requiring newer GEOS. Requires GEOS 3.6+ and PostgreSQL 9.6+. To use MVT you will need protobuf-c 1.1. or higher.

Best served with

PostgreSQL 13.1, GEOS 3.7 or higher is recommended.

pgRouting 3.1.1

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 19, 2020 12:00 AM