Planet PostGIS

July 21, 2026

PostGIS Development

PostGIS 3.7.0beta1

The PostGIS Team is pleased to release PostGIS 3.7.0beta1! Best Served with PostgreSQL 19 Beta2 and GEOS 3.15.0beta2.

This version requires PostgreSQL 14 - 19beta2, GEOS 3.10 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.15+ is needed. To take advantage of all SFCGAL features SFCGAL 2.3.0+ is needed.

This release contains fixes and enhancements since 3.7.0alpha1 release.

3.7.0beta1

source download md5
NEWS
HTML Online en ja zh_Hans fr
PDF docs: en ja, zh_Hans, fr
Cheat Sheets:
- postgis: en ja zh_Hans fr
- postgis_raster: en ja zh_Hans fr
- postgis_topology: en ja zh_Hans fr
- postgis_sfcgal: en ja zh_Hans fr

This release is an alpha of a major release, it includes bug fixes since PostGIS 3.6.4 and new features.

by Regina Obe at July 21, 2026 12:00 AM

July 05, 2026

PostGIS Development

PostGIS 3.7.0alpha1

The PostGIS Team is pleased to release PostGIS 3.7.0alpha1! Best Served with PostgreSQL 19 Beta1 and GEOS 3.15 which will be released soon.

This version requires PostgreSQL 14 - 19beta1, GEOS 3.10 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.15+ is needed. To take advantage of all SFCGAL features SFCGAL 2.3.0+ is needed.

3.7.0alpha1

source download md5
NEWS
HTML Online en ja zh_Hans fr
PDF docs: en ja, zh_Hans, fr
Cheat Sheets:
- postgis: en ja zh_Hans fr
- postgis_raster: en ja zh_Hans fr
- postgis_topology: en ja zh_Hans fr
- postgis_sfcgal: en ja zh_Hans fr

This release is an alpha of a major release, it includes bug fixes since PostGIS 3.6.4 and new features.

by Regina Obe at July 05, 2026 12:00 AM

June 28, 2026

Auchindown

Armchair Transit with PostGIS: The Census & The Bestagons

Step one in the quest for good transit in Kingston: hexagons, census data, and a whole lot of ST_Intersection.

by Rhys at June 28, 2026 05:10 AM

June 22, 2026

PostGIS Development

PostGIS Tiger Geocoder 2025.1

The PostGIS development team is pleased to provide postgis_tiger_geocoder extension. This is the very first release since the break from the PostGIS core. This version requires PostgreSQL 16 and above and should work with any supported PostGIS version.

PostGIS 3.6 series is the last series to include postgis_tiger_geocoder. PostGIS 3.7 will be shipped without postgis_tiger_geocoder.

Moving forward postgis_tiger_geocoder has its own dedicated repo at OSGeo Gitea postgis_tiger_geocoder under the PostGIS org.

The versioning model has also changed to be versioned based on the year of the Census US Tiger dataset that is current at time of it’s release.

by Regina Obe at June 22, 2026 12:00 AM

April 24, 2026

Auchindown

Finding the centre of Jamaica.

Do family meetups always devolve into SQL?

by Rhys at April 24, 2026 05:10 AM

April 14, 2026

PostGIS Development

PostGIS Patch Releases

The PostGIS development team is pleased to provide bug fix and security releases for PostGIS 3.2 - 3.6.

by Paul Ramsey at April 14, 2026 12:00 AM

March 26, 2026

Auchindown

From Triggers to Training: Automating Network Design in Three Levels

Intelligent Automation to Artificial Intelligence in three levels.

by Rhys at March 26, 2026 05:10 AM

February 09, 2026

PostGIS Development

PostGIS Patch Releases

The PostGIS development team is pleased to provide bug fix releases for PostGIS 3.0 - 3.6. These are the End-Of-Life (EOL) releases for PostGIS 3.0.12 and 3.1.13. If you haven’t already upgraded from 3.0 or 3.1 series, you should do so soon.

by Regina Obe at February 09, 2026 12:00 AM

December 28, 2025

Boston GIS (Regina Obe, Leo Hsu)

FOSS4GNA 2025: Summary

Free and Open Source for Geospatial North America (FOSS4GNA) 2025 was running November 3-5th 2025 and I think it was one of the better FOSS4GNAs we've had. I was on the programming and workshop committees and we were worried with the government shutdown that things could go badly since we started getting people withdrawing their talks and workshops very close to curtain time. Despite our attendance being lower than prior years, it felt crowded enough and on the bright side, people weren't fighting for chairs to sit even in the most crowded talks. The FOSS4G 2025 International happened 2 weeks after, in Auckland, New Zealand, and that I heard had a fairly decent turn-out too.

Continue reading "FOSS4GNA 2025: Summary"

by Regina Obe (nospam@example.com) at December 28, 2025 11:37 PM

December 09, 2025

Crunchy Data (Snowflake)

PostGIS Performance: Simplification

There’s nothing simple about simplification! It is very common to want to slim down the size of geometries, and there are lots of different approaches to the problem.

We will explore different methods starting with ST_Letters for this rendering of the letter “a”.

alt

SELECT ST_Letters('a');

This is a good starting point, but to show the different effects of different algorithms on things like redundant linear points, we need a shape with more vertices along the straights, and fewer along the curves.

alt

SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1);

Here we add in vertices every one meter with ST_Segmentize and ST_RemoveRepeatedPoints to thin out the points along the curves. Already we are simplifying!

Lets apply the same “remove repeated” algorithm, with a 10 meter tolerance.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_RemoveRepeatedPoints(a, 10) FROM a;

We do have a lot fewer points, and the constant angle curves are well preserved, but some straight lines are no longer legible as such, and there are redundant vertices in the vertical straight lines.

The ST_Simplify function applies the Douglas-Peuker line simplification algorithm to the rings of the polygon. Because it is a line simplifier it does a cruder job preserving some aspects of the polygon area like squareness of the top ligature.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_Simplify(a, 1) FROM a;

The ST_SimplifyVW function applies the Visvalingam–Whyatt algorithm to the rings of the polygon. Visvalingam–Whyatt is better for preserving the shapes of polygons than Douglas-Peuker, but the differences are subtle.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_SimplifyVW(a, 5) FROM a;

Coercing a shape onto a fixed precision grid is another form of simplification, sometimes used to force the edges of adjacent objects to line up exactly. The original such function, ST_SnapToGrid, does exactly what it says on the name. Every vertex is rounded to a fixed grid point.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_SnapToGrid(a, 5) FROM a;

However, as you can see at the top left, the grid snapper frequently generates invalidity in polygons, such as the self-intersecting ring in this example.

A more modern alternative is precision reduction.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_ReducePrecision(a, 5) FROM a;

The ST_ReducePrecision function not only snaps geometries to a fixed precision grid, it also ensures that outputs are always valid.

Because grid snapping tends to introduce a lot of vertices along straight edges, combining it with a line simplifier makes a lot of sense.

alt

WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_Simplify(ST_ReducePrecision(a, 5),1) FROM a;

Simplifying single geometries is all well and good, but what about simplifying groups of geometries? Specifically ones that share boundaries?

Fortunately, since PostGIS 3.6 there is now a complete set of functions for that problem.

Starting with a pair of polygons with a non-matched shared boundary.

alt

Non-clean boundaries can be cleaned up with the ST_CoverageClean function.

alt

SELECT ST_CoverageClean OVER() AS geom FROM polys;

And once the coverage is clean, the shapes including their shared borders can be simplified with ST_CoverageSimplify.

alt

WITH clean AS (
  SELECT ST_CoverageClean OVER() AS geom FROM polys
)
SELECT ST_CoverageSimplify(geom, 10) OVER() FROM clean

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at December 09, 2025 01:00 PM

November 21, 2025

Crunchy Data (Snowflake)

PostGIS Performance: Data Sampling

One of the temptations database users face, when presented with a huge table of interesting data, is to run queries that interrogate every record. Got a billion measurements? What’s the average of that?!

One way to find out is to just calculate the average.

SELECT avg(value) FROM mytable;

For a billion records, that could take a while!

Fortunately, the “Law of Large Numbers” is here to bail us out, stating that the average of a sample approaches the average of the population, as the sample size grows. And amazingly, the sample does not even have to be particularly large to be quite close.

Here’s a table of 10M values, randomly generated from a normal distribution. We know the average is zero. What will a sample of 10K values tell us it is?

CREATE TABLE normal AS
  SELECT random_normal(0,1) AS values
    FROM generate_series(1,10000000);

We can take a sample using a sort, or using the random() function, but both of those techniques first scan the whole table, which is exactly what we want to avoid.

Instead, we can use the PostgreSQL TABLESAMPLE feature, to get a quick sample of the pages in the table and an estimate of the average.

SELECT avg(values)
  FROM normal TABLESAMPLE SYSTEM (1);

I get an answer – 0.0031, very close to the population average – and it takes just 43 milliseconds.

Can this work with spatial? For the right data, it can. Imagine you had a table that had one point in it for every person in Canada (36 million of them) and you wanted to find out how many people lived in Toronto (or this red circle around Toronto).

alt

SELECT count(*)
  FROM census_people
  JOIN yyz
    ON ST_Intersects(yyz.geom, census_people.geom);

The answer is 5,010,266, and it takes 7.2 seconds to return. What if we take a 10% sample?

SELECT count(*)
  FROM census_people TABLESAMPLE SYSTEM (10)
  JOIN yyz
    ON ST_Intersects(yyz.geom, census_people.geom);

The sample is 10%, and the answer comes back as 508,292 (near one tenth of our actual measurement) in 2.2 seconds. What about a 1% sample?

SELECT count(*)
  FROM census_people TABLESAMPLE SYSTEM (1)
  JOIN yyz
    ON ST_Intersects(yyz.geom, census_people.geom);

The sample is 1%, and the answer comes back as 50,379 (near one hundredth of our actual measurement) in 0.2 seconds. Still a good estimate!

Is this black magic? No, the TABLESAMPLE SYSTEM mode gets its speed by reading pages randomly. In our last example, it randomly chose 1% of the pages. Here’s what that looks like in Toronto.

alt

See in particular how blotchy the data are in the suburban areas outside the circle. The data in the table are not randomly distributed to the pages, they came from the census data in order, and ended up loaded into the database in order. So for any given database page, the actual rows in the page will tend to be near to one another.

This works for this example because the amount of data is high, and the area we are summarizing is a large proportion of the total data – a seventh of the Canadian population lives in that circle.

If we were summarizing a smaller area, the results would not have been so good.

The TABLESAMPLE SYSTEM is a powerful tool, but you have to be sure that any given page has a random selection of the data you are sampling for. Our random normal example worked perfectly, because the data were perfectly random. A sample of time series data would not work well for sample time windows (the data were probably stored in order of arrival) but might work for sampling some other value.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at November 21, 2025 01:00 PM

November 14, 2025

Crunchy Data (Snowflake)

PostGIS Performance: Intersection Predicates and Overlays

In this series, we talk about the many different ways you can speed up PostGIS. A common geospatial operation is to clip out a collection of smaller shapes that are contained within a larger shape. Today let's review the most efficient ways to query for things inside something else.

alt

Frequently the smaller shapes are clipped where they cross the boundary, using the ST_Intersection function.

alt

The naive SQL is a simple spatial join on ST_Intersects.

SELECT ST_Intersection(polygon.geom, p.geom) AS geom
  FROM parcels p
  JOIN polygon
    ON ST_Intersects(polygon.geom, p.geom);

When run on the small test area shown in the pictures, the query takes about 14ms. That’s fast, but the problem is small, and larger operations will be slower.

There is a simple way to speed up the query that takes advantage of the fact that boolean spatial predicates are faster than spatial overlay operations.

What?

“Boolean spatial predicates” are functions like ST_Intersects and ST_Contains. They take in two geometries and return “true” or “false” for whether the geometries pass the named test.
“Spatial overlay operations” are functions like ST_Intersection or ST_Difference that take in two geometries, and generate a new geometry based on the named rule.

Predicates are faster because their tests often allow for logical short circuits (once you find any two edges that intersect, you know the geometries intersect) and because they can make use of the prepared geometry optimizations to cache and index edges between function calls.

The speed-up for spatial overlay simply observes that, for most overlays there is a large set of features that can be added to the result set unchanged – the features that are fully contained in the clipping shape. We can identify them using ST_Contains.

alt

Similarly, there is a smaller set of features that cross the border, and thus do need to be clipped. These are features that ST_Intersects but are not ST_Contains.

alt

The higher performance function uses the faster predicates to filter the smaller shapes into two streams, one for intersection, and one for unchanged inclusion.

SELECT
  CASE
    WHEN ST_Contains(polygon.geom, p.geom) THEN p.geom
    ELSE ST_Intersection(polygon.geom, p.geom)
    END AS geom
  FROM parcels p
  JOIN polygon
    ON ST_Intersects(polygon.geom, p.geom);

Two predicates are used here, the ST_Intersects in the join clause ensures that only parcels that might participate in the overlay are fed into the CASE statement, where the ST_Contains predicate no-ops the parcels that do not cross the boundary.

When run against our tiny example, the query executes in just 9ms. Amazing that the difference is large enough to measure on such a small example.

Using `CASE` statement to combine predicates and overlays

The core idea here is to recognize that boolean spatial predicates like ST_Contains and ST_Intersects are computationally much faster than spatial overlay operations like ST_Intersection. The standard, but slow, approach clips all intersecting features. The optimized method uses a CASE statement and ST_Contains check to create a shortcut: if a smaller geometry is entirely contained within the larger clipping polygon, we return the geometry unchanged (a quick no-op) and completely bypass the slower ST_Intersection calculation.

You can apply this optimization pattern to any PostGIS work involving clipping, spatial joins, or overlays where you suspect a significant number of features might be fully contained within a boundary. By filtering and partitioning your geometries into "fully contained" (fast path) and "crossing the border" (slow path) streams, you ensure the expensive overlay operations are only executed when they are strictly necessary to clip the edges.

Need more PostGIS?
Join us this year on November 20 for PostGIS Day 2025, a free, virtual, community event about open source geospatial!

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at November 14, 2025 01:00 PM

November 12, 2025

PostGIS Development

PostGIS 3.6.1

The PostGIS Team is pleased to publish PostGIS 3.6.1. This is a bug fix release that includes bug fixes since PostGIS 3.6.0.

This version requires PostgreSQL 12 - 18, Proj 6.1+, and GEOS 3.8+. To take advantage of all features, GEOS 3.12+ is needed.
SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 2.2+ is needed.

3.6.1

source download md5
NEWS
PDF docs: en
HTML Online en ja fr zh_Hans
Cheat Sheets:
- postgis: en ja fr zh_Hans
- postgis_raster: en ja fr zh_Hans
- postgis_topology: en ja fr zh_Hans
- postgis_sfcgal: en ja fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja fr zh_Hans

by Paul Ramsey at November 12, 2025 12:00 AM

November 06, 2025

Crunchy Data (Snowflake)

PostGIS Performance: Improve Bounding Boxes with Decompose and Subdivide

In the third installment of the PostGIS Performance series, I wanted to talk about performance around bounding boxes.

Geometry data is different from most column types you find in a relational database. The objects in a geometry column can be wildly different in the amount of the data domain they cover, and the amount of physical size they take up on disk.

The data in the “admin0” Natural Earth data range from the 1.2 hectare Vatican City, to the 1.6 billion hectare Russia, and from the 4 point polygon defining Serranilla Bank to the 68 thousand points of polygons defining Canada.

SELECT ST_NPoints(geom) AS npoints, name
FROM admin0
ORDER BY 1 DESC LIMIT 5;

SELECT ST_Area(geom::geography) AS area, name
FROM admin0
ORDER BY 1 DESC LIMIT 5;

alt

As you can imagine, polygons this different will have different performance characteristics:

Physically large objects will take longer to work with. To pull off the disk, to scan, to calculate with.
Geographically large objects will cover more other objects, and reduce the effectiveness of your indexes.

Your spatial indexes are “r-tree” indexes, where each object is represented by a bounding box.

alt

The bounding boxes can overlap, and it is possible for some boxes to cover a lot of the dataset.

For example, here is the bounding box of France.

alt

What?! How is that France? Well, France is more than just the European parts, it also includes the island of Reunion, in the southern Indian Ocean, and the island of Guadaloupe, in the Caribbean. Taken together they result in this very large bounding box.

Such a large box makes a poor addition to the spatial index of all the objects in “admin0”. I could be searching in with a query key in the middle of the Atlantic, and the index would still be telling me “maybe it is in France?”.

For this testing, I have made a synthetic dataset of one million random points covering the whole world.

CREATE TABLE random_normal AS
  SELECT id,
    ST_Point(
      random_normal(0, 180),
      random_normal(0, 80),
      4326) AS geom
  FROM generate_series(0, 1000000) AS id;


CREATE INDEX random_normal_geom_x ON random_normal USING GIST (geom);

alt

The un-altered bounds of “admin0”, the bounds that will be used to run the spatial join, look like this. Lots of overlap, lots of places where they bounds cover areas the polygons do not.

alt

The baseline time to do a spatial join using the un-altered “admin0” data is 9 seconds.

SELECT Count(*), admin0.name
  FROM admin0 JOIN random_normal
    ON ST_Intersects(random_normal.geom, admin0.geom)
  GROUP BY admin0.name;

What if, instead of joining against the raw “admin0” – which includes weird cases like France and a Canada with hundreds of islands – we first decompose every object into the singular polygons that make it up, using ST_Dump.

alt

The decomposed objects cover far less ocean, and much more accurately represent the polygons they are proxying for. And the time – including the cost of decomposing the objects – to do a full join on the 1M points falls to 3.8 seconds.

WITH polys AS  (
  SELECT (ST_Dump(geom)).geom AS geom, name
  FROM admin0
)
SELECT Count(*), polys.name
FROM polys JOIN random_normal
ON ST_Intersects(random_normal.geom, polys.geom)
GROUP BY polys.name;

There is still a lot of ocean being queried here, and also some of the polygons are not just very spatially large, but include a lot of vertices. What if we make the polygons smaller yet by chopping them up ST_Subdivide?

alt

These bounds are almost perfect, they cover very little of the ocean, and they also have reduced the maximum memory size of any polygon to no more than 256 vertices. And the performance, even including the very expensive subdivision step, gets faster yet.

WITH polys AS (
  SELECT ST_Subdivide(geom,128) AS geom, name FROM admin0
)
SELECT Count(*), polys.name
FROM polys JOIN random_normal
ON ST_Intersects(random_normal.geom, polys.geom)
GROUP BY polys.name;

The final query takes just 1.8 seconds, twice as fast as the simple boxes, and 4 times faster than a naive spatial join. For smaller collections of points, the naive approach can work as fast as the subdivision, but for this 1M point test set the overhead of doing the subdivision is still far less than the gains from using the more effective bounds.

Investing computation into creating better, smaller, and simpler geometries pays off significantly for large datasets by making the spatial index much more effective.

Need more PostGIS?
Join us this year on November 20 for PostGIS Day 2025, a free, virtual, community event about open source geospatial!

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at November 06, 2025 01:00 PM

October 27, 2025

Auchindown

Trigger Happy: Live edits in QGIS

QGIS and PostgreSQL working well together

by Rhys at October 27, 2025 10:00 AM

October 20, 2025

Crunchy Data (Snowflake)

PostGIS Performance: pg_stat_statements and Postgres tuning

In this series, we talk about the many different ways you can speed up PostGIS. Today let’s talk about looking across the queries with pg_stat_statements and some basic tuning.

Showing Postgres query times with pg_stat_statements

A reasonable question to ask, if you are managing a system with variable performance is: “what queries on my system are running slowly?”

Fortunately, PostgreSQL includes an extension called “pg_stat_statements” that tracks query performance over time and maintains a list of high cost queries.

CREATE EXTENSION pg_stat_statements;

Now you will have to leave your database running for a while, so the extension can gather up data about the kind of queries that are run on your database.

Once it has been running for a while, you have a whole table – pg_stat_statements – that collects your query statistics. You can query it directly with SELECT * or you can write individual queries to find the slowest queries, the longest running ones, and so on.

Here is an example of the longest running 10 queries ranked by duration.

SELECT
  total_exec_time,
  mean_exec_time,
  calls,
  rows,
  query
FROM pg_stat_statements
WHERE calls > 0
ORDER BY mean_exec_time DESC
LIMIT 10;

While “pg_stat_statements” is good at finding individual queries to tune, and the most frequent cause of slow queries is just inefficient SQL or a need for indexing - see the first post in the series.

Occasionally performance issues do crop up at the system level. The most frequent culprit is memory pressure. PostgreSQL ships with conservative default settings for memory usage, and some workloads benefit from more memory.

Shared buffers

A database server looks like an infinite, accessible, reliable bucket of data. In fact, the server orchestrates data between the disk – which is permanent and slow – and the random access memory – which is volatile and fast – in order to provide the illusion of such a system.

alt

When the balance between slow storage and fast memory is out of whack, system performance falls. When attempts to read data are not present in the fast memory (a “cache hit”), they continue on to the slow disk (a “cache miss”).

You can check the balance of your system by looking at the “cache hit ratio”.

SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit)  as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) +  sum(heap_blks_read)) as ratio
FROM
  pg_statio_user_tables;

A result in the 99% is a good sign. Below 90% means that your database could be memory constrained, so increasing the “shared_buffers” parameter may help. As a general rule, “shared buffers” should be about 25% of physical RAM.

Working memory

Working memory is controlled by the “work_mem” parameter, and it controls how much memory is available for in-memory sorting, index building, and other short term processes. Unlike the “shared buffers”, which are permanent and fully allocated on startup, the “working memory” is allocated on an as-needed basis.

However, the working memory limit is applied for each database connection, so it is possible for the total working memory to radically exceed the “work_mem” value. If 1000 connections each allocate 100MB, your server will probably run out of memory.

You can speed up known memory-hungry processes, like building spatial indexes, by temporarily increasing the working memory available to your particular connection, then reduce it when the process is complete.

SET work_mem = '2GB';
CREATE INDEX roads_geom_x ON roads USING GIST (geom);
SET work_mem = '100MB';

The same principle holds for maintenance tasks, like the “VACUUM” command. You can speed up the maintenance of a large table by increasing the “maintenance_work_mem” temporarily.

SET maintenance_work_mem = '2GB';
VACUUM roads;
SET maintenance_work_mem = '128MB';

Parallelism

It is common for modern database servers to have multiple CPU cores available, but your PostgreSQL configuration may not be tuned to use them all. Postgres does have parallel query support. PostgreSQL is conservative about making use of multiple cores, because executing and coordinating multi-process queries has overheads, but in general large aggregations or scans can frequently make effective use of two to four cores at once.

Check what limits are set on your database.

SHOW max_worker_processes;

SHOW max_parallel_workers;

Setting the maximums to the number of cores on your server is good practice. In particular, don’t be afraid to reduce the number of workers if you have fewer cores – there is no benefit to be had in workers contending for cores.

Tuning Postgres basics

To wrap up:

Check the slowest queries with pg_stat_statements.
Use EXPLAIN and Indexing to experiment with improvements
Check inefficient memory by looking at:
- shared buffers
- working memory (work_mem)
- parallelism

After you do some tuning, don’t forget to reset pg_stat_statements and check again to see if/how things have improved!

Need more PostGIS?
Join us this year on November 20 for PostGIS Day 2025, a free, virtual, community event about open source geospatial!

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at October 20, 2025 01:00 PM

October 16, 2025

PostGIS Development

PostGIS 3.5.4

The PostGIS Team is pleased to release PostGIS 3.5.4.

This version requires PostgreSQL 12 - 18beta1, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5+ is needed.

3.5.4

This release is a bug fix release that includes bug fixes since PostGIS 3.5.3.

by Paul Ramsey at October 16, 2025 12:00 AM

October 10, 2025

Crunchy Data (Snowflake)

PostGIS Performance: Indexing and EXPLAIN

I am kicking off a short blog series on PostGIS performance fundamentals. For this first example, we will cover fundamental indexing.

We will explore performance using the Natural Earth “admin0” (countries) data (258 polygons) and their “populated places” (7342 points).

alt

A classic spatial query is the “spatial join”, finding the relationships between objects using a spatial contain.

“How many populated places are there within each country?”

SELECT Count(*), a.name
FROM admin0 a
JOIN popplaces p
  ON ST_Intersects(a.geom, p.geom)
GROUP BY a.name ORDER BY 1 DESC;

This returns an answer, but it takes 2200 milliseconds! For two such small tables, that seems like a long time. Why?

The first stop in any performance evaluation should be the “EXPLAIN” command, which returns a detailed explanation of how the query is executed by the database.

EXPLAIN SELECT Count(*), a.name
FROM admin0 a
JOIN popplaces p
  ON ST_Intersects(a.geom, p.geom)
GROUP BY a.name;

Explain output looks complicated, but a good practice is to start from the middle (the most deeply nested) and work your way out.

                              QUERY PLAN
-------------------------------------------------------------------------
 GroupAggregate  (cost=23702129.78..23702145.38 rows=258 width=18)
   Group Key: a.name
   ->  Sort  (cost=23702129.78..23702134.12 rows=1737 width=10)
         Sort Key: a.name
         ->  Nested Loop  (cost=0.00..23702036.30 rows=1737 width=10)
               Join Filter: st_intersects(a.geom, p.geom)
               ->  Seq Scan on admin0 a  (cost=0.00..98.58 rows=258 width=34320)
               ->  Materialize  (cost=0.00..328.13 rows=7342 width=32)
                     ->  Seq Scan on popplaces p  (cost=0.00..291.42 rows=7342 width=32)

The query plan includes a minimum and maximum potential cost for each step in the plan. Steps with large differences are potential bottlenecks. Our bottleneck is in the “nested loop” join, which is performing the spatial join.

For each geometry in the admin0 table:
- Check every geometry in the popplaces table
  - If it passes the join filter, keep it in the join

This pattern of checking every potential intersection is a lot of work, even for our small tables. For 258 countries and 7342 places, it runs 1.8 million intersection tests!

Just as for a non-spatial join, the key to making this query efficient is adding an index. In this case, an index on the populated places geometry.

CREATE INDEX popplaces_geom_x ON popplaces USING GIST (geom);

The PostGIS spatial index is implemented as an “r-tree” using the GIST “access method”. The “r-tree” index algorithm is auto-tuning, so you do not need to fiddle with parameters to get the best index for your data.

Important! Do not forget to specify the GIST access method with the USING GIST keywords in your index creation. If you leave them out, you will build a default PostgreSQL b-tree index instead, and that will provide no speed-up at all for your spatial join.

Running with the index in place, the SQL is exactly the same.

SELECT Count(*), a.name
FROM admin0 a
JOIN popplaces p
  ON ST_Intersects(a.geom, p.geom)
GROUP BY a.name;

But now it takes 200 milliseconds, 10 times faster! Why? The SQL has not changed, but thanks to the index, the query plan has changed.

                                QUERY PLAN
-----------------------------------------------------------------------------
 HashAggregate  (cost=4185.41..4187.99 rows=258 width=18)
   Group Key: a.name
   ->  Nested Loop  (cost=0.15..4176.73 rows=1737 width=10)
         ->  Seq Scan on admin0 a  (cost=0.00..98.58 rows=258 width=34320)
         ->  Index Scan using popplaces_geom_x on popplaces p  (cost=0.15..15.80 rows=1 width=32)
               Index Cond: (geom && a.geom)
               Filter: st_intersects(a.geom, geom)

The join is still a nested loop on the admin0 geometry, but instead of a sequence scan on the populated places, costing as much as 300, the inner loop is an index scan, costing only 15. As much as 20 times cheaper, resulting in our overall 10 times faster query time.

Join us this year on November 20 for PostGIS Day 2025, a free, virtual, community event about open source geospatial!

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at October 10, 2025 02:00 PM

September 02, 2025

PostGIS Development

PostGIS 3.6.0

The PostGIS Team is pleased to release PostGIS 3.6.0! Best Served with PostgreSQL 18 Beta3 and recently released GEOS 3.14.0.

This version requires PostgreSQL 12 - 18beta3, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.14+ is needed. To take advantage of all SFCGAL features, SFCGAL 2.2.0+ is needed.

3.6.0

source download md5
NEWS
HTML Online en ja sv fr zh_Hans
PDF docs: en ja, sv, zh_Hans, fr
Cheat Sheets:
- postgis: en ja sv fr zh_Hans
- postgis_raster: en ja sv fr zh_Hans
- postgis_topology: en ja sv fr zh_Hans
- postgis_sfcgal: en ja sv fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja sv fr zh_Hans

This release includes bug fixes since PostGIS 3.5.3 and new features.

by Regina Obe at September 02, 2025 12:00 AM

August 25, 2025

PostGIS Development

PostGIS 3.6.0rc2

The PostGIS Team is pleased to release PostGIS 3.6.0rc2! Best Served with PostgreSQL 18 Beta3 and recently released GEOS 3.14.0.

3.6.0rc2

source download md5
NEWS
HTML Online en ja sv fr zh_Hans
PDF docs: en ja, sv, zh_Hans, fr
Cheat Sheets:
- postgis: en ja sv fr zh_Hans
- postgis_raster: en ja sv fr zh_Hans
- postgis_topology: en ja sv fr zh_Hans
- postgis_sfcgal: en ja sv fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja sv fr zh_Hans

This release is a beta of a major release, it includes bug fixes since PostGIS 3.5.3 and new features.

by Regina Obe at August 25, 2025 12:00 AM

August 18, 2025

PostGIS Development

PostGIS 3.6.0rc1

The PostGIS Team is pleased to release PostGIS 3.6.0rc1! Best Served with PostgreSQL 18 Beta3 and soon to be released GEOS 3.14.

3.6.0rc1

source download md5
NEWS
HTML Online en ja sv fr zh_Hans
PDF docs: en ja, sv, zh_Hans, fr
Cheat Sheets:
- postgis: en ja sv fr zh_Hans
- postgis_raster: en ja sv fr zh_Hans
- postgis_topology: en ja sv fr zh_Hans
- postgis_sfcgal: en ja sv fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja sv fr zh_Hans

This release is a beta of a major release, it includes bug fixes since PostGIS 3.5.3 and new features.

by Regina Obe at August 18, 2025 12:00 AM

August 01, 2025

Auchindown

The Beauty of Extensibility

You don't always have to wait for new functionality, you can sometimes do it yourself

by Rhys at August 01, 2025 05:00 AM

July 20, 2025

PostGIS Development

PostGIS 3.6.0rc1

The PostGIS Team is pleased to release PostGIS 3.6.0rc1! Best Served with PostgreSQL 18 Beta3 and soon to be released GEOS 3.14.

3.6.0rc1

source download md5
NEWS
HTML Online en ja sv fr zh_Hans
PDF docs: en ja, sv, zh_Hans, fr
Cheat Sheets:
- postgis: en ja sv fr zh_Hans
- postgis_raster: en ja sv fr zh_Hans
- postgis_topology: en ja sv fr zh_Hans
- postgis_sfcgal: en ja sv fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja sv fr zh_Hans

This release is a beta of a major release, it includes bug fixes since PostGIS 3.5.3 and new features.

by Regina Obe at July 20, 2025 12:00 AM

PostGIS Development

PostGIS 3.6.0beta1

The PostGIS Team is pleased to release PostGIS 3.6.0beta1! Best Served with PostgreSQL 18 Beta2 and soon to be released GEOS 3.14.

This version requires PostgreSQL 12 - 18beta2, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.14+ is needed. To take advantage of all SFCGAL features, SFCGAL 2.2.0+ is needed.

3.6.0beta1

source download md5
NEWS
HTML Online en ja sv fr zh_Hans
PDF docs: en ja, sv, zh_Hans, fr
Cheat Sheets:
- postgis: en ja sv fr zh_Hans
- postgis_raster: en ja sv fr zh_Hans
- postgis_topology: en ja sv fr zh_Hans
- postgis_sfcgal: en ja sv fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja sv fr zh_Hans

This release is a beta of a major release, it includes bug fixes since PostGIS 3.5.3 and new features.

by Regina Obe at July 20, 2025 12:00 AM

May 18, 2025

PostGIS Development

PostGIS 3.6.0alpha1

The PostGIS Team is pleased to release PostGIS 3.6.0alpha1! Best Served with PostgreSQL 18 Beta1 and GEOS 3.13.1.

This version requires PostgreSQL 12 - 18beta1, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 2.1.0+ is needed.

3.6.0alpha1

source download md5
NEWS
HTML Online en ja fr zh_Hans
PDF docs: en ja, zh_Hans, fr
Cheat Sheets:
- postgis: en ja fr zh_Hans
- postgis_raster: en ja fr zh_Hans
- postgis_topology: en ja fr zh_Hans
- postgis_sfcgal: en ja fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja fr zh_Hans

This release is an alpha of a major release, it includes bug fixes since PostGIS 3.5.3 and new features.

by Regina Obe at May 18, 2025 12:00 AM

May 17, 2025

PostGIS Development

PostGIS 3.5.3

The PostGIS Team is pleased to release PostGIS 3.5.3.

3.5.3

source download md5
NEWS
PDF docs: en
HTML Online en ja fr zh_Hans
Cheat Sheets:
- postgis: en ja fr zh_Hans
- postgis_raster: en ja fr zh_Hans
- postgis_topology: en ja fr zh_Hans
- postgis_sfcgal: en ja fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja fr zh_Hans

This release is a bug fix release that includes bug fixes since PostGIS 3.5.1.

by Regina Obe at May 17, 2025 12:00 AM

March 14, 2025

Crunchy Data (Snowflake)

Pi Day PostGIS Circles

What's your favourite infinite sequence of non-repeating digits? There are some people who make a case for e, but to my mind nothing beats the transcendental and curvy utility of π, the ratio of a circle's circumference to its diameter.

Drawing circles is a simple thing to do in PostGIS -- take a point, and buffer it. The result is circular, and we can calculate an estimate of pi just by measuring the perimeter of the unit circle.

SELECT ST_Buffer('POINT(0 0)', 1.0);

buffer default PostGIS

Except, look a little more closely -- this "circle" seems to be made up of short straight lines. What is the ratio of its circumference to its diameter?

SELECT ST_Perimeter(ST_Buffer('POINT(0 0)', 1.0)) / 2;

3.1365484905459406

That's close to pi, but it's not pi. Can we generate a better approximation? What if we make the edges even shorter? The third parameter to ST_Buffer() is the "quadsegs", the number of segments to build each quadrant of the circle.

SELECT ST_Perimeter(ST_Buffer('POINT(0 0)', 1.0, quadsegs => 128)) / 2;

3.1415729403671087

Much closer!

We can crank this process up a lot more, keep adding edges, but at some point the process becomes silly. We should just be able to say "this edge is a portion of a circle, not a straight line", and get an actual circular arc.

Good news, we can do exactly that! The CIRCULARSTRING is the curvy analogue to a LINESTRING wherein every connection is between three points that define a portion of a circle.

circular arc

The circular arc above is the arc that starts at A and ends at C, passing through B. Any three points define a unique circular arc. A CIRCULARSTRING is a connected sequence of these arcs, just as a LINESTRING is a connected sequence of linear edges.

How does this help us get to pi though? Well, PostGIS has a moderate amount of support for circular arc geometry, so if we construct a circle using "natively curved" objects, we should get an exact representation of a circle rather than an approximation.

circle

So, what is an arc that starts and ends at the same point? A circle! This is the unit circle -- a circle of radius one centered on the origin -- expressed as a CIRCULARSTRING.

SELECT ST_Length('CIRCULARSTRING(1 0, -1 0, 1 0)') / 2;

3.141592653589793

That looks a lot like pi!

Let's bust out the built-in pi() function from PostgreSQL and check to be sure.

SELECT pi() - ST_Length('CIRCULARSTRING(1 0, -1 0, 1 0)') / 2;

Yep, a perfect π to celebrate "Pi Day" with!

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at March 14, 2025 02:00 PM

February 10, 2025

Paul Ramsey

The Early History of Spatial Databases and PostGIS

For PostGIS Day this year I researched a little into one of my favourite topics, the history of relational databases. I feel like in general we do not pay a lot of attention to history in software development. To quote Yoda, “All his life has he looked away… to the future, to the horizon. Never his mind on where he was. Hmm? What he was doing.”

Anyways, this year I took on the topic of the early history of spatial databases in particular. There was a lot going on in the ’90s in the field, and in many ways PostGIS was a late entrant, even though it gobbled up a lot of the user base eventually.

February 10, 2025 04:00 PM

February 07, 2025

Crunchy Data (Snowflake)

Using Cloud Rasters with PostGIS

With the postgis_raster extension, it is possible to access gigabytes of raster data from the cloud, without ever downloading the data.

How? The venerable postgis_raster extension (released 13 years ago) already has the critical core support built-in!

Rasters can be stored inside the database, or outside the database, on a local file system or anywhere it can be accessed by the underlying GDAL raster support library. The storage options include S3, Azure, Google, Alibaba, and any HTTP server that supports RANGE requests.

As long as the rasters are in the cloud optimized GeoTIFF (aka "COG") format, the network access to the data will be optimized and provide access performance limited mostly by the speed of connection between your database server and the cloud storage.

TL;DR It Works

Prepare the Database

Set up a database named raster with the postgis and postgis_raster extensions.

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;

ALTER DATABASE raster
  SET postgis.gdal_enabled_drivers TO 'GTiff';

ALTER DATABASE raster
  SET postgis.enable_outdb_rasters TO true;

Investigate The Data

COG is still a new format for public agencies, so finding a large public example can be tricky. Here is a 56GB COG of medium resolution (30m) elevation data for Canada. Don't try and download it, it's 56GB!

MrDEM for Canada

You can see some metadata about the file using the gdalinfo utility to read the headers.

gdalinfo /vsicurl/https://datacube-prod-data-public.s3.amazonaws.com/store/elevation/mrdem/mrdem-30/mrdem-30-dsm.tif

Note that we prefix the URL to the image with /viscurl/ to tell GDAL to use virtual file system access rather than direct download.

There is a lot of metadata!

Metadata from gdalinfo

Driver: GTiff/GeoTIFF
Files: /vsicurl/https://datacube-prod-data-public.s3.amazonaws.com/store/elevation/mrdem/mrdem-30/mrdem-30-dsm.tif
Size is 183687, 159655
Coordinate System is:
PROJCRS["NAD83(CSRS) / Canada Atlas Lambert",
    BASEGEOGCRS["NAD83(CSRS)",
        DATUM["NAD83 Canadian Spatial Reference System",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4617]],
    CONVERSION["Canada Atlas Lambert",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",49,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-95,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",49,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",77,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Transformation of coordinates at 5m level of accuracy."],
        AREA["Canada - onshore and offshore - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon."],
        BBOX[38.21,-141.01,86.46,-40.73]],
    ID["EPSG",3979]]
Data axis to CRS axis mapping: 1,2
Origin = (-2454000.000000000000000,3887400.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
  TIFFTAG_DATETIME=2024:05:08 12:00:00
  AREA_OR_POINT=Area
Image Structure Metadata:
  LAYOUT=COG
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-2454000.000, 3887400.000) (175d38'57.51"W, 68d 7'32.00"N)
Lower Left  (-2454000.000, -902250.000) (121d27'11.17"W, 36d35'36.71"N)
Upper Right ( 3056610.000, 3887400.000) ( 10d43'16.37"W, 62d45'36.29"N)
Lower Right ( 3056610.000, -902250.000) ( 63d 0'39.68"W, 34d21' 6.31"N)
Center      (  301305.000, 1492575.000) ( 88d57'23.39"W, 62d31'56.78"N)
Band 1 Block=512x512 Type=Float32, ColorInterp=Gray
  NoData Value=-32767
  Overviews: 91843x79827, 45921x39913, 22960x19956, 11480x9978, 5740x4989, 2870x2494, 1435x1247, 717x623, 358x311

The key things we need to take from the metadata are that:

the spatial reference system is "NAD83(CSRS) / Canada Atlas Lambert", "EPSG:3979"; and,
the blocking (tiling) is 512x512 pixels.

Load The Database Table

With this metadata in hand, we are ready to load a reference to the remote data into our database, using the raster2pgsql utility that comes with PostGIS.

./raster2pgsql \
  -R \
  -k \
  -s 3979 \
  -t 512x512 \
  -Y 1000 \
  /vsicurl/https://datacube-prod-data-public.s3.amazonaws.com/store/elevation/mrdem/mrdem-30/mrdem-30-dsm.tif \
  mrdem30 \
  | psql raster

That is a lot of flags! What do they mean?

-R means store references, so the pixel data is not copied into the database.
-k means do not skip tiles that are all NODATA values. While it would be nice to skip NODATA tiles, doing so involves reading all the pixel data, which is exactly what we are trying to avoid.
-s 3979 means that the projection of our data is EPSG:3979, the value we got from the metadata.
-t 512x512 means to create tiles with 512x512 pixels, so that the blocking of the tiles in our database matches the blocking of the remote file. This should help lower the number of network reads any given data request requires.
-Y 1000 means to use COPY mode when writing out the tile definitions, and to write out batches of 1000 rows in each COPY block.
Then the URL to the cloud GeoTIFF we are referencing, with /vsicurl/ at the front to indicate using the "curl virtual file system".
Then the table name (mrdem30) we want to use in the database.
Finally we pipe the result of the command (which is just SQL text) to psql to load it into the raster database.

When we are done, we have a table of raster tiles that looks like this in the database.

                     Table "public.mrdem30"
 Column |  Type   | Nullable |               Default
--------+---------+-----------+----------+--------------------------------------
 rid    | integer | not null | nextval('mrdem30_rid_seq'::regclass)
 rast   | raster  |          |
Indexes:
    "mrdem30_pkey" PRIMARY KEY, btree (rid)

We should add a geometry index on the raster column, specifically on the bounds of each tile.

CREATE INDEX mrdem30_st_convexhull_idx
  ON mrdem30 USING GIST (ST_ConvexHull(rast));

This index will speed up the raster tile lookup needed when we are spatially querying.

Query The Data

The single MrDEM GeoTIFF data set is now represented in the database as a table of raster tiles.

SELECT count(*) FROM mrdem30;

There are 112008 tiles in the collection.

Each tile is pretty big, spatially (512 pixels on a side, 30 meters per pixel, means a 15km tile).

MrDEM Tiles

Each tile knows what file it references, where it is on the globe and what projection it is in.

SELECT (ST_BandMetadata(rast)).*
  FROM mrdem30 OFFSET 50000 LIMIT 1;

pixeltype     | 32BF
nodatavalue   | -32767
isoutdb       | t
path          | /vsicurl/https://datacube-prod-data-public.s3.amazonaws.com/store/elevation/mrdem/mrdem-30/mrdem-30-dsm.tif
outdbbandnum  | 1
filesize      | 59659542216
filetimestamp | 1718629812

The ST_ConvexHull() function can be used to get a polygon geometry of the raster bounds.

SELECT ST_AsText(ST_ConvexHull(rast))
  FROM mrdem30 OFFSET 50000 LIMIT 1;

POLYGON((-2054640 -367320,-2039280 -367320,-2039280 -382680,-2054640 -382680,-2054640 -367320))

Just like geometries, raster tiles have a spatial reference id associated with them, in this case a projection that makes sense for a Canada-wide raster.

SELECT ST_SRID(rast)
  FROM mrdem30 OFFSET 50000 LIMIT 1;

Query Elevation

So how do we get an elevation value from this collection of reference tiles? Easy! For any given point, we pull the tile that point falls inside, and then read off the elevation at that point.

-- Make point for Toronto
-- Transform to raster coordinate system
WITH pt AS (
  SELECT ST_Transform(
    ST_Point(-79.3832, 43.6532, 4326),
    3979) AS toronto
)
-- Find the raster tile of interest,
-- and read the value of band one (there is only one band)
-- at that point.
SELECT
  ST_Value(rast, 1, toronto, resample => 'bilinear') AS elevation,
  toronto AS geom
FROM
  mrdem30, pt
WHERE ST_Intersects(ST_ConvexHull(rast), toronto);

Note that we are using "bilinear interpolation" in ST_Value(), so if our point falls between pixel values, the value we get is interpolated in between the pixel values.

Query a Larger Geometry

What about something bigger? How about the flight line of a plane going from Victoria (YYJ) to Calgary (YYC) over the Rocky Mountains?

Generate the points
Make a flight route to join them
Transform that route into the coordinate system of the raster
Pull all the rasters that touch the line and merge them into one giant raster in memory
Copy the values off the raster into the Z coordinate of the line
Dump the line into points to make a pretty picture

-- Create start and end points of route
-- YYJ = Victoria, YYC = Calgary
CREATE TABLE flight AS
WITH
end_pts AS (
    SELECT ST_Point(-123.3656, 48.4284, 4326) AS yyj,
           ST_Point(-114.0719, 51.0447, 4326) AS yyc
),
-- Construct line and add vertex every 10KM along great circle
-- Reproject to coordinate system of rasters
ln AS (
    SELECT ST_Transform(ST_Segmentize(
        ST_MakeLine(end_pts.yyj, end_pts.yyc)::geography,
        10000)::geometry, 3979) AS geom
    FROM end_pts
),
rast AS (
    SELECT ST_Union(rast) AS r
    FROM mrdem30, ln
    WHERE ST_Intersects(ST_ConvexHull(rast), ln.geom)
),
-- Add Z values to that line
zln AS (
    SELECT ST_SetZ(rast.r, ln.geom) AS geom
    FROM rast, ln
),
-- Dump the points of the line for the graph
zpts AS (
    SELECT (ST_DumpPoints(geom)).*
    FROM zln
)
SELECT geom, ST_Z(geom) AS elevation
FROM zpts;

From the elevated points, we can make a map showing the flight line, and the elevations along the way.

Elevation Profile

Why does it work?

How is it possible to read the values off of a 56GB GeoTIFF file without ever downloading the file?

Cloud Optimized GeoTIFF

The difference between a "cloud GeoTIFF" and a "local GeoTIFF" is mostly a difference in how software accesses the data.

A local GeoTIFF probably resides on an SSD or some other storage that has fast random access. Small random reads will be fast, and so will large sequential reads. Local access is fast!
A cloud GeoTIFF resides on an "object store", a remote API that allows clients to real all of a file (with an HTTP "GET") or part of a file (with an HTTP "RANGE"). Each random read is quite slow, because the read involves setting up an HTTP connection (slow) and then transmitting the data over an internetwork (slow). The more reads you do, the worse performance get. So the core goal of a "cloud format" is to reduce the number of reads required to access a subset of the data.

Reading multi-gigabyte raster files from object storage is a relatively new idea, formalized only a couple years ago in the cloud optimized GeoTIFF (aka COG) specification.

The "cloud optimization" takes the form of just a few restrictions on the ordinary GeoTIFF:

Pixel data are tiled
Overviews are also tiled

Forcing tiling means that pixels that are near each other in space are also near each other in the file. Pixels that are near each other in the file can be read in a single read, which is great when you are reading from cloud object storage.

(Another "cloud format" shaking up the industry is Parquet, and Crunchy Data Warehouse can do direct access and query on Parquet for precisely the same reasons that postgis_raster can query COG files -- the format is structured to reduce the number of reads needed to carry out common queries.)

GDAL Virtual File Systems

While a "cloud optimized" format like COG or GeoParquet is cool, it is not going to be a useful cloud format without a client library that knows how to efficiently read the file. The client needs to be native to the application, and it needs to be parsimonious in the number of file accesses it makes.

For a web application, that means that COG access requires a JavaScript library that understands the GeoTIFF format.

For a database written in C, like PostgreSQL/PostGIS, that means that access requires a C/C++ library that understands GeoTIFF and abstracts file system operations, so that the GeoTIFF reader can support both local file system access and remote cloud access.

For PostGIS raster, that library is GDAL. Every build of postgis_raster is linked to GDAL and allows us to take advantage of the library capabilities.

GDAL allows direct access to COG files on remote cloud storage services.

Any HTTP server that supports Range requests
AWS S3
Google Cloud Storage
Azure Blob Storage
and others!

The specific cloud service support allows things like access keys to be used for reading private objects. There is more information about accessing secure buckets with PostGIS raster in this blog post.

Under the covers GDAL not only reads COG format files, it also maintains a modest in-memory data cache. This means there's a performance premium to making sure your raster queries are spatially coherent (each query point is near the previous one) because this maximizes the use of cached data.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at February 07, 2025 03:30 PM

February 03, 2025

Paul Ramsey

WKB EMPTY

I have been watching the codification of spatial data types into GeoParquet and now GeoIceberg with some interest, since the work is near and dear to my heart.

Writing a disk serialization for PostGIS is basically an act of format standardization – albeit a standard with only one consumer – and many of the same issues that the Parquet and Iceberg implementations are thinking about are ones I dealt with too.

Here is an easy one: if you are going to use well-known binary for your serialiation (as GeoPackage, and GeoParquet do) you have to wrestle with the fact that the ISO/OGC standard for WKB does not describe a standard way to represent empty geometries.

Empty

Empty geometries come up frequently in the OGC/ISO standards, and they are simple to generate in real operations – just subtract a big thing from a small thing.

SELECT ST_AsText(ST_Difference(
	'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))',
	'POLYGON((-1 -1, 3 -1, 3 3, -1 3, -1 -1))'
	))

If you have a data set and are running operations on it, eventually you will generate some empties.

Which means your software needs to know how to store and transmit them.

Which means you need to know how to encode them in WKB.

And the standard is no help.

But I am!

WKB Commonalities

All WKB geometries start with 1-byte “byte order flag” followed by a 4-byte “geometry type”.

enum wkbByteOrder  {
    wkbXDR = 0, // Big Endian
    wkbNDR = 1  // Little Endian
};

The byte order flag signals which “byte order” all the other numbers will be encoded with. Most modern hardware uses “least significant byte first” (aka “little endian”) ordering, so usually the value will be “1”, but readers must expect to occasionally get “big endian” encoded data.

enum wkbGeometryType {
    wkbPoint = 1,
    wkbLineString = 2,
    wkbPolygon = 3,
    wkbMultiPoint = 4,
    wkbMultiLineString = 5,
    wkbMultiPolygon = 6,
    wkbGeometryCollection = 7
};

The type number is an integer from 1 to 7, in the indicated byte order.

Collections

Collections are easy! GeometryCollection, MultiPolygon, MultiLineString and MultiPoint all have a WKB structure like this:

wkbCollection {
    byte    byteOrder;
    uint32  wkbType;
    uint32  numWkbSubGeometries;
    WKBGeometry wkbSubGeometries[numWkbSubGeometries];
}

The way to signal an empty collection is to set its numGeometries value to zero.

So for example, a MULTIPOLYGON EMPTY would look like this (all examples in little endian, spaces added between elements for legibility, using hex encoding).

01 06000000 00000000

The elements are:

The byte order flag
The geometry type (6 == MultiPolygon)
The number of sub-geometries (zero)

Polygons and LineStrings

The Polygon and LineString types are also very easy, because after their type number they both have a count of sub-objects (rings in the case of Polygon, points in the case of LineString) which can be set to zero to indicate an empty geometry.

For a LineString:

01 02000000 00000000

For a Polygon:

01 03000000 00000000

It is possible to create a Polygon made up of a non-zero number of empty linear rings. Is this construction empty? Probably. Should you make one of them? Probably not, since POLYGON EMPTY describes the case much more simply.

Points

Saving the best for last!

One of the strange blind spots of the ISO/OGC standards is the WKB Point. There is a standard text representation for an empty point, POINT EMPTY. But nowhere in the standard is there a description of a binary empty point, and the WKB structure of a point doesn’t really leave any place to hide one.

WKBPoint {
    byte    byteOrder;
    uint32  wkbType; // 1
    double x;
    double y;
};

After the standard byte order flag and type number, the serialization goes directly into the coordinates. There’s no place to put in a zero.

In PostGIS we established our own add-on to the WKB standard, so we could successfully round-trip a POINT EMPTY through WKB – empty points are to be represented as a point with all coordinates set to the IEEE NaN value.

Here is a little-endian empty point.

01 01000000 000000000000F87F 000000000000F87F

And a big-endian one.

00 00000001 7FF8000000000000 7FF8000000000000

Most open source implementations of WKB have converged on this standardization of POINT EMPTY. The most common alternate behaviour is to convert POINT EMPTY object, which are not representable, into MULTIPOINT EMPTY objects, which are. This might be confusing (an empty point would round-trip back to something with a completely different type number).

In general, empty geometries create a lot of “angels dancing on the head of a pin” cases for functions that otherwise have very deterministic results.

“What is the distance in meters between a point and an empty polygon?” Zero? Infinity? NULL? NaN?
“What geometry type is the interesection of an empty polygon and empty line?” Do I care? I do if I am writing a database system and have to provide an answer.

Over time the PostGIS project collated our intuitions and implementations in this wiki page of empty geometry handling rules.

The trouble with empty handling is that there are simultaneously a million different combinations of possibilities, and extremely low numbers of people actually exercising that code line. So it’s a massive time suck. We have basically been handling them on an “as needed” basis, as people open tickets on them.

Other Databases

SQL Server changes POINT EMPTY to MULTIPOINT EMPTY when generating WKB.

SELECT Geometry::STGeomFromText('POINT EMPTY',4326).STAsBinary()

0x010400000000000000

MariaDB and SnowFlake return NULL for a POINT EMPTY WKB.

SELECT ST_AsBinary(ST_GeomFromText('POINT EMPTY'))

NULL

February 03, 2025 04:00 PM

January 18, 2025

PostGIS Development

PostGIS 3.5.2

The PostGIS Team is pleased to release PostGIS 3.5.2.

This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5+ is needed.

3.5.2

source download md5
NEWS
PDF docs: en ja, fr, zh_Hans
HTML Online en ja fr zh_Hans
Cheat Sheets:
- postgis: en ja fr zh_Hans
- postgis_raster: en ja fr zh_Hans
- postgis_topology: en ja fr zh_Hans
- postgis_sfcgal: en ja fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja fr zh_Hans

This release is a bug fix release that includes bug fixes since PostGIS 3.5.1.

by Regina Obe at January 18, 2025 12:00 AM

January 06, 2025

Crunchy Data (Snowflake)

Running an Async Web Query Queue with Procedures and pg_cron

The number of cool things you can do with the http extension is large, but putting those things into production raises an important problem.

The amount of time an HTTP request takes, 100s of milliseconds, is 10- to 20-times longer that the amount of time a normal database query takes.

This means that potentially an HTTP call could jam up a query for a long time. I recently ran an HTTP function in an update against a relatively small 1000 record table.

The query took 5 minutes to run, and during that time the table was locked to other access, since the update touched every row.

This was fine for me on my developer database on my laptop. In a production system, it would not be fine.

Geocoding, For Example

A really common table layout in a spatially enabled enterprise system is a table of addresses with an associated location for each address.

CREATE EXTENSION postgis;

CREATE TABLE addresses (
  pk serial PRIMARY KEY,
  address text,
  city text,
  geom geometry(Point, 4326),
  geocode jsonb
);

CREATE INDEX addresses_geom_x
  ON addresses USING GIST (geom);

INSERT INTO addresses (address, city)
  VALUES ('1650 Chandler Avenue', 'Victoria'),
         ('122 Simcoe Street', 'Victoria');

New addresses get inserted without known locations. The system needs to call an external geocoding service to get locations.

SELECT * FROM addresses;

 pk |       address        |   city   | geom | geocode
----+----------------------+----------+------+---------
  8 | 1650 Chandler Avenue | Victoria |      |
  9 | 122 Simcoe Street    | Victoria |      |

When a new address is inserted into the system, it would be great to geocode it. A trigger would make a lot of sense, but a trigger will run in the same transaction as the insert. So the insert will block until the geocode call is complete. That could take a while. If the system is under load, inserts will pile up, all waiting for their geocodes.

Procedures to the Rescue

A better performing approach would be to insert the address right away, and then come back later and geocode any rows that have a NULL geometry.

The key to such a system is being able to work through all the rows that need to be geocoded, without locking those rows for the duration. Fortunately, there is a PostgresSQL feature that does what we want, the PROCEDURE.

Unlike functions, which wrap their contents in a single, atomic transaction, procedures allow you to apply multiple commits while the procedure runs. This makes them perfect for long-running batch jobs, like our geocoding problem.

CREATE PROCEDURE process_address_geocodes()
LANGUAGE plpgsql
AS $$
DECLARE
  pk_list BIGINT[];
  pk BIGINT;
BEGIN
  --
  -- Find all rows that need geocoding
  --
  SELECT array_agg(addresses.pk)
    INTO pk_list
    FROM addresses
    WHERE geocode IS NULL;

  --
  -- Geocode those rows one at a time,
  -- one transaction per row
  --
  IF pk_list IS NOT NULL THEN
    FOREACH pk IN ARRAY pk_list LOOP
      PERFORM addresses_geocode(pk);
      COMMIT;
    END LOOP;
  END IF;

END;
$$;

The important thing is to break the work up so it is done one row at a time. Rather than running a single UPDATE to the table, we find all the rows that need geocoding, and loop through them, one row at a time, committing our work after each row.

Geocoding Function

The addresses_geocode(pk) function takes in a row primary key and then geocodes the address using the http extension to call the Google Maps Geocoding API. Taking in the primary key, instead of the address string, allows us to call the function one-at-a-time on each row in our working set of rows.

The function:

reads the Google API key from the environment;
reads the address string for the row;
sends the geocode request to Google using the http extension;
checks the validity of the response; and
updates the row.

Each time through the function is atomic, so the controlling procedure can commit the result as soon as the function is complete.

Geocoding function addresses_geocode(pk)

--
-- Take a primary key for a row, get the address string
-- for that row, geocode it, and update the geometry
-- and geocode columns with the results.
--
CREATE FUNCTION addresses_geocode(geocode_pk bigint)
RETURNS boolean
LANGUAGE 'plpgsql'
AS $$
DECLARE
  js jsonb;
  full_address text;
  res http_response;
  api_key text;
  api_uri text;
  uri text := '<https://maps.googleapis.com/maps/api/geocode/json>';
  lat float8;
  lng float8;

BEGIN

  -- Fetch API key from environment
  api_key := current_setting('gmaps.api_key', true);

  IF api_key IS NULL THEN
      RAISE EXCEPTION 'addresses_geocode: the ''gmaps.api_key'' is not currently set';
  END IF;

  -- Read the address string to geocode
  SELECT concat_ws(', ', address, city)
    INTO full_address
    FROM addresses
    WHERE pk = geocode_pk
    LIMIT 1;

  -- No row, no work to do
  IF NOT FOUND THEN
    RETURN false;
  END IF;

  -- Prepare query URI
  js := jsonb_build_object(
          'address', full_address,
          'key', api_key
        );
  uri := uri || '?' || urlencode(js);

  -- Execute the HTTP request
  RAISE DEBUG 'addresses_geocode: uri [pk=%] %', geocode_pk, uri;
  res := http_get(uri);

  -- For any bad response, exit here, leaving all
  -- entries NULL
  IF res.status != 200 THEN
    RETURN false;
  END IF;

  -- Parse the geocode
  js := res.content::jsonb;

  -- Save the json geocode response
  RAISE DEBUG 'addresses_geocode: saved geocode result [pk=%]', geocode_pk;
  UPDATE addresses
    SET geocode = js
    WHERE pk = geocode_pk;

  -- For any non-usable geocode, exit here,
  -- leaving the geometry NULL
  IF js->>'status' != 'OK' OR js->'results'->>0 IS NULL THEN
    RETURN false;
  END IF;

  -- For any non-usable coordinates, exit here
  lat := js->'results'->0->'geometry'->'location'->>'lat';
  lng := js->'results'->0->'geometry'->'location'->>'lng';
  IF lat IS NULL OR lng IS NULL THEN
    RETURN false;
  END IF;

  -- Save the geocode result as a geometry
  RAISE DEBUG 'addresses_geocode: got POINT(%, %) [pk=%]', lng, lat, geocode_pk;
  UPDATE addresses
    SET geom = ST_Point(lng, lat, 4326)
    WHERE pk = geocode_pk;

  -- Done
  RETURN true;

END;
$$;

Deploy with pg_cron

We now have all the parts of a geocoding engine:

a function to geocode a row; and,
a procedure that finds rows that need geocoding.

What we need is a way to run that procedure regularly, and fortunately there is a very standard way to do that in PostgreSQL — pg_cron.

If you install and enable pg_cron in the usual way, in the postgres database, new jobs must be added from inside the postgres database, using the cron.schedule_in_database() function to target other databases.

--
-- Schedule our procedure in the "geocode_example_db" database
--
SELECT cron.schedule_in_database(
  'geocode-process',                 -- job name
  '15 seconds',                      -- job frequency
  'CALL process_address_geocodes()', -- sql to run
  'geocode_example_db'               -- database to run in
  ));

Wait, 15 seconds frequency? What if a process takes more than 15 seconds, won't we end up with a stampeding herd of procedure calls? Fortunately no, pg_cron is smart enough to check and defer if a job is already in process. So there's no major downside to calling the procedure fairly frequently.

Conclusion

HTTP and AI and BI rollup calls can run for a "long time" relative to desired database query run-times.
PostgreSQL PROCEDURE calls can be used to wrap up a collection of long running functions, putting each into an individual transaction to lower locking issues.
pg_cron can be used to deploy those long running procedures, to keep the database up-to-date while keeping load and locking levels reasonable.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at January 06, 2025 02:30 PM

December 26, 2024

Crunchy Data (Snowflake)

Name Collision of the Year: Vector

I can’t get through a zoom call, a conference talk, or an afternoon scroll through LinkedIn without hearing about vectors. Do you feel like the term vector is everywhere this year? It is. Vector actually means several different things and it's confusing. Vector means AI data, GIS locations, digital graphics, and a type of query optimization, and more. The terms and uses are related, sure. They all stem from the same original concept. However their practical applications are quite different. So “Vector” is my choice for this year’s name collision of the year.

In this post I want to break down the vector. The history of the vector, how vectors were used in the past and how they evolved to what they are today (with examples!).

The original vector

The idea that vectors are based on goes back to the 1500s when René Descartes first developed the Cartesian coordinate XY system to represent points in space. Descartes didn't use the word vector but he did develop a numerical representation of a location and direction. Numerical locations is the foundational concept of the vector - used for measuring spatial relationships.

The first use of the term vector was in the 1840s by an Irish mathematician named William Rowan Hamilton. Hamilton defined a vector as a quantity with both magnitude and direction in three-dimensional space. He used it to describe geometric directions and distances, like arrows in 3D space. Hamilton combined his vectors with several other math terms to solve problems with rotation and three dimensional units.

The word Hamilton chose, vector, comes from the Latin word vehere meaning ‘to carry’ or ‘conveyor’ (yes, same origin for the word vehicle). We assume Hamilton chose this Latin word origin to emphasize the idea of a vector carrying a point from one location to another.

There’s a book about the history of vectors published just this year, and a nice summary here. I’ve already let Santa know this is on my list this year.

Mathematical vectors

Building upon Hamilton’s work, vectors have been used extensively in linear algebra pre and post computational math. If it has been 20 since you took a math class here’s a quick refresher.

Linear algebra is a branch of mathematics that focuses on vectors, matrices, and arrays of numbers. Here’s a super simple mathematical vector equation. We have two points on an XY coordinate system, point A at 1, 2 and B at 4,6. The vector formula for this is below in this diagram, final solution 3,4.

basic math vector

Linear algebra of much more complicated forms is used in solving systems of linear differential equations. Vector equations have practical use cases in physics and engineering for things we use every day like heat conduction, fluids, and electrical circuits.

Computer science vectors

Early computer scientists made heavy use of the vector in a variety of ways. A computational vector can be similar to the example above or even just a simple numeric array of fixed size with where the numbers have related values. In early computer programming, simple operations like additions or subtraction would be applied to a set of vectors.

A basic example of this could be financial portfolio analysis where you have two vectors: 1 - Portfolio weights, v1, showing the proportion of investment in different stocks and 2 - market impact adjustments, v2, that adjusts markets based on current values. This code sample here in C calculates the adjusted weights for each stock in the portfolio by adding the two vectors.

#include <stdio.h>

#define STOCKS 8

typedef float Portfolio[STOCKS];

int main() {
    // Portfolio weights (in percentages, out of 100)
    Portfolio portfolioWeights = {10.0, 20.0, 15.0, 25.0, 5.0, 10.0, 10.0, 5.0};
    // Market impact adjustments (positive or negative percentages)
    Portfolio marketAdjustments = {0.5, -0.3, 1.0, -0.5, 0.2, -0.1, 0.0, 0.7};
    Portfolio adjustedWeights;

    // Perform vector addition
    for (int i = 0; i < STOCKS; i++) {
        adjustedWeights[i] = portfolioWeights[i] + marketAdjustments[i];
    }

    // Print adjusted weights
    printf("Adjusted Portfolio Weights: <");
    for (int i = 0; i < STOCKS; i++) {
        printf("%s%.1f%%", i > 0 ? ", " : "", adjustedWeights[i]);
    }
    printf(">\n");

    return 0;
}

Modern computer science builds on similar concepts of organizing and processing collections. The std::vector in C++ and Vec<T> in Rust are general-purpose dynamic arrays. They can be virtually any data type to help manage or compute collections of elements.

Graphics and vectors

Vector graphics were used in early arcade and video game development. Think of something like Spacewar! or Asteroids. Vectors could be used to draw lines and shapes like ships and stars.

Here’s a super simple example of how vectors could be used to draw a triangle.

#define DrawLine(pt1, pt2)

typedef struct Point {
    int x, y;
} Point;

typedef struct Line {
    Point start;
    Point end;
} Line;

Line lines[3] = {
    {{0, 0}, {100, 100}},  // Line 1
    {{100, 100}, {200, 50}}, // Line 2
    {{200, 50}, {0, 0}}    // Line 3
};

// Loop through these points to draw our triangle on the screen.
int main()
{
    for (int i = 0; i < 3; i++)
    {
        DrawLine(lines[i].start, lines[i].end);
    }
    return 0;
}

These early xy arrays and computerized graphics paved the way for modern computer graphics which make use of vectors in even more advanced ways. When you play a modern 3D video game, many characters, objects, and movement you see on the screen are powered by linear algebra vectors.

The Graphics Processing Unit (GPU) was a specialized computer developed in the 1990s and then improved on in the decades since. GPUs handle the millions of vector operations required to create 3D graphics in real time. GPUs now are used for far more than 3D graphics. Vector-based assembly operations can operate on a continuous block of memory, doing the same operation across different chunks of memory.

Scalable vector graphics (SVG)

SVGs are 2D vector graphics that have become a de-facto image format in web design and development. There’s a vector standard that allows svg graphics to be created with a series of numbers that represent shapes and paths that work across devices and web browsers. SVG graphics display logos, icons, charts, and animations. Their popularity took off in the mid 2010s and continues to grow as they remain popular due to their performance and lightweight nature.

SVGs use some number of vector numbers to describe the object they represent. For a simple SVG with a few shapes might be dozens of numbers. A more complex SVG like one for a detailed icon or map might include thousands of numbers.

Here’s what the SVG of the Crunchy Data hippo logo looks like:

<svg
	id="aad9811e-aeeb-4dae-a064-7d889077489a"
	data-name="Layer 6"
	xmlns="http://www.w3.org/2000/svg"
	viewBox="0 0 1407.15 1158.38"
>
	<path
		d="M553.21,651l124.3,122.4-154.9-89Zm-304.5-496.6-54.6,148.9L35.71,415.19,6.81,523.49l-6.5,67.9,83.1,65.2h0l208.7-10.3,114.1-155.7,3.6-166,199.3-200.5-104.7-41.9Zm0,0,360.4-30.3m-104.7-41.9-114.1,61.4-130.7,213.5-105.5,150.5-70.8,149m322.9-166-145.9-135.4-222.5,62.1M294.21,642l-140.1-135.1L1,586.39m36.1-171.2,116.3,91,190.8-73.1m-95.5-278.7L259.61,357m150.1-32.4-19.4-181m218.8-19.5,14.7,196.7-59.5,137.4-49.1,104-92.7,47.2-128.8,35.9,139.8,39.3L621.21,632l62.4-196.3,16.7-174.4-92.4-136.9M621.21,632l-215-141.5,26.7,194-349.6-28m617-395.2-294.1,229.3,215,141.5m-217.1,50.2,8.6,306.7-17.5,35.7,6.1,52.8,101.7-4.8,63.5-63.9,6-47.9L588.41,792h0l89.2-18.4,97.2,23.4,84.2,19.7-2.1,46.5,10.5,30.4-19,28.9,28.1,1.9,1.6-.8,6,105.5-15.1,40.1,25.3,88.7,132.1-33-6.1-50.6,65.5-306.8,49.5-12.2,57-43,29,41.1,2.4,88.3,5.8,61.8-18.6,46.2,23.5,38.7,96.5-12.4,44.3-43.5-21.1-28.8,13.8-216.9,4-65.5,34.6-116.4-23.4-120.4-332.8-215.1L842,135l-151.2,47.5m119.9,84.8-202.4-143.1m202.4,143.1L849,552.39l134.2-214.2ZM1164,453.09l-180.8-115-42.6,277Zm-486.5,320.4,263-158.4L849,552.39Zm133.2-506.2-110.6-4-4.6,48.5,115-42.3m-133,504-154.9-89,65.7,107.4Zm170.3-25.9,35.1,87,57.6-219.4Zm117.7,83.3-25-215.8-57.6,219.4Zm-24.9-215.8,25,215.8,120.2-63.5Zm12.7,418.8,94-83.9-81.9-119.1Zm-105.5-285.6-170.3,25.2,200,47.7ZM1164,453.09l-70.6,270.3,141.1-114Zm70.5,156.3,77.8-132.8L1195,262.89Zm-251.3-271.3,180.8,115,31.1-190.2Zm67.1-168.8-67.1,168.8,211.9-75.2ZM842,135l-151.2,47.5,359.5-13.9Zm244.2,633.2,7.2-44.8m167.2-63.1,51.8-183.7-77.9,132.8Zm0,0-26.1-50.9-99.3,145.8Zm0,0,84.1-88.7-32.4-95Zm84.1-88.7-84.1,88.7,42.4-7.6Zm-22.6-226.7-9.8,131.7,32.4,95Zm0,0,22.6,226.7,62-69Zm46.3,339.3-65.3-30.2,56.7,161.5Zm-114.7,122.3,77.3-31.9-28.1-121.8Zm49.2-153.7,28.1,121.8,28.9,40.9Zm69.3-32.3-27.5-48.9,23.7,112.6ZM1331,774.59l-4.7,123.7,33.6-82.7Zm-93.9,213.3,94.5-12.7-5.4-78.4Zm16.6-181.4-30,35.1,13.4,139.9,63.4-138.2Zm0,0-33.1-115.9,3.1,150.6Zm-32.8-115.2,82.2-37.2m-73.5,249.3,7.6,84.6m94.5-12.8,43.7-42.9-49.1-35.5Zm-5.8-79.2,29.1,7.3m-942.3,85.6-11.4,88.5,63.4-55.8Zm51.2,31.9,38.7,52.5,63.8-64.5Zm556,53.9-66.6-40.8-59.2,123.9Zm-431.6-282.8-112.2,70.4-11.4,159.3Zm-178.6,89.3,2.9,107.7,63.5-126.6Zm238-729.1,40.7-57.4L702,45.29l-13.6-32L650.11.49l-13.6,2.6-31.2,41.3-10.3,73,14.1,6.7ZM650,.49l-48.6,74.7,81.4-45.9Zm32.7,28.4L702,45.19m-19.1-15.3,5.5,64.8L647.31,110l-38.2,14.1m0,0-7.7-48.9m87-61.9-5.5,16.6L650,.59m-269.3,116-4.1-59.1-45-22.9-43.7,26.8,2.7,42.8,11.5,35.3M346.21,81l-14.6-46.5-41,69.7L346.21,81l-43.8,58.5m74.2-82.1L346.21,81l34.5,35.6m486.4,777.9,10.9,29m4.9-90.7-15.6,60.6,10.7,30.1Zm-407,32,46.7-180.3-112.9,196.7m23.2-196.6,89.7-.1,30.6-33.4M744.81,394l-10.6,113.9L849,552.39Zm-75.5,84.8L621.21,632l113.1-124.1Zm64.9,29.1-56.7,265.6m0,0,27.2-133.3-83.6-8.1Zm68.1-380.1-59.2,18m9-99.7,49.4,82.3,65.7-124.6Zm-289.2,178.9,277.3-54.9m200.3,594.7,31-31.4,50.7-168.1m-82.6,1.9,31.9,166.1,38.5,34.9M1331,774.59l-30.4,68.7,25.8,53.5M287.91,61.39l23.9,6.7"
		fill="none"
		stroke="currentColor"
		stroke-linejoin="bevel"
	/>
</svg>

GIS vector data

In modern computational GIS, vectors are used to represent geometric data types like points, line-strings, and polygons. Like any other x,y,z vector coordinate system the vectors refer to specific global points or objects. There’s quite a few different spatial reference systems that can be used. The vectors are typically stored in PostGIS using a binary format Well-Known Binary (WKB), which is a standardized binary encoding for geometries. Vectorization also powers many of the key functions in modern geospatial data processing like intersections, distance calculations, joins, and proximity analysis.

Here’s the vector binary for (imho) the best BBQ restaurant in the world:

 restaurant_name |                        geom
-----------------+----------------------------------------------------
Gates Bar B Q    | 0101000020E610000082E673EE76A557C007B47405DB884340

AI Vectors

AI vectors emerged from the mathematical and computational foundations of vectors that I covered above. Through advancements in hardware and in machine learning algorithms, vectors can be used as a system to describe virtually anything. Large Language Models (LLMs) convert data like text, images, or other inputs into vectors through a process called embedding. LLMs use layers of neural networks to process the embeddings in a specific context. So the vectors numerically represent relationships between objects within the context they were created with.

You’ve probably heard of the pgvector extension that is used for storing and querying AI related embedding data. pgvector adds a custom data type vector for storing fixed-length arrays of floating-point numbers. pgvector stores up to 16k dimensions.

My colleague Karen Jex has a great embedding talk she does about AI called “What’s the Opposite of a Corn Dog”. The vector embedding for a corn dog from an OpenAI menu dataset is an array of a staggering 1536 numbers. Here’s a snippet.

// vector of a Corn Dog
[0.0045576594,-0.00088141876,-0.014024569,-0.011641564,0.0038251784,0.010306821,-0.01265076,-0.013672978,-0.01582159,-0.041670028,0.0044274405,.........0.040185533,-0.010463083,0.004326521,-0.019571891,0.01853014,0.025770308,-0.017787892,0.0018572462]

In AI and machine learning, a vector is an ordered list of numbers that represents data for literally anything. Really what “AI” is doing is turning anything and everything into a vector and then comparing that vector with other vectors in the same matrix.

Vectorized queries

As the use of computational vectors have become so popular along with machine learning, the underlying methods and CPU hardware for processing vector data is now used to process other kinds of data.

There are several databases on the market now like DuckDB, Big Query, Snowflake, and Crunchy Data Warehouse that make use of vectorized query execution to speed up analytics queries. Vectorized database queries split up and streamline queries into similar results over chunks of data of the same type. In a way, they’re treating columns of data like mathematical vectors. This can be much more powerful than reading data row by row. The power here also comes from the parallelization and effective CPU and IO usage.

vectorized queries.png

The values processed with vectorized execution are typically treated as vectors in the sense that they’re contiguous batches of data elements. Surprisingly, they do not need to represent mathematical vectors—they can be any kind of data that fits the processing model.

Vectors are everywhere!

Vectors are everywhere and they can mean virtually anything in a computerized context - especially now with AI - everything is or can be a vector.

Vectors and their uses are one of the main characters in the story of modern computing. An evolution from pen and ink math to modern ML algorithms. The beauty of the vector in its infinite use of numeric representation. From simple concepts like a point on the globe to computerized graphics and animation, and AI embeddings for any text or image.

Vector use summary:

vector uses.png

Attributions

Hamilton’s Lecture on Vectors

by Elizabeth Christensen (Elizabeth.Christensen@crunchydata.com) at December 26, 2024 01:30 PM

December 23, 2024

PostGIS Development

PostGIS Patch Releases

The PostGIS development team is pleased to provide bug fix releases for 3.5.1, 3.4.4, 3.3.8, 3.2.8, 3.1.12

Please refer to the links above for more information about the issues resolved by these releases.

by Regina Obe at December 23, 2024 12:00 AM

December 15, 2024

Boston GIS (Regina Obe, Leo Hsu)

The bus factor problem

One of the biggest problems open source projects face today is the bus factor problem.

I've been thinking a lot about this lately as how it applies to my PostGIS, pgRouting, and OSGeo System Administration (SAC) teams.

Continue reading "The bus factor problem"

by Regina Obe (nospam@example.com) at December 15, 2024 03:11 AM

December 01, 2024

Boston GIS (Regina Obe, Leo Hsu)

PostGIS Day 2024 Summary

PostGIS Day yearly conference sponsored by Crunchy Data is my favorite conference of the year because it's the only conference I get to pig out on PostGIS content and meet fellow passionate PostGIS users pushing the envelop of what is possible with PostGIS and by extension PostgreSQL. Sure FOSS4G conferences do have a lot of PostGIS content, but that content is never quite so front and center as it is on PostGIS day conferences. The fact it's virtual means I can attend in pajamas and robe and that the videos come out fairly quickly and is always recorded. In fact the PostGIS Day 2024 videos are available now in case you wanted to see what all the fuss is about.

Continue reading "PostGIS Day 2024 Summary"

by Regina Obe (nospam@example.com) at December 01, 2024 10:59 PM

November 27, 2024

Crunchy Data (Snowflake)

PostGIS Day 2024 Summary

In late November, on the day after GIS Day, we hosted the annual PostGIS day online event. 22 speakers from around the world, in an agenda that ran from mid-afternoon in Europe to mid-afternoon on the Pacific coast.

We had an amazing collection of speakers, exploring all aspects of PostGIS, from highly technical specifics, to big picture culture and history. A full playlist of PostGIS Day 2024 is available on the Crunchy Data YouTube channel. Here’s a highlight reel of the talks and themes throughout the day.

The Old and the New

My contribution to the day is a historical look back at the history of databases and spatial databases. The roots of PostGIS are the roots of PostgreSQL, and the roots of PostgreSQL in turn go back to the dawn of databases. The history of software involves a lot of coincidences, and turns on particular characters sometimes, but it’s never (too) dull!

Joshua Carlson delivered one of the stand-out talks of the day, exploring how he built a very old-style cartographic product–a street with a grid-based index to find street names–using a very new-style approach–spatial SQL to generate the grid and find the grid numbers for each street to fill in the index. Put Making a Dynamic Street Map Index with ST_SquareGrid at the top of your video play list.

alt

For the past ten years, Brian Timoney has been warning geospatial practitioners about the complexity of the systems they are delivering to end users. In Simplify, simplify, simplify, Timoney both walks the walk and talks the talk, delivering denunciations of GIS dashboard mania, while building out a minimalist mapping solution using just PostGIS, SVG and (yes!) Excel. It turns out that SVG is an excellent medium for delivering cartographic products, and you can generate them entirely in PostgreSQL/PostGIS.

And then, for example, work with them directly in MS Word! (This is, as Brian says, what customers are looking for, not a dashboard.)

alt

Steve Pousty brought the mandatory AI-centric talk, but avoided the hype and stuck to the practicalities of the new era: what do the terms mean, what are the models for, what tools are there in PostgreSQL to make use of them, and in particular what makes sense for spatial practitioners.

Parquet and PostGIS

Our own Rekha Khandhadia showed off the power of our latest product, Crunchy Data Warehouse, when combined with the massive map data available from Overture, and the analytical tools of PostGIS.

In Geospatial Analytics with GeoParquet, using only SQL, she addressed the 300GB of Overture data, and ran a spatial analysis on the fly over the state of Michigan.

GeoParquet is the new kid on the block, with lots of folks in the researching phase.

alt

Brian Loomis of Nikola Motor shared how he is using PostGIS/PostgreSQL to quantify how much time their trucks are spending in various impacted communities, for reporting to the California Air Resources Board (CARB). Loomis also shares his use case for Crunchy Data Warehouse. In working with 4 billion points a day, they're using s3 to store partitioned data in Parquet. Loomis has some useful notes on Parquet file sizes and structure optimization if you're new to that topic.

The Larger World

PostGIS doesn’t exist in a vacuum, it’s part of a larger open ecosystem of data and other software and organizations trying to solve problems. Bonny McClain returned to PostGIS day with an update on her work on urban climate issues and using SQL as an engine for public policy analysis.

At Overture Maps, a collaboration of industry members is synthesizing a public world base map from multiple sources, and Dana Bauer and Jake Wasserman got us Started With Overture Maps, how PostGIS can make use of the data and what is being built. At the other end of the spectrum, Felt is building end-user facing tools for spatial collaboration, and Michal Migurski walked us through a demo of pulling climate data from a PostGIS service, visualizing and story telling with the data.

Meanwhile, in the daily grind of GIS operations, Kurt Menke is seeing a wave of open source adoption in Danish municipalities, as QGIS and PostGIS take over and old MapInfo installations are phased out. The pattern of adoption across the nation is very interesting and Kurt provides lots of maps.

alt

This poll from the webinar shows a lot of QGIS use in our PostGIS Day audience! Not surprising, really, QGIS is the easiest desktop GIS to integrate with PostGIS.

alt

Finally, we got to hear from Pekka Sarkola on How to Connect PostGIS to ArcGIS and the answer is “it depends”. There’s a lot of complexity in the Esri environment, lots of products, and lots of history, so the precise way you want to connect will depend on your needs. But you can do it, just remember to read the docs carefully.

Regina with a pure SQL exploration of PostGIS-related extensions, shared PostGIS Surprise, the Sequel;

The Nitty Gritty

Using PostGIS often means accessing and using from another language, and Tom Payne provided a great deep dive into using PostGIS from within the Go language. Tom’s work on 3D geospatial is built into flight devices to warn aviators of hazards in the Swiss alps. Also in the world of 3D, Loïc Bartoletti explained SFCGAL and PostGIS, bringing new algorithms into PostGIS – in particular algorithms working with volumetric types and 3D data.

alt

Finally, Maxime Schoemans introduced us to the power of Multi-entry Generalized Search Trees – imagine the current PostGIS spatial indexes, but with each spatial object potentially represented with multiple index keys. The potential for performance improvements, as Maxime demonstrated, is very high, particularly for data involving large and complex shapes.

All these speakers crossed the threshold of true nitty – they talked about C and core code bindings!

Routing and Driving

Route finding and fleet management continue to be ever-green topics in the world of geospatial, as the world keeps spinning faster on more and more wheels. While it is tempting to reach for pgRouting to solve any routing problem, both Ibrahim Saricicek and Dennis Boachie Boateng counseled making sure your routing solutions matches your routing problem.

Everyone has a favourite cost for routing, and this poll shows the PostGIS day audience pretty divided on the right one.

alt

Ibrahim provided a good comparison of different open source routing options, in a Survey of pgRouting and Other Open Source Routing Tools.

And Dennis went all-in on the bespoke routing path, describing the core principles of routing, and demonstrating his own Custom Routing Solutions with PostGIS, in particular a live example of his own mobile way-finding application.

You get an API, you get an API, you all get APIs!

Web APIs to PostGIS are always a rich topic, because there’s a lot of them, and everyone has a favorite specification or implementation language. Michael Keller shared his incredibly well fleshed out FastCollection API, a Python state-of-the-art implementation of the Open Geospatial Consortium standards, with a few extra API end points for easier web application building. We are looking forward to seeing Michael in future years, as he builds out a complete example application on top of this API.

Elizabeth Christensen showed off our favourite API tools, the lightweight services we use for building Web maps from PostGIS – pg_featureserv and pg_tileserv. Simplicity of deployment and interface are what distinguish these Go language services, just download and run, no dependencies, no fuss.

alt

Martin Davis also showed off our microservices, but in the context of the Uber global hexagonal grid system. He built a live dashboard specifically to show Summarizing Data in H3 with PostGIS and pg_tileserv. All the summary maps were generated on-the-fly, which is particularly impressive given the data on the backend.

Topological Data Models

Two approaches to managing data with shared boundaries were demonstrated at PostGIS day this year. The “traditional” approach was explained by Felipe Matas in Simplify Space Relations like Country/State Divisions with Postgis Topology. PostGIS comes with a built-in topology model, but understanding the moving parts can be hard, and Felipe provided a great talk with (importantly) a lot of pictures about how a topological model represents something like administrative boundaries.

alt

Yao Cui from the British Columbia Geological Survey showed off the data model he developed 20 years ago to handle the difficult problem of keeping geological data clean while still supporting a robust data update cycle. Cui’s approach uses PostGIS to Facilitate Polygonal Map Integration Without Edge Matching. He keeps the topology implicit, and just manages the boundaries between areas, with a little careful work in identifying the boundaries of edit areas to allow long term data checkout, and clean data check-in.

The curtain closes

It was an honor to once again host PostGIS day, and we are in debt to all the great speakers who gave their time to participate. Thanks to everyone who participated in the chat and Q&A sessions, it was a lively experience, all 11 hours of it!

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at November 27, 2024 04:30 PM

November 19, 2024

Crunchy Data

Loading the World! OpenStreetMap Import In Under 4 Hours

The OpenStreetMap (OSM) database builds almost 750GB of location data from a single file download. OSM notoriously takes a full day to run. A fresh open street map load involves both a massive write process and large index builds. It is a great performance stress-test bulk load for any Postgres system. I use it to stress the latest PostgreSQL versions and state-of-the-art hardware. The stress test validates new tuning tricks and identifies performance regressions.

Two years ago, I presented (video / slides) at PostGIS Day on challenges of this workload. In honor of this week’s PostGIS Day 2024, I’ve run the same benchmark on Postgres 17 and the very latest hardware. The findings:

PostgreSQL keeps getting better! Core improvements sped up index building in particular.
The osm2pgsql loader got better too! New takes on indexing speed things up.
Hardware keeps getting better! It has been two years since my last report and the state-of-the-art has advanced.

Tune Your Instrument

First, we are using bare metal hardware—a server with 128GB RAM—so so let’s tune Postgres for loading and to match that server:

max_wal_size = 256GB
shared_buffers = 48GB
effective_cache_size = 64GB
maintenance_work_mem = 20GB
work_mem = 1GB

Second, let’s prioritize bulk load. The following settings do not make sense for a live system under read/write load, but they will improve performance for this bulk load scenario:

checkpoint_timeout = 60min
synchronous_commit = off
# if you don't have replication:
wal_level = minimal
max_wal_senders = 0
# if you believe my testing these make things
# faster too
fsync = off
autovacuum = off
full_page_writes = off

It’s also possible to tweak the background writer for the particular case of massive data ingestion, but for bulk loads without concurrency it doesn’t make a large difference.

How PostgreSQL has Improved

In 2022, testing that year's new AMD AM5 hardware loaded the data in just under 8 hours with Postgres 14. Today the amount of data in the OSM Planet files has grown another 14%. Testing with Postgres 17 still halves the load time, with the biggest drops coming from software improvements in the PG14-16 time-frame.

osm building time

The benchmark orchestration and metrics framework here is my pgbench-tools. Full hardware details are published to GeekBench.

GIST Index Building in PostgreSQL 15

The biggest PostgreSQL speed gains are from improvements in the GIST index building code.

The new code pre-sorts index pages before merging them, and for large GIST index builds the performance speed-up can be substantial, as reported by the author of osm2pgsql.

My tests showed going from PostgreSQL 14 to 15 delivered:

16% speedup
15% size reduction
86% GIST index build speedup!

osm index building time

There have been further improvements in PostgreSQL 16 and 17 in B-Tree index building, but this osm2pgsql benchmark does not really show them. The GIST index time build times wash out the other index builds.

How osm2pgsql has improved

In Q3 2022, osm2pgsql 1.7 made a technique called the Middle Way Node Index ID Shift the new default.

Middle Way Node Index ID Shift is a clever design approach that compresses the database's largest index, trading off lookup and update performance for a smaller footprint. It uses a Partial Index to merge nearby values together into less fine grained sections. When an index is used frequently, this would waste too many CPU cycles. Similar to hash bucket collision, partial indexes have to constantly exclude non-matched items. That chews through extra CPU on every read. In addition, because individual blocks hold so many more values, the locking footprint for updates increases proportionately. However, for large but infrequently used indexes like this one, those are satisfactory trade-offs.

Applying that improvement dropped my loading times by 37% and plummeted the database size from 1000GB to under 650GB. Total time at the terabyte size had crept upward to near 10 hours. The speed-up drove it back below 6 hours.

The osm2pgsql manual shows the details in its Update for Expert Users. I highly recommend that section and its Improving the middle blog entry. It's a great study of how PG's skinnable indexing system lets applications optimize for their exact workload.

How hardware has improved

SSD Write Speed

During data import, the osm2pgsql workload writes heavily at medium queue depths for hours. The best results come from SSDs with oversized SLC caches that also balance cleanup compaction of that cache. The later CREATE TABLE AS (CTAS) sections of the build reach its peak read/write speeds.

I saw 11GB/s from a Crucial T705 PCIe 5.0 drive the week (foreshadowing!) I was running that with an Intel i9-14900K:

read write for osm

osm2pgsql has a tuning parameter named --number-processes that guides how many parallel operations the code tries to spawn.

For the server and memory I used in this benchmark, increasing--number-processesfrom my earlier 2 to 5 worked well. However, be careful: you can easily go too far! Bumping up this parameter increases memory usage too. Going wild on the concurrent work will run you out of memory and put you into the hands of the Linux Out of Memory (OOM) killer.

Processor advances

Obviously, every year processors get a little better, but they do so in different ways and at different rates.

For later 2023 and testing against PostgreSQL 15 and 16, an Intel i7-13600K overtook the earlier AMD R5 7700X. There was another small bump in 2024 upgrading to an i9-14900K.

But this is a demanding regression test workload, and it only took a few weeks of running the OSM workload to trigger the i9-14900K’s voltage bugs to the point where my damaged CPU could not even finish the test.

Thankfully I was able to step away from those issues when AMD's 9600X launched. Here's the latest results from PG17 on an AMD 9600X, with the same SK41 2TB drive as I tested in 2022 for my PostGIS Day talk.

My best OSM import results to date

2024-10-15 10:03:41  [00] Reading input files done in 7851s (2h 10m 51s).
2024-10-15 10:03:41  [00]   Processed 9335778934 nodes in 490s (8m 10s) - 19053k/s
2024-10-15 10:03:41  [00]   Processed 1044011263 ways in 4301s (1h 11m 41s) - 243k/s
2024-10-15 10:03:41  [00]   Processed 12435485 relations in 3060s (51m 0s) - 4k/s
2024-10-15 10:03:41  [00] Overall memory usage: peak=158292MByte current=157746MByte...
2024-10-15 11:32:13  [00] osm2pgsql took 13162s (3h 39m 22s) overall. f

Completed in less than 4 hours!

PostgreSQL 17 is about 3% better on this benchmark than PostgreSQL 16 when replication is used, thanks to improvements in the WAL infrastructure in PostgreSQL 17.

I look forward to following up on this benchmark in more detail, after my scorched Intel system is fully running again! Like the speed of the Postgres ecosystem, the pile of hardware I've benchmarked to death grows every year.

by Greg Smith (Greg.Smith@crunchydata.com) at November 19, 2024 02:30 PM

September 26, 2024

PostGIS Development

PostGIS 3.5.0

The PostGIS Team is pleased to release PostGIS 3.5.0! Best Served with PostgreSQL 17 RC1 and GEOS 3.13.0.

3.5.0

source download md5
NEWS
PDF docs: en ja, fr, zh_Hans
HTML Online en ja fr zh_Hans
Cheat Sheets:
- postgis: en ja fr zh_Hans
- postgis_raster: en ja fr zh_Hans
- postgis_topology: en ja fr zh_Hans
- postgis_sfcgal: en ja fr zh_Hans
- address standardizer, postgis_tiger_geocoder: en ja fr zh_Hans

This release is a feature release that includes bug fixes since PostGIS 3.4.3, new features, and a few breaking changes.

by Regina Obe at September 26, 2024 12:00 AM

September 25, 2024

Crunchy Data

Vehicle Routing with PostGIS and Overture Data

The Overture Maps collection of data is enormous, encompassing over 300 million transportation segments, 2.3 billion building footprints, 53 million points of interest, and a rich collection of cartographic features as well. It is a consistent global data set, but it is intimidatingly large -- what can a person do with such a thing?

Building cartographic products is the obvious thing, but what about the less obvious. With an analytical engine like PostgreSQL and Crunchy Bridge for Analytics, what is possible? Well turns out, a lot of things.

Crunchy Data recently joined the Overture Maps Foundation as a continuation of support for open spatial data management and mapping. We are excited about building on what is possible bringing the power of Postgres to Overture open map data.

Routing with Overture

Back to thinking about what can Overture and Postgres/PostGIS do together. How about vehicle routing?

alt

Not global routing, but something more tractable, and perhaps more useful (how many people have global routing problems?) such as local routing. In this walk-through we will:

extract enough Overture transportation data to perform useful local routing;
condition that data to be usable by the pgRouting engine; and,
actually run some routing queries.

Database Setup

For this example, we will be using the new Geospatial features of Crunchy Bridge for Analytics.

When creating a new cluster, click the drop-down option and select an "Analytics Cluster".

alt

SET pgaudit.log TO 'none';

-- create the spatial analytics extension and postgis:
CREATE EXTENSION crunchy_spatial_analytics CASCADE;

-- to re-enable audit logging in the current session
RESET pgaudit.log;

Then enable the pgrouting extension.

CREATE EXTENSION pgrouting;

The database is ready to go!

Data Import

One of the things that makes the Overture data sets so enticing is the way they are hosted: in GeoParquet format on S3 and Azure object storage.

This alone would not be a big deal, but since spring 2024 the Overture data is spatially sorted. That means it is possible for a client with GeoParquet support to pull out a spatial subset of the data without having to scan the whole collection. We can get the 20 thousand features we want, without having to read through the whole 300 million feature collection.

The trickiest part of the data access is figuring out the URL to pull the Overture data from. The data are released approximately monthly, and each theme consists of multiple parquet files. For our purposes though, we can use the * character in the URL and the analytics code treats the collection of files as a single data set.

CREATE FOREIGN TABLE ov_segments ()
    SERVER crunchy_lake_analytics
    OPTIONS (path 's3://overturemaps-us-west-2/release/2024-08-20.0/theme=transportation/type=segment/*.parquet');

This SQL does not initiate a download, it creates a "foreign table", similar to a view, but in which the data is not stored locally on the database server. In this case, of course, the data resides on S3, and nothing has been downloaded yet.

So, we do not want to run SELECT * FROM ov_segments, for example, because that would download the entire contents of the collection. Instead, we should subset the download, and because the data are spatially sorted, we can do it efficiently with a spatial filter.

-- Use the same table definition as the FDW table
CREATE TABLE ov_segments_local ()
    INHERITS (ov_segments);

-- Query only those features that are within our area of interest
INSERT INTO ov_segments_local
    SELECT ov_segments.*
    FROM
        ov_segments,
        (VALUES ('LINESTRING(-123.455 48.391,-123.283 48.522)'::geometry)) AS t(q)
    WHERE (bbox).xmin >= ST_XMin(q)
      AND (bbox).xmax <= ST_XMax(q)
      AND (bbox).ymin >= ST_YMin(q)
      AND (bbox).ymax <= ST_YMax(q);

Despite addressing a data collection with 300 million records, the query returns 20 thousand records in a few seconds.

Data Structure

We have transportation segments! Are we ready to start routing? Not yet.

We have several data structuring tasks to get the data ready for vehicle routing:

We need to further filter the segments down to those classes that participate in the vehicle network. No paths, no tracks, no segments currently under construction.
We need to convert the speed and road class attribution in Overture to a "cost" for pgRouting to apply to each edge.
We need to change the physical structure of the Overture segments to the pure edge/node structure that is used by pgRouting, and identify one-way segments.
We need to change the unique Overture UUIDs into integers for pgRouting.

The starting point for data structuring is the Overture segment feature, and it is a complex one!

-- Get a pretty JSON view of the data structure
SELECT row_to_json(ov_segments_local, true)
  FROM ov_segments_local
  WHERE id = '08828d1aac3fffff043df239fe1d3069';

Full JSON structure of a segment

{
  "id":"08828d1aac3fffff043df239fe1d3069",
  "geometry":{
    "type":"LineString",
    "coordinates":[[-123.3617848,48.4325098], ...[-123.3626606,48.4352395]]},
  "bbox":{
    "xmin":-123.3627,"xmax":-123.3618,
    "ymin":48.4325,"ymax":48.43525},
  "version":0,
  "sources":[
    {"property":"routes","dataset":"OpenStreetMap","record_id":"r8747097","update_time":null,"confidence":null},{"property":"","dataset":"OpenStreetMap","record_id":"w476265027","update_time":null,"confidence":null},
    {"property":"","dataset":"OpenStreetMap","record_id":"w494031748","update_time":null,"confidence":null}],
  "subtype":"road",
  "class":"primary",
  "names":{
    "primary":"BlanshardStreet",
    "common":null,
    "rules":[
      {"variant":"common","language":null,"value":"BlanshardStreet","between":null,"side":null}]},
  "connector_ids":[
    "08f28d1aac38818d0429ea4e482966af",
    "08f28d1aac38818d0429ea4e482246ae",
    "08f28d1aac28d6680473cb2c125fcd98"],
  "connectors":[
    {"connector_id":"08f28d1aac38818d0429ea4e482966af","at":0},
    {"connector_id":"08f28d1aac38818d0429ea4e482246ae","at":0.3},
    {"connector_id":"08f28d1aac28d6680473cb2c125fcd98","at":1}],
  "routes":[
    {"name":"Highway17(BC)(North)","network":"CA:BC","ref":"17","symbol":"https://upload.wikimedia.org/wikipedia/commons/7/76/BC-17.svg","wikidata":"Q918890","between":[0.856363,1]}],
  "subclass":null,
  "subclass_rules":null,
  "access_restrictions":[{
    "access_type":
      "denied",
      "when":{
        "during":null,
        "heading":"backward",
        "using":null,
        "recognized":null,
        "mode":null,
        "vehicle":null},
      "between":null}],
  "level_rules":null,
  "destinations":null,
  "prohibited_transitions":null,
  "road_surface":[{"value":"paved","between":null}],
  "road_flags":null,
  "speed_limits":[{
    "min_speed":null,
    "max_speed":{
      "value":50,
      "unit":"km/h"},
    "is_max_speed_variable":null,
    "when":null,
    "between":null}],
  "width_rules":null,
  "theme":"transportation",
  "type":"segment"
}

Filtering Class for Vehicle Routing

Fortunately we can do all our filtering for vehicle segments by using the class attribute of segments.

There are a lot of combinations of class and subclass:

SELECT DISTINCT class, subclass
  FROM ov_segments_local
  ORDER BY 1,2;

All the combinations of class and subclass

     class     |    subclass
---------------+----------------
 bridleway     |
 cycleway      | cycle_crossing
 cycleway      |
 footway       | crosswalk
 footway       | sidewalk
 footway       |
 living_street |
 motorway      | link
 motorway      |
 path          |
 pedestrian    |
 primary       | link
 primary       |
 residential   |
 secondary     | link
 secondary     |
 service       | alley
 service       | driveway
 service       | parking_aisle
 service       |
 steps         |
 tertiary      | link
 tertiary      |
 track         |
 trunk         | link
 trunk         |
 unclassified  |
               |

And of those many combinations, there are many segments we should exclude--paths, pedestrian, bridleways, and more!

alt

By restricting to a few classes--motorway, primary, residential, secondary, tertiary, trunk, unclassified--results in a network that has only vehicle segments.

alt

Converting Speed to Cost

Many of the segments in our collection of vehicle segments have a speed limit on them, but the model is a little complicated. Because segments can be quite long it is possible (though rare) for a single segment to have multiple speeds. So the Overture model for speed limits looks like this:

  "speed_limits": [{
     "min_speed": null,
     "max_speed": {
        "value": 50,
        "unit": "km/h"},
     "is_max_speed_variable": null,
     "when": null,
     "between": null}],

For simplicity, we will use the first available speed limit in this example, and apply it to the whole segment. To be more precise, we would split the segment into one edge for each speed limit.

For many segments, there is no speed limit provided, so for those we can use defaults and provide different defaults for different classes: a default speed limit for a trunk road might be 90km/hr, and a default for a residential street might be 40km/hr.

So, converting the speed limits to cost, then looks like this:

Find the speed limit if there is one.
- Apply a class based default if there is not.
Convert any "miles per hour" limits to "kilometers per hour"
Convert to "meters per second".
Calculate the length of the segment in meters.
Calculate the time required to traverse the segment, in seconds.

The last step is the fun one: each segment is costed based on how long it takes to traverse it. This way a 1 kilometer segment with a speed limit of 100 km/h has half the cost of the same segment with a 50 km/h limit.

PL/PgSQL functions to convert speed limits into cost

--
-- Deal with kmph/mph units, and fill in any null
-- speed information with sensible defaults based
-- on the segment class.
--
CREATE OR REPLACE FUNCTION pgr_segment_kmph(speed float8, unit text, class text)
RETURNS FLOAT8 AS
$$
DECLARE
    default_kmph FLOAT8 := 40;
BEGIN

    -- Convert mph to kmph where necessary
    IF unit = 'mph' THEN
        speed := speed * 1.60934;
    END IF;

    IF speed IS NOT NULL THEN
    	RETURN speed;
    END IF;

    -- Apply some defaults
    -- Should not be driving fast on service roads
    IF class = 'service' THEN
        speed := 20;
    -- Or on residential roads
    ELSIF class = 'residential' THEN
        speed := 30;
    -- Everywhere else, use the default
    ELSE
        speed := coalesce(speed, default_kmph);
    END IF;

    RETURN speed;

END;
$$ LANGUAGE 'plpgsql';

--
-- The cost to traverse a segment is the number of
-- seconds needed to traverse it, so distance over speed.
--
CREATE OR REPLACE FUNCTION pgr_segment_cost(geom geometry, speed_kmph float8)
RETURNS FLOAT8 AS
$$
DECLARE
    length_meters FLOAT8;
    default_kmph FLOAT8 := 40;
    kmph FLOAT8;
    cost FLOAT8;
    meters_per_second FLOAT;
BEGIN
    -- Geography length is in meters
    length_meters := ST_Length(geom::geography);

    -- Convert km/hour into meters/second
    meters_per_second := speed_kmph * 1000.0 / 3600.0;

    -- Segment cost is the number of seconds
    -- needed to traverse the segment
    RETURN length_meters / meters_per_second;
END;
$$ LANGUAGE 'plpgsql';

Identifying One-way Segments

One of the strangest aspects of the Overture model is the handling of one-way streets. Most models have a boolean "one way" flag, or maybe a "direction" attribute with "forward", "backward" and "both".

Overture models directionality as one in a number of possible "restrictions" on the segment, here's the relevant JSON from our example segment.

  "access_restrictions":[{
    "access_type":
      "denied",
      "when":{
        "during":null,
        "heading":"backward",
        "using":null,
        "recognized":null,
        "mode":null,
        "vehicle":null},
      "between":null}],

So every segment has a list of restrictions, and "heading" is one of them, but also mode of transport, vehicle type, time period, and others. Because one-way is a pretty important restriction in a route planner, we cannot simply check the first restriction, we will have to actually check every restriction on a segment and only set the "one way" flag if the "heading" restricting is non-null.

Converting from Overture Segments to pgRouting Edges

The most challenging aspect of preparing the Overture segments for pgRouting is the model transformation between "segments" and "edges".

The pgRouting graph is a simple structure of vertices and edges. Vertices are points and edges are defined as joining two vertices, so any edge can be characterized by stating its "source" and "target" vertex.

alt

In the Overture graph, on the other hand, every segment connects at least two connectors.

alt

So "source" and "target" connector alone are not enough to characterize a segment. So Overture uses a list of connectors on the edge.

  "connectors":[
    {"connector_id":"08f28d1aac38818d0429ea4e482966af","at":0},
    {"connector_id":"08f28d1aac38818d0429ea4e482246ae","at":0.3},
    {"connector_id":"08f28d1aac28d6680473cb2c125fcd98","at":1}],

The unique identifier for each connector is given, and the at attribute provides the proportion along the edge where the connector appears. The 0 connector is at the start, the 1 connector is at the end, and the 0.3 connector is 30% of the distance between the start and the end.

So to convert from Overture "segments" to pgRouting "edges", we just need to iterate over the connectors list and apply the ST_LineSubstring function to chop the original segment into the right edges.

A PL/PgSQL function to chop Overture segments into edges

--
-- Create a simple table that reflects some of the
-- input we have generated (speed, directionality)
-- and mirrors some other useful info (surface,
-- primary name) for mapping purposes.
-- Most importantly, carry out the chopping of segments
-- into edges with only two graph connectors, one at
-- the start and one at the end.
--
CREATE OR REPLACE FUNCTION ov_to_pgr(segment ov_segments)
RETURNS TABLE(
    id text,
    geometry geometry(LineString, 4326),
    connector_source text,
    connector_target text,
    class text,
    subclass text,
    surface text,
    speed_kmph real,
    primary_name text,
    one_way boolean
) AS
$$
DECLARE
    n integer;
    connector_to float8;
    connector_from float8 := 0.0;
BEGIN

    -- Carry over some attributes directly
    id := segment.id;
    class := segment.class;
    subclass := segment.subclass;
    primary_name := (segment.names).primary;
    -- Take the first surface we see rather than
    -- chopping up the segment here
    surface := segment.road_surface[1].value;

	speed_kmph := pgr_segment_kmph(segment.speed_limits[1].max_speed.value, segment.speed_limits[1].max_speed.unit, segment.class);

    -- Most edges are two-way, but a few are one-way, flag
    -- those so we can adjust the cost later
    one_way := false;
    IF segment.access_restrictions IS NOT NULL THEN

    	-- Overture uses "backward" access restrictions
    	-- for one-way segments, and the restriction can
    	-- show up anywhere in the list, so...
        n := array_length(segment.access_restrictions, 1);
        FOR i IN 1..n LOOP
            IF segment.access_restrictions[i].access_type = 'denied' AND segment.access_restrictions[i].when.heading = 'backward' THEN
                one_way := true;
                EXIT;
            END IF;
        END LOOP;
    END IF;

    -- Chop segments into edges with vertexes at
    -- the connectors. Each edge has two connectors
    -- (one at each end) so a list of 3 connectors
    -- implies outputting 2 edges.
    connector_target := segment.connectors[1].connector_id;
    connector_to := 0.0;
    n := array_length(segment.connectors, 1);
    FOR i IN 2..n LOOP

        -- Avoid emitting zero-length segments
        IF connector_to = segment.connectors[i].at THEN
            CONTINUE;
        END IF;
        connector_from := connector_to;
        connector_source := connector_target;
        connector_to := segment.connectors[i].at;
        connector_target := segment.connectors[i].connector_id;

        -- This is where we chop!
        geometry := ST_SetSRID(ST_LineSubstring(segment.geometry, connector_from, connector_to),4326);

        -- Table-valued output means the return fills
        -- in the output parameters for us magically,
        -- as long as we have used the correct variable
        -- names.
        RETURN NEXT;
    END LOOP;

END;
$$ LANGUAGE 'plpgsql';

Creating a table of Connectors

In order to actually run routing on our final data, we are going to need a table of network vertices, so that we can figure what "source" vertex and "target" vertex correspond to a particular pair of routing points.

It would seem that the Overture connector file would provide an easy method to get those points, but unfortunately I discovered while testing this process that the file is incomplete. Not all of the connectors referenced in the segments type appear in the connectors type.

Fortunately, there is another place a complete list of connectors appears: in the connectors attribute of the segments:

  "connectors":[
    {"connector_id":"08f28d1aac38818d0429ea4e482966af","at":0},
    {"connector_id":"08f28d1aac38818d0429ea4e482246ae","at":0.3},
    {"connector_id":"08f28d1aac28d6680473cb2c125fcd98","at":1}],

Using the segment geometry, and the connectors list, it is possible to materialize (with ST_LineLocatePoint)a complete list of all connectors associated with the segments in our tables.

A SQL query to generate connectors from connector list

DROP TABLE IF EXISTS pgr_connectors;
CREATE TABLE pgr_connectors AS
    WITH connectors AS (
        SELECT (unnest(connectors)).*, geometry
        FROM ov_segments_local
        WHERE class IN ('motorway', 'primary', 'residential', 'secondary', 'tertiary', 'trunk', 'unclassified')
    )
    -- Unfortunately a connector will show up on every segment
    -- it connects, so we need to dedupe the set, which can be costly
    -- for larger areas.
    SELECT DISTINCT ON (connector_id)
        nextval('pgr_connector_seq') AS vertex_id,
        connector_id,
        ST_SetSRID(ST_LineInterpolatePoint(geometry, at),4326)::geometry(point, 4326) AS geometry
    FROM connectors;

CREATE INDEX pgr_connectors_x ON pgr_connectors (connector_id);
CREATE INDEX pgr_connectors_geom_x ON pgr_connectors USING GIST (geometry);

Data Processing

I have outlined individual components, but thus far have not yet integrated them into a sequential process to convert raw Overture GeoParquet to pgRouting compatible tables.

Here is the complete process, roughly:

Create an FDW table ov_segments referencing the raw Overture files online.
Pull a local copy of that table, ov_segments_local, only for our area of interest.
Process the ov_segments_local table, chopping segments into edges, and copying some attributes of interest into a pgr_segments table.
Process the ov_segments_local table, pulling out a unique list of connectors and connector geometry into a pgr_connectors table.
Process the pgr_segments table, adding integer unique keys for edge and vertex identification, creating the final pgr_edges table ready for routing.

alt

All the functions and the overall process are available in the overture.sql files.

Routing

After all the work, we are ready to route, which should be straightforward, right? We have pgRouting data ready, with low costs on the fast streets and higher costs on the slow streets.

alt

Unfortunately there are still a few pieces of code left to write, because pgRouting provides a very low level generic graph solver and most people solving routing problems have more specific needs.

For example,

pgRouting expects the start- and end-points of a route to be specified using a vertex id, (like these red dots) but
most people working with spatial routing are dealing with start- and end-points that are coordinates (like the green triangle)

alt

So we need to start our routing function by translating from locations to vertex identifiers.

And also,

pgRouting returns route results as a list of edge ids, but
most people working with spatial routing want, at a minimum, a linestring representation of the route to put on a map.

So we need to end our routing function by joining the edge identifiers back to the edges table to create the route geometry.

alt

To drive the pgr_dijkstra() function, we need to provide a SQL statement that generates a list of edges and source/target vertices, and for this example, we pull all 13902 edges from the pgr_edges table.

The final function looks like this:

CREATE OR REPLACE FUNCTION pgr_routeline(pt0 geometry, pt1 geometry)
RETURNS TEXT AS
$$
DECLARE
    vertex0 bigint;
    vertex1 bigint;
    edges_sql text;
    result text;
BEGIN

    -- Lookup the nearest vertex to our start and end geometry
    SELECT vertex_id INTO vertex0 FROM pgr_connectors ORDER BY geometry <-> pt0 LIMIT 1;
    SELECT vertex_id INTO vertex1 FROM pgr_connectors ORDER BY geometry <-> pt1 LIMIT 1;
    RAISE DEBUG 'vertex0=% vertex1=%', vertex0, vertex1;

    --
    -- SQL to create a pgRouting graph
    -- This is as simple as they come.
    -- More complex approaches might
    --  * scale cost based on class
    --  * restrict edges based on box formed
    --    by start/end points
    --  * restrict edges based on class
    --
    edges_sql := 'SELECT
            edge_id AS id,
            source_vertex_id AS source,
            target_vertex_id AS target,
            cost, reverse_cost
        FROM pgr_edges';

    -- Run the Dijkstra shortest path and join back to edges
    -- to create the path geometry
    SELECT ST_AsGeoJSON(ST_Union(e.geometry))
        INTO result
        FROM pgr_dijkstra(edges_sql, vertex0, vertex1) pgr
        JOIN pgr_edges e
        ON e.edge_id = pgr.edge;

    RETURN result;

END;
$$ LANGUAGE 'plpgsql';

To run the function and get back the route, feed it two points located within the area of your downloaded data.

SELECT pgr_routeline(
    ST_Point(-123.37826,48.41976, 4326),
    ST_Point(-123.35214,48.43891, 4326));

Resources

To see all the functions, tables and SQL for this example, check out the overture.sql file.
The pictures in this example were created using QGIS.
The documentation of Crunchy Bridge for Analytics' spatial features has more on accessing Overture data.

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at September 25, 2024 01:30 PM

September 16, 2024

PostGIS Development

PostGIS 3.5.0rc1

The PostGIS Team is pleased to release PostGIS 3.5.0rc1! Best Served with PostgreSQL 17 RC1 and GEOS 3.13.0.

3.5.0rc1

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.4.3 and new features.

Changes since 3.5.0beta1 are as follows:

#5779 Failures building in parallel mode (Sandro Santilli)
#5778, Sections missing in What’s new (Regina Obe)

by Regina Obe at September 16, 2024 12:00 AM

PostGIS Development

PostGIS 3.5.0beta1

The PostGIS Team is pleased to release PostGIS 3.5.0beta1! Best Served with PostgreSQL 17 RC1 and GEOS 3.13.0.

3.5.0beta1

This release is a beta of a major release, it includes bug fixes since PostGIS 3.4.3 and new features.

by Regina Obe at September 16, 2024 12:00 AM

September 09, 2024

Crunchy Data

PostGIS meets DuckDB: Crunchy Bridge for Analytics goes Spatial

Crunchy Data is excited to announce the next major feature release for Crunchy Bridge for Analytics: Geospatial Analytics.

We have developed a variety of features to connect Postgres and PostGIS to S3 and public web servers to make spatial data access easier than ever.

This release includes:

Creating an analytics table directly from a geospatial data set by providing only the URL, for ad-hoc queries and data transformations.
Creating a regular PostGIS table directly from a URL.
Automatic mapping of geospatial columns into PostGIS geometry type.
Support for GeoParquet, GeoJSON, Shapefile (zip), Geopackage, WKT in CSV, and more.
Delegate PostGIS functions and operators to DuckDB spatial for fast queries on GeoParquet.

Together, these make Crunchy Bridge for Analytics an easy-to-use and powerful platform for working with geospatial data.

Query almost any geospatial data set with one easy command

PostGIS is the most popular and versatile geospatial data processing tool available, and the underlying GEOS library powers most other geospatial applications. Crunchy has a long history in PostGIS and geospatial, and we’re lucky to count geospatial legends Paul Ramsey and Martin Davis (see: PostGIS, GEOS, JTS, pg_featureserv, pg_tileserv, and more) among our colleagues.

Crunchy Bridge for Analytics enhances PostgreSQL with the ability to run fast analytical queries on data files in S3 and public web servers, with queries accelerated using DuckDB and caching on local NVMe drives. DuckDB also has a spatial extension built on top of GEOS and inspired by PostGIS.

It was natural for us to look for ways in which we can take advantage of the capabilities offered by Bridge for Analytics for geospatial use cases. We soon realized that one of the challenges of geospatial data is the wide variety of formats and data sources, and the relative difficulty of getting them into PostgreSQL.

By leveraging the capabilities built into Bridge for Analytics, we’ve managed to simplify the experience of accessing any geospatial data set via s3 or https in PostgreSQL down to a very simple create foreign table command:

-- Create a table from the overture buildings data set,
-- auto-infers columns, caches GeoParquet files in the background
create foreign table ov_buildings ()
server crunchy_lake_analytics
options (path 's3://overturemaps-us-west-2/release/2024-08-20.0/theme=buildings/type=*/*.parquet');

-- Immediately start querying the >2 billion row data set,
-- uses range requests until files get cached
select (names).primary as building, st_area(geometry, true) as surface_m2
from ov_buildings
where (names).primary is not null
and (bbox).xmin <= 7.2275
and (bbox).xmax >= 3.3583
and (bbox).ymin <= 53.6316
and (bbox).ymax >= 50.7504
order by st_area(geometry) desc limit 1;
┌─────────────────────────┬───────────────────┐
│        building         │    surface_m2     │
├─────────────────────────┼───────────────────┤
│ Bloemenveiling Aalsmeer │ 449485.2894157285 │
└─────────────────────────┴───────────────────┘
(1 row)

Time: 10169.907 ms (00:10.170)

Queries on GeoParquet are significantly accelerated by DuckDB, and files will get automatically cached in the background. For instance, the ~600GB Overture data set can be fully cached on larger analytics clusters, which makes analytics tables a practical tool for building applications with Overture.

Support for geospatial formats is not limited to GeoParquet. You can directly create a table from Shapefile (in zip), GeoJSON, Geopackage, Geodatabase, KML, and many other file formats supported by the GDAL library, and you can use public URLs to get data directly from the source.

-- Load US state boundaries from a compressed TIGER/Line Shapefile
create foreign table state ()
server crunchy_lake_analytics
options (format 'gdal', path 'https://www2.census.gov/geo/tiger/TIGER2023/STATE/tl_2023_us_state.zip');

-- Inspect auto-inferred schema
\d state
                     Foreign table "public.state"
┌──────────┬──────────┬───────────┬──────────┬─────────┬─────────────┐
│  Column  │   Type   │ Collation │ Nullable │ Default │ FDW options │
├──────────┼──────────┼───────────┼──────────┼─────────┼─────────────┤
│ region   │ text     │           │          │         │             │
│ division │ text     │           │          │         │             │
...
│ geom     │ geometry │           │          │         │             │
└──────────┴──────────┴───────────┴──────────┴─────────┴─────────────┘
Server: crunchy_lake_analytics
FDW options: (path 'https://www2.census.gov/geo/tiger/TIGER2023/STATE/tl_2023_us_state.zip');

-- What are the biggest states?
select name, st_area(geom, true)/1000000 area_in_km2
from state
order by 2 desc limit 10;
┌────────────┬───────────────────┐
│    name    │    area_in_km2    │
├────────────┼───────────────────┤
│ Alaska     │ 1724364.048632004 │
│ Texas      │ 695668.3746231933 │
│ California │ 423965.0992563212 │
│ Montana    │ 380840.4022201886 │
│ New Mexico │ 314925.0846268172 │
│ Arizona    │ 295220.1394989747 │
│ Nevada     │ 286376.9475553515 │
│ Colorado   │ 269604.5427509235 │
│ Oregon     │ 254799.4066699504 │
│ Wyoming    │ 253326.2430649384 │
└────────────┴───────────────────┘
(10 rows)

Time: 802.659 ms

Queries on GDAL data sets are currently slower than on GeoParquet, but the files will be immediately cached on disk when creating the table, so they are only downloaded once. On very rare occasions when the server is replaced, or after the file was evicted from cache, the file is automatically re-downloaded on demand.

There are several existing tools for loading data into PostGIS, though they are relatively laborious, and usually involve downloading large files to your computer and subsequently re-uploading the output. The ogr_fdw extension by Paul is probably the most versatile geospatial data access option available for PostgreSQL, though it will re-request remote data files for every query and is hence more suitable for accessing remote databases and web services with filter pushdown.

Building geospatial data pipelines with PostGIS

Once you’ve created an analytics table, you can start building a data transformation pipeline to get the data into the shape you want via (materialized) views.

For instance, a very simple pipeline might look like:

-- Create an analytics table for ad-hoc queries and transformations
create foreign table state ()
server crunchy_lake_analytics
options (path 'https://www2.census.gov/geo/tiger/TIGER2023/STATE/tl_2023_us_state.zip');

-- Create a materialized view for rendering a simple bar chart with sub-millisecond query time
create materialized view states_by_size as
select stusps, name, st_area(geom, true)/1000000 area_in_km2 from state;

You can also combine multiple data sets with spatial joins and compose views:

-- National Forest System boundaries (Shapefile)
create foreign table forests ()
server crunchy_lake_analytics
options (path 'https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.AdministrativeForest.zip');

-- Fire occurence points in the US (Shapefile)
create foreign table fires ()
server crunchy_lake_analytics
options (path 'https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.MTBS_FIRE_OCCURRENCE_PT.zip');

-- Only consider fires in national forests in 2022
create view nfs_fires_in_2022 as
select fires.*, forests.adminfores
from forests, fires
where st_within(fires.geom, forests.geom)
and date_trunc('year', ig_date) = '2022-01-01';

-- Find the forests which had fires in 2022
create view forests_with_fires_in_2022 as
select *
from forests
where adminfores in (
  select adminfores from nfs_fires_in_2022
);

Finally, we also made it very straight-forward to create a regular heap table with a PostGIS geometry column directly from a public geospatial data set by setting the load_from option in a create table command.

-- Create a regular table with an index from a Shapefile zip (note: WITH uses = syntax)
create table forests ()
with (load_from = 'https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.AdministrativeForest.zip');
-- Add a spatial index
create index on forests using gist (geom);

-- You can also load data into an existing table using COPY, assuming the schemas match
copy forests from 'https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.AdministrativeForest.zip';

You can see we have a lot of options here, so some general guidance:

create foreign table + create view - for ad-hoc queries and transformations on current data
create foreign table + create materialized view + create index - for repeated selective queries on data sets that occasionally need to be refreshed
create table with load_from + create index - loading the table data directly into PostGIS as a one-off

Overall, our aim is to give you a powerful toolbox for geospatial data, while also simplifying common scenarios down to very simple operations (create foreign table, create view, start rendering).

Connecting QGIS to Crunchy Bridge for Analytics

Since Crunchy Bridge for Analytics is just PostgreSQL, you can directly create a connection to your Analytics cluster from QGIS and add your (foreign) tables and views as layers, which means you can very quickly go from geospatial data set to visualization.

For example, the 4 commands for creating the forest views from the previous section can give you a map of national forests which had fires in 2022 and where those fires occurred:

qgis from s3

Note that QGIS by default requires that the first column of the table is unique. This is quite often the case, but when it’s not you may need to create a view to reorder the columns or add a unique value.

PostGIS combined with DuckDB spatial

Under the covers, Crunchy Bridge for Analytics takes advantage of DuckDB spatial. It is an awesome DuckDB extension, though it is still in an early stage of development. We map PostGIS functions and operators to DuckDB spatial functions where possible to accelerate analytical queries, and otherwise pull geometries into PostGIS, such that any query works as expected.

By default, geometry values in analytics tables have SRID set to 0/unspecified. You can set the SRID using st_setsrid as usual to make functions such as st_distance return the right units, but that will happen in PostgreSQL and transferring the geometries from DuckDB to PostgreSQL might slow down some queries. On the other hand, you can easily transfer the data set into a regular table or materialized view with an index if needed.

For queries on (Geo)Parquet, the speedup from DuckDB can be quite significant, so it may be worth avoiding SRIDs. You can check explain verbose to see which part of the query is delegated to DuckDB.

Get started with Geospatial Analytics and tell us your thoughts!

We believe this initial geospatial analytics release helps to bridge the gap of going from raw geospatial data files into a structured/indexed PostGIS table. These new features can help bootstrap many geospatial applications.

We’re excited to share this new feature with customers and get feedback and continue to build out the next generation of spatial analytics.

Geospatial analytics is available today on Crunchy Bridge, and it only takes a few minutes to get started. See our spatial analytics documentation for additional details.

by Marco Slot (Marco.Slot@crunchydata.com) at September 09, 2024 02:00 PM

September 05, 2024

PostGIS Development

PostGIS 3.3.7

The PostGIS Team is pleased to release PostGIS 3.4.7! This is a bug fix release.

3.3.7

by Paul Ramsey at September 05, 2024 12:00 AM

September 04, 2024

PostGIS Development

PostGIS 3.4.3

The PostGIS Team is pleased to release PostGIS 3.4.3!

This version requires PostgreSQL 12-17, GEOS 3.8+, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.5+ is needed.

3.4.3

source download md5
NEWS
PDF docs: en, ja, fr

by Paul Ramsey at September 04, 2024 12:00 AM

July 06, 2024

PostGIS Development

PostGIS 3.5.0alpha2

The PostGIS Team is pleased to release PostGIS 3.5.0alpha2! Best Served with PostgreSQL 17 Beta2 and GEOS 3.12.2.

This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4-1.5 is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5 is needed.

3.5.0alpha2

This release is an alpha of a major release, it includes bug fixes since PostGIS 3.4.2 and new features.

by Regina Obe at July 06, 2024 12:00 AM

July 04, 2024

PostGIS Development

PostGIS 3.5.0alpha1

The PostGIS Team is pleased to release PostGIS 3.5.0alpha1! Best Served with PostgreSQL 17 Beta2 and GEOS 3.12.2.

This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.5.0+ is needed.

3.5.0alpha1

This release is an alpha of a major release, it includes bug fixes since PostGIS 3.4.2 and new features.

by Regina Obe at July 04, 2024 12:00 AM

May 23, 2024

Crunchy Data

Converting DMS to PostGIS Point Geometry

I love taking random spatial data and turning it into maps. Any location data can be put into PostGIS in a matter of minutes. Often when I’m working with data that humans collected, like historic locations or things that have not yet traditionally been done with computational data, I’ll find traditional Degrees, Minutes, Seconds (DMS) data. To get this into PostGIS and QGIS, you’ll need to convert this data to a different system for decimal degrees. There’s probably proprietary tools that will do this for you, but we can easily write our own code to do it. Let’s walk through a quick example today.

Let’s say I found myself with a list of coordinates, that look like this:

38°58′17″N 95°14′05″W

(this is the location of my town’s haunted hotel 👻)

This format of writing geographic coordinates is called DMS, Degrees, Minutes, Seconds (DMS). If you remember from 4th grade geography lessons, that is the latitude on the left there, representing N or S of the equator and longitude East or West of the Prime Meridian.

WKT & XY coordinates

PostGIS, and most computational spatial systems, work with a geographic system that is akin to an XY grid of the entire planet. Because it is XY, it is a longitude, latitude (X first) system.

postgis on xy globe

PostGIS utilizes with two kinds of geometry values:

WKT (Well-known text) where a point would look like this POINT(-126.4 45.32)
WKB (Well-known binary) where a point would look like this 0101000000000000000000F03F000000000000F03

Most often you’ll see the binary used to represent stored data and you can use a function, st_astext, to view or query it as text.

Converting coordinates to decimal degrees

To convert our traditional coordinates into decimals or WKT, we can use decimal math like this:

({long_degree}+({long_minutes}/60)+({long_seconds}/3600)

So for our location:

-- starting location
38°58′17″N 95°14′05″W

-- formula
38+(58/60)+(17/3600), 95+(14/60)+(05/3600)

-- switch the order since this is X first
-- make the Western quad negative
-- getting this result

 -95.2472222, 38.9713888

Regex Function for Making PostGIS Points out of DMS

If you have one location like this, you probably have a lot, so we’ll need a more sophisticated solution for our whole data set. You know if you need something done right, you ask Paul Ramsey. Paul worked with me on getting this function written that will convert DMS to PostGIS friendly (binary geometry) point data.

CREATE OR REPLACE FUNCTION dms_to_postgis_point(dms_text TEXT)
    RETURNS geometry AS
    $$
    DECLARE
        dms TEXT[] := regexp_match(dms_text, '(\d+)\D+(\d+)\D+(\d+)\D+([NS])\D+(\d+)\D+(\d+)\D+(\d+)\D+([EW])');
        lat float8;
        lon float8;
    BEGIN
        lat := dms[1]::float8 + dms[2]::float8/60 + dms[3]::float8/3600;
        lon := dms[5]::float8 + dms[6]::float8/60 + dms[7]::float8/3600;
        IF upper(dms[4]) = 'S' THEN
            lat := -1 * lat;
        END IF;

        IF upper(dms[8]) = 'W' THEN
            lon := -1 * lon;
        END IF;

        RETURN ST_Point(lon, lat, 4326);
    END;
    $$
    LANGUAGE 'plpgsql'
    IMMUTABLE
    STRICT;

Let’s do a quick test with our original point:

SELECT st_astext(dms_to_postgis_point('38°58′17″N 95°14′05″W'));

                  st_astext
---------------------------------------------
 POINT(-95.23472222222222 38.97138888888889)
(1 row)

Great, that works.

Creating a new column with your geometry

Now we can use built-in PostGIS functions to add a new geom column and run the function on our old lat_long column.

ALTER TABLE my_table ADD COLUMN geom geometry(Point);

UPDATE my_table SET geom = dms_to_postgis_point(lat_long);

Conclusion

PostGIS is just packed with so many cool functions to make sure you can turn anything into maps. Hope this helps you get started if you’re using traditional lat long data.

by Elizabeth Christensen (Elizabeth.Christensen@crunchydata.com) at May 23, 2024 05:00 PM

March 19, 2024

Crunchy Data

Inside PostGIS: Calculating Distance

Calculating distance is a core feature of a spatial database, and the central function in many analytical queries.

"How many houses are within the evacuation radius?"
"Which responder is closest to the call?"
"How many more miles until the school bus needs routine maintenance?"

PostGIS and any other spatial database let you answer these kinds of questions in SQL, using ST_Distance(geom1, geom2) to return a distance, or ST_DWithin(geom1, geom2, radius) to return a true/false result within a tolerance.

SELECT ST_Distance(
  'LINESTRING (150 300, 226 274, 320 280, 370 320, 390 370)'::geometry,
  'LINESTRING (140 180, 250 230, 350 200, 390 240, 450 200)'::geometry
);

It all looks very simple, but under the covers there is a lot of machinery around getting a result fast for different kinds of inputs.

Distance Under the Covers

Distance should be easy! After all, we learn how to calculate distance in middle school! The Pythagorean Theorem tells us that the square of the hypotenuse of a right triangle is the sum of the squares of the two other sides.

Pythagoras Proof by Rearrangement

So, problem solved, right?

Not so fast. Pythagorus gives us the distance between two points, but objects in spatial databases like PostGIS can be much more complex.

Complex Polygons

How would I calculate the distance between two complex polygons?

Brute Force

The straight-forward solution is to just find the distance between every possible combination of edges in the two polygons, and return the minimum of that set.

Brute force distance

This is a "quadratic" algorithm, what computer scientists call O(n^2), because the amount of work it generates is proportional to the square of the number of inputs. As the inputs get big, the amount of work gets very very very big.

Fortunately, there are better ways.

Projection and Pruning

The distance implementation in PostGIS has two major code paths:

For disjoint (non-overlapping) inputs, an optimized calculation; and,
For overlapping inputs, the brute force calculation.

Disjoint inputs are handled with a clever simplification of the problem space. Because the inputs are disjoint, it is possible to construct a line between the centers of the two inputs.

Sorted and pruned distance

If every edge in each object is projected down onto the line, it becomes possible to perform a sort of those edges, such that edges that are near on the line are also near in the sorted lists, and near in space.

Starting from the mid-point of each object it is relatively inexpensive to quickly prune away large numbers of edges that are definitely not the nearest edges, leaving a much smaller number of potential targets that need to have their distance calculated.

The cost of creating the projected segments is just O(n), but the cost of the sort step is O(n*log(n)) so the overall cost of the algorithm is O(n*log(n)).

This is all well and good, but what if the inputs do overlap? Then the algorithm falls back to brute-force and O(n^2). Is there any way to avoid that?

Linear Time Spatial Trees

The project-and-prune approach is very clever, but it is possible to generate a spatially searchable representation of the edges even faster, by using the fact that edges in a LineString or LinearRing are highly spatial autocorrelated:

The end point of one edge is always the start point of the next.
The edges mostly don't cross each other.

Basically, the edges are already spatially pre-sorted. That means it is possible to build a decent tree structure from them incurring any non-linear computational cost.

Linear ring tree

Start with the edges in sorted order. The bounds of the edges form the leaf nodes of a spatial tree. Merge neighboring leaf nodes, now you have the first level of interior nodes. Continue until you have only one node left, that is your root node. The cost is O(n) + O(0.5n) + O(0.25n) ... which is to say in aggregate, O(n).

Ordinarily, building a spatial tree would be expected to cost about O(n*log(n)), so this is a nice win.

The CIRC_NODE tree used to accelerate distance calculation for the geography type is built using this process.

Overlapping Inputs and Distance Calculation

There is no guarantee that a tree-indexed approach will crack the overlapping polygon problem.

Disjoint polygons are very amenable to distance searching trees, because it is easy to discard whole branches of the tree that are definitionally too far away to contain candidate edges.

Pruning disjoint objects

As inputs begin to overlap, it becomes harder to discard large portions of the trees, and as a result a lot of computation is spent traversing the tree, even if a moderate proportion of candidates can be discarded from the lower branches of the tree.

Pruning disjoint objects

Next Steps

The distance calculation in PostGIS has not been touched in many years, for good reason: it's really important, so any re-write has to be definitely an improvement on the existing code, over all known (and unknown) use cases.

However, there is some already built and tested code, in the code base, which has never been turned on, the RECT_TREE.

Like the CIRC_NODE tree in geography, this implementation is based on building a tree from spatially coherent inputs. Unlike the CIRC_NODE tree, it has not been proven to be faster than the existing implementation in all cases.

A next development step will be to revive this implementation, evaluate it for implementation efficiency, and test effectiveness:

Can it exceed the current sort-and-prune strategy for disjoint polygons?
Can it exceed brute-force for overlapping polygons?

by Paul Ramsey (Paul.Ramsey@crunchydata.com) at March 19, 2024 01:00 PM

March 06, 2024

Crunchy Data

Connecting QGIS to Postgres and PostGIS

QGIS, the Quantum Geographic Information System, is an open-source graphical user interface for map making. QGIS works with a wide variety of file types and has robust support for integrating with Postgres and PostGIS. Today I just wanted to step through getting QGIS connected to a Postgres database and the basic operations that let you connect the two systems.

Connecting QGIS to Postgres

Connecting QGIS to Postgres is very similar to any other GUI or application, you’ll need the database host, login, and password details. This is the same process for a local connection or remote one (like Crunchy Bridge). You’ll connect the first time through the Browser option listed PostgreSQL and Add New Connection.

connect qgis to a postgres

By default, QGIS will store your passwords as plain text in a file. If you’re just working with a local database and don’t have anything special in there, that may not be a problem. But if you’re working with a larger production database shared by lots of users, you’ll want to opt for a higher level of protection for the password. In your PostgreSQL connections box, you’ll see a way to add Configurations for the password. Here you can create a master password, store your database credentials, and they’ll be encrypted and only decrypted with your master password.

Using QGIS to load data

QGIS is a great way to get spatial data into Postgres and PostGIS. You can use any file type supported by QGIS including vector types like shapefiles (shp), GeoJSON, and even csv files on your local machine. To load data into QGIS, you’ll first go to Layer —> Add Layer and choose the type of file you have.

new vector layer qgis

For this sample I have a county map of the state of Kansas. Maps like this are often freely available for download from government agencies.

Once my layer is in, I can toggle on to show labels which will add any label data for your geometry.

qgis layer with labels

Now that I have data in QGIS I can save this to a Postgres database. This will allow me to work with this data later. Go to the DB manager icon, and choose Import Layer.

qgis db manager

There are several settings here, like choosing the primary key, the origin and destination SRID. QGIS will even suggest that adding an index for your geometry column is a good idea and will build an index in your database for you.

Loading data from PostGIS into QGIS

QGIS works both ways, so if you already have a dataset to work from, you can just use that data as your source. In that case, you’ll start from the Layer — Add Layer option. You’ll either need to specify the database connection you want, or add a new one here. You’ll be able to open all the tables in your database and choose which ones to add as a layer in your map viewer.

qgis db import errors

There's a good overview of PostGIS file loading on our blog.

File loading troubleshooting

Depending on the file you get, QGIS may or may not be super happy with it. You might see a warning icon next to your file. The main issues that QGIS will be warning you about are:

There’s no spatial reference id
The spatial reference ID is an important quirk when dealing with geospatial data. You can default to 4326 if you don’t have a better option.
To find a spatial reference id, or see if it is set:
```
SELECT ST_SRID(geom) FROM my_table_name LIMIT 1;
```
And to update it:
```
SELECT UpdateGeometrySRID('my_table_name','geom',4326);
```
There’s no geometry column
Assuming you have points, lines, or polygon data and it is just in the wrong data type, you can create a new column for the geometry and point your data to that new column.
```
ALTER TABLE my_table_name
ADD COLUMN geom geometry(Point, 4326);

UPDATE my_table_name
SET geom = ST_SetSRID(ST_MakePoint(my_column, my_column), 4326);
```
There’s no primary key
Relational databases rely on a primary key to tie other data together. If there’s already an id column, you can just create a primary key index on it like this. If there’s not a unique column like that, you might have to do a bit more work on the data to get this fixed.
```
alter table my_table add primary key (id_column);
```

Writing and Saving SQL

One really cool feature of QGIS is that you can write SQL directly against your Postgres database and view the results as spatial geometries. You can save queries for use later as well. You can also use QGIS to create a layer based on your query results.

Here’s a sample query where I’m joining two data sources, one my geometry of Kansas counties that I loaded earlier. Second population data by county. I’m selecting just the geometry column and with the load option can have QGIS add that query result as a layer.

qgis sql query

QGIS will also let you save a query as “view”. This is a database specific term that will save your query results as a table for use later. This can also be loaded as a layer in QGIS projects. There’s an overview of using views in the post Postgres Subquery Powertools. Views are a great idea if you’re using only a small subset of data in your QGIS map but you are storing a larger dataset as well.

Here’s an example of a map I made using 4 different query layers, one for each different population density.

sql map layers qgis

Don’t forget that QGIS works in stacked layers, so your new SQL query layers will have to be on top of your base map or they won’t be visible.

Saving QGIS projects in Postgres

You can also save your QGIS project in your Postgres database. This is under the Project — Save to options. This can be a good idea if you want others on your team to have access to projects or if you don’t want your QGIS projects stored locally.

qgis saved projects

Final notes

There’s a great video of this and more from PostGIS Day 2020 called QGIS and PostGIS.

You can use QGIS to load shape files and other file types into Postgres
You can use QGIS to create maps from existing Postgres/PostGIS data sources
You can write queries in QGIS against your Postgres data and show the results as geometry layers
You can write join queries in QGIS and join your geometry fields with other attribute data or tables in your database
You can save all of your project work for QGIS in the Postgres database

by Elizabeth Christensen (Elizabeth.Christensen@crunchydata.com) at March 06, 2024 01:00 PM

Welcome to Planet PostGIS

July 21, 2026

3.7.0beta1

July 05, 2026

3.7.0alpha1

June 28, 2026

June 22, 2026

April 24, 2026

April 14, 2026

March 26, 2026

February 09, 2026

December 28, 2025

December 09, 2025

November 21, 2025

November 14, 2025

November 12, 2025

3.6.1

November 06, 2025

October 27, 2025

October 20, 2025

October 16, 2025

3.5.4

October 10, 2025

September 02, 2025

3.6.0

August 25, 2025

3.6.0rc2

August 18, 2025

3.6.0rc1

August 01, 2025

July 20, 2025

3.6.0rc1

3.6.0beta1

May 18, 2025

3.6.0alpha1

May 17, 2025

3.5.3

March 14, 2025

February 10, 2025

February 07, 2025

February 03, 2025

WKB Commonalities

Collections

Polygons and LineStrings

Points

Other Databases

January 18, 2025

3.5.2

January 06, 2025

December 26, 2024

December 23, 2024

December 15, 2024