The PostGIS development team is pleased to provide bug fix and performance enhancements 3.4.1, 3.3.5, 3.2.6, 3.1.10, 3.0.10 for the 3.4, 3.3, 3.2, 3.1, 3.0 stable branches.
### ###
The PostGIS development team is pleased to provide bug fix and performance enhancements 3.4.1, 3.3.5, 3.2.6, 3.1.10, 3.0.10 for the 3.4, 3.3, 3.2, 3.1, 3.0 stable branches.
A user on the postgis-users had an interesting question today: how to generate a geometry column in PostGIS with random points, linestrings, or polygons?
Random data is important for validating processing chains, analyses and reports. The best way to test a process is to feed it inputs!
Random points is pretty easy -- define an area of interest and then use the
PostgreSQL random()
function to create
the X and Y values in that area.
CREATE TABLE random_points AS
WITH bounds AS (
SELECT 0 AS origin_x,
0 AS origin_y,
80 AS width,
80 AS height
)
SELECT ST_Point(width * (random() - 0.5) + origin_x,
height * (random() - 0.5) + origin_y,
4326)::Geometry(Point, 4326) AS geom,
id
FROM bounds,
generate_series(0, 100) AS id
Filling a target shape with random points is a common use case, and there's a
special function just for that,
ST_GeneratePoints()
. Here
we generate points inside a circle created with
ST_Buffer()
.
CREATE TABLE random_points AS
SELECT ST_GeneratePoints(
ST_Buffer(
ST_Point(0, 0, 4326),
50),
100) AS geom
If you have PostgreSQL 16, you can use the brand new
random_normal()
function to
generate coordinates with a central tendency.
CREATE TABLE random_normal_points AS
WITH bounds AS (
SELECT 0 AS origin_x,
0 AS origin_y,
80 AS width,
80 AS height
)
SELECT ST_Point(random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
4326)::Geometry(Point, 4326) AS geom,
id
FROM bounds,
generate_series(0, 100) AS id
random_normal()
.
CREATE OR REPLACE FUNCTION random_normal(
mean double precision DEFAULT 0.0,
stddev double precision DEFAULT 1.0)
RETURNS double precision AS
$$
DECLARE
u1 double precision;
u2 double precision;
z0 double precision;
z1 double precision;
BEGIN
u1 := random();
u2 := random();
z0 := sqrt(-2.0 * ln(u1)) * cos(2.0 * pi() * u2);
z1 := sqrt(-2.0 * ln(u1)) * sin(2.0 * pi() * u2);
RETURN mean + (stddev * z0);
END;
$$ LANGUAGE plpgsql;
Linestrings are a little harder, because they involve more points, and aesthetically we like to avoid self-crossings of lines.
Two-point linestrings are pretty easy to generate with
ST_MakeLine()
-- just generate
twice as many random points, and use them as the start and end points of the
linestrings.
CREATE TABLE random_2point_lines AS
WITH bounds AS (
SELECT 0 AS origin_x, 80 AS width,
0 AS origin_y, 80 AS height
)
SELECT ST_MakeLine(
ST_Point(random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
4326),
ST_Point(random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
4326))::Geometry(LineString, 4326) AS geom,
id
FROM bounds,
generate_series(0, 100) AS id
Multi-point random linestrings are harder, at least while avoiding self-intersections, and there are a lot of potential approaches. While a recursive CTE could probably do it, an imperative approach using PL/PgSQL is more readable.
The generate_random_linestring()
function starts with an empty linestring, and
then adds on new segments one at a time, changing the direction of the line with
each new segment.
generate_random_linestring()
definition.
CREATE OR REPLACE FUNCTION generate_random_linestring(
start_point geometry(Point))
RETURNS geometry(LineString, 4326) AS
$$
DECLARE
num_segments integer := 10; -- Number of segments in the linestring
deviation_max float := radians(45); -- Maximum deviation
random_point geometry(Point);
deviation float;
direction float := 2 * pi() * random();
segment_length float := 5; -- Length of each segment (adjust as needed)
i integer;
result geometry(LineString) := 'SRID=4326;LINESTRING EMPTY';
BEGIN
result := ST_AddPoint(result, start_point);
FOR i IN 1..num_segments LOOP
-- Generate a random angle within the specified deviation
deviation := 2 * deviation_max * random() - deviation_max;
direction := direction + deviation;
-- Calculate the coordinates of the next point
random_point := ST_Point(
ST_X(start_point) + cos(direction) * segment_length,
ST_Y(start_point) + sin(direction) * segment_length,
ST_SRID(start_point)
);
-- Add the point to the linestring
result := ST_AddPoint(result, random_point);
-- Update the start point for the next segment
start_point := random_point;
END LOOP;
RETURN result;
END;
$$
LANGUAGE plpgsql;
We can use the generate_random_linestring()
function now to turn random start
points (created in the usual way) into fully random squiggly lines!
CREATE TABLE random_lines AS
WITH bounds AS (
SELECT 0 AS origin_x, 80 AS width,
0 AS origin_y, 80 AS height
)
SELECT id,
generate_random_linestring(
ST_Point(random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
4326))::Geometry(LineString, 4326) AS geom
FROM bounds,
generate_series(1, 100) AS id;
At the simplest level, a set of random boxes is a set of random polygons, but
that's pretty boring, and easy to generate using
ST_MakeEnvelope()
.
CREATE TABLE random_boxes AS
WITH bounds AS (
SELECT 0 AS origin_x, 80 AS width,
0 AS origin_y, 80 AS height
)
SELECT ST_MakeEnvelope(
random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
random_normal(origin_x, width/4),
random_normal(origin_y, height/4)
)::Geometry(Polygon, 4326) AS geom,
id
FROM bounds,
generate_series(0, 20) AS id
But more interesting polygons have curvy and convex shapes, how can we generate those?
One way is to extract a polygon from a set of random points, using
ST_ConcaveHull()
, and then
applying an "erode and dilate" effect to make the curves more pleasantly round.
We start with a random center point for each polygon, and create a circle with
ST_Buffer()
.
Then use
ST_GeneratePoints()
to fill
the circle with some random points -- not too many, so we get a nice jagged
result.
Then use ST_ConcaveHull()
to trace a "boundary" around those points.
Then apply a negative buffer, to erode the shape.
And finally a positive buffer to dilate it back out again.
Generating multiple hulls involves stringing together all the above operations with CTEs or subqueries.
CREATE TABLE random_hulls AS
WITH bounds AS (
SELECT 0 AS origin_x,
0 AS origin_y,
80 AS width,
80 AS height
),
polypts AS (
SELECT ST_Point(random_normal(origin_x, width/2),
random_normal(origin_y, width/2),
4326)::Geometry(Point, 4326) AS geom,
polyid
FROM bounds,
generate_series(1,10) AS polyid
),
pts AS (
SELECT ST_GeneratePoints(ST_Buffer(geom, width/5), 20) AS geom,
polyid
FROM bounds,
polypts
)
SELECT ST_Multi(ST_Buffer(
ST_Buffer(
ST_ConcaveHull(geom, 0.3),
-2.0),
3.0))::Geometry(MultiPolygon, 4326) AS geom,
polyid
FROM pts;
Another approach is to again start with random points, but use the Voronoi diagram as the basis of the polygon.
Start with a center point and buffer circle.
Generate random points in the circle.
Use the
ST_VoronoiPolygons()
function to generate polygons that subdivide the space using the random points
as seeds.
Filter just the polygons that are fully contained in the originating circle.
And then use ST_Union()
to merge
those polygons into a single output shape.
Generating multiple hulls again involves stringing together the above operations with CTEs or subqueries.
CREATE TABLE random_delaunay_hulls AS
WITH bounds AS (
SELECT 0 AS origin_x,
0 AS origin_y,
80 AS width,
80 AS height
),
polypts AS (
SELECT ST_Point(random_normal(origin_x, width/2),
random_normal(origin_y, width/2),
4326)::Geometry(Point, 4326) AS geom,
polyid
FROM bounds,
generate_series(1,20) AS polyid
),
voronois AS (
SELECT ST_VoronoiPolygons(
ST_GeneratePoints(ST_Buffer(geom, width/5), 10)
) AS geom,
ST_Buffer(geom, width/5) AS geom_clip,
polyid
FROM bounds,
polypts
),
cells AS (
SELECT (ST_Dump(geom)).geom, polyid, geom_clip
FROM voronois
)
SELECT ST_Union(geom)::Geometry(Polygon, 4326) AS geom, polyid
FROM cells
WHERE ST_Contains(geom_clip, geom)
GROUP BY polyid;
by Paul Ramsey (Paul.Ramsey@crunchydata.com) at September 11, 2023 01:00 PM
Paul Ramsey and I recently had a Fireside chat with Path to Cituscon. Checkout the Podcast Why People care about PostGIS and Postgres. There were a surprising number of funny moments and very insightful stuff.
It was a great fireside chat but without the fireplace. We covered the birth and progression of PostGIS for the past 20 years and the trajectory with PostgreSQL. We also learned of Paul's plans to revolutionize PostGIS which was new to me. We covered many other side-line topics, like QGIS whose birth was inspired by PostGIS. We covered pgRouting and mobilitydb which are two other PostgreSQL extension projects that extend PostGIS.
We also managed to fall into the Large Language Model conversation of which Paul and I are on different sides of the fence on.
Continue reading "Why People care about PostGIS and Postgres and FOSS4GNA"by Regina Obe (nospam@example.com) at September 10, 2023 03:22 AM
The PostGIS Team is pleased to release PostGIS 3.4.0! This version works with versions PostgreSQL 12-16, GEOS 3.6 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.4.1+ is needed.
This release is a major release, it includes bug fixes since PostGIS 3.3.4 and new features.
The PostGIS Team is pleased to release PostGIS 3.4.0rc1! Best Served with PostgreSQL 16 Beta2 and GEOS 3.12.0.
This version requires PostgreSQL 12 or higher, GEOS 3.6 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.4.1+ is needed.
This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.3.4 and new features.
The PostGIS Team is pleased to release PostGIS 3.4.0beta2! Best Served with PostgreSQL 16 Beta2 and GEOS 3.12.0.
This version requires PostgreSQL 12 or higher, GEOS 3.6 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.4.1+ is needed.
This release is a beta of a major release, it includes bug fixes since PostGIS 3.3.4 and new features.
The PostGIS development team is pleased to announce bug fix release 3.3.4, mostly focused on Topology fixes.
The PostGIS Team is pleased to release PostGIS 3.4.0beta1! Best Served with PostgreSQL 16 Beta2 and GEOS 3.12.0.
This version requires PostgreSQL 12 or higher, GEOS 3.6 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. To take advantage of all SFCGAL features, SFCGAL 1.4.1+ is needed.
This release is a beta of a major release, it includes bug fixes since PostGIS 3.3.3 and new features.
Last month I got to record a couple podcast episodes with the MapScaping Podcast’s Daniel O’Donohue. One of them was on the benefits and many pitfalls of putting rasters into a relational database, and the other was about real-time events and pushing data change information out to web clients!
TL;DR: geospatial data tends to be more “visible” to end user clients, so communicating change to multiple clients in real time can be useful for “common operating” situations.
I also recorded a presentation about pg_eventserv for PostGIS Day 2022.
I recently released PostGIS 3.3.3. bundle for Windows which is available on application stackbuilder and OSGeo download site for PostgreSQL 11 - 15. If you are running PostgreSQL 12 or above, you get an additional bonus extension MobilityDB which is an extension that leverages PostGIS geometry and geography types and introduces several more spatial-temporal types and functions specifically targeted for managing objects in motion.
What kind of management, think of getting the average speed a train is moving at a segment in time or collisions in time, without any long SQL code. Just use a function on the trip path, and viola. Think about storing GPS data very compactly in a singe row /column with time and being able to ask very complex questions with very little SQL. True PostGIS can do some of this using geometry with Measure (geometryM) geometry types, but you have to deal with that craziness of converting M back to timestamps, which mobilitydb temporal types automatically encode as true PostgreSQL timestamp types.
Anita Graser, of QGIS and Moving Pandas fame, has written several posts about it such as: Visualizing Trajectories with QGIS and mobilitydb and Detecting close encounters using MobilityDB 1.0.
Continue reading "PostGIS Bundle 3.3.3 for Windows with MobilityDB"by Regina Obe (nospam@example.com) at June 03, 2023 12:34 AM
PostGIS excels at storing, manipulating and analyzing geospatial data. At some point it's usually desired to convert raw spatial data into a two-dimensional representation to utilize the integrative capabilities of the human visual cortex. In other words, to see things on a map.
PostGIS is a popular backend for mapping technology, so there are many options
to choose from to create maps. Data can be rendered to a raster image using a
web map server like GeoServer or
MapServer; it can be converted to GeoJSON or vector
tiles via servers such as
pg_featureserv
and
pg_tileserv
and then shipped to
a Web browser for rendering by a library such as
OpenLayers, MapLibre or
Leaflet; or a GIS application such as
QGIS can connect to the database and create richly-styled
maps from spatial queries.
What these options have in common is that they require external tools which need to be installed, configured and maintained in a separate environment. This can introduce unwanted complexity to a geospatial architecture.
This post presents a simple way to generate maps entirely within the database, with no external infrastructure required.
A great way to display vector data is to use the Scalable Vector Graphic (SVG) format. It provides rich functionality for displaying and styling geometric shapes. SVG is widely supported by web browsers and other tools.
By including CSS and Javascript it's possible to add advanced styling, custom popups, dynamic behaviour and interaction with other web page elements.
pg-svg
Generating SVG "by hand" is difficult. It requires detailed knowledge of the
SVG specification, and constructing a complex
text format in SQL is highly error-prone. While PostGIS has had the function
ST_AsSVG
for years, it
only produces the SVG
path data attribute
value. Much more is required to create a fully-styled SVG document.
The PL/pgSQL library pg-svg
solves this
problem! It makes it easy to convert PostGIS data into styled SVG documents. The
library provides a simple API as a set of PL/pgSQL functions which allow
creating an SVG document in a single SQL query. Best of all, this installs with
a set of functions, nothing else required!
The best way to understand how pg-svg
works is to see an example. We'll create
an SVG map of the United States showing the highest point in each state. The map
has the following features:
The resulting map looks like this (to see tooltips open the raw image):
The SQL query to generate the map is
here.
It can be downloaded and run using psql
:
psql -A -t -o us-highpt.svg < us-highpt-svg.sql
The SVG output us-highpt.svg
can be viewed in any web browser.
Let's break the query down to see how the data is prepared and then rendered to
SVG. A dataset of US states in geodetic coordinate system (WGS84, SRID = 4326)
is required. We used the Natural Earth states and provinces data available
here.
It is loaded into a table ne.admin_1_state_prov
with the following command:
shp2pgsql -c -D -s 4326 -i -I ne_10m_admin_1_states_provinces.shp ne.admin_1_state_prov | psql
The query uses the SQL WITH
construct to organize processing into simple,
modular steps. We'll describe each one in turn.
First, the US state features are selected from the Natural Earth boundaries
table ne.admin_1_state_prov
.
us_state AS (SELECT name, abbrev, postal, geom
FROM ne.admin_1_state_prov
WHERE adm0_a3 = 'USA')
Next, the map is made more compact by realigning the far-flung states of Alaska
and Hawaii.
This is done using PostGIS
affine transformation functions.
The states are made more proportionate using
ST_Scale
, and moved
closer to the lower 48 using
ST_Translate
. The
scaling is done around the location of the state high point, to make it easy to
apply the same transformation to the high point feature.
,us_map AS (SELECT name, abbrev, postal,
-- transform AK and HI to make them fit map
CASE WHEN name = 'Alaska' THEN
ST_Translate(ST_Scale(
ST_Intersection( ST_GeometryN(geom,1), 'SRID=4326;POLYGON ((-141 80, -141 50, -170 50, -170 80, -141 80))'),
'POINT(0.5 0.75)', 'POINT(-151.007222 63.069444)'::geometry), 18, -17)
WHEN name = 'Hawaii' THEN
ST_Translate(ST_Scale(
ST_Intersection(geom, 'SRID=4326;POLYGON ((-161 23, -154 23, -154 18, -161 18, -161 23))'),
'POINT(3 3)', 'POINT(-155.468333 19.821028)'::geometry), 32, 10)
ELSE geom END AS geom
FROM us_state)
Data for the highest point in each state is provided as an inline table of values:
,high_pt(name, state, hgt_m, hgt_ft, lon, lat) AS (VALUES
('Denali', 'AK', 6198, 20320, -151.007222,63.069444)
,('Mount Whitney', 'CA', 4421, 14505, -118.292,36.578583)
...
,('Britton Hill', 'FL', 105, 345, -86.281944,30.988333)
)
The next query does several things:
lon
and lat
location for Alaska and Hawaii high points to
match the transformation applied to the state geometrysymHeight
attribute for the height of the high point triangle
symbolORDER BY
to sort the high points by latitude, so that their symbols
overlap correctly when rendered,highpt_shape AS (SELECT name, state, hgt_ft,
-- translate high points to match shifted states
CASE WHEN state = 'AK' THEN lon + 18
WHEN state = 'HI' THEN lon + 32
ELSE lon END AS lon,
CASE WHEN state = 'AK' THEN lat - 17
WHEN state = 'HI' THEN lat + 10
ELSE lat END AS lat,
(2.0 * hgt_ft) / 15000.0 + 0.5 AS symHeight,
CASE WHEN hgt_ft > 14000 THEN '#ffffff'
WHEN hgt_ft > 7000 THEN '#aaaaaa'
WHEN hgt_ft > 5000 THEN '#ff8800'
WHEN hgt_ft > 2000 THEN '#ffff44'
WHEN hgt_ft > 1000 THEN '#aaffaa'
ELSE '#558800'
END AS clr
FROM high_pt ORDER BY lat DESC)
The previous queries transformed the raw data into a form suitable for
rendering.
Now we get to see pg-svg
in action! The next query generates the SVG text for
each output image element, as separate records in a result set called shapes
.
The SVG elements are generated in the order in which they are drawn - states and labels first, with high-point symbols on top. Let's break it down:
The first subquery produces SVG shapes from the state geometries. The
svgShape
function produces an SVG
shape element for any PostGIS geometry. It also provides optional parameters
supporting other capabilities of SVG. Here title
specifies that the state name
is displayed as a tooltip, and style
specifies the styling of the shape.
Styling in SVG can be provided using properties defined in the
Cascaded Style Sheets (CSS)
specification. pg-svg
provides the
svgStyle
function to make it easy
to specify the names and values of CSS styling properties.
Note that the fill
property value is a URL instead of a color specifier. This
refers to an SVG gradient fill which is defined later.
The state geometry is also included in the subquery result set, for reasons which will be discussed below.
,shapes AS (
-- State shapes
SELECT geom, svgShape( geom,
title => name,
style => svgStyle( 'stroke', '#ffffff',
'stroke-width', 0.1::text,
'fill', 'url(#state)',
'stroke-linejoin', 'round' ) )
svg FROM us_map
Labels for state abbreviations are positioned at the point produced by the
ST_PointOnSurface
function. (Alternatively, ST_MaximumInscribedCircle
could
be used.) The SVG is generated by the
svgText
function, using the
specified styling.
UNION ALL
-- State names
SELECT NULL, svgText( ST_PointOnSurface( geom ), abbrev,
style => svgStyle( 'fill', '#6666ff', 'text-anchor', 'middle', 'font', '0.8px sans-serif' ) )
svg FROM us_map
The high point features are displayed as triangular symbols. We use the
convenient svgPolygon
function
with a simple array of ordinates specifying a triangle based at the high point
location, with height given by the previously computed svgHeight
column. The
title is provided for a tooltip, and the styling uses the computed clr
attribute as the fill.
UNION ALL
-- High point triangles
SELECT NULL, svgPolygon( ARRAY[ lon-0.5, -lat, lon+0.5, -lat, lon, -lat-symHeight ],
title => name || ' ' || state || ' - ' || hgt_ft || ' ft',
style => svgStyle( 'stroke', '#000000',
'stroke-width', 0.1::text,
'fill', clr ) )
svg FROM highpt_shape
)
The generated shape elements need to be wrapped in an <svg>
document element.
This is handled by the svgDoc
function.
The viewable extent of the SVG data needs to be provided by the viewbox
parameter. The most common case is to display all of the rendered data. An easy
way to determine this is to apply the PostGIS ST_Exrtent
aggregate function to
the input data (this is why we included the geom
column as well as the svg
text column). We can include a border by enlarging the extent using the
ST_Expand
function. The function
svgViewBox
converts the PostGIS
geometry for the extent into SVG format.
We also include a definition for an SVG linear gradient to be used as the fill style for the state features.
SELECT svgDoc( array_agg( svg ),
viewbox => svgViewbox( ST_Expand( ST_Extent(geom), 2)),
def => svgLinearGradient('state', '#8080ff', '#c0c0ff')
) AS svg FROM shapes;
The output from svgDoc
is a text
value which can be used anywhere that SVG
is supported.
We've shown how the pg-svg
SQL function library lets you easily generate map
images from PostGIS data right in the database. This can be used as a simple
ad-hoc way of visualizing spatial data. Or, it could be embedded in a larger
system to automate repetitive map generation workflows.
Although SVG is a natural fit for vector data, there may be situations where
producing a map as a bitmap (raster) image makes sense.
For a way of generating raster maps right in the database see this PostGIS Day
2022 presentation. This would be
especially appealing where the map is displaying data stored using
PostGIS raster data.
It would also be possible to combine vector and raster data into a hybrid
SVG/image output.
Although we've focussed on creating maps of geospatial data, SVG is often used
for creating other kinds of graphics. For examples of using it to create
geometric and mathematical designs see the pg-svg
demo
folder. Here's an
image of a Lissajous knot generated by
this SQL.
You could even use pg-svg
to generate charts of non-spatial data (although
this would be better handled by a more task-specific API).
Let us know if you find pg-svg
useful, or if you have ideas for improving it!
by Martin Davis (Martin.Davis@crunchydata.com) at May 30, 2023 01:00 PM
The PostGIS development team is pleased to provide bug fixes and performance enhancements 3.3.3, 3.2.5, 3.1.9 and 3.0.9 for the 3.3, 3.2, 3.1, and 3.0 stable branches.
Last month I was invited to give a keynote talk at the CUGOS Spring Fling, a delightful gathering of “Cascadia Users of Open Source GIS” in Seattle. I have been speaking about open source economics at FOSS4G conferences more-or-less every two years, since 2009, and took this opportunity to somewhat revisit the topics of my 2019 FOSS4GNA keynote.
If you liked the video and want to use the materials, the slides are available here under CC BY.
Last month I got to record a couple podcast episodes with the MapScaping Podcast’s Daniel O’Donohue. One of them was on the benefits and many pitfalls of putting rasters into a relational database, and it is online now!
TL;DR: most people think “put it in a database” is a magic recipe for: faster performance, infinite scalability, and easy management.
Where the database is replacing a pile of CSV files, this is probably true.
Where the database is replacing a collection of GeoTIFF imagery files, it is probably false. Raster in the database will be slower, will take up more space, and be very annoying to manage.
So why do it? Start with a default, “don’t!”, and then evaluate from there.
For some non-visual raster data, and use cases that involve enriching vectors from raster sources, having the raster co-located with the vectors in the database can make working with it more convenient. It will still be slower than direct access, and it will still be painful to manage, but it allows use of SQL as a query language, which can give you a lot more flexibility to explore the solution space than a purpose built data access script might.
There’s some other interesting tweaks around storing the actual raster data outside the database and querying it from within, that I think are the future of “raster in (not really in) the database”, listen to the episode to learn more!
Geocoding is the process of taking addresses or location information and getting the coordinates for that location. Anytime you route a new location or look up a zip code, the back end is geocoding the location and then using the geocode inside other PostGIS functions to give you the routes, locations, and other data you asked for.
PostGIS comes equipped with an easy way to use the US Census data with the Tiger geocoder. Using the Tiger geocoder requires downloading large amounts of census data and in space-limited databases, this may not be ideal. Using a geocoding web API service can be a space saving solution in these cases.
I am going to show you how to set up a really quick function using plpython3u
to hit a web service geocoder every time that we get a new row in the database.
The plpython3u extension comes with Crunchy Bridge or you can add it to your database. To get started run the following:
CREATE EXTENSION plpython3u;
In this example, I'll use the US census geocoding API as our web service, and build a function to geocode addresses based on that.
The function puts together parameters to hit the census geocoding API and then parses the resulting object, and returns a geometry:
CREATE OR REPLACE FUNCTION geocode(address text)
RETURNS geometry
AS $$
import requests
try:
payload = {'address' : address , 'benchmark' : 2020, 'format' : 'json'}
base_geocode = 'https://geocoding.geo.census.gov/geocoder/locations/onelineaddress'
r = requests.get(base_geocode, params = payload)
coords = r.json()['result']['addressMatches'][0]['coordinates']
lon = coords['x']
lat = coords['y']
geom = f'SRID=4326;POINT({lon} {lat})'
except Exception as e:
plpy.notice(f'address failed: {address}')
plpy.notice(f'error: {e.message}')
geom = None
return geom
$$
LANGUAGE 'plpython3u';
Using this function to geocode Crunchy Data's headquarters:
SELECT ST_AsText(geocode('162 Seven Farms Drive Charleston, SC 29492'));
But what if we want to automatically run this every time an address is inserted into a table? Let's say we have a table with a field ID, an address, and a point that we want to auto-populate on inserts.
CREATE TABLE addresses (
fid SERIAL PRIMARY KEY,
address VARCHAR,
geom GEOMETRY(POINT, 4326)
);
We can make use of a Postgres trigger to add the geocode before every insert! Triggers are a very powerful way to leverage built in functions to automatically transform your data as it enters the database, and this particular case is a great demo for them!
CREATE OR REPLACE FUNCTION add_geocode()
RETURNS trigger AS
$$
DECLARE
loc geometry;
BEGIN
loc := geocode(NEW.address);
NEW.geom = loc;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER update_geocode BEFORE INSERT ON addresses
FOR EACH ROW EXECUTE FUNCTION add_geocode();
Now when running an insert, the value is automatically geocoded!
INSERT INTO addresses(address) VALUES ('415 Mission St, San Francisco, CA 94105');
postgres=# SELECT fid, address, ST_AsText(geom) FROM addresses;
fid | address | geom
-----+-----------------------------------------+----------------------------------------------------
1 | 415 Mission St, San Francisco, CA 94105 | 0101000020E610000097CD0E2B66995EC0BB004B2729E54240
If you’re space limited, using a web API based geocoder might be the way to go. Using a geocoder function with triggers on new row inserts will get you geocoded addresses in a snap.
by Jacob Coblentz (Jacob.Coblentz@crunchydata.com) at March 02, 2023 04:00 PM
In geospatial terminology, a "raster" is a cover of an area divided into a uniform gridding, with one or more values assigned to each grid cell.
A "raster" in which the values are associated with red, green and blue bands might be a visual image. The rasters that come off the Landsat 7 earth observation satellite have eight bands: red, green, blue, near infrared, shortwave infrared, thermal, mid-infrared and panchromatic.
Working with raster data via SQL is a little counter-intuitive: rasters don't neatly fit the relational model the way vector geometries do. A table of parcels where one column is the geometry and the others are the owner name, address, and tax roll id makes sense. How should a raster fit into a table? As a row for every pixel? For every scan row? What other values should be associated with each row?
There is no clean relationship between "real world objects" and the database representation of a raster, because a raster has nothing to say about objects, it is just a collection of measurements.
We can squeeze rasters into the database, but doing so makes working with the data more complex. Before loading data, we need to enable PostGIS and the raster module.
CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;
For this example, we will load raster data for a "digital elevation model" (DEM), a raster with just one band, the elevation at each pixel.
Using the SRTM Tile Grabber I downloaded one tile of
old SRTM data. Then using the gdalinfo
utility, read out the metadata about the file.
wget https://srtm.csi.cgiar.org/wp-content/uploads/files/srtm_5x5/TIFF/srtm_12_03.zip
unzip srtm_12_03.zip
gdalinfo srtm_12_03.tif
The metadata tells me two useful things for loading the data:
Int16
, so two bytes per pixel.Knowing that, I can build a raster2pgsql
call to load the data into a raster
table.
raster2pgsql \
-I \ # create a spatial index on the column
-s 4326 \ # use 4326 (WGS 84) as the spatial reference for the raster
-t 32x32 \ # tile the raster into 32 by 32 pixel tiles
srtm_12_03.tif | \ # name of the raster
psql dem # target database connection
Once loaded the raster table looks like this on a map.
And it looks like this in the database.
Table "public.srtm_12_03"
Column | Type
--------+---------
rid | integer
rast | raster
Indexes:
"srtm_12_03_pkey" PRIMARY KEY, btree (rid)
"srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))
It's a pretty boring table! Just a bunch of binary raster tiles and a unique key for each.
-- 29768 rows
SELECT Count(*) FROM srtm_12_03;
Those binary raster tiles aren't just opaque blobs though, we can look inside them with the right functions. Here we get a summary of all the raster tiles in the table.
SELECT (ST_SummaryStatsAgg(rast, 1, true)).* FROM srtm_12_03;
count | 28966088
sum | 20431360140
mean | 705.3544869434907
stddev | 561.252765463607
min | -291
max | 4371
Remember when we loaded the data with raster2pgsql
we specified a "tile size"
of 32 by 32 pixels? This has a number of implications.
Int16
data like our DEM will take up 2048
bytes. This is small enough to fit in the database
page size,
which means the data will not end up stored in a side table by the
TOAST subsystem
that handles large row values.raster2pgsql
does not generate tiles when the contents are all "no data"
pixels (as the DEM data is over the ocean).The loaded data looks like this.
Notice how small each tile is. As a general rule, when working with raster data queries,
Finding tiles efficiently means using spatial index, and the spatial index definition as we saw above is this:
"srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))
This is a
functional index,
which means in order to access it, we need to copy the functional part:
st_convechull(rast)
when forming our query.
The ST_ConvexHull(raster) converts a raster tile into a polygon defining the boundary of the tile. When querying raster tables, you will use this function a great deal to convert rasters into polygons suitable for querying a spatial index.
The simplest raster query is to take a point, and find the value of the raster under that point.
Here is a point table with one point in it:
CREATE TABLE mappoint AS
SELECT ST_Point(-123.7273, 47.8467, 4326)::geometry(Point, 4326) AS geom,
1 AS fid;
The nice thing about points is that they only hit one tile at a time. So we don't have to think too hard about what to do with our tile sets.
SELECT ST_Value(srtm.rast, pt.geom)
FROM srtm_12_03 srtm
JOIN mappoint pt
ON ST_Intersects(pt.geom, ST_ConvexHull(srtm.rast))
st_value
----------
1627
Here we use the ST_ConvexHull(raster) function to get access to our spatial index on the raster table, and the ST_Intersects(geom, geom) function to test the condition.
The ST_Value(raster, geom) function reads the pixel value from the raster at the location of the point.
Summarizing rasters under polygons is more involved than reading point values, because polygons will frequently overlap multiple tiles, so you have to think in terms of "sets of raster tiles" instead of "the raster" when building your query.
The final complete query looks like this.
WITH clipped_tiles AS (
SELECT ST_Clip(srtm.rast, ply.geom) AS rast, srtm.rid
FROM srtm_12_03 srtm
JOIN mappoly ply
ON ST_Intersects(ply.geom, ST_ConvexHull(srtm.rast))
)
SELECT (ST_SummaryStatsAgg(rast, 1, true)).*
FROM clipped_tiles;
count | 362369
sum | 388175193
mean | 1071.2152336430545
stddev | 441.7982032761408
min | 108
max | 2374
Working with database rasters analytically can be challenging, particularly if you are used to thinking about them as single, unitary coverages. Remember to apply the basic rules of database rasters:
by Paul Ramsey (Paul.Ramsey@crunchydata.com) at February 23, 2023 04:00 PM
In a
previous post
we announced the CQL filtering capability in
pg_featureserv
. It provides
powerful functionality for attribute and
spatial
querying of data in PostgreSQL and PostGIS.
Another important datatype which is often present in datasets is temporal.
Temporal datasets contain attributes which are dates or timestamps. The CQL
standard defines some special-purpose syntax to support temporal filtering. This
allows pg_featureserv
to take advantage of the extensive capabilities of
PostgreSQL for specifying queries against time-valued attributes. This post in
the CQL series will show some examples of temporal filtering in
pg_featureserv
.
Temporal filtering in CQL is provided using temporal literals and conditions.
Temporal literal values may be dates or timestamps:
2001-01-01
2010-04-23T01:23:45
Note: The temporal literal syntax is based on an early version of the OGC API
Filter and CQL standard. The current
draft CQL standard has a different
syntax: DATE('1969-07-20')
and TIMESTAMP('1969-07-20T20:17:40Z')
. It also
supports intervals: INTERVAL('1969-07-16', '1969-07-24')
. A subsequent version
of pg_featureserv
will support this syntax as well.
Temporal conditions allow time-valued properties and literals to be compared
via the standard boolean comparison operators <
,>
,<=
,>=
,=
,<>
and the
BETWEEN..AND
operator:
start_date >= 2001-01-01
event_time BETWEEN 2010-04-22T06:00 AND 2010-04-23T12:00
The
draft CQL standard
provides dedicated temporal operators, such as T_AFTER
, T_BEFORE
,
T_DURING
, etc. A future version of pg_featureserv
will likely provide these
operators.
We'll demonstrate temporal filters using a dataset with a strong time linkage: tracks of tropical storms (or hurricanes). There is a dataset of Historical Tropical Storm Tracks available here.
The data requires some preparation. It is stored as a set of records of line segments representing 6-hour long sections of storm tracks. To provide simpler querying we will model the data using a single record for each storm, with a line geometry showing the entire track and attributes for the start and end time for the track.
The data is provided in Shapefile format. As expected for a worldwide dataset, it is in the WGS84 geodetic coordinate system (lat/long). In PostGIS this common Spatial Reference System is assigned an identifier (SRID) of 4326.
The PostGIS
shp2pgsql
utility can be used to load the dataset into a spatial table called
trop_storm_raw
. The trop_storm_raw
table is a temporary staging table
allowing the raw data to be loaded and made available for the transformation
phase of data preparation.
shp2pgsql -c -D -s 4326 -i -I -W LATIN1 "Historical Tropical Storm Tracks.shp" public.trop_storm_raw | psql -d database
The options used are:
-c
- create a new table-D
- use PostgreSQL dump format to load the data-s
- specify the SRID of 4326-i
- use 32-bit integers-I
- create a GIST index on the geometry column (this is not strictly
necessary, since this is just a temporary staging table)-W
- specifies the encoding of the input attribute data in the DBF fileNext, create the table having the desired data model:
CREATE TABLE public.trop_storm (
btid int PRIMARY KEY,
name text,
wind_kts numeric,
pressure float8,
basin text,
time_start timestamp,
time_end timestamp,
geom geometry(MultiLineString, 4326)
);
It's good practice to add comments to the table and columns. These will be
displayed in the pg_featureserv
Web UI.
COMMENT ON TABLE public.trop_storm IS 'This is my spatial table';
COMMENT ON COLUMN public.trop_storm.geom IS 'Storm track LineString';
COMMENT ON COLUMN public.trop_storm.name IS 'Name assigned to storm';
COMMENT ON COLUMN public.trop_storm.btid IS 'Id of storm';
COMMENT ON COLUMN public.trop_storm.wind_kts IS 'Maximum wind speed in knots';
COMMENT ON COLUMN public.trop_storm.pressure IS 'Minumum pressure in in millibars';
COMMENT ON COLUMN public.trop_storm.basin IS 'Basin in which storm occured';
COMMENT ON COLUMN public.trop_storm.time_start IS 'Timestamp of storm start';
COMMENT ON COLUMN public.trop_storm.time_end IS 'Timestamp of storm end';
Now the power of SQL can be used to transform the raw data into the simpler data model. The track sections can be combined into single tracks with a start and end time using the following query.
MultiLineString
s with
single elements. The element is extracted using
ST_GeometryN
so
that the result of aggregating them using
ST_Collect
is a
MultiLineString
, not a GeometryCollection
. (An alternative is to aggregate
into a GeometryCollection and use
ST_CollectionHomogenize
to reduce it to a MultiLineString
.)ST_Multi
ensures that all tracks are stored as MultiLineStrings
, as required by the
type constraint on the geom
column.time_end - time_start < '1 year'::interval
removes
tracks spanning the International Date Line.WITH data AS (
SELECT btid, name, wind_kts, pressure, basin, geom,
make_date(year::int, month::int, day::int) + ad_time::time AS obs_time
FROM trop_storm_raw ORDER BY obs_time
),
tracks AS (
SELECT btid,
MAX(name) AS name,
MAX(wind_kts) AS wind_kts,
MAX(pressure) AS pressure,
MAX(basin) AS basin,
MIN(obs_time) AS time_start,
MAX(obs_time) AS time_end,
ST_Multi( ST_LineMerge( ST_Collect( ST_GeometryN(geom, 1)))) AS geom
FROM data GROUP BY btid
)
INSERT INTO trop_storm
SELECT * FROM tracks WHERE time_end - time_start < '1 year'::interval;
This is a small dataset, and pg_featureserv
does not require one, but as per
best practice we can create a spatial index on the geometry column:
CREATE INDEX trop_storm_gix ON public.trop_storm USING GIST ( geom );
Once the trop_storm
table is created and populated, it can be published in
pg_featureserv
. Issuing the following request in a browser shows the feature
collection in the Web UI:
http://localhost:9000/collections.html
http://localhost:9000/collections/public.trop_storm.html
The dataset can be viewed using pg_featureserv
's built-in map viewer (note
that to see all 567 records displayed it is probably necessary to increase the
limit on the number of response features):
http://localhost:9000/collections/public.trop_storm/items.html?limit=1000
That's a lot of storm tracks. It would be easier to visualize a smaller number
of tracks. A natural way to subset the data is by querying over a time range.
Let's retrieve the storms between the start of 2005 and the end of 2009. This is
done by adding a filter
parameter with a CQL expression against the dataset
temporal property time_start
(storms typically do not span the start of
years). To query values lying between a range of times it is convenient to use
the BETWEEN
operator. The filter condition is
time_start BETWEEN 2005-01-01 AND 2009-12-31
. The full request is:
http://localhost:9000/collections/public.trop_storm/items.html?filter=time_start BETWEEN 2005-01-01 AND 2009-12-31&limit=100
Submitting this query produces a result with 68 tracks:
Temporal conditions can be combined with other kinds of filters. For instance,
we can execute a spatio-temporal query by using a temporal condition along with
a spatial condition. In this example, we query the storms which occurred in 2005
and after in Florida. The temporal condition is expressed as
time_start > 2005-01-01
.
The spatial condition uses the INTERSECTS
predicate to test whether the line
geometry of a storm track intersects a polygon representing the (simplified)
coastline of Florida. The polygon is provided as a geometry literal using WKT.
(For more information about spatial filtering with CQL in pg_featureserv
see
this
blog post.)
POLYGON ((-81.4067 30.8422, -79.6862 25.3781, -81.1609 24.7731, -83.9591 30.0292, -85.2258 29.6511, -87.5892 29.9914, -87.5514 31.0123, -81.4067 30.8422))
Putting these conditions together in a boolean expression using AND
, the
request to retrieve the desired tracks from pg_featureserv
is:
http://localhost:9000/collections/public.trop_storm/items.html?filter=time_start > 2005-01-01 AND INTERSECTS(geom, POLYGON ((-81.4067 30.8422, -79.6862 25.3781, -81.1609 24.7731, -83.9591 30.0292, -85.2258 29.6511, -87.5892 29.9914, -87.5514 31.0123, -81.4067 30.8422)) )&limit=100
This query produces a result with only 9 tracks, all of which cross Florida:
CQL temporal filtering is included in the forthcoming pg_featureserv
Version
1.3. But you can try it out now by
downloading the latest
build. Let us know what use cases you find for CQL temporal filtering! Crunchy
Data offers full managed PostGIS in the Cloud, with Container apps to run
pg_featureserv.
Try it today.
by Martin Davis (Martin.Davis@crunchydata.com) at February 10, 2023 03:00 PM
Last week a user noted on the postgis-users list (paraphrase):
I upgraded from PostGIS 2.5 to 3.3 and now the results of my coordinate transforms are wrong. There is a vertical shift between the systems I’m using, but my vertical coordinates are unchanged.
Hmmm.
PostGIS gets all its coordinate reprojection smarts from the proj library. The user’s query looked like this:
SELECT ST_AsText(
ST_Transform('SRID=7405;POINT(545068 258591 8.51)'::geometry,
4979
));
“We just use proj” is a lot less certain and stable an assertion than it appears on the surface. In fact, PostGIS “just uses proj” for proj versions from 4.9 all the way up to 9.2, and there has been a lot of change to the proj library over that sweep of releases.
The first thing to do in debugging this “PostGIS problem” was to establish if it was in fact a PostGIS problem, or a problem in proj. There are commandline tools in proj to query what pipelines the system will use for a transform, and what the effect on coordinates will be, so I can take PostGIS right out of the picture.
We can run the query on the commandline:
echo 545068 258591 8.51 | cs2cs 'EPSG:7405' 'EPSG:4979'
Which returns:
52d12'23.241"N 0d7'17.603"E 8.510
So directly using proj we are seeing the same problem as in PostGIS SQL: no change in the vertical dimension, it goes in at 8.51 and comes out at 8.51. So the problem is not PostGIS, is is proj.
Cartographic transformations are nice deterministic functions, they take in a longitude and latitude and spit out an X and a Y.
(x,y) = f(theta, phi)
(theta, phi) = finv(x, y)
But not all transformations are cartographic transformations, some are transformation between geographic reference systems. And many of those are lumpy and kind of random.
For example, the North American 1927 Datum (NAD27) was built from classic survey techniques, starting from the “middle” (Kansas) and working outwards, chain by chain, sighting by sighting. The North American 1983 Datum (NAD83) was built with the assistance of the first GPS units. The accumulated errors of survey over distance are not deterministic, they are kind of lumpy and arbitrary. So the transformation from NAD27 to NAD83 is also kind of lumpy and arbitrary.
How do you represent lumpy and arbitrary transformations? With a grid! The grid says “if your observation falls in this cell, adjust it this much, in this direction”.
For the NAD27->NAD83 conversion, the NADCON grids have been around (and continuously improved) for a generation.
Here’s a picture of the horizontal deviations in the NADCON grid.
Transformations between vertical systems also frequently require a grid.
So what does this have to do with our bug? Well, the way proj gets its grids changed in version 7.
Proj grids have always been a bit of an outlier. They are much larger than just the source code is. They are localized in interest (someone in New Zealand probably doesn’t need European grids), not everyone needs all the grids. So historically they were distributed in zip files separately from the code.
This is all well and good, but software packagers wanted to provide a good “works right at install” experience to their end users, so they bundled up the grids into the proj packages.
As more and more people consumed proj via packages and software installers, the fact that the grids were “separate” from proj became invisible to the end users: they just download software and it works.
This was fine while the collection of grids was a manageable size. But it is not manageable any more.
In working through the GDALBarn project to improve proj, Even Roualt decided to find all the grids that various agencies had released for various places. It turns out, there are a lot more grids than proj previously bundled. Gigabytes more.
Simply distributing the whole collection of grids as a default with proj was not going to work anymore.
So for proj 7, Even proposed moving to a download-on-demand model for proj grids. If a transformation request requires a grid, proj will attempt to download the necessary grid from the internet, and save it in a local cache.
Now everyone can get the very best possible tranformation between system, everywhere on the globe, as long as they are connected to the internet.
Except… the network grid feature is not turned on by default! So for versions of proj higher than 7, the software ships with no grids, and the software won’t check for grids on the network… until you turn on the feature!
There are three ways to turn it on, I’m going to focus on the PROJ_NETWORK
environment variable because it’s easy to toggle. Let’s look at the proj transformation pipeline from our original bug.
projinfo -s EPSG:7405 -t EPSG:4979
The projinfo
utility reads out all the possible transformation pipelines, in order of desirability (accuracy) and shows what each step is. Here’s the most desireable pipeline for our transform.
+proj=pipeline
+step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
+y_0=-100000 +ellps=airy
+step +proj=hgridshift +grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif
+step +proj=vgridshift +grids=uk_os_OSGM15_GB.tif +multiplier=1
+step +proj=unitconvert +xy_in=rad +xy_out=deg
+step +proj=axisswap +order=2,1
This transform actually uses two grids! A horizontal and a vertical shift. Let’s run the shift with the network explicitly turned off.
echo 545068 258591 8.51 | PROJ_NETWORK=OFF cs2cs 'EPSG:7405' 'EPSG:4979'
52d12'23.241"N 0d7'17.603"E 8.510
Same as before, and the elevation value is unchanged. Now run with PROJ_NETWORK=ON
.
echo 545068 258591 8.51 | PROJ_NETWORK=ON cs2cs 'EPSG:7405' 'EPSG:4979'
52d12'23.288"N 0d7'17.705"E 54.462
Note that the horizontal and vertical results are different with the network, because we now have access to both grids, via the CDN.
If you have no internet, how do you do grid shifted transforms? Well, much like in the old days of proj, you have to manually grab the grids you need. Fortunately there is a utility for that now that makes it very easy: projsync.
You can just download all the files:
projsync --all
Or you can download a subset for your area of concern:
projsync --bbox 2,49,2,49
If you don’t want to turn on network access via the environment variable, you can hunt down the proj.ini
file and flip the network = on
variable.
Working at Crunchy Data on the
spatial team, I'm always
looking for new features and fun things to show on live demos. I recently
started playing around with ST_Letters
and wanted to jot down some quick code
samples for playing around with this feature, introduced in PostGIS 3.3. These
examples are super easy to use, they don't need any data!
The screenshots shown below came from pgAdmin's geometry viewer and will also work with other query GUI tools like QGIS or DBeaver.
ST_Letters
Here's a simple example to get started with ST_Letters
. This will work on any
Postgres database, running the PostGIS extension version 3.3+.
Select ST_Letters('PostGIS');
It's also possible to overlay letters on a map, just like any other polygon.
Since the default for ST_Letters
results in a polygon starting at the baseline
at the origin of the chosen projection, with a maximum height of 100 "units"
(from the bottom of the descenders to the tops of the capitals).
That's not ideal. We need a way to both move it and resize it.
First, we want to make a point in the middle of San Francisco in order to serve as a centroid for where we want to move the letters, and we also want to rescale the letters in order to approximately fit over the City of San Francisco. Using the formula for converting units in WGS84 to meters, 0.001 works approximately well enough to fit over the San Francisco Bay Area.
Next we use ST_Translate
in order to move the letters from the top of the map
to fit over the Bay Area. Finally, mostly because it looks cool, we use
ST_Rotate
to rotate the polygon 45 degrees.
WITH
san_fran_pt AS (
SELECT ST_Point(-122.48, 37.758, 4326) AS geom),
letters AS (
SELECT ST_Scale(ST_SetSRID(
ST_Letters('San Francisco'), 4326),
0.001, 0.001) AS geom),
letters_geom AS (
SELECT ST_Translate(
letters.geom,
ST_X(san_fran_pt.geom) - ST_X(ST_Centroid(letters.geom)),
ST_Y(san_fran_pt.geom) - ST_Y(ST_Centroid(letters.geom))
) AS geom
FROM letters, san_fran_pt
)
SELECT ST_Rotate(geom, -pi() / 4, ST_Centroid(geom))
FROM letters_geom;
ST_ConcaveHull
demo'd with ST_Letters
A great use case for ST_Letters
is for demoing PostGIS functions. In this
post, I'm going to demo the function ST_ConcaveHull
, which creates a concave
polygon which encloses the vertices of a target geometry. ST_ConcaveHull
was
recently updated in PostGIS 3.3.0, in order to use GEOS 3.11, which makes the
input parameters easier to understand and results in a large speed upgrade.
Here's a short demo of how different parameters of param_pctconvex
and
param_allow_holes
for ST_ConcaveHull
operate on points generated by
ST_GeneratePoints
and ST_Letters
.
First, let's generate a table of randomly generated points that fill in the letters in 'postgis'.
CREATE TABLE public.word_pts AS
WITH word AS (
SELECT ST_Letters('postgis') AS geom
),
letters AS ( -- dump letter multipolygons into individual polygons
SELECT (ST_Dump(word.geom)).geom
FROM word
)
SELECT
letters.geom AS polys,
ST_GeneratePoints(letters.geom, 100) AS pts
FROM letters;
SELECT pts FROM word_pts.pts
Then, we set the convexity to a fairly high parameter (param_pctconvex=0.75
,
indicating a highly convex shape), and don't allow there to be holes in the
shape (param_allow_holes=false
)
SELECT ST_ConcaveHull(pts, 0.75, false) FROM word_pts;
Doesn't look much like 'postgis'!
Next, we reduce the convexity, but don't allow holes in the shape.
SELECT ST_ConcaveHull(pts, 0.5, false) FROM word_pts;
A little better, but still hard to recognize 'postgis'. What if we allowed holes?
SELECT ST_ConcaveHull(pts, 0.5, true) FROM word_pts;
This starts to look a bit more like the word 'postgis', with the hole in 'p' being clear.
As we start to make the shape more concave, it begins to take on more and more recognizable as 'postgis'....until it doesn't and starts to look closer to modern art.
SELECT ST_ConcaveHull(pts, 0.35, true) FROM word_pts;
SELECT ST_ConcaveHull(pts, 0.05, true) FROM word_pts;
ST_ConcaveHull
is also useful on multipolygons, and follows the same
properties as demo'd on multipoints. It's important to note that if there are
already holes in the existing multipolygon, setting param_allow_holes=false
will still create convex polygons with "holes" in the middle, following the
original polygon. The concave hulls will always contains the original polygons!
SELECT ST_ConcaveHull(ST_Letters('postgis'), 0.5, false);
As the convexity decreases and holes are allowed, the shape looks more and more like the original polygons in the original table.
SELECT ST_ConcaveHull(ST_Letters('postgis'), 0.1, true);
ST_TriangulatePolygon
The last demo here is the function ST_TriangulatePolygon
, new in PostGIS 3.3.
This function computes the "best quality" triangulation of a polygon (and also
works on multipolygons too!). This can be extremely useful for computing meshes
of polygons in a quick and efficient manner.
SELECT ST_TriangulatePolygon(ST_Letters('postgis'));
ST_Letters
provides a useful starting point for demoing functions on points
and polygons. The new improvements in ST_ConcaveHull
make it more useful for
generating concave hulls of geometries and they are significantly more intuitive
to use. ST_TriangulatePolygon
can be useful for finding meshes of polygons and
multipolygons. The team at Crunchy Data will continue to make important
contributions to PostGIS in order to help our users create interesting and
innovative open source solutions!
by Jacob Coblentz (Jacob.Coblentz@crunchydata.com) at January 12, 2023 03:00 PM
Imagine your system captures event data from all over the world but the data all comes in UTC time giving no information about the local timing of an event. How can we quickly convert between the UTC timestamp and local time zone using GPS location? We can quickly solve this problem using PostgreSQL and PostGIS.
This example assumes you have a Postgres database running with PostGIS. If you’re new to PostGIS, see PostGIS for Newbies.
Below is an overview of the data relationship and join conditions we will be using.
A “shape file” commonly refers to a collection of files with .shp, .shx, .dbf, and other extensions on a common prefix name, which in our case is combined-shapefile-with-oceans.*. **combined-shapefile-with-oceans contains polygons with the boundaries of the world's timezones. With this data we can start our process.
We will be using shp2pgsql to generate sql file from shape file to create public.timezones_with_ocean and inserts data in a table. The table contains fields gid, tzid and geometry.
Export the Host, user, password variables
export PGHOST=p.<pgcluster name>.db.postgresbridge.com
export PGUSER=<DBUser>
export PGPASSWORD=<dbPassword>
Create sql file from shape file
shp2pgsql -s 4326 "combined-shapefile.shp" public.timezones_with_oceans > timezone_shape.sql
Create public.timezones_with_ocean and load timestamp data
psql -d timestamp -f timezone_shape.sql
Query a bit of sample data
SELECT tzid, ST_AsText(geom), geom FROM public.timezone_with_oceans limit 10;
Visualize Sample data
Using PgAdmin highlight geom column and click on eye icon visualize the geometry on map showing below.
PostgreSQL provides a view of pg_timezone_names with a list of time zone names recognized by SET TIMEZONE. By default, PostgreSQL also provides their associated abbreviations, UTC offsets, and daylight-savings status, which our clients need to know.
pg_timezone_names view columns description
Column | Type | Description |
---|---|---|
name | text | Time zone name |
abbrev | text | Time zone abbreviation |
utc_offset | interval | Offset from UTC (positive means east of Greenwich) |
is_dst | bool | True if currently observing daylight savings |
pg_timezone sample data
Now that we have the timezone shape file loaded, we can create an event table, load sample transaction data, and apply a timestamp conversion transformation query.
CREATE TABLE IF NOT EXISTS public.events
(
event_id bigint NOT NULL,
eventdatetime timestamp without time zone NOT NULL,
event_type varchar(25) not null,
latitude double precision NOT NULL,
longitude double precision NOT NULL,
CONSTRAINT events_pkey PRIMARY KEY (event_id)
);
INSERT INTO public.events(
event_id, eventdatetime, event_type, latitude, longitude)
VALUES (10086492,'2021-08-17 23:17:05','Walking',34.894089,-86.51148),
(50939,'2021-08-19 10:27:12','Hiking',34.894087,-86.511484),
(10086521,'2021-09-09 19:32:37','Swiming',34.642584,-86.761291),
(22465493,'2021-09-30 11:43:34','Swiming',33.611151,-86.799522),
(22465542,'2021-11-26 22:40:44.197','Swiming',34.64259,-86.761452),
(22465494,'2021-09-30 11:43:34','Hiking',33.611151,-86.799522),
(10087348,'2021-07-01 13:42:15','Swiming',25.956098,-97.535303),
(22466679,'2021-09-01 12:25:06','Hiking',25.956112,-97.535304),
(22466685,'2021-09-02 13:41:07','Swiming',25.956102,-97.535305),
(10088223,'2021-11-29 13:19:53','Hiking',25.956097,-97.535303),
(22246192,'2021-06-16 22:21:23','Walking',37.083726,-113.577984),
(9844188,'2021-06-23 20:18:43','Swiming',37.1067,-113.561401),
(22246294,'2021-06-25 21:50:06','Walking',37.118719,-113.598038),
(22246390,'2021-07-01 18:15:54','Hiking',37.109579,-113.562923),
(9844332,'2021-07-04 19:11:13','Walking',37.251538,-113.614708),
(9845242,'2021-11-04 13:25:40.425','Swiming',37.251542,-113.614699),
(84843,'2021-11-23 14:33:20','Swiming',37.251541,-113.614698),
(22247674,'2021-12-21 14:31:15','Swiming',37.251545,-113.614691),
(22246714,'2021-08-09 14:46:51','Swiming',37.109597,-113.562912),
(9845116,'2021-10-18 14:59:51','Swiming',37.082777,-113.554991);
Sample Event Data
Now we can convert the UTC timestamp to the local time for an event. Using PostGIS function St_Intersects, we can find the timezone_with_oceans.geom polygon in which an event point lies. This gives the name of the timezone where the event occurred. To create our transformation query:
First we create the location geometry using Longitude and Latitude from the events table.
Using PostGIS function St_Intersects, we will find common points between timezone_with_oceans.geom and an event’s location geometry giving us information on where the event occurred.
Join pg_timezone_names to timezone_with_oceans on name and tzid respectively, to retrieve abbrev, utc_offset, and is_dst fields from pg_timezone_names.
Using PostgreSQL AT TIME ZONE operator and pg_timezone_name, we convert UTC event timestamp to local event timestamp completing the process, e.g.
timestamp '2021-07-05 00:59:12' at time zone 'America/Denver' → 2021-07-04 18:59:12+00’
Transformation Query SQL:
SELECT event_id, latitude, longitude, abbrev,
utc_offset,is_dst, eventdatetime,
((eventdatetime::timestamp WITH TIME ZONE AT TIME ZONE abbrev)::timestamp WITH TIME ZONE)
AS eventdatetime_local
FROM public.events
JOIN timezone_with_oceans ON ST_Intersects(ST_Point(longitude, latitude, 4326) , geom)
JOIN pg_timezone_names ON tzid = name;
PostgreSQL and PostGIS allow you to easily and dynamically solve timezone transformation. I hope this blog was helpful, and we at Crunchy Data wish you happy learning.
by Rekha Khandhadia (Rekha.Khandhadia@crunchydata.com) at January 06, 2023 03:00 PM
Elizabeth Chistensen already gave a succinct summary of some of the PostGIS Day 2022 presentations, which you can see here.
There were many case study presentations which involved use of PostGIS, QGIS, OpenStreetMap, and pgRouting (and other extensions that extend PostGIS) as well as many "How to" videos. There were also talks on "How PostGIS is made". I'll highlight some of these, which overlap with Elizabeth's list but different angle of view.
Continue reading "PostGIS Day 2022 Videos are out and some more highlights"by Regina Obe (nospam@example.com) at December 04, 2022 09:27 PM
Crunchy Data hosted the 4th annual PostGIS Day on November 17, 2022. PostGIS Day always comes a day after GIS Day which occurs annually on the 3rd Wednesday of November.
We had speakers from 10 different countries and attendees from more than 70 countries.
PostGIS is the most popular spatial relational database worldwide with:
What blew me away this year is how PostGIS touches so many parts of our everyday lives. From high speed internet, to water and flood management, there’s PostGIS supporting major parts of the infrastructure we depend on every day.
We heard a talk about the GISwater project, which is open source asset management for water and wastewater tracking. Helping humanity, bringing the world water resources, using the open source tools. What could be better? This project just gives everyone all the feels.
In a similar way, my ears really perked up with the Vizzuality talk on using PostGIS for conversation efforts, tracking food supply chains, and sustainability.
We were really lucky to have two federal agencies join us as well. USDA talked about using PostGIS as the engine behind the Dynamic Soils Hub project, which tracked a number of key soil changes like erosion and farmland loss across America. A developer who works alongside the National Flood Insurance Program talked about some of his work to collect several massive open data sets and merge them into a set of building outlines for the entire country.
I also really appreciated several talks that went a little bit outside the box to talk about how PostGIS can be used in a more people focused context. We had a nice talk about using PostGIS/QGIS to look at how you can explore infrastructure and how it impacts social inequity and health outcomes. There was also a great talk about routing people's movements outside roads, like on foot or inside buildings. Another talk went outside the box, or should I say inside the goal box, with a talk on how geospatial data is being collected and used in hockey and sports.
We were really fortunate to get a couple talks on how developers in Africa are using PostGIS. We had a talk on using QField, QGIS, and PostGIS for urban planning in Nigeria. A developer from Tanzania showed an isochrone map plugin he wrote for QGIS. Isochrone maps show distance from a certain point with time considered.
We had two talks from Canada, where they’re aiming to have the entire country covered by high speed internet by 2030. One of the talks described building a design system in QGIS and PostGIS and even training internal designers to use these tools. We had a second talk about the telecom industry, this one more focused on getting things up to scale for backing a really large enterprise.
PosGIS Day also had a really broad set of talks about ecosystem tools. Seeing these all in total, you realize how big the world of PostGIS is and how many people are building large scale applications out of geospatial data.
Thanks to everyone who participated this year! If you missed it, plan to join us next year. Mark your calendar, we’ll open a call for papers in September of 2023. Videos are linked above or available as a full playlist on Crunchy Data’s YouTube channel.
PostGIS is a supported extension with Crunchy Certified Postgres and is packaged with our products on traditional infrastructure, Kubernetes, and our fully managed Crunchy Bridge. Crunchy Data supports a variety of enterprise clients using PostGIS in production. If you’re using PostGIS and are looking for a host or supported distribution, contact us, we’d love to chat.
by Elizabeth Christensen (Elizabeth.Christensen@crunchydata.com) at December 02, 2022 03:00 PM
I didn't think last two years Post GIS Day conferences could be topped, but I was wrong. This year's was absolutely fabulous and better than all the others. We had two key conferences going on this year (yesterday). The first was the Cruncy Data PostGIS Day 2022 12 hour marathon of nothing but PostGIS related talks. Many thanks to Elizabeth Christensen and Paul Ramsey for putting it all together and for 12 hrs. The PostGIS day celebrations climaxed with many of us playing capture the flag on Paul's new shiny invention until we managed to crash it.
Continue reading "Post GIS Day 2022 Celebrations"by Regina Obe (nospam@example.com) at November 19, 2022 12:29 AM
The PostGIS development team is pleased to provide security, bug fixes and performance enhancements 3.3.2, 3.2.4, 3.1.8 and 3.0.8 for the 3.3, 3.2, 3.1, and 3.0 stable branches.
The PostGIS development team is pleased to provide security, bug fixes and performance enhancements 3.3.2, 3.2.4, 3.1.8 and 3.0.8 for the 3.3, 3.2, 3.1, and 3.0 stable branches.
The PostGIS Team is pleased to release PostGIS 2.5.9! This is the end-of-life release of the 2.5 branch.
This release is a bug fix release, addressing issues found in the previous 2.5 releases.
The PostGIS Team is pleased to release PostGIS 2.5.9! This is the end-of-life release of the 2.5 branch.
This release is a bug fix release, addressing issues found in the previous 2.5 releases.
In a recent post, we introduced pg_eventserv and the real-time web notifications from database actions.
In this post, we will dive into a practical use case: displaying state, calculating events, and tracking historical location for a set of moving objects.
This demonstration uses pg_eventserv for eventing, and pg_featureserv for external web API, and OpenLayers as the map API, to build a small example application that shows off the common features of moving objects systems.
Moving objects applications can be very complex or very simple, but they usually include a few common baseline features:
The model has three tables:
From the outside, the system has the following architecture:
Changes to objects are communicated in via a web API backed by pg_featureserv, those changes fires a bunch of triggers that generate events that pg_eventserv pushes out to listening clients via WebSockets.
The user interface generates object movements, via the arrow buttons for each object. This is in lieu of a real "moving object" fleet in the real world generating timestamped GPS tracks.
Every movement click on the UI fires a call to a web API, which is just a
function published via
pg_featureserv,
object_move(object_id, direction)
.
CREATE OR REPLACE FUNCTION postgisftw.object_move(
move_id integer, direction text)
RETURNS TABLE(id integer, geog geography)
AS $$
DECLARE
xoff real = 0.0;
yoff real = 0.0;
step real = 2.0;
BEGIN
yoff := CASE
WHEN direction = 'up' THEN 1 * step
WHEN direction = 'down' THEN -1 * step
ELSE 0.0 END;
xoff := CASE
WHEN direction = 'left' THEN -1 * step
WHEN direction = 'right' THEN 1 * step
ELSE 0.0 END;
RETURN QUERY UPDATE moving.objects mo
SET geog = ST_Translate(mo.geog::geometry, xoff, yoff)::geography,
ts = now()
WHERE mo.id = move_id
RETURNING mo.id, mo.geog;
END;
$$
LANGUAGE 'plpgsql' VOLATILE;
The object_move(object_id, direction)
function just converts the "direction"
parameter into a movement vector, and UPDATES
the relevant row of the
objects
table.
The change to the objects
table fires off the objects_geofence()
trigger,
which calculates the fences the object is now in.
CREATE FUNCTION objects_geofence() RETURNS trigger AS $$
DECLARE
fences_new integer[];
BEGIN
-- Add the current geofence state to the input
-- tuple every time.
SELECT coalesce(array_agg(id), ARRAY[]::integer[])
INTO fences_new
FROM moving.geofences
WHERE ST_Intersects(geofences.geog, new.geog);
RAISE DEBUG 'fences_new %', fences_new;
-- Ensure geofence state gets saved
NEW.fences := fences_new;
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
objects
table then fires off the objects_update()
trigger, which:
objects_history
tracking table.NOTIFY
queue using pg_notify()
.CREATE FUNCTION objects_update() RETURNS trigger AS
$$
DECLARE
channel text := 'objects';
fences_old integer[];
fences_entered integer[];
fences_left integer[];
events_json jsonb;
location_json jsonb;
payload_json jsonb;
BEGIN
-- Place a copy of the value into the history table
INSERT INTO moving.objects_history (id, geog, ts, props)
VALUES (NEW.id, NEW.geog, NEW.ts, NEW.props);
-- Clean up any nulls
fences_old := coalesce(OLD.fences, ARRAY[]::integer[]);
RAISE DEBUG 'fences_old %', fences_old;
-- Compare to previous fences state
fences_entered = NEW.fences - fences_old;
fences_left = fences_old - NEW.fences;
RAISE DEBUG 'fences_entered %', fences_entered;
RAISE DEBUG 'fences_left %', fences_left;
-- Form geofence events into JSON for notify payload
WITH r AS (
SELECT 'entered' AS action,
g.id AS geofence_id,
g.label AS geofence_label
FROM moving.geofences g
WHERE g.id = ANY(fences_entered)
UNION
SELECT 'left' AS action,
g.id AS geofence_id,
g.label AS geofence_label
FROM moving.geofences g
WHERE g.id = ANY(fences_left)
)
SELECT json_agg(row_to_json(r))
INTO events_json
FROM r;
-- Form notify payload
SELECT json_build_object(
'type', 'objectchange',
'object_id', NEW.id,
'events', events_json,
'location', json_build_object(
'longitude', ST_X(NEW.geog::geometry),
'latitude', ST_Y(NEW.geog::geometry)),
'ts', NEW.ts,
'color', NEW.color,
'props', NEW.props)
INTO payload_json;
RAISE DEBUG '%', payload_json;
-- Send the payload out on the channel
PERFORM (
SELECT pg_notify(channel, payload_json::text)
);
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
NOTIFY
queue and pushes it out to all listening clients over
WebSockets.Phew! That's a lot!
geofences
table also has a trigger, layer_change()
that
catches insert/update/delete events and publishes a JSON notification with
pg_notify()
. This is also published by
pg_eventserv and when the UI
receives it, it simply forces a full re-load of geofence data.CREATE FUNCTION layer_change() RETURNS trigger AS
$$
DECLARE
layer_change_json json;
channel text := 'objects';
BEGIN
-- Tell the client what layer changed and how
SELECT json_build_object(
'type', 'layerchange',
'layer', TG_TABLE_NAME::text,
'change', TG_OP)
INTO layer_change_json;
RAISE DEBUG 'layer_change %', layer_change_json;
PERFORM (
SELECT pg_notify(channel, layer_change_json::text)
);
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
OK, all done.
All the code and instructions are available in the
moving objects example
of pg_eventserv
.
by Paul Ramsey (Paul.Ramsey@crunchydata.com) at October 24, 2022 03:00 PM
PostgreSQL 15 came out just last week. To celebrate the arrival of PostgreSQL 15, I will revisit the number one problem people have with PostGIS, how to upgrade your PostGIS enabled cluster, without installing an old PostGIS version.
In a previous article Using pg upgrade to upgrade PostGIS without installing older version I demonstrated a trick for upgrading to a newer PostgreSQL instance from PostGIS 2.2 - 2.whatever without having to install the older version of PostGIS in your new PostgreSQL service. This is a revisit of that article, but with considerations for upgrading from a 2 series to a 3 series.
Fear not. Going from PostGIS 2+ to PostGIS 3+ can still be done without installing the old PostGIS 2+ in your new cluster. Additionally once you are on PostGIS 3.1+, you should never have to do symlink or copy hacks to upgrade say PostGIS 3.1 to PostGIS 3.4. At least not for the same major version.
If per chance you are on PostGIS 1 and trying to jump all the way to PostGIS 3+, then sadly you need to do a pg_dump of your old database/cluster and pg_restore into your new cluster.
Continue reading "Using pg_upgrade to upgrade PostgreSQL 9.6 PostGIS 2.4 to PostgreSQL 15 3.3 on Yum"by Regina Obe (nospam@example.com) at October 17, 2022 01:15 AM
There is an elegant mathematical theory of binary relations. Homogeneous relations are an important subclass of binary relations in which both domains are the same. A homogeneous relation R is a subset of all ordered pairs (x,y) with x and y elements of the domain. This can be thought of as a boolean-valued function R(x,y), which is true if the pair has the relationship and false if not.
The restricted structure of homogeneous relations allows describing them by various properties, including:
Reflexive - R(x, x) is always true
Irreflexive - R(x, x) is never true
Symmetric - if R(x, y), then R(y, x)
Antisymmetric - if R(x, y) and R(y, x), then x = y
Transitive - if R(x, y) and R(y, z), then R(x, z)
The Dimensionally Extended 9-Intersection Model (DE-9IM) represents the topological relationship between two geometries.
Various useful subsets of spatial relationships are specified by named spatial predicates. These are the basis for spatial querying in many spatial systems including the JTS Topology Suite, GEOS and PostGIS.
The spatial predicates are homogeneous binary relations over the Geometry domain They can thus be categorized in terms of the relational properties above. The following table shows the properties of the standard predicates:
Predicate | Reflexive / Irreflexive | Symmetric / Antisymmetric | Transitive |
---|---|---|---|
Equals | R | S | T |
Intersects | R | S | - |
Disjoint | R | S | - |
Contains | R | A | - |
Within | R | A | - |
Covers | R | A | T |
CoveredBy | R | A | T |
Crosses | I | S | - |
Overlaps | I | S | - |
Touches | I | S | - |
by Dr JTS (noreply@blogger.com) at October 13, 2022 10:03 PM
I had coffee with an IT colleague here in Victoria last week, and he was interested in getting into core PostgreSQL programming. “What resources would you recommend I look at?”
That’s… a hard question!
PostgreSQL is a huge code base with a multi-decade history. I’ve been poking around the edges for almost 10 years and feel comfortable with the extension APIs, foreign data wrappers, access methods APIs, some system catalogue stuff… maybe 5% of the surface area of the beast?
So, what advice for someone who wants to dive much much deeper than that?
First, start with the vision, and read “The Design of Postgres” (Stonebraker & Rowe, 1985) to get a sense of what distinguished Postgres from its predecessors: complex objects; user extensibility; and active database facilities; all while retaining relational concepts.
Second, take a run through the Bruce Momjain’s “internals” presentations. These tend to be a little older, Bruce hasn’t been doing deep core work for a while, but he’s an expert teacher and explainer, so they are useful to get a feel for the shape of things. In a similar (and more recent) vein, my colleague Stephen Frost walks through the code base in this 2018 talk about adding a new feature to PostgreSQL.
Third, consider spending some time with “The Internals of PostgreSQL”. This is a very detailed look at PostgreSQL subsystems, including header structures and data flow. As with any book, it may have already drifted a bit from the particulars of current PostgreSQL, but there is no other resource I know that even attempts to explain internals at this granularity.
Fourth, the source code itself is an amazing resource, and the commentary in header files and function descriptions is very good. The incredibly detailed and stringent source code review process of the PostgreSQL community not only expects good code, but also good documentation of what the code does. I’m not sure how much this can be traced back to the influence of Tom Lane (whose comments are frequently complete technical manuals) but regardless the culture of good internal documentation is in place, and makes the code as “approachable” as a system of this complexity could hope to be.
Now things get harder.
Conference talks are a moving target, because they tend to focus on the leading edge of things, but there are some community members who regularly give talks about their work on core topics, that must necessarily explain how things work in order to contextualize things.
Unfortunately, PostgreSQL conferences have a habit of … not making recordings! So there’s relatively little online. I have seen some amazing talks (the multi-session query planner master class Tom Lane gave at PgCon 2011 sticks out) but most of them are lost to the ages.
The best conference to go to for core technical content is undoubtedly PgCon. (I will see you there this spring!)
COVID robbed us of many things, but it did cause PgCon to record and publish a number of standout technical talks that might otherwise have never been available.
Here’s the presenters I always mark down in my program and rush to get a seat for:
I would also go to any talk Tom Lane felt like giving. And Thomas Vondra, and Thomas Munro, and Oleg Bartunov.
Learning the PostgreSQL code base is a journey of a million steps, that’s for sure. One thing that all my effective personal learning has had in common is a practical need. My learning has been in support of some actual work that needed to be done for my employer of the moment. It had motivation, a start point and an end point. That was really helpful.
Best of luck in your PostgreSQL journeys!
Because of course I left out some stuff in the first draft:
It’s been a while since we last talked about MobilityDB in 2019 and 2020. Since then, the project has come a long way. It joined OSGeo as a community project and formed a first PSC, including the project founders Mahmoud Sakr and Esteban Zimányi as well as Vicky Vergara (of pgRouting fame) and yours truly.
This post is a quick teaser tutorial from zero to computing closest points of approach (CPAs) between trajectories using MobilityDB.
The easiest way to get started with MobilityDB is to use the ready-made Docker container provided by the project. I’m using Docker and WSL (Windows Subsystem Linux on Windows 10) here. Installing WLS/Docker is out of scope of this post. Please refer to the official documentation for your operating system.
Once Docker is ready, we can pull the official container and fire it up:
docker pull mobilitydb/mobilitydb
docker volume create mobilitydb_data
docker run --name "mobilitydb" -d -p 25432:5432 -v mobilitydb_data:/var/lib/postgresql mobilitydb/mobilitydb
psql -h localhost -p 25432 -d mobilitydb -U docker
Currently, the container provides PostGIS 3.2 and MobilityDB 1.0:
Once the container is running, we can already connect to it from QGIS. This is my preferred way to load data into MobilityDB because we can simply drag-and-drop any timestamped point layer into the database:
For this post, I’m using an AIS data sample in the region of Gothenburg, Sweden.
After loading this data into a new table called ais, it is necessary to remove duplicate and convert timestamps:
CREATE TABLE AISInputFiltered AS
SELECT DISTINCT ON("MMSI","Timestamp") *
FROM ais;
ALTER TABLE AISInputFiltered ADD COLUMN t timestamp;
UPDATE AISInputFiltered SET t = "Timestamp"::timestamp;
Afterwards, we can create the MobilityDB trajectories:
CREATE TABLE Ships AS
SELECT "MMSI" mmsi,
tgeompoint_seq(array_agg(tgeompoint_inst(Geom, t) ORDER BY t)) AS Trip,
tfloat_seq(array_agg(tfloat_inst("SOG", t) ORDER BY t) FILTER (WHERE "SOG" IS NOT NULL) ) AS SOG,
tfloat_seq(array_agg(tfloat_inst("COG", t) ORDER BY t) FILTER (WHERE "COG" IS NOT NULL) ) AS COG
FROM AISInputFiltered
GROUP BY "MMSI";
ALTER TABLE Ships ADD COLUMN Traj geometry;
UPDATE Ships SET Traj = trajectory(Trip);
Once this is done, we can load the resulting Ships layer and the trajectories will be loaded as lines:
To compute the closest point of approach between two moving objects, MobilityDB provides a shortestLine function. To be correct, this function computes the line connecting the nearest approach point between the two tgeompoint_seq. In addition, we can use the time-weighted average function twavg to compute representative average movement speeds and eliminate stationary or very slowly moving objects:
SELECT S1.MMSI mmsi1, S2.MMSI mmsi2,
shortestLine(S1.trip, S2.trip) Approach,
ST_Length(shortestLine(S1.trip, S2.trip)) distance
FROM Ships S1, Ships S2
WHERE S1.MMSI > S2.MMSI AND
twavg(S1.SOG) > 1 AND twavg(S2.SOG) > 1 AND
dwithin(S1.trip, S2.trip, 0.003)
In the QGIS Browser panel, we can right-click the MobilityDB connection to bring up an SQL input using Execute SQL:
The resulting query layer shows where moving objects get close to each other:
To better see what’s going on, we’ll look at individual CPAs:
Since our filtered AIS layer has proper timestamps, we can animate it using the Temporal Controller. This enables us to replay the movement and see what was going on in a certain time frame.
I let the animation run and stopped it once I spotted a close encounter. Looking at the AIS points and the shortest line, we can see that MobilityDB computed the CPAs along the trajectories:
A more targeted way to investigate a specific CPA is to use the Temporal Controllers’ fixed temporal range mode to jump to a specific time frame. This is helpful if we already know the time frame we are interested in. For the CPA use case, this means that we can look up the timestamp of a nearby AIS position and set up the Temporal Controller accordingly:
I hope you enjoyed this quick dive into MobilityDB. For more details, including talks by the project founders, check out the project website.
This post is part of a series. Read more about movement data in GIS.
The PostGIS Team is pleased to release PostGIS 3.3.1.
This is a bug fix release to address an issue compiling against PostgreSQL 15 Beta 4.
Best served with PostgreSQL 15 Beta 4.
The PostGIS Team is pleased to release PostGIS 3.3.1.
This is a bug fix release to address an issue compiling against PostgreSQL 15 Beta 4.
Best served with PostgreSQL 15 Beta 4.
The PostGIS Team is pleased to release PostGIS 3.3.0.
The PostGIS Team is pleased to release PostGIS 3.3.0.
The PostGIS Team is pleased to release PostGIS 3.3.0rc2! Best Served with PostgreSQL 15 beta3 ,GEOS 3.11.0 , and SFCGAL 1.4.1
Lower versions of the aforementioned dependencies will not have all new features.
This release supports PostgreSQL 11-15.
This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.3.
The PostGIS Team is pleased to release PostGIS 3.3.0rc2! Best Served with PostgreSQL 15 beta3 ,GEOS 3.11.0 , and SFCGAL 1.4.1
Lower versions of the aforementioned dependencies will not have all new features.
This release supports PostgreSQL 11-15.
This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.3.
The PostGIS development team is pleased to provide bug fix and performance enhancements 3.2.3, 3.1.7, 3.0.7, 2.5.8 for the 3.2, 3.1, 3.0, 2.5 stable branches.
The PostGIS development team is pleased to provide bug fix and performance enhancements 3.2.3, 3.1.7, 3.0.7, 2.5.8 for the 3.2, 3.1, 3.0, 2.5 stable branches.
Find me all the things in set "A" that are not in set "B".
This is a pretty common query pattern, and it occurs in both non-spatial and spatial situations. As usual, there are multiple ways to express this query in SQL, but only a couple queries will result in the best possible performance.
The non-spatial setup starts with two tables with the numbers 1 to 1,000,000 in them, then deletes two records from one of the tables.
CREATE TABLE a AS SELECT generate_series(1,1000000) AS i
ORDER BY random();
CREATE INDEX a_x ON a (i);
CREATE TABLE b AS SELECT generate_series(1,1000000) AS i
ORDER BY random();
CREATE INDEX b_x ON b (i);
DELETE FROM b WHERE i = 444444;
DELETE FROM b WHERE i = 222222;
ANALYZE;
The spatial setup is a 2M record table of geographic names, and a 3K record table of county boundaries. Most of the geonames are inside counties (because we tend to names things on land) but some of them are not (because sometimes we name things in the ocean, or our boundaries are not detailed enough).
Since the problem statement includes the words "not in", this form of the query seems superficially plausible:
SELECT i
FROM a
WHERE i NOT IN (SELECT i FROM b);
Perfect! Give me everything from "A" that is not in "B"! Just what we want? In fact, running the query takes so long that I never got it to complete. The explain gives some hints.
QUERY PLAN
------------------------------------------------------------------------------
Gather (cost=1000.00..5381720008.33 rows=500000 width=4)
Workers Planned: 2
-> Parallel Seq Scan on a (cost=0.00..5381669008.33 rows=208333 width=4)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..23331.97 rows=999998 width=4)
-> Seq Scan on b (cost=0.00..14424.98 rows=999998 width=4)
Note that the subquery ends up materializing the whole second table into memory, where it is scanned over and over and over to test each key in table "A". Not good.
PostgreSQL supports some
set-based key words
that allow you to find logical combinations of queries: UNION
, INTERSECT
and
EXCEPT
.
Here, we can make use of EXCEPT
.
SELECT a.i FROM a
EXCEPT
SELECT b.i FROM b;
The SQL still matches our mental model of the problem statement: everything in "A" except for everything in "B".
And it returns a correct answer in about 2.3 seconds.
i
--------
222222
444444
(2 rows)
The query plan is interesting: the two tables are appended and then sorted for duplicates and then only non-dupes are omitted!
QUERY PLAN
---------------------------------------------------------------------------------------------
SetOp Except (cost=322856.41..332856.40 rows=1000000 width=8)
-> Sort (cost=322856.41..327856.41 rows=1999998 width=8)
Sort Key: "*SELECT* 1".i
-> Append (cost=0.00..58849.95 rows=1999998 width=8)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..24425.00 rows=1000000 width=8)
-> Seq Scan on a (cost=0.00..14425.00 rows=1000000 width=4)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..24424.96 rows=999998 width=8)
-> Seq Scan on b (cost=0.00..14424.98 rows=999998 width=4)
It's a big hammer, but it works.
The best approach is the "anti-join". One way to express an anti-join is with a special "correlated subquery" syntax:
SELECT a.i
FROM a
WHERE NOT EXISTS
(SELECT b.i FROM b WHERE a.i = b.i);
So this returns results from "A" only where those results result in a no-record-returned subquery against "B".
It takes about 850 ms on my test laptop, so 3 times faster than using
EXCEPT
in this test, and gets the right answer. The query plan looks like
this:
QUERY PLAN
------------------------------------------------------------------------------------
Gather (cost=16427.98..31466.36 rows=2 width=4)
Workers Planned: 2
-> Parallel Hash Anti Join (cost=15427.98..30466.16 rows=1 width=4)
Hash Cond: (a.i = b.i)
-> Parallel Seq Scan on a (cost=0.00..8591.67 rows=416667 width=4)
-> Parallel Hash (cost=8591.66..8591.66 rows=416666 width=4)
-> Parallel Seq Scan on b (cost=0.00..8591.66 rows=416666 width=4)
The same sentiment can be expressed without the NOT EXISTS
construct, using
only basic SQL and a LEFT JOIN
:
SELECT a.i
FROM a
LEFT JOIN b ON (a.i = b.i)
WHERE b.i IS NULL;
This also takes about 850 ms.
The LEFT JOIN
is required to return a record for every row of "A". So what
does it do if there's no record in "B" that satisfies the join condition? It
returns NULL
for the columns of "B" in the join relation for those records.
That means any row with a NULL
in a column of "B" that is normally non-NULL
is a record in "A" that is not in "B".
The nice thing about the LEFT JOIN
expression of of the solution is that it
generalizes nicely to arbitrary join conditions, like those using spatial
predicate functions.
"Find the geonames points that are not inside counties"... OK, we will
LEFT JOIN
geonames with counties and find the records where counties are
NULL
.
SELECT g.name, g.geonameid, g.geom
FROM geonames g
LEFT JOIN counties c
ON ST_Contains(c.geom, g.geom)
WHERE g.geom IS NULL;
The answer pops out in about a minute.
Unsurprisingly, that's about how long a standard inner join takes to associate the 2M geonames with the 3K counties, since the anti-join has to do about that much work to determine which records do not match the join condition.
WHERE NOT EXISTS
and LEFT JOIN
patterns both result in the most
efficient query plans and executions.by Paul Ramsey (Paul.Ramsey@crunchydata.com) at August 15, 2022 12:00 PM
The PostGIS Team is pleased to release PostGIS 3.3.0rc1! Best Served with PostgreSQL 15 beta2 ,GEOS 3.11.0 , and SFCGAL 1.4.1
Lower versions of the aforementioned dependencies will not have all new features.
This release supports PostgreSQL 11-15.
This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.2 and new features.
The PostGIS Team is pleased to release PostGIS 3.3.0rc1! Best Served with PostgreSQL 15 beta2 ,GEOS 3.11.0 , and SFCGAL 1.4.1
Lower versions of the aforementioned dependencies will not have all new features.
This release supports PostgreSQL 11-15.
This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.2 and new features.
The PostGIS Team is pleased to release PostGIS 3.2.2! This release works for PostgreSQL 9.6-15.
This release is a bug fix release, addressing issues found in the previous 3.2 releases.
The PostGIS Team is pleased to release PostGIS 3.2.2! This release works for PostgreSQL 9.6-15.
This release is a bug fix release, addressing issues found in the previous 3.2 releases.
The PostGIS Team is pleased to release PostGIS 3.1.6! This release works for PostgreSQL 9.6-14.
This release is a bug fix release, addressing issues found in the previous 3.1 releases.
The PostGIS Team is pleased to release PostGIS 3.1.6! This release works for PostgreSQL 9.6-14.
This release is a bug fix release, addressing issues found in the previous 3.1 releases.
The PostGIS Team is pleased to release PostGIS 3.0.6. This release works for PostgreSQL 9.5-13.
The PostGIS Team is pleased to release PostGIS 3.0.6. This release works for PostgreSQL 9.5-13.
The sheer cleverness of relational databases is often discounted because we so frequently use them for very simple data management tasks.
Serialize an object into a row, store with unique key. yawwwn
Search for unique key, deserialize row into an object. yawwwwwwn
The real power of relational databases is juggling "relations" (aka tables) in large numbers and figuring out on-the-fly the most effective way to filter out rows and find an answer.
PostgreSQL has an undeniably clever query planning system that auto-tunes based on the data in the system. It samples tables to gain statistics about the distribution of data, and uses those statistics to choose the order of joins and filters applied to the data for the most efficient query execution.
Even more amazing, the query planning system is modular enough to integrate
user-defined data types, like the geometry
and geography
types in PostGIS.
So complex queries involving spatial filters are also correctly planned
on-the-fly.
Mostly, this just works magically. The system has a rough idea of the selectivity of any given filter or join condition, and can use that estimate to choose the most efficient order of operations to execute multi-table and multi-filter queries over complex collections of relational tables.
But what about when things go wrong?
By default, a PostgreSQL data ships with a default_statistics_target
of 100
.
This means that to populate the statistics for a column the database will draw a
sample of 300 * default_statistics_target = 30000
rows.
The system will then use that data to populate the "common value list" and
column histogram with roughly default_statistics_target
entries.
A large table of keys arranged in a Pareto distribution can make things hard for the statistics system. There may be more common keys than can easily fit into the "common value list", so joins get poorly planned for example.
Fortunately, there are some table specific knobs you can turn.
The statistics target on each table and column can be individually changed, so that more data are sampled, and more granular statistics are gathered.
ALTER TABLE people ALTER COLUMN age SET STATISTICS 200;
Gathering more statistics is usually sufficient to correct planning behavior. However, in special cases (like when a query uses multiple filters on strongly correlated columns) it might be necessary to pull out "extended statistics" to gather even more data for planning.
The thing about magic is that underneath there is usually some complex machinery to make it all work transparently.
Pretty much every database user understands that there's a thing called an "index" and that adding an "index" to a column will make searches on that column faster.
For most uses, this rule-of-thumb is correct: indexes make random access of a small number of items faster, at the expense of making inserts and updates to a table slightly slower.
What happens, though, if you are accessing a large number of items from a table?
An index is a tree. It's fast to find one item traversing through a tree, it's called an "index scan". But finding a lot of items involves running through the tree over and over and over. It is slow to index scan all the records in a tree.
An efficient index scan query will have a narrow filter:
SELECT * FROM people WHERE age = 4;
Conversely, reading a table from from front to back can be quite fast, if you are interested in most of the records. This is called a "sequence scan".
An efficient sequence scan query will have a wide filter:
SELECT * FROM people WHERE age BETWEEN 45 AND 95;
For any given filter, a relational database will figure out whether the filter is most efficiently run as an index scan or a sequence scan. But how?
The "selectivity" of a filter is a measure of what proportion of the records will be returned by the filter. A filter that returns only 1% of the records is "very selective", and vice versa.
An ordinary filter on numbers or strings or dates might define an equality or a range as a filter. A spatial filter on geographic objects defines a bounding box as a filter. A filter is just a true/false test that every object in the column can be subjected to.
Here's two spatial filters, which one do you suppose is more selective?
The red one, right? Just like our SQL example above on ages, the smaller rectangle must return fewer records. Except not necessarily. Suppose our data table looked like this.
So the blue box is selective and should use an index scan and the red one should use a sequence scan.
But wait, how does the database know which box is selective, without running a query and actually applying the filters to the data?
When the ANALYZE
command is run, the database will gather "statistics" for
every column that has an index on it. (The "autovacuum" system that runs in the
background of every PostgreSQL cluster is also an "autoanalyze", gathering
statistics at the same time it does table maintenance.)
For data types like integers, real numbers, text and dates the system will gather a "common value list" of values that show up frequently in the table, and a histogram of the frequency of values in different ranges.
For spatial data types, the system builds a spatial histogram of frequencies of occurrance. So, for our example data set, something like this.
Instead of being a potentially unconstrained collection of maybe millions of spatial objects, the database now just has to make a quick summary of a few cells in a histogram to get an estimate of how many features a given query filter will return.
The spatial statistics analyzer in PostGIS is actually a little more sophisticated than that: before filling out the histogram it adjusts the cell widths for each dimension, to try and catch more resolution in places where the data is more dense.
All these statistics do have a cost. Increasing statistics targets will make the
ANALYZE
command run a little slower, as it samples more data. It will also
make the planning stage of queries slower, as the larger collection of
statistics need to be read through and compared to the filters and joins in the
SQL. However, for complex BI queries, more statistics are almost always worth
while to get a better plan for a complex query.
by Paul Ramsey (Paul.Ramsey@crunchydata.com) at June 23, 2022 03:00 PM