### ### Planet PostGIS

Welcome to Planet PostGIS

July 19, 2018

Anita Graser (Underdark)

Movement data in GIS #15: writing a PL/pgSQL stop detection function for PostGIS trajectories

Do you sometimes start writing an SQL query and around at line 50 you get the feeling that it might be getting out of hand? If so, it might be useful to start breaking it down into smaller chunks and wrap those up into custom functions. Never done that? Don’t despair! There’s an excellent PL/pgSQL tutorial on postgresqltutorial.com to get you started.

To get an idea of the basic structure of a PL/pgSQL function and to proof that PostGIS datatypes work just fine in this context, here’s a basic function that takes a trajectory geometry and outputs its duration, i.e. the difference between its last and first timestamp:

CREATE OR REPLACE FUNCTION AG_Duration(traj geometry) 
RETURNS numeric LANGUAGE 'plpgsql'
AS $BODY$ 
BEGIN
RETURN ST_M(ST_EndPoint(traj))-ST_M(ST_StartPoint(traj));
END; $BODY$;

My end goal for this exercise was to implement a function that takes a trajectory and outputs the stops along this trajectory. Commonly, a stop is defined as a long stay within an area with a small radius. This leads us to the following definition:

CREATE OR REPLACE FUNCTION AG_DetectStops(
   traj geometry, 
   max_size numeric, 
   min_duration numeric)
RETURNS TABLE(sequence integer, geom geometry) 
-- implementation follows here!

Note how this function uses RETURNS TABLE to enable it to return all the stops that it finds. To add a line to the output table, we need to assign values to the sequence and geom variables and then use RETURN NEXT.

Another reason to use PL/pgSQL is that it enables us to write loops. And loops I wanted for my stop detection function! Specifically, I wanted to go through all the points in the trajectory:

FOR pt IN SELECT (ST_DumpPoints(traj)).geom LOOP
-- here comes the magic!
END LOOP;

Eventually the function should go through the trajectory and identify all segments that stay within an area with max_size diameter for at least min_duration time. To test for the area size, we can use:

IF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true; 

Putting everything together, my current implementation looks like this:

CREATE OR REPLACE FUNCTION AG_DetectStops(
   traj geometry,
   max_size numeric,
   min_duration numeric)
RETURNS TABLE(sequence integer, geom geometry) 
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE 
   pt geometry;
   segment geometry;
   is_stop boolean;
   previously_stopped boolean;
   stop_sequence integer;
   p1 geometry;
BEGIN
segment := NULL;
sequence := 0;
is_stop := false;
previously_stopped := false;
p1 := NULL;
FOR pt IN SELECT (ST_DumpPoints(traj)).geom LOOP
   IF segment IS NULL AND p1 IS NULL THEN 
      p1 := pt; 
   ELSIF segment IS NULL THEN 
      segment := ST_MakeLine(p1,pt); 
      p1 := NULL;
      IF ST_Length(segment) <= max_size THEN is_stop := true; END IF; 
   ELSE 
      segment := ST_AddPoint(segment,pt); 
      -- if we're in a stop, we want to grow the segment, otherwise we remove points to the specified min_duration 
      IF NOT is_stop THEN 
         WHILE ST_NPoints(segment) > 2 AND AG_Duration(ST_RemovePoint(segment,0)) >= min_duration LOOP
            segment := ST_RemovePoint(segment,0); 
         END LOOP;
      END IF;
      -- a stop is identified if the segment stays within a circle of diameter = max_size
      IF ST_Length(segment) <= max_size THEN is_stop := true; 
      ELSIF ST_Distance(ST_StartPoint(segment),pt) > max_size THEN is_stop := false;
      ELSIF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true; 
      ELSE is_stop := false; 
      END IF; 
      -- if we found the end of a stop, we need to check if it lasted long enough 
      IF NOT is_stop AND previously_stopped THEN 
         IF ST_M(ST_PointN(segment,ST_NPoints(segment)-1))-ST_M(ST_StartPoint(segment)) >= min_duration THEN
            geom := ST_RemovePoint(segment,ST_NPoints(segment)-1); 
            RETURN NEXT;
            sequence := sequence + 1;
            segment := NULL;
            p1 := pt;
         END IF;
      END IF;
   END IF;
   previously_stopped := is_stop;
END LOOP;
IF previously_stopped AND AG_Duration(segment) >= min_duration THEN 
   geom := segment; 
   RETURN NEXT; 
END IF;
END; $BODY$;

While this function is not really short, it’s so much more readable than my previous attempts of doing this in pure SQL. Some of the lines for determining is_stop are not strictly necessary but they do speed up processing.

Performance still isn’t quite where I’d like it to be. I suspect that all the adding and removing points from linestring geometries is not ideal. In general, it’s quicker to find shorter stops in smaller areas than longer stop in bigger areas.

Let’s test! 

Looking for a testing framework for PL/pgSQL, I found plpgunit on Github. While I did not end up using it, I did use its examples for inspiration to write a couple of tests, e.g.

CREATE OR REPLACE FUNCTION test.stop_at_beginning() RETURNS void LANGUAGE 'plpgsql'
AS $BODY$
DECLARE t0 integer; n0 integer;
BEGIN
WITH temp AS ( SELECT AG_DetectStops(
   ST_GeometryFromText('LinestringM(0 0 0, 0 0 1, 0.1 0.1 2, 2 2 3)'),
   1,1) stop 
)
SELECT ST_M(ST_StartPoint((stop).geom)), 
       ST_NPoints((stop).geom) FROM temp INTO t0, n0;	
IF t0 = 0 AND n0 = 3
   THEN RAISE INFO 'PASSED - Stop at the beginning of the trajectory';
   ELSE RAISE INFO 'FAILED - Stop at the beginning of the trajectory';
END IF;
END; $BODY$;

Basically, each test is yet another PL/pgSQL function that doesn’t return anything (i.e. returns void) but outputs messages about the status of the test. Here I made heavy use of the PERFORM statement which executes the provided function but discards the results:

by underdark at July 19, 2018 09:31 PM

July 08, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

Using procedures for batch geocoding and other batch processing

One of the features we are looking forward to in upcoming PostgreSQL 11 is the introduction of procedures via the CREATE PROCEDURE ANSI-SQL construct. The major benefit that sets apart procedures from functions is that procedures are not wrapped in an outer transaction and can have COMMITs within them. This means it's not an all or nothing like it is with functions. Even if you stop a procedure in motion, whatever work has been done and committed is saved. In the case of functions, a stop or failure would roll-back all the work. It also means you can see work in progress of a stored procedure since the work will already have been committed. This is a huge benefit for batch processing. Batch processing covers a lot of use-cases of PostGIS users since a good chunk of PostGIS work involves doing some kind of batch processing of data you get from third-parties or machines.

Continue reading "Using procedures for batch geocoding and other batch processing"

by Leo Hsu and Regina Obe (nospam@example.com) at July 08, 2018 05:40 AM

July 03, 2018

PostGIS Development

PostGIS 2.5.0beta1

The PostGIS development team is pleased to release PostGIS 2.5.0beta1.

This release is a work in progress. No more api changes will be made from this point on before 2.5.0 release. Remaining time will be focused on bug fixes and documentation for the new functionality and performance enhancements under the covers. Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 will offer, you should be running PostgreSQL 11beta2+ and GEOS 3.7.0beta1 which were released recently.

Best served with PostgreSQL 11beta2 which was recently released.

View all closed tickets for 2.5.0.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

2.5.0beta1

by Regina Obe at July 03, 2018 12:00 AM

June 25, 2018

Boston GIS (Regina Obe, Leo Hsu)

New in QGIS 3.2 Save Project to PostgreSQL

We've got customers discovering PostGIS and GIS in general or migrating away from ArcGIS family of tools. When they ask, "How do I see my data?", we often point them at QGIS which is an open source GIS desktop with rich integration with PostGIS/PostgreSQL.

QGIS is something that is great for people who need to live in their GIS environment since it allows for easily laying on other datasources, web services and maps. The DBManager tool allows for more advanced querying (like writing Spatial SQL queries that take advantage of the 100s of functions PostGIS has to offer) , ability to import/export data, and create PostgreSQL views.

QGIS has this thing called Projects, which allow for defining map layers and the symbology associated with them. For example what colors do you color your roads, and any extra symbols, what field attributes do you overlay - street name etc. Projects are usually saved in files with a .qgs or .qgz extension. If you spent a lot of time styling these layers, chances are you want to share them with other people in your group. This can become challenging if your group is not connected via network share.

Continue reading "New in QGIS 3.2 Save Project to PostgreSQL"

by Regina Obe (nospam@example.com) at June 25, 2018 05:47 AM

June 21, 2018

Paul Ramsey

PostGIS Polygon Splitting

One of the joys of geospatial processing is the variety of tools in the tool box, and the ways that putting them together can yield surprising results. I have been in the guts of PostGIS for so long that I tend to think in terms of primitives: either there’s a function that does what you want or there isn’t. I’m too quick to ignore the power of combining the parts that we already have.

A community member on the users list asked (paraphrased): “is there a way to split a polygon into sub-polygons of more-or-less equal areas?”

I didn’t see the question, which is lucky, because I would have said: “No, you’re SOL, we don’t have a good way to solve that problem.” (An exact algorithm showed up in the Twitter thread about this solution, and maybe I should implement that.)

PostGIS developer Darafei Praliaskouski did answer, and provided a working solution that is absolutely brilliant in combining the parts of the PostGIS toolkit to solve a pretty tricky problem. He said:

The way I see it, for any kind of polygon:

  • Convert a polygon to a set of points proportional to the area by ST_GeneratePoints (the more points, the more beautiful it will be, guess 1000 is ok);
  • Decide how many parts you’d like to split into, (ST_Area(geom)/max_area), let it be K;
  • Take KMeans of the point cloud with K clusters;
  • For each cluster, take a ST_Centroid(ST_Collect(point));
  • Feed these centroids into ST_VoronoiPolygons, that will get you a mask for each part of polygon;
  • ST_Intersection of original polygon and each cell of Voronoi polygons will get you a good split of your polygon into K parts.

Let’s take it one step at a time to see how it works.

We’ll use Peru as the example polygon, it’s got a nice concavity to it which makes it a little trickier than an average shape.

CREATE TABLE peru AS 
  SELECT *
  FROM countries
  WHERE name = 'Peru'

Original Polygon (Petu)

Now create a point field that fills the polygon. On average, each randomly placed point ends up “occupying” an equal area within the polygon.

CREATE TABLE peru_pts AS
  SELECT (ST_Dump(ST_GeneratePoints(geom, 2000))).geom AS geom
  FROM peru
  WHERE name = 'Peru'

2000 Random Points

Now, cluster the point field, setting the number of clusters to the number of pieces you want the polygon divided into. Visually, you can now see the divisions in the polygon! But, we still need to get actual lines to represent those divisions.

CREATE TABLE peru_pts_clustered AS
  SELECT geom, ST_ClusterKMmeans(geom, 10) over () AS cluster
  FROM peru_pts;

10 Clusters

Using a point field and K-means clustering to get the split areas was inspired enough. The steps to get actual polygons are equally inspired.

First, calculate the centroid of each point cluster, which will be the center of mass for each cluster.

CREATE TABLE peru_centers AS
  SELECT cluster, ST_Centroid(ST_collect(geom)) AS geom
  FROM peru_pts_clustered
  GROUP BY cluster;

Centroids of Clusters

Now, use a voronoi diagram to get actual dividing edges between the cluster centroids, which end up closely matching the places where the clusters divide!

CREATE TABLE peru_voronoi AS
  SELECT (ST_Dump(ST_VoronoiPolygons(ST_collect(geom)))).geom AS geom
  FROM peru_centers;

Voronoi of Centrois

Finally, intersect the voronoi areas with the original polygon to get final output polygons that incorporate both the outer edges of the polgyon and the voronoi dividing lines.

CREATE TABLE peru_divided AS
  SELECT ST_Intersection(a.geom, b.geom) AS geom
  FROM peru a
  CROSS JOIN peru_voronoi b;

Intersection with Original Polygon

Done!

Clustering a point field to get mostly equal areas, and then using the voronoi to extract actual dividing lines are wonderful insights into spatial processing. The final picture of all the components of the calculation is also beautiful.

All the Components Together

I’m not 100% sure, but it might be possible to use Darafei’s technique for even more interesting subdivisions, like “map of the USA subdivided into areas of equal GDP”, or “map of New York subdivided into areas of equal population” by generating the initial point field using an economic or demographic weighting.

June 21, 2018 08:00 PM

June 18, 2018

Boston GIS (Regina Obe, Leo Hsu)

Coming PostGIS 2.5 ST_OrientedEnvelope

PostGIS 2.5 is just around the corner. One of the new functions coming is the ST_OrientedEnvelop. This is something I've been waiting for for years. It is the true minimum bounding rectangle, as opposed to ST_Envelope which is an axis aligned bounding rectable.

Below is a pictorial view showing the difference between the two.

Continue reading "Coming PostGIS 2.5 ST_OrientedEnvelope"

by Regina Obe (nospam@example.com) at June 18, 2018 09:42 PM

June 09, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PostgresVision 2018 Slides and Impressions

Leo and I attended PostgresVision 2018 which ended a couple of days ago.

We gave a talk on spatial extensions with main focus being PostGIS. Here are links to our slides PostgresVision2018_SpatialExtensions HTML version PDF.

Unfortunately there are no slides of the pgRouting part, except the one that says PGRouting Live Demos because Leo will only do live demos. He has no fear of his demos not working.

Side note, if you are on windows and use the PostGIS bundle, all the extensions listed in the PostGIS box of the spatial extensions diagram, as well as the pointcloud, pgRouting, and ogr_fdw are included in the bundle.

Continue reading "PostgresVision 2018 Slides and Impressions"

by Leo Hsu and Regina Obe (nospam@example.com) at June 09, 2018 08:56 AM

May 28, 2018

PostGIS Development

PostGIS 2.5.0alpha

The PostGIS development team is pleased to release PostGIS 2.5.0alpha.

This release is a work in progress, and some features are still slated to be added. Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 will offer, you should be running PostgreSQL 11beta+.

Best served with PostgreSQL 11beta which was recently released.

View all closed tickets for 2.5.0.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

2.5.0alpha

by Regina Obe at May 28, 2018 12:00 AM

May 14, 2018

Paul Ramsey

PostGIS Talks @ FOSS4G North America

I presented my “PostGIS for Managers” talk for the last time (at least in this form) today at FOSS4G North America. The reason it’s for the last time is that the central conceit it’s built around, that a core decision is between a proprietary and an open source database, isn’t really operative anymore. The real decisions are now being driven by other considerations, like cloud platforms, and the services available there. So, it’s not really PostgreSQL versus Oracle anymore.

I also presented my “SQL Festival” talk, for the first time! New material is always a little challenging: will it work, is it the right level for the audience? It seemed to be well received, a mix of generic SQL tidbits, and some specific interesting queries you can do with PostGIS.

May 14, 2018 04:00 PM

April 16, 2018

Anita Graser (Underdark)

Movement data in GIS #12: why you should be using PostGIS trajectories

In short: both writing trajectory queries as well as executing them is considerably faster using PostGIS trajectories (as LinestringM) rather than the commonly used point-based approach.

Here are a couple of examples to give you an impression of the differences.

Spoiler alert! Trajectory queries are up to 500 times faster than comparable point-based queries.

A quick look at indexing

In both cases, we have indexed the tracker id, geometry, and time columns to speed up query processing.

The trajectory table has 3 indexes

  • gist (time_range)
  • gist (track gist_geometry_ops_nd)
  • btree (tracker)

The point-based table has 4 indexes

  • gist (pt)
  • btree (trajectory_id)
  • btree (tracker)
  • btree (t)

Length

First, let’s see how to determine trajectory length for all observed moving objects (identified by a tracker id).

Using the point-based approach, we first need to ensure that the points are in the correct temporal order, create the lines, and finally sum up their length:

WITH ordered AS (
 SELECT trajectory_id, tracker, t, pt
 FROM geolife.trajectory_pt
 ORDER BY t
), tmp AS (
 SELECT trajectory_id, tracker, st_makeline(pt) traj
 FROM ordered 
 GROUP BY trajectory_id, tracker
)
SELECT tracker, round(sum(ST_Length(traj::geography)))
FROM tmp
GROUP BY tracker 
ORDER BY tracker

With trajectories, we can go right to computing lengths:

SELECT tracker, round(sum(ST_Length(track::geography)))
FROM geolife.trajectory_ext
GROUP BY tracker
ORDER BY tracker

On my test system, the trajectory query run time is 22.7 sec instead of 43.0 sec for the point-based approach:

Duration

Compared to trajectory length, duration is less complicated in the point-based approach:

WITH tmp AS (
 SELECT trajectory_id, tracker, min(t) start_time, max(t) end_time
 FROM geolife.trajectory_pt
 GROUP BY trajectory_id, tracker
)
SELECT tracker, sum(end_time - start_time)
FROM tmp
GROUP BY tracker
ORDER BY tracker

Still, the trajectory query is less complex and much faster at 31 ms instead of 6.0 sec:

SELECT tracker, sum(upper(time_range) - lower(time_range))
FROM geolife.trajectory_ext
GROUP BY tracker
ORDER BY tracker

Temporal filter

Extracting trajectories that occurred during a certain time frame is another common use case:

WITH tmp AS (
 SELECT trajectory_id, tracker, min(t) start_time, max(t) end_time
 FROM geolife.trajectory_pt
 GROUP BY trajectory_id, tracker
)
SELECT trajectory_id, tracker, start_time, end_time
FROM tmp
WHERE end_time > '2008-11-26 11:00'
AND start_time < '2008-11-26 15:00'
ORDER BY tracker

This point-based query takes 6.0 sec while the shorter trajectory query finishes in 12 ms:

SELECT id, tracker, time_range
FROM geolife.trajectory_ext
WHERE time_range && '[2008-11-26 11:00+1,2008-11-26 15:00+01]'::tstzrange

or equally fast (12 ms) by making use of the n-dimensional index:

WHERE track &&&	ST_Collect(
 ST_MakePointM(-180, -90, extract(epoch from '2008-11-26 11:00'::timestamptz)),
 ST_MakePointM(180, 90, extract(epoch from '2008-11-26 15:00'::timestamptz))
)

Spatial filter

Finally, of course, let’s have a look at spatial filters, for example, trajectories that start in a certain area:

WITH my AS ( 
 SELECT ST_Buffer(ST_SetSRID(ST_MakePoint(116.31894,39.97472),4326),0.0005) areaA
), tmp AS (
 SELECT trajectory_id, tracker, min(t) t 
 FROM geolife.trajectory_pt
 GROUP BY trajectory_id, tracker
)
SELECT distinct traj.tracker, traj.trajectory_id 
FROM tmp
JOIN geolife.trajectory_pt traj
ON tmp.trajectory_id = traj.trajectory_id AND traj.t = tmp.t
JOIN my
ON ST_Within(traj.pt, my.areaA)

This point-based query takes 6.0 sec while the shorter trajectory query finishes in 488 ms:

WITH my AS ( 
 SELECT ST_Buffer(ST_SetSRID(ST_MakePoint(116.31894, 39.97472),4326),0.0005) areaA
)
SELECT id, tracker, ST_AsText(track)
FROM geolife.trajectory_ext
JOIN my
ON areaA && track
AND ST_Within(ST_StartPoint(track), areaA)

For more generic “does this trajectory intersect another geometry”, the points can also be aggregated to a linestring on the fly but that takes 21.9 sec:

I’ll be presenting more work on PostGIS trajectories at GI_Forum in Salzburg in July. In the talk, I’ll also have a look at the custom PG-Trajectory datatype. Here’s the full open-access paper:

Graser, A. (2018) Evaluating Spatio-temporal Data Models for Trajectories in PostGIS Databases. GI_Forum ‒ Journal of Geographic Information Science, 1-2018, 16-33. DOI: 10.1553/giscience2018_01_s16.


This post is part of a series. Read more about movement data in GIS.

by underdark at April 16, 2018 08:30 PM

April 10, 2018

Michal Zimmermann

PostgreSQL Backup and Recovery Orchestration: systemd Automation

Posts in this series have described the basic automation of PostgreSQL backup/recovery strategy. The process itself consists of different periodic tasks that shouldn’t be executed manually. There are essentially two tools dedicated to periodic task running in Linux: cron and systemd.

Cron used to be my first choice of automation in Linux, as it’s very easy to use. On the other hand, it’s quite messy (running crontab -e under different users to find out which one has the job defined) and a bit difficult to test - many times I ran into a situation when underlying bash script executed just fine, while cron job kept failing for reason unknown.

My own cron experience together with a few words from a workmate brought me into the arms of systemd, which is a Linux system and service manager. It’s capable of running periodic tasks just like cron, yet making it more transparent.

Important bits

Understanding the whole systemd is way out of scope of a poor GIS guy, yet I managed to tame three important parts of the ecosystem:

  • services
  • timers
  • targets

Services

Service is a configuration saved inside “.service” file specifying what you want systemd to do. Following code shows how you can tell systemd to vacuum your database once in a while.

[Unit]
Description=CR vacuumdb
OnFailure=unit-status-mail@%n.service unit-status-slack@%n.service
Wants=cr-sunday.timer

[Service]
User=postgres
Group=postgres
Type=simple
ExecStart=/bin/bash /usr/local/sbin/pgsql-vacuumdb.sh --port %i

[Install]
WantedBy=cr-sunday.target

Unit files come with several handy features. First of all, they are orchestrated with systemctl. Second, any service configuration file containing @ in its filename might be symlinked/copied and run for different instances. Third, notice OnFailure directive in the code above. If anything goes wrong, systemd might serve as a postman delivering the bad news. I set up both e-mail and Slack notifications and they’ve been working like a charm ever since.

On top of that, I find systemd orchestration much easier to test and maintain compared to cron.

With the above code saved in /lib/systemd/system/pgsql-vacuumdb@.service, you can copy the file to /lib/systemd/system/pgsql-vacuumdb@5432.service, /lib/systemd/system/pgsql-vacuumdb@5432.service etc. If you look at ExecStart part of the service file, you’ll notice %i being used at the end - a placeholder replaced with the string between @ and .service in the filename.

This systemd service file is no more than a simple wrapper around the following bash code. We run three different database clusters on one machine and this approach makes their maintenance pretty comfortable.

#!/bin/bash
#
# @author: Michal Zimmermann <michal.zimmermann@clevermaps.cz>
# Vacuums the whole database cluster running on a given port.

while [[ $# > 0 ]]
do
    key="$1"

    case $key in
        -p|--port)
            PORT="$2"
            shift
            ;;
        *)
            echo "Usage: `basename $0` --port|-p [port_number]"
            exit 1
            ;;
    esac
    shift
done

if [[ -z "$PORT" ]]
then
    echo "Port not provided!"
    $0 *
    exit 2
fi

/usr/bin/vacuumdb -U postgres -p $PORT --all --full --analyze

What you get so far is the possibility to run systemctl start pgsql-vacuumdb@5432 instead of calling the underlying bash code manually. Not much, really. That’s where timers come to the party.

Timers

Timer files ends with “.timer” and are responsible for running services on given time. The code below, coming from /lib/systemd/system/cr-sunday.timer file runs the pgsql-vacuumdb service every Sunday at 3:45 am.

[Unit]
Description=CR Sunday timer

[Timer]
OnCalendar=Sun *-*-* 03:45
Persistent=yes
Unit=cr-sunday.target

[Install]
WantedBy=multi-user.target

Targets

Target files end with “.target” and are used to group units in general. In our case, the target file for vacuumdb service is as simple as the following code.

[Unit]
Description=CR Sunday target
StopWhenUnneeded=yes

Targets might be called by other targets. Running systemctl start cr-sunday.target would eventually lead to running all the services wanted by that target.

As I already mentioned, I find systemd services easy to code and test. If any of them should fail, you’d find a message in syslog or via systemctl status pgsql-vacuumdb.

by Michal Zimmermann at April 10, 2018 10:00 AM

April 06, 2018

PostGIS Development

PostGIS Patch Releases 2.2.7, 2.3.7, and 2.4.4

The PostGIS development team is pleased to provide bug fix release 2.2.7, 2.3.7 and 2.4.4 for the 2.2, 2.3, and 2.4 stable branches.

Please note that PostGIS 2.2 is reaching end-of-life and there will be one more patch release later in the year for it. If you have not upgraded to at least PostGIS 2.3, we encourage you do so before we discontinue support for PostGIS 2.2.

View all closed tickets for 2.4.4, 2.3.7, 2.2.7.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

2.2.7

2.3.7

2.4.4

by Regina Obe at April 06, 2018 12:00 AM

March 02, 2018

Michal Zimmermann

PostgreSQL Backup and Recovery Orchestration: Bash Automation

There is a bunch of periodic database-related tasks in a life of PostgreSQL administrator. Some should be done daily, others weekly, others can wait for a whole month. Many of them are essential for your database health. Forget to run such a task or screw up the run accidentally, and you’ll be snowed under with fixing your database.

Those tasks are easily done with bash, which is the first step to full automation. Following tasks are perfect candidates to be implemented as bash scripts:

  • full backups (both creation and removal)
  • WAL backups (both creation and removal)
  • vacuum
  • pgBadger log analysis (both creation and removal)
  • log maintenance (if you don’t use log rotate)

Full backup creation is just one example of how powerful bash can be.

#!/bin/bash
#
# @author: Michal Zimmermann <michal.zimmermann@clevermaps.cz>
# Creates base backup.

CUR_DIR=$(dirname "$0")
if [[ ! -f ${CUR_DIR}/pgsql-common.sh ]]
then
    echo "pgsql-common.sh not found!"
    exit 1
fi

source "${CUR_DIR}/pgsql-common.sh"
source "$CONFIG"

if [[ -d ${CR_BASE_BACKUP_DIR}/${CR_LABEL} ]]
then
    echo "${CR_BASE_BACKUP_DIR}/${CR_LABEL} already exists and is not empty!"
    exit 2
fi

pg_basebackup \
    --pgdata=${CR_BASE_BACKUP_DIR}/${CR_LABEL} \
    --format=plain \
    --write-recovery-conf \
    --wal-method=stream \
    --label=${CR_LABEL} \
    --checkpoint=fast \
    --progress \
    --verbose

if [[ $? -gt 0 ]]
then
    rm -rf ${CR_BASE_BACKUP_DIR}/${CR_LABEL}
    echo "pg_basebackup on ${CR_LABEL} failed!"
    exit 3
fi

tar -czf ${CR_BASE_BACKUP_DIR}/${CR_LABEL}.tar.gz ${CR_BASE_BACKUP_DIR}/${CR_LABEL} && rm -rf ${CR_BASE_BACKUP_DIR}/${CR_LABEL}

As you probably noticed, a pgsql-common.sh file is sourced at the beginning of the script. This script in turn just loads the proper config file that provides variables to other, devops, scripts. As you might need those variables in several of your scripts, it is a good idea to put their load to a separate file.

#!/bin/bash
#
# @author: Michal Zimmermann <michal.zimmermann@clevermaps.cz>
# Sourced in pgsql-*.sh scripts.

while [[ $# > 0 ]]
do
    key="$1"

    case $key in
        -c|--config)
            CONFIG="$2"
            shift
            ;;
        *)
            echo "Usage: `basename $0` --config|-c [config_file]"
            exit 1
            ;;
    esac
    shift
done
# /Input parameters

if [[ -z "$CONFIG" ]]
then
    echo "Config file is not set! See the script usage below."
    $0 *
    exit 2
fi

if [[ ! -f "$CONFIG" ]]
then
    echo "$CONFIG not found!"
    exit 3
fi

A config file might remain as simple as this:

# Base backup location
export CR_BASE_BACKUP_DIR="/mnt/backup/symap/base/"
# WAL backup location
export CR_WAL_BACKUP_DIR="/mnt/backup/symap/wal"
# PostgreSQL WAL location
export CR_PG_XLOG_DIR="/var/lib/postgresql/10/symap/pg_wal"
export CR_PG_LOG_DIR="/var/lib/postgresql/10/symap/pg_log"
# Base backup pattern (set to YYYYMMDD)
export CR_LABEL=symap_$(date +%Y%m%d)
export PGPORT=5432

Another, likely the simplest, example is a vacuumdb task:

#!/bin/bash
#
# @author: Michal Zimmermann <michal.zimmermann@clevermaps.cz>
# Vacuums the whole database cluster running on a given port.

while [[ $# > 0 ]]
do
    key="$1"

    case $key in
        -p|--port)
            PORT="$2"
            shift
            ;;
        *)
            echo "Usage: `basename $0` --port|-p [port_number]"
            exit 1
            ;;
    esac
    shift
done

if [[ -z "$PORT" ]]
then
    echo "Port not provided!"
    $0 *
    exit 2
fi

/usr/bin/vacuumdb -U postgres -p $PORT --all --full --analyze

Tips

  • Always test your bash scripts before production deployment. Even a single line of code might lead to a very different, possibly unexpected, outcome.
  • Try to stay as defensive as possible. Imagine a variable did not get sourced properly. Is it going to blow your database? Trust me, I know what I am talking about (see the tweet below).

Pitfalls

You do not want to run your bash scripts by hand. You probably do not want them to be run by cron. You want to run them with systemd. More on this next time.

by Michal Zimmermann at March 02, 2018 03:00 PM

February 22, 2018

Oslandia

Database migrations and Pum

At Oslandia we often need to deal with database migrations. So it’s an important topic to us. And
it should be an important topic to anyone maintaining databases in production.

First of all I want to mention a good series of articles by K. Scott Allen on the philosophy and
practice of database migrations and version control:

These articles are not about specific databases and migration tools. Instead they cover the basics of having your databases under version control and managing migrations.

Many database migration tools exist, both in the opensource and proprietary worlds.

Some tools relate to specific programming languages and ORMs (Object-Relational Mapping), such as Rails Migrations and Alembic. I’ve personally used Alembic in a number of Python projects in the past. Alembic is a fantastic tool for Python applications based on the SQLAlchemy toolkit. I would hightly recommend it. Other popular database migration tools include Flyway and Liquidbase. With Flyway migrations are written in either Java or SQL. With Liquidbase migrations are described using declarative formats such as XML, YAML and JSON; “plain SQL” being also supported.

Pum, which stands for “Postgres Upgrades Manager”, is a new migration tool. It was created in the context of the QWAT and QGEP projects, but it is completely independent and agnostic to those projects. Pum was initially created by OPENGIS.ch, with help from Oslandia and the whole QWAT community for the design and specifications of the tool.

Pum was greatly inspired by Flyway and Liquidbase. Flyway and Liquidbase are written in Java, and supports various database systems. In contrast Pum is written in Python, and focuses on the PostgreSQL database system. Pum is simple, and easy to use and extend. Also, with Pum, migrations, called “deltas” in the Pum jargon, are written in “plain SQL”. Pum fully embraces SQL and doesn’t attempt to hide it behind a declarative or programming language.

Pum is a CLI (Command Line Interface) tool. The main commands provided by Pum are

  • check
  • baseline
  • upgrade
  • test-and-upgrade

The check command compares two databases and shows the differences. Let’s create a simple example to illustrate it:

$ createdb prod  # create the "production" database
$ createdb ref   # create the "reference" database
$
$ cat > pg_service.conf << EOF # create a PostgreSQL service file for "prod" and "ref" > [prod]
> dbname=prod
> [ref]
> dbname=ref
> EOF
$ export PGSERVICEFILEFILE=pg_service.conf
$
$ pum check -p1 prod -p2 ref
Check...OK
columns: []
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables: []
triggers: []
views: []

The check command reports that the databases “prod” and “ref” are identical. They’re actually both empty. Now let’s add a table to the “ref” database, and run the check command again:

$ # add the "test" table to the "ref" database
$ psql -d ref -c "create table test (name text)"
$
$ pum check -p1 prod -p2 ref
Check...DIFFERENCES FOUND
columns:
- + ('public', 'test', 'name', None, 'YES', 'text', None, None, None, None)
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables:
- + ('public', 'test')
triggers: []
views: []

This time the check command reports that the “ref” database has a table and a column that the “prod” database does not have. If we created the same “test” table in “prod” the `check` command would report that “prod” and “ref” are the same again.

The baseline command assigns a version to a database. For example let’s assign the version 0.0.1 to the “prod” database, and the version 0.0.2 to the “ref” database:

$ pum baseline -p prod -t public.pum_upgrades -d deltas -b 0.0.1
$ pum baseline -p ref -t public.pum_upgrades -d deltas -b 0.0.2
$
$ # check that version 0.0.1 was assigned to the "prod" database
$ psql service=prod -c "table pum_upgrades"
 id | version | description | type | script | checksum | installed_by |        installed_on        | execution_time | success 
----+---------+-------------+------+--------+----------+--------------+----------------------------+----------------+---------
  1 | 0.0.1   | baseline    |    0 |        |          | postgres     | 2018-02-22 16:33:40.948143 |              1 | t
(1 row)

$
$ # check that version 0.0.2 was assigned to the "ref" database
$ psql service=ref -c "table pum_upgrades"
 id | version | description | type | script | checksum | installed_by |       installed_on        | execution_time | success
----+---------+-------------+------+--------+----------+--------------+---------------------------+----------------+---------
  1 | 0.0.2   | baseline    |    0 |        |          | postgres     | 2018-02-22 16:56:25.19542 |              1 | t
(1 row)

Finally, we’re going to write a migration script to migrate the “prod” database, and use Pum to do the migration.

We create the migration script delta_0.0.2_000.sql with the following content:

create table test (name text);

Let’s now migrate “prod”, checking that it now matches the “ref” database:

$ # a "test" database is required by the test-and-upgrade command
$ createdb test
$ cat >> pg_service.conf << EOF > [test]
> dbname=test
> EOF
$
$ # now perform the migration
$ pum test-and-upgrade -pp prod -pt test -pc ref -t public.pum_upgrades -d delta -f output.backup
Test and upgrade...Dump...OK
Restore...OK
Upgrade...     Applying delta 0.0.2... OK
OK
Check...OK
columns: []
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables: []
triggers: []
views: []

Apply deltas to prod? [n]|y: y
Upgrade...     Applying delta 0.0.2... OK
OK
OK

The “prod” database has been migrated, and during the process PUM has checked that the “prod” and “ref” are now in the same state.

That’s all folks! Please contact us at infos@oslandia.com if you have any questions or want to discuss this further.

by Eric Lemoine at February 22, 2018 06:20 PM

February 16, 2018

Michal Zimmermann

PostgreSQL Backup and Recovery Orchestration: Recovery

PostgreSQL continuous backups are very powerful, if you know how to use them for recovery. There’s nothing else to do to be sure about that other than actually trying it. Personally, I see recovery as a single process with two possibly different outcomes:

  • you’re recovering to the same state your cluster is/was in (because of a hardware failure, provider switch, …) - it’s more of a data migration, but you need your backup anyway
  • you’re doing a point-in-time-recovery (someone dropped the wrong table, data got corrupted, …)

Both scenarios follow the same steps and differ slighty at the end.

  1. Stop the PostgreSQL cluster.
  2. Copy the current PGDATA_DIR somewhere safe, just in case you screw up.
  3. Replace the PGDATA_DIR with the full backup. If you start the cluster right away, it will boot to the last full backup state (in my case, missing a week of WAL segments tops).

General recovery

In this case, you’re trying to recover as far as possible. With previous steps done succesfully, the next follow:

  • Copy all archived WAL segments created after the last full backup to PGDATA_DIR/pg_xlog. These can be found with find -newer command run against the corresponding .backup file in your wal-archive/u/p directory.
  • If your full backup strategy includes recovery.conf file creation, you cane safely move it or remove it.
  • Start the database cluster again. It is going to boot to the last working state.

If you’re about to migrate your data, you might be better off with simple pg_dump, pg_dumpall and pg_restore commands rather than using full backup/WAL segments combination.

Point-in-time-recovery

PostgreSQL’s PITR can help you restore your accidentally deleted/corrupted data. After the first three steps mentioned above, you should follow with these:

  • Copy all archived segments created after the last full backup somewhere the PostgreSQL user can read them (/your-wal-recovery-folder/ for example).
  • Set up the recovery.conf file properly. If you know something nasty happened at 2018-01-29 08:00:00, try to recover right to that point (or to any other, as described in the documentation).
restore_command = 'cp /your-wal-recovery-folder/%f "%p"'
recovery_target_time = '2018-01-29 08:00:00'
  • Start the database cluster again. It is going to boot to the last full backup and then play all the WAL segments until the recovery target. Depending on how many WAL segments are about to be used, this might take a while.

Pitfalls

You don’t want to find yourself in the middle of the biggest database failure of the century just to find out your backups don’t work, and even if they did, you would have no idea how to use them. Or, even worse, there are no backups at all, because your backup strategy has been failing silently without a single notice for several months.

Tips

Try to recover from your backups once in a while.

I forget things and make mistakes. We all do. That’s why I built an ensemble that takes care of our database automatically. Nothing fancy, just a bunch of good old Bash scripts managed with systemd rathern than cron. Next time, I’d like to show you the code and walk you through our current setup.

by Michal Zimmermann at February 16, 2018 03:00 PM

February 15, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

FDWS for PostgreSQL Windows 32 and 64-bit

We've updated our binaries for PostgreSQL 10 windows, both 32 and 64-bit. The 64-bit should work fine with EnterpriseDb windows as well as BigSQL.

Continue reading "FDWS for PostgreSQL Windows 32 and 64-bit"

by Leo Hsu and Regina Obe (nospam@example.com) at February 15, 2018 05:25 AM

February 12, 2018

Michal Zimmermann

PostgreSQL Backup and Recovery Orchestration: WAL Archiving

Just a very few of my day-to-day work tasks can be accomplished without PostgreSQL. For years I’ve been a (power) user of this wonderful relational database, knowing almost nothing about how its internals really work. Faced with the need to build a backup and recovery strategy, I’ve recently read up a lot on this topic.

As I don’t find it very odd for a GIS person to be given such an extraordinary task (nobody wants to lose the priceless spatial data, right?), I hope this series might shed light on how to prepare and manage the backup/recovery process to those, who are up to such a task. I won’t be discussing backup strategies based on pg_backup tool, as those don’t offer neither continuous archivation, nor point-in-time-recovery (PITR) - those two features disqualifies it as CleverMaps production backup strategy.

That leaves us with taking periodic base backups combined with continuous WAL archivation, as described below.

Taking base backups

Archived WAL segments are worthless without a base backup they can be run on. It’s crucial to have consistent, periodic base backups to keep your data safe.

pg_basebackup takes base backup of PostgreSQL cluster. Nothing fancy. Gzipping the output folder once the backup is done is definitely a good idea.

pg_basebackup \
    --pgdata=/mnt/backup/base/backup_number \
    --format=plain \
    --write-recovery-conf \
    --xlog-method=stream \
    --label=${CR_LABEL} \
    --checkpoint=fast \
    --progress \
    --verbose

In our current environment, we take a base backup of each of our clusters once a week.

WAL archiving configuration

To properly set WAL archiving, several postgresql.conf settings has to be adjusted:

  • wal_level = replica
  • archive_mode = on
  • archive_command = test ! -f /backup/wal/%f && cp %p /backup/wal/%f

Setting wal_level to replica writes enough information for WAL archiving. Turning on archive_mode will run archive_command each time a WAL segment is completed. archive_command might be anything from simple cp to rsync or aws s3 cp commands. It is absolutely critical that the command returns non-zero exit code in case of failure (including when a file with the same name already exists in your backup folder).

That’s it, after reloading PostgreSQL service, new WAL files should be copied to /backup/wal directory. The PostgreSQL process user (postgres usually) has to be able to write to the location.

Pitfalls

  • If archive_command fails, WAL segment remains on your database drive. If it keeps failing long enough, you’ll run out of space and the database will crash.
  • If the backup location fills up, the above-mentioned happens as well.
  • If you lose or corrupt any of the archived WAL segments, you won’t be able to pass through. That’s why you want to be sure that your archive_command actually does what you think it does.

Tips

It might be a real PITA (fiddling around WAL segments included) to start a crashed database cluster with no space left. Keeping a dummy file in your pg_xlog location might save you a lot of trouble. Create one with following command. If you run out of space, remove this file and you get 300 MB for free. Don’t forget to recreate it after you start the cluster.

dd if=/dev/zero of=/path_to_your_database_cluster/pg_xlog/DO_NOT_MOVE_THIS_FILE bs=1MB count=300

There’s no need to keep archived WAL segments forever. They’re only needed until you take another base backup. Again, deleting WAL segments manually (or using find ! -newer previous_base_backup.tar.gz) might lead to accidental corruption of your backups. It’s much safer to use pg_archivecleanup pointed to your WAL backup folder, referencing the last sucessful full backup. Below is the script we use to keep our WAL backup folder of reasonable size, keeping the last three full backups.

# Find base_backup files not older than 3 weeks
# Sort by date
# Use the oldest one as a reference
OLDEST_BASE_BACKUP=$(basename $(find ${CR_WAL_BACKUP_DIR}/u/p/ -type f -iname "*.backup" -mtime -21 -print0 | \
xargs -0 ls -t | \
tail -n 1))

# Find all subfolders
# Except the u/p backup subfolder
# Execute pg_archivecleanup for each of the subfolders
find $CR_WAL_BACKUP_DIR \
    -type d \
    -not -path "${CR_WAL_BACKUP_DIR}u*" \
    -exec pg_archivecleanup -d {} $OLDEST_BASE_BACKUP \;

Functional backups are crucial part of a solid backup/recovery system. They’re still just one half of that system, though. If not tested thoroughly, they’re even less than that. More on testing backups and recovering from failures next time.

by Michal Zimmermann at February 12, 2018 03:00 PM

January 17, 2018

PostGIS Development

PostGIS Patch Releases 2.3.6 and 2.4.3

The PostGIS development team is pleased to provide bug fix release 2.3.6 and 2.4.3 for the 2.3 and 2.4 stable branches.

Key fixes in these releases are Brin upgrade, ST_Transform schema qualification to fix issues with restore, foreign table, and materialized view use, ClusterKMeans and encoded polyline fixes.

View all closed tickets for 2.4.3 and 2.3.6.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

2.3.6

2.4.3

by Regina Obe at January 17, 2018 12:00 AM

December 14, 2017

Paul Ramsey

PostGIS Scaling

Earlier this month I got to speak at the Spatial Data Science Conference hosted by my employer Carto at our funky warehouse offices in Bushwick, Brooklyn. The topic: PostGIS scaling.

PostGIS Scaling

Now.

“Make it go faster” is a hard request to respond to in the generic: what is “it”, what are you doing with “it”, are you sure that your performance isn’t already excellent but you’re just too damned demanding?

So, the talk covers a number of routes to better performance: changing up query patterns, adding special PostgreSQL extensions, leaning on new features of PostgreSQL, and just plain old waiting for PostgreSQL to get better. Which it does, every release.

December 14, 2017 04:00 PM

December 06, 2017

Paul Ramsey

PostGIS "Fund Me" Milestone

On the twitter this morning, there was a good question:

TL;DR: If you find a feature in “Fund Me” and want to fund it, join the postgis-devel mailing list and make yourself known.

If you go to the PostGIS ticket report and scroll through the pages you’ll first see some milestones tied to released versions. These are usually bug reports, both big and small, valid and invalid, and will eventually be closed.

We unfortunately carry a lot of tickets in the current development milestone (2.5 right now) which are, at best, speculative. They should probably be closed (we really will never do them and don’t much care) or moved to the “Fund Me” category (they are valid, but we have no personal/professional impetus to address them).

The “Fund Me” category used to be called “Future”. This was a bad name, as it implied that sometime in the “Future” the ticket might actually be addressed, and all you needed was sufficient patience to wait. The reality is that the way a ticket got into the “Future” category was that it was ignored for long enough that we couldn’t stand to see it in the current milestone anymore.

The PostGIS development community includes all kinds of developers, who make livings in all kinds of ways, and there are folks who will work on tasks for money. The “Fund Me” milestone is a way of pointing up that there are tasks that can be done, if only someone is willing to pay a developer to do them.

That’s the good news!

The bad news is that the tickets all look the same, but they are wildly variable in terms of level of effort and even feasibility.

  • #220 “Implement ST_Numcurves and ST_CurveN” would probably take a couple hours at the outside, and almost any C developer could do it, even oen with zero experience in the PostGIS/PostgreSQL/GEOS ecosystem.
  • #2597 “[raster] St_Grayscale” would require some knowledge of the PostGIS raster implementation and image processing routines or at least the GDAL library.
  • #2910 “Implement function to output Mapbox Vector Tiles” actually happened in 2.4, but the (duplicate) ticket remained open, as a reminder that we’re terrible at ticket management.

And then there’s the “big kahunas”, tasks that live quietly in one ticket but actually encompass massive research and development projects spanning months or years.

  • #1629 “Tolerance and Precision strategy” is a super idea, that would allow functions like ST_Intersects() or ST_Equals() to return true if a condition was met within a tolerance. However, it would require substantial enhancement to GEOS, to allow predicate evaluation within a tolerance context, as well as a changes to non-GEOS backed distance functions, and new signatures for every geometry relationship function. Given the depth of the GEOS problem, I’d estimate multiple months of effort, and a potential for zero deliverables at all if things went pear-shaped.
  • #472 “Missing ST_IsValid for Geography Types” is even worse than the tolerance problem, since it should really be implemented as a complete rewrite of GEOS to understand non-linear edge types, either through a cheater’s strategy to turn do local projections of geographic edges, or as a full understanding of geographic edges. On the upside, doing that would allow many of the other GEOS functions to support geography which would vastly expand geography functionality in one stroke. On the downside, it is again in the category of a year-long effort with a potential failure at the end of it if for unexpected reasons it turns out to be impossible within that timeframe.

These kind of core features basically never get funded, because the marginal benefit they provide is generally much lower than the development cost for any one organization. This is a common open source weakness: aggregating funding is something everyone agrees is a great idea in principle but rarely happens in practice.

Occasionally, lightning does strike and a major funded feature happens. PostGIS topology was funded by a handful of European governments, and my work on the geography type was funded entirely by Palantir. However, usually funders show up with a few thousand dollars in hand and are dismayed when they learn of the distance between their funds and their desires.

December 06, 2017 04:00 PM

November 30, 2017

Michal Zimmermann

QGIS Plugin Development: Testing Your Code

Good news, everyone! The AttributeTransfer plugin has been approved for QGIS Python Plugins Repository. It’s available via QGIS Manage and Install Plugins menu. Feel free to download!

Nevertheless, this post (the last in the series) covers QGIS plugin testing rather than my personal feelings about the aforementioned success.

Testing means mocking

To test a QGIS plugin you need to simulate the environment it’s meant to run in. And that environment is obviously QGIS itself, yet it’s not feasible to launch QGIS every time you run a test. Luckily, there’s a great QGIS mock that gets you going in no time (it completely slipped my mind where I found that piece of code though).

Testing means you need data

Every test is run again and again, which means it has to reset the data being used to its default state. This might be a PIDA if the test changes the data in an unpredictable manner.

Using QGIS memory layers you can prepare fresh data for each of your tests, effectively putting the whole data manipulation process aside.

Writing tests

Each of the AttributeTransfer plugin tests inherits from unittest.TestCase, which comes with several methods you might be familiar with from other languages: setUp() is run before for every test method, while tearDown() is run after each of them. Tests are defined as methods whose names start with the word test.

Each test should call some assertWhatever method that checks whether the test passed or failed. Here’s an example of such a test covering non-point layers.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date    : 2017-11-18 18:40:50
# @Author  : Michal Zimmermann <zimmicz@gmail.com>

import os
import sip
import sys
import unittest
from qgis.core import QgsMapLayerRegistry, QgsVectorLayer, QgsFeature, QgsGeometry, QgsPoint
from utilities import get_qgis_app

sys.path.append(os.path.dirname(os.path.abspath(__file__)) + "/..")
from attribute_transfer import AttributeTransfer
from create_dummy_data import create_dummy_data_polygon_or_line

sip.setapi('QtCore', 2)
sip.setapi('QString', 2)
sip.setapi('QDate', 2)
sip.setapi('QDateTime', 2)
sip.setapi('QTextStream', 2)
sip.setapi('QTime', 2)
sip.setapi('QUrl', 2)
sip.setapi('QVariant', 2)

QGIS_APP = get_qgis_app()
IFACE = QGIS_APP[2]


class AttributeTransferTestPolygonOrLine(unittest.TestCase):

    def setUp(self):
        self.source_layer = QgsVectorLayer(
            "Polygon?crs=epsg:4326&field=id:integer&field=textAttr:string&field=intAttr:integer&field=decAttr:double&field=dateAttr:date&index=yes", "source layer", "memory")
        self.target_layer = QgsVectorLayer(
            "Linestring?crs=epsg:4326&field=id:integer&index=yes", "target layer", "memory")
        self.widget = AttributeTransfer(IFACE)

        registry = QgsMapLayerRegistry.instance()
        registry.removeAllMapLayers()
        registry.addMapLayers([self.source_layer, self.target_layer])
        create_dummy_data_polygon_or_line(self.source_layer, self.target_layer)
        self.widget.initGui()
        self.widget.vectors = [self.source_layer, self.target_layer]
        self.widget.editable_vectors = [self.source_layer, self.target_layer]
        self.widget.dlg.sourceLayer.addItems(["source layer", "target layer"])

    def test_text_attr(self):
        ATTRIBUTE_NAME = "textAttr"
        ATTRIBUTE_INDEX = 1

        self._test_attr(ATTRIBUTE_NAME, ATTRIBUTE_INDEX)

    def test_int_attr(self):
        ATTRIBUTE_NAME = "intAttr"
        ATTRIBUTE_INDEX = 2

        self._test_attr(ATTRIBUTE_NAME, ATTRIBUTE_INDEX)

    def test_dec_attr(self):
        ATTRIBUTE_NAME = "decAttr"
        ATTRIBUTE_INDEX = 3

        self._test_attr(ATTRIBUTE_NAME, ATTRIBUTE_INDEX)

    def test_date_attr(self):
        ATTRIBUTE_NAME = "dateAttr"
        ATTRIBUTE_INDEX = 4

        self._test_attr(ATTRIBUTE_NAME, ATTRIBUTE_INDEX)

    def test_existing_attr(self):
        ATTRIBUTE_NAME = "id"
        ATTRIBUTE_INDEX = 0

        self.widget.dlg.sourceAttribute.setCurrentIndex(ATTRIBUTE_INDEX)
        self.widget.dlg.targetAttribute.setText(ATTRIBUTE_NAME)

        self.assertEqual(
            self.widget.dlg.sourceAttribute.currentText(), ATTRIBUTE_NAME)
        self.assertFalse(self.widget.transfer())

    def _test_attr(self, attr_name, attr_index):
        self.widget.dlg.sourceAttribute.setCurrentIndex(attr_index)
        self.widget.dlg.targetAttribute.setText(attr_name)

        self.assertEqual(
            self.widget.dlg.sourceAttribute.currentText(), attr_name)

        self.widget.transfer()

        target_fields = [f.name()
                         for f in self.target_layer.dataProvider().fields()]
        self.assertIn(attr_name, target_fields)

        source_features = [f for f in self.source_layer.getFeatures()]
        target_features = [f for f in self.target_layer.getFeatures()]

        for idx, f in enumerate(source_features):
            self.assertEqual(f.attribute(attr_name), target_features[
                             idx].attribute(attr_name))


if __name__ == "__main__":
    unittest.main()

by Michal Zimmermann at November 30, 2017 04:36 PM

November 27, 2017

Paul Ramsey

Nested Loop Join with FDW

Update: See below, but I didn’t test the full pushdown case, and the result is pretty awesome.

I have been wondering for a while if Postgres would correctly plan a spatial join over FDW, in which one table was local and one was remote. The specific use case would be “keeping a large pile of data on one side of the link, and joining to it”.

Because spatial joins always plan out to a “nested loop” execution, where one table is chosen to drive the loop, and the other to be filtered on the rows from the driver, there’s nothing to prevent the kind of remote execution I was looking for.

I set up my favourite spatial join test: BC voting areas against BC electoral districts, with local and remote versions of both tables.

CREATE EXTENSION postgres_fdw;

-- Loopback foreign server connects back to
-- this same database
CREATE SERVER test
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (
    host '127.0.0.1', 
    dbname 'test', 
    extensions 'postgis'
    );

CREATE USER MAPPING FOR pramsey
        SERVER test
        OPTIONS (user 'pramsey', password '');
        
-- Foreign versions of the local tables        
CREATE FOREIGN TABLE ed_2013_fdw
( 
  gid integer, 
  edname text, 
  edabbr text, 
  geom geometry(MultiPolygon,4326)
) SERVER test 
  OPTIONS (
    table_name 'ed_2013', 
    use_remote_estimate 'true');

CREATE FOREIGN TABLE va_2013_fdw
( 
  gid integer OPTIONS (column_name 'gid'), 
  id text OPTIONS (column_name 'id'),
  vaabbr text OPTIONS (column_name 'vaabbr'), 
  edabbr text OPTIONS (column_name 'edabbr'), 
  geom geometry(MultiPolygon,4326) OPTIONS (column_name 'geom')
) SERVER test 
  OPTIONS (
    table_name 'va_2013', 
    use_remote_estimate 'true');

The key option here is use_remote_estimate set to true. This tells postgres_fdw to query the remote server for an estimate of the remote table selectivity, which is then fed into the planner. Without use_remote_estimate, PostgreSQL will generate a terrible plan that pulls the contents of the `va_2013_fdw table local before joining.

With use_remote_estimate in place, the plan is just right:

SELECT count(*), e.edabbr
  FROM ed_2013 e
  JOIN va_2013_fdw v
  ON ST_Intersects(e.geom, v.geom)
  WHERE e.edabbr in ('VTB', 'VTS')
  GROUP BY e.edabbr;
GroupAggregate  (cost=241.14..241.21 rows=2 width=12)
 Output: count(*), e.edabbr
 Group Key: e.edabbr
 ->  Sort  (cost=241.14..241.16 rows=6 width=4)
     Output: e.edabbr
     Sort Key: e.edabbr
     ->  Nested Loop  (cost=100.17..241.06 rows=6 width=4)
         Output: e.edabbr
         ->  Seq Scan on public.ed_2013 e  (cost=0.00..22.06 rows=2 width=158496)
             Output: e.gid, e.edname, e.edabbr, e.geom
             Filter: ((e.edabbr)::text = ANY ('{VTB,VTS}'::text[]))
         ->  Foreign Scan on public.va_2013_fdw v  (cost=100.17..109.49 rows=1 width=4236)
             Output: v.gid, v.id, v.vaabbr, v.edabbr, v.geom
             Remote SQL: SELECT geom FROM public.va_2013 WHERE (($1::public.geometry(MultiPolygon,4326) OPERATOR(public.&&) geom)) AND (public._st_intersects($1::public.geometry(MultiPolygon,4326), geom))

For FDW drivers other than postgres_fdw this means there’s a benefit to going to the trouble to support the FDW estimation callbacks, though the lack of exposed estimation functions in a lot of back-ends may mean the support will be ugly hacks and hard-coded nonsense. PostgreSQL is pretty unique in exposing fine-grained information about table statistics.

Update

One “bad” thing about the join pushdown plan above is that it still pulls all the resultant records back to the source before aggregating them, so there’s a missed opportunity there. However, if both the tables in the join condition are remote, the system will correctly plan the query as a remote join and aggregation.

SELECT count(*), e.edabbr
  FROM ed_2013_fdw e
  JOIN va_2013_fdw v
  ON ST_Intersects(e.geom, v.geom)
  WHERE e.edabbr in ('VTB', 'VTS')
  GROUP BY e.edabbr;
 Foreign Scan  
   (cost=157.20..157.26 rows=1 width=40) 
   (actual time=32.750..32.752 rows=2 loops=1)
   Output: (count(*)), e.edabbr
   Relations: Aggregate on ((public.ed_2013_fdw e) INNER JOIN (public.va_2013_fdw v))
   Remote SQL: SELECT count(*), r1.edabbr FROM (public.ed_2013 r1 INNER JOIN public.va_2013 r2 ON (((r1.geom OPERATOR(public.&&) r2.geom)) AND (public._st_intersects(r1.geom, r2.geom)) AND ((r1.edabbr = ANY ('{VTB,VTS}'::text[]))))) GROUP BY r1.edabbr
 Planning time: 12.752 ms
 Execution time: 33.145 ms

November 27, 2017 04:00 PM

November 23, 2017

Michal Zimmermann

QGIS Plugin Development: AttributeTransfer Plugin

This part finally brings the whole source code of the QGIS AttributeTransfer plugin.

The plugin itself resides in the attribute_transfer.py file. When run() method is invoked, the QT form pops up with combos prefilled with available vector layers that support attribute editing.

Source and target layer combos are mutually exclusive, thus it’s not possible to transfer the attribute within the same layer.

Coding the plugin, I came across minor issues related mainly to the QgsSpatialIndex implementation. In the nearest neighbor analysis part of the series, the QgsSpatialIndex.nearestNeighbor method was mentioned. Yet, as I found out, this method only works with QgsPoint geometries. Those are impossible to get from QgsPolygon or QgsPolyline, though. What can one possibly do, facing such a misfortune? Well… draw a solution matrix.

point line polygon
point QgsSpatialIndex.nearestNeighbor QgsSpatialIndex.nearestNeighbor; layers have to be switched, e.g. source layer = line QgsSpatialIndex.nearestNeighbor; layers have to be switched, e.g. source layer = polygon
line QgsSpatialIndex.nearestNeighbor QgsSpatialIndex.intersects with QgsGeometry.distance QgsSpatialIndex.intersects with QgsGeometry.distance
polygon QgsSpatialIndex.nearestNeighbor QgsSpatialIndex.intersects with QgsGeometry.distance QgsSpatialIndex.intersects with QgsGeometry.distance

Using the spatial index brings one more issue I’ve come to realize just after implementing the special comparison workflows for different geometry types. There’s a chance of finding the nearest feature using the bounding box that’s actually not the nearest feature. In that case, I chose to find the most distant vertex of such a feature and use it to construct the rectangle around the target feature. If there are any source features in such a rectangle, it’s very likely one of them is the real nearest feature.

Right now, I’m working on finding the nearest feature even if no bounding box intersection is found. Meanwhile, the plugin is being reviewed to be featured in QGIS Plugins repository. Fingers crossed.

I thought this was going to be the last part of the series. But how could one possibly claim the coding project done without writing tests? Stay tuned for the next episode.

by Michal Zimmermann at November 23, 2017 06:00 PM

November 16, 2017

Boston GIS (Regina Obe, Leo Hsu)

Happy PostGIS Day 2017

To commemorate PostGIS Day 2017, I put together a 3D scene listing my favorite functions.

You can see it PostGIS Day in 3D

by Regina Obe (nospam@example.com) at November 16, 2017 11:51 PM

Michal Zimmermann

QGIS Plugin Development: Creating GUI with Qt Designer

After fiddling with QGIS Python console and implementing nearest neighbor analysis, I’m going to create a very simple GUI for the plugin at last.

While QGIS API docs took me few hours to grasp, the PyQGIS ecosystem knocked my socks off. Here comes the list of tools you should incorporate into your development process as soon as possible.

Plugin Builder

The QGIS Plugin Builder is a plugin created to create… well, other plugins. It gets you going in minutes and lets you code instead of setting up things you don’t want to be setting up. A definite must-have. Note you should put the plugin inside the QGIS plugins folder (defaults to ~/.qgis2/python/plugins) in Linux.

Remember to run pyrcc4 -o resources.py resources.qrc inside your plugin folder before you add it to QGIS.

Plugin Reloader

The QGIS Plugin Reloader is a plugin (possibly created with QGIS Plugin Builder) that lets you live reload your plugin while you code. No QGIS restarts needed. A definite must-have.

Qt Designer

Qt Designer comes with qt4-designer package in Ubuntu. It is tailored to design and build GUIs from Qt components that can be used within QGIS. Its drag&drop interface lets you prototype quickly.

Thanks to the Plugin Builder you can load the attribute_transfer_dialog_base.ui file straight into the Qt Designer and adjust it to your needs.

It doesn’t take much, just one QLineEdit and a few QComboBox widgets. Those will be available in the attribute_transfer.py file as properties of the AttributeTransferDialog class.

The widget name can be customized in the right sidebar and I advise you to do so. I chose the following:

Once loaded with Plugins -> Manage and Install Plugins -> AttributeTransfer, the plugin is available right from the toolbar or Vector menu. It is missing the business logic completely, but I have this covered in the previous part.

All that is to be done is to bind those two parts together.

by Michal Zimmermann at November 16, 2017 02:00 PM

November 15, 2017

PostGIS Development

PostGIS Patch Releases

The PostGIS development team has uploaded bug fix releases for the 2.3 and 2.4 stable branches.

2.3.5

2.4.2

by Paul Ramsey at November 15, 2017 12:00 AM

November 09, 2017

Michal Zimmermann

QGIS Plugin Development: Finding Nearest Neighbors

I described basics of vector layers manipulation in the previous part of the series. With my goal in mind (fully functional custom plugin capable of writing an attribute value from a source layer to a target layer based on a feature distance), I’d like to discuss spatial indexing and nearest neighbor analysis.

The picture above illustrates the task that can be solved solely by using QGIS API. Imagine you’re given a source layer with an attribute filled with values. You’re given a target layer as well, sadly though, the values in this layer are missing (not so rare in the GIS world, right?). Yet you know that the missing attribute value of each feature in the target layer can be filled by the value of its nearest neighbor from the source layer. How do you do that?

Generating dummy data

Let’s create two memory data sets with id and value attributes. Both of them will have ten features.

from qgis.core import QgsMapLayerRegistry, QgsVectorLayer, QgsFeature, QgsGeometry, QgsPoint, QgsSpatialIndex
from qgis.utils import iface

source_layer = QgsVectorLayer("point?crs=epsg:4326&field=id:integer&field=value:integer", "Source layer", "memory")
target_layer = QgsVectorLayer("point?crs=epsg:4326&field=id:integer&field=value:integer", "Target layer", "memory")

def create_dummy_data():

    source_layer.startEditing()
    target_layer.startEditing()

    feature = QgsFeature(source_layer.pendingFields())

    for i in range(10):
        feature.setGeometry(QgsGeometry.fromPoint(QgsPoint(i, i)))
        feature.setAttribute("id", i)
        feature.setAttribute("value", i)
        source_layer.addFeature(feature)

    feature = QgsFeature(source_layer.pendingFields())

    for i in range(10):
        feature.setGeometry(QgsGeometry.fromPoint(QgsPoint(i + i, i)))
        feature.setAttribute("id", i)
        target_layer.addFeature(feature)

    source_layer.commitChanges()
    target_layer.commitChanges()

    QgsMapLayerRegistry.instance().addMapLayer(source_layer)
    QgsMapLayerRegistry.instance().addMapLayer(target_layer)

create_dummy_data()

Writing values from the nearest neighbor

The actual nearest neighbor analysis can be done in ten lines of code! First, the qgis.core.QgsSpatialIndex is built from all the source_layer features. Then, you iterate over the target_layer features and for each of them, gets only one (nearestNeighbor(f.geometry().asPoint(), 1)[0]) nearest neighbor. At last, you just write the nearest neighbor’s attribute value to the target layer and commit changes. Just use the following code with the code above.

def write_values_from_nn():
    source_layer_index = QgsSpatialIndex(source_layer.getFeatures())
    source_layer_features = {feature.id(): feature for (feature) in source_layer.getFeatures()}
    target_layer_features = target_layer.getFeatures()

    target_layer.startEditing()

    for f in target_layer_features:
        nearest = source_layer_index.nearestNeighbor(f.geometry().asPoint(), 1)[0]
        value = source_layer_features[nearest].attribute("value")
        target_layer.changeAttributeValue(f.id(), 1, value)

    target_layer.commitChanges()

write_values_from_nn()

Missing pieces or what’s next

I’m one step closer to my goal. What’s missing?

  • capabilities checks: does the target layer support edits? Check the layer data provider capabilities to find out.
  • user logging: notices, warnings or errors are completely missing. It will be great to have them shown inside qgis.gui.QgsMessageBar.
  • custom attributes: this version expects both layers to have the same attribute with the same data type.
  • GUI: a very simple PyQt widget will turn this console-based script into a custom plugin. That’s what’s going to be next.

by Michal Zimmermann at November 09, 2017 02:00 PM

November 07, 2017

PostGIS Development

Move PostGIS extension to a different schema

As of PostGIS 2.3, the postgis extension was changed to no longer allow relocation. All function calls within the extension are now schema qualified.

While this change fixed some issues with database restore, it created the issue of if you installed PostGIS in a schema other than the one you wanted to it is not intuitive how to move it to a different schema. Luckily there is a way to do this.

For this exercise, I will install PostGIS in the default schema and then demonstrate how to move it into another schema location.

You can run these steps using psql or pgAdmin or any other PostgreSQL tool you want.

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 07, 2017 12:00 AM

November 06, 2017

Paul Ramsey

Parallel PostGIS IIA

One of the core complaints in my review of PostgreSQL parallelism, was that the cost of functions executed on rows returned by queries do not get included in evaluations of the cost of a plan.

So for example, the planner appeared to consider these two queries equivalent:

SELECT *
FROM pd;

SELECT ST_Area(geom)
FROM pd;

They both retrieve the same number of rows and both have no filter on them, but the second one includes a fairly expensive function evaluation. No amount of changing the cost of the ST_Area() function would cause a parallel plan to materialize. Only changing the size of the table (making it bigger) would flip the plan into parallel mode.

Fortunately, when I raised this issue on pgsql-hackers, it turned out to have been reported and discussed last month, and Amit Kapila had already prepared a patch, which he kindly rebased for me.

With the patch in place, I now see rational behavior from the planner. Using the default PostGIS function costs, a simple area calculation on my 60K row polling division table is sequential:

EXPLAIN
SELECT ST_Area(geom)
FROM pd;
Seq Scan on pd  
(cost=0.00..14445.17 rows=69534 width=8)

However, if the ST_Area() function is costed a little more realistically, the plan shifts.

ALTER FUNCTION ST_Area(geometry) COST 100;

EXPLAIN
SELECT ST_Area(geom)
FROM pd;
 Gather  
 (cost=1000.00..27361.20 rows=69534 width=8)
   Workers Planned: 3
   ->  Parallel Seq Scan on pd  
       (cost=0.00..19407.80 rows=22430 width=8)

Perfect!

While not every query receives what I consider a “perfect plan”, it now appears that we at least have some reasonable levers available to get better plans via applying some sensible (higher) costs across the PostGIS code base.

November 06, 2017 08:00 PM

November 02, 2017

Michal Zimmermann

QGIS Plugin Development: Using Python Console

As mentioned in previous part of the series, the QGIS Python console is an entry point to GIS workflow automation within QGIS. Remember there’s an iface object representing qgis.gui.QgisInterface instance within the console that gives you access to the whole QGIS GUI. Let’s see what we can do inside the console.

Loading vector layers folder

import glob
from qgis.core import QgsMapLayerRegistry, QgsVectorLayer

def load_folder(folder):
    VALID_EXTENSIONS = ('.geojson', '.gpkg', '.shp')
    files = [f for f in glob.glob("{}/*".format(folder)) if f.endswith(VALID_EXTENSIONS)]

    for f in files:
        layer = QgsVectorLayer(f, f.split('/')[-1], 'ogr')

        if not layer.isValid():
            iface.messageBar().pushCritical("Failed to load:", f)
            continue

        QgsMapLayerRegistry.instance().addMapLayer(layer)

load_folder("path/to/your/vector/files/folder")
  • QgsMapLayerRegistry represents Layers Panel as present in the QGIS GUI
  • iface.messageBar() returns the message bar of the main app and lets you notify the user of what’s going on under the hood
  • QgsVectorLayer represents a vector layer with its underlying vector data sets

Editing active layer attribute table

The following code demonstrates the possibility to edit vector layer attribute table via console.

  • Any attribute to be written has to come in form of a qgis.core.QgsField - this is more or less an encapsulation of an attribute name and its type (PyQt4.QtCore.QVariant to be precise)
  • The underlying data provider has to be capable of attribute addition (caps & QgsVectorDataProvider.AddAttributes)
  • QgsVectorLayer.addAttribute method returns boolean rather than throwing an exception
from qgis.core import QgsField
from qgis.gui import QgsMessageBar
from PyQt4.QtCore import QVariant


def edit_active_layer(attr_name, attr_type):
    layer = iface.activeLayer()
    caps = layer.dataProvider().capabilities()

    if caps & QgsVectorDataProvider.AddAttributes:
        layer.startEditing()
        if layer.addAttribute(QgsField(attr_name, attr_type)):
            iface.messageBar().pushMessage("Attribute {0} was successfully added to the active layer.".format(attr_name), QgsMessageBar.SUCCESS)
            layer.commitChanges()
        else:
            iface.messageBar().pushMessage("Attribute {0} was not added. Does it already exist?".format(attr_name), QgsMessageBar.CRITICAL)
            layer.rollBack()

edit_active_layer("new_string_attribute", QVariant.String)

The whole series aims to present a plugin capable of writing a new attribute and its value to an existing layer. Thus, this code might come handy in the future.

Creating a new vector layer

It’s possible to create a whole new vector layer with QGIS Python console. I present a very simple create_new_layer function, yet I hope you can imagine the ways it can be tweaked.

from qgis.core import QgsField, QgsFields, QgsVectorLayer, QgsFeature, QgsGeometry, QgsPoint
from PyQt4.QtCore import QVariant

def create_new_layer():
    filename = "/path/to/your/vector/file.gpkg"

    fields = QgsFields()
    fields.append(QgsField("attr1", QVariant.String))
    fields.append(QgsField("attr2", QVariant.Int))

    file = QgsVectorFileWriter(
        filename,
        "UTF8",
        fields,
        QGis.WKBPoint,
        QgsCoordinateReferenceSystem(4326),
        "GPKG"
    )

    layer = QgsVectorLayer(filename, filename.split("/")[-1], "ogr")
    QgsMapLayerRegistry.instance().addMapLayer(layer)

    if not layer.dataProvider().capabilities() & QgsVectorDataProvider.AddAttributes:
        pass

    feature = QgsFeature(layer.pendingFields())
    feature.setGeometry(QgsGeometry().fromPoint(QgsPoint(0, 0)))
    feature.setAttribute("attr1", "attr1")
    feature.setAttribute("attr2", 2)

    layer.startEditing()

    if layer.addFeature(feature, True):
        layer.commitChanges()
    else:
        layer.rollBack()
        iface.messageBar().pushMessage("Feature addition failed.", QgsMessageBar.CRITICAL)

create_new_layer()

Those were just few examples of what can be done with QGIS API and Python console. Next time, I’d like to focus on spatial joins inside QGIS - another step to the final plugin.

by Michal Zimmermann at November 02, 2017 02:00 PM

October 31, 2017

Paul Ramsey

Parallel PostGIS II

A year and a half ago, with the release of PostgreSQL 9.6 on the horizon, I evaluated the parallel query infrastructure and how well PostGIS worked with it.

The results at the time were mixed: parallel query worked, when poked just the right way, with the correct parameters set on the PostGIS functions, and on the PostgreSQL back-end. However, under default settings, parallel queries did not materialize. Not for scans, not for joins, not for aggregates.

With the recent release of PostgreSQL 10, another generation of improvement has been added to parallel query processing, so it’s fair to ask, “how well does PostGIS parallelize now?”

Parallel PostGIS II

TL;DR:

The answer is, better than before:

  • Parallel aggregations now work out-of-the-box and parallelize in reasonable real-world conditions.
  • Parallel scans still require higher function costs to come into action, even in reasonable cases.
  • Parallel joins on spatial conditions still seem to have poor planning, requiring a good deal of manual poking to get parallel plans.

Setup

In order to run these tests yourself, you will need:

  • PostgreSQL 10
  • PostGIS 2.4

You’ll also need a multi-core computer to see actual performance changes. I used a 4-core desktop for my tests, so I could expect 4x improvements at best.

For testing, I used the same ~70K Canadian polling division polygons as last time.

createdb parallel
psql -c 'create extension postgis' parallel
shp2pgsql -s 3347 -I -D -W latin1 PD_A.shp pd | psql parallel

PDs

To support join queries, and on larger tables, I built a set of point tables based on the polling divisions. One point per polygon:

CREATE TABLE pts AS 
SELECT 
  ST_PointOnSurface(geom)::Geometry(point, 3347) AS geom, 
  gid, fed_num 
FROM pd;

CREATE INDEX pts_gix 
  ON pts USING GIST (geom);

Ten points per polygon (for about 700K points):

CREATE TABLE pts_10 AS 
SELECT 
  (ST_Dump(ST_GeneratePoints(geom, 10))).geom::Geometry(point, 3347) AS geom, 
  gid, fed_num 
FROM pd;

CREATE INDEX pts_10_gix 
  ON pts_10 USING GIST (geom);

One hundred points per polygon (for about 7M points):

CREATE TABLE pts_100 AS 
SELECT 
  (ST_Dump(ST_GeneratePoints(geom, 100))).geom::Geometry(point, 3347) AS geom, 
  gid, fed_num 
FROM pd;

CREATE INDEX pts_100_gix 
  ON pts_100 USING GIST (geom);

The configuration parameters for parallel query have changed since the last test, and are (in my opinion) a lot easier to understand.

These parameters are used to fine-tune the planner and execution. Usually you don’t need to change them.

  • parallel_setup_cost sets the planner’s estimate of the cost of launching parallel worker processes. Default 1000.
  • parallel_tuple_cost sets the planner’s estimate of the cost of transferring one tuple from a parallel worker process to another process. Default 0.1.
  • min_parallel_table_scan_size sets the minimum amount of table data that must be scanned in order for a parallel scan to be considered. Default 8MB.
  • min_parallel_index_scan_size sets the minimum amount of index data that must be scanned in order for a parallel scan to be considered. Default 512kB.
  • force_parallel_mode forces the planner to parallelize is wanted. Values: off | on | regress
  • effective_io_concurrency for some platforms and hardware setups allows true concurrent read. Values from 1 (for one spinning disk) to ~100 (for an SSD drive). Default 1.

These parameters control how many parallel processes are launched for a query.

  • max_worker_processes sets the maximum number of background processes that the system can support. Default 8.
  • max_parallel_workers sets the maximum number of workers that the system can support for parallel queries. Default 8.
  • max_parallel_workers_per_gather sets the maximum number of workers that can be started by a single Gather or Gather Merge node. Setting this value to 0 disables parallel query execution. Default 2.

Once you get to the point where #processes == #cores there’s not a lot of advantage in adding more processes. However, each process does exact a cost in terms of memory: a worker process consumes work_mem the same as any other backend, so when planning memory usage take both max_connections and max_worker_processes into consideration.

Before running tests, make sure you have a handle on what your parameters are set to: I frequently found I accidentally tested with max_parallel_workers set to 1.

show max_worker_processes;
show max_parallel_workers;
show max_parallel_workers_per_gather;

Aggregates

First, set max_parallel_workers and max_parallel_workers_per_gather to 8, so that the planner has as much room as it wants to parallelize the workload.

PostGIS only has one true spatial aggregate, the ST_MemUnion function, which is comically inefficient due to lack of input ordering. However, it’s possible to see some aggregate parallelism in action by wrapping a spatial function in a parallelizable aggregate, like Sum():

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 8;

EXPLAIN ANALYZE 
  SELECT Sum(ST_Area(geom)) 
    FROM pd 

Boom! We get a 3-worker parallel plan and execution about 3x faster than the sequential plan.

Finalize Aggregate  
(cost=15417.45..15417.46 rows=1 width=8) 
(actual time=236.925..236.925 rows=1 loops=1)
->  Gather  
(cost=15417.13..15417.44 rows=3 width=8) 
(actual time=236.915..236.921 rows=4 loops=1)
   Workers Planned: 3
   Workers Launched: 3
   ->  Partial Aggregate  
   (cost=14417.13..14417.14 rows=1 width=8) 
   (actual time=231.724..231.724 rows=1 loops=4)
       ->  Parallel Seq Scan on pd  
       (cost=0.00..13800.30 rows=22430 width=2308) 
       (actual time=0.049..30.407 rows=17384 loops=4)
Planning time: 0.111 ms
Execution time: 238.785 ms

Just to confirm, re-run it with parallelism turned off:

SET max_parallel_workers_per_gather = 0;

EXPLAIN ANALYZE 
  SELECT Sum(ST_Area(geom)) 
    FROM pd 

Back to one thread and taking about 3 times as long, as expected.

Scans

The simplest spatial parallel scan adds a spatial function to the filter clause.

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 8;

EXPLAIN ANALYZE 
  SELECT *
    FROM pd 
    WHERE ST_Area(geom) > 10000;    

Unfortunately, that does not give us a parallel plan.

The ST_Area() function is defined with a COST of 10. If we move it up, to 100, we can get a parallel plan.

SET max_parallel_workers_per_gather = 8;

ALTER FUNCTION ST_Area(geometry) COST 100;
EXPLAIN ANALYZE 
  SELECT *
    FROM pd 
    WHERE ST_Area(geom) > 10000;    

Boom! Parallel scan with three workers:

Gather  
(cost=1000.00..20544.33 rows=23178 width=2554) 
(actual time=0.253..293.016 rows=62158 loops=1)
Workers Planned: 5
Workers Launched: 5
->  Parallel Seq Scan on pd  
    (cost=0.00..17226.53 rows=4636 width=2554) 
    (actual time=0.091..210.581 rows=10360 loops=6)
     Filter: (st_area(geom) > '10000'::double precision)
     Rows Removed by Filter: 1229
Planning time: 0.128 ms
Execution time: 302.600 ms

It appears our spatial function costs may still be too low in general to get good planning. And as we will see with joins, it’s possible the planner is still discounting function costs too much in deciding whether to go parallel or not.

Joins

Starting with a simple join of all the polygons to the 100 points-per-polygon table, we get:

SET max_parallel_workers_per_gather = 4;

EXPLAIN  
 SELECT *
  FROM pd 
  JOIN pts_100 pts
  ON ST_Intersects(pd.geom, pts.geom);

PDs & Points

In order to give the PostgreSQL planner a fair chance, I started with the largest table, thinking that the planner would recognize that a “70K rows against 7M rows” join could use some parallel love, but no dice:

Nested Loop  
(cost=0.41..13555950.61 rows=1718613817 width=2594)
 ->  Seq Scan on pd  
     (cost=0.00..14271.34 rows=69534 width=2554)
 ->  Index Scan using pts_gix on pts  
     (cost=0.41..192.43 rows=232 width=40)
       Index Cond: (pd.geom && geom)
       Filter: _st_intersects(pd.geom, geom)

There are a number of knobs we can press on. There are two global parameters:

  • parallel_setup_cost defaults to 1000, but no amount of lowering the value, even to zero, causes a parallel plan.
  • parallel_tuple_cost defaults to 0.1. Reducing it by a factor of 100, to 0.001 causes the plan to flip over into a parallel plan.
SET parallel_tuple_cost = 0.001;

As with all parallel plans, it is a nested loop, but that’s fine since all PostGIS joins are nested loops.

Gather  (cost=0.28..4315272.73 rows=1718613817 width=2594)
Workers Planned: 4
->  Nested Loop  
    (cost=0.28..2596658.92 rows=286435636 width=2594)
     ->  Parallel Seq Scan on pts_100 pts  
         (cost=0.00..69534.00 rows=1158900 width=40)
     ->  Index Scan using pd_geom_idx on pd  
         (cost=0.28..2.16 rows=2 width=2554)
           Index Cond: (geom && pts.geom)
           Filter: _st_intersects(geom, pts.geom)

Running the parallel plan to completion on the 700K point table takes 18s with four workers and 53s with a sequential plan. We are not getting an optimal speed up from parallel processing anymore: four workers are completing in 1/3 of the time instead of 1/4.

If we set parallel_setup_cost and parallel_tuple_cost back to their defaults, we can also change the plan by fiddling with the function costs.

First, note that our query can be re-written like this, to expose the components of the spatial join:

SET parallel_tuple_cost=0.1;
SET parallel_setup_cost=1000;
SET max_parallel_workers_per_gather = 4;

EXPLAIN  
 SELECT *
  FROM pd 
  JOIN pts_100 pts
   ON pd.geom && pts.geom 
   AND _ST_Intersects(pd.geom, pts.geom);

The default cost of _ST_Intersects() is 100. If we adjust it up by a factor of 100, we can get a parallel plan.

ALTER FUNCTION _ST_Intersects(geometry, geometry) COST 10000;

However, what if our query only used a single spatial operator in the join filter? Can we still force a parallel plan on this query?

SET parallel_tuple_cost=0.1;
SET parallel_setup_cost=1000;
SET max_parallel_workers_per_gather = 4;

EXPLAIN  
 SELECT *
  FROM pd 
  JOIN pts_100 pts
  ON pd.geom && pts.geom;

The && operator could activate one of two functions:

  • geometry_overlaps(geom, geom) is bound to the && operator
  • geometry_gist_consistent_2d(internal, geometry, int4) is bound to the 2d spatial index

However, no amount of increasing their COST causes the operator-only query plan to flip into a parallel mode:

ALTER FUNCTION  geometry_overlaps(geometry, geometry) COST 1000000000000;
ALTER FUNCTION  geometry_gist_consistent_2d(internal, geometry, int4) COST 10000000000000;

So for operator-only queries, it seems the only way to force a spatial join is to muck with the parallel_tuple_cost parameter.

More Joins

Can we parallelize a common GIS use case: the spatial overlay?

Shifted PDs

Here is a table that simply shifts the polling divisions up and over, so that they can be overlaid to create a new set of smaller polygons.

CREATE TABLE pd_translate AS 
SELECT ST_Translate(geom, 100, 100) AS geom, 
    fed_num, pd_num 
  FROM pd;
  
CREATE INDEX pd_translate_gix 
  ON pd_translate USING GIST (geom);
CREATE INDEX pd_fed_num_x 
  ON pd (fed_num);
CREATE INDEX pdt_fed_num_x 
  ON pd_translate (fed_num);

The overlay operation finds, for each geometry on one side, all the overlapping geometries, and then calculates the shape of those overlaps (the “intersection” of the pair). Calculating intersections is expensive, so it’s something want to happen in parallel, even more than we want the join to happen in parallel.

This query calculates the overlay of all polling divisions (and their translations) in British Columbia (fed_num > 59000):

EXPLAIN 
SELECT ST_Intersection(pd.geom, pdt.geom) AS geom
  FROM pd
  JOIN pd_translate pdt
  ON ST_Intersects(pd.geom, pdt.geom)
  WHERE pd.fed_num > 59000
  AND pdt.fed_num > 59000;

Unfortunately, the default remains a non-parallel plan. The parallel_tuple_cost has to be adjusted down to 0.01 or the cost of _ST_Intersects() adjusted upwards to get a parallel plan.

Conclusions

  • The costs assigned to PostGIS functions still do not provide the planner a good enough guide to determine when to invoke parallelism. Costs assigned currently vary widely without any coherent reasons.
  • The planner behaviour on spatial joins remains hard to predict: is the deciding factor the join operator cost, the number of rows of resultants, or something else altogether? Counter-intuitively, it was easier to get join behaviour from a relatively small 6K x 6K polygon/polygon overlay join than it was for the 70K x 7M point/polygon overlay.

October 31, 2017 08:00 PM

October 28, 2017

Anita Graser (Underdark)

Movement data in GIS #10: open tools for AIS tracks from MarineCadastre.gov

MarineCadastre.gov is a great source for AIS data along the US coast. Their data formats and tools though are less open. Luckily, GDAL – and therefore QGIS – can read ESRI File Geodatabases (.gdb).

MarineCadastre.gov also offer a Track Builder script that creates lines out of the broadcast points. (It can also join additional information from the vessel and voyage layers.) We could reproduce the line creation step using tools such as Processing’s Point to path but this post will show how to create PostGIS trajectories instead.

First, we have to import the points into PostGIS using either DB Manager or Processing’s Import into PostGIS tool:

Then we can create the trajectories. I’ve opted to create a materialized view:

The first part of the query creates a temporary table called ptm (short for PointM). This step adds time stamp information to each point. The second part of the query then aggregates these PointMs into trajectories of type LineStringM.

CREATE MATERIALIZED VIEW ais.trajectory AS
 WITH ptm AS (
   SELECT b.mmsi,
     st_makepointm(
       st_x(b.geom), 
       st_y(b.geom), 
       date_part('epoch', b.basedatetime)
     ) AS pt,
     b.basedatetime t
   FROM ais.broadcast b
   ORDER BY mmsi, basedatetime
 )
 SELECT row_number() OVER () AS id,
   st_makeline(ptm.pt) AS st_makeline,
   ptm.mmsi,
   min(ptm.t) AS min_t,
   max(ptm.t) AS max_t
 FROM ptm
 GROUP BY ptm.mmsi
WITH DATA;

The trajectory start and end times (min_t and max_t) are optional but they can help speed up future queries.

One of the advantages of creating trajectory lines is that they render many times faster than the original points.

Of course, we end up with some artifacts at the border of the dataset extent. (Files are split by UTM zone.) Trajectories connect the last known position before the vessel left the observed area with the position of reentry. This results, for example, in vertical lines which you can see in the bottom left corner of the above screenshot.

With the trajectories ready, we can go ahead and start exploring the dataset. For example, we can visualize trajectory speed and/or create animations:

Purple trajectory segments are slow while green segments are faster

We can also perform trajectory analysis, such as trajectory generalization:

This is a first proof of concept. It would be great to have a script that automatically fetches the datasets for a specified time frame and list of UTM zones and loads them into PostGIS for further processing. In addition, it would be great to also make use of the information in the vessel and voyage tables, thus splitting up trajectories into individual voyages.


This post is part of a series. Read more about movement data in GIS.

by underdark at October 28, 2017 01:22 PM

October 26, 2017

Michal Zimmermann

QGIS Plugin Development: Getting Started

QGIS 2.1x is a brilliant tool for Python-based automation in form of custom scripts or even plugins. The first steps towards writing the custom code might be a bit difficult, as you need to grasp quite complex Python API. The QGIS Plugin Development series (see the list of other parts at the end of this article) targets pitfalls and traps I’ve met while learning to use it myself.

The outcome of the series is going to be a fully functional custom plugin capable of writing attribute values from a source layer nearest neighbour to a target layer based on their spatial proximity.

In this part, I’ll mention the basics a.k.a. what is good to know before you start.

Documentation

Different QGIS versions come with different Python API. The documentation is to be found at https://qgis.org, the latest being version 2.18. Note that if you come directly to http://qgis.org/api/, you’ll see the current master docs.

Alternatively, you can apt install qgis-api-doc on your Ubuntu-based system and run python -m SimpleHTTPServer [port] inside /usr/share/qgis/doc/api. You’ll find the documentation at http://localhost:8000 (if you don’t provide port number) and it will be available even when you’re offline.

Basic API objects structure

Before launching QGIS, take a look at what’s available inside API:

  • qgis.core package brings all the basic objects like QgsMapLayer, QgsDataSourceURI, QgsFeature etc
  • qgis.gui package brings GUI elements that can be used within QGIS like QgsMessageBar or QgsInterface (very important API element, exposed to all custom plugins)
  • qgis.analysis, qgis.networkanalysis, qgis.server, and qgis.testing packages that won’t be covered in the series
  • qgis.utils module that comes with iface exposed (very handy within QGIS Python console)

QGIS Python Console

Using Python console is the easiest way to automate your QGIS workflow. It can be accessed via pressing Ctrl + Alt + P or navigating to Plugins -> Python Console. Note the above mentioned iface from qgis.utils module is exposed by default within the console, letting you interact with QGIS GUI. Try out the following examples.

iface.mapCanvas().scale() # returns the current map scale
iface.mapCanvas().zoomScale(100) # zoom to scale of 1:100
iface.activeLayer().name() # get the active layer name
iface.activeLayer().startEditing() # toggle editting

That was a very brief introduction to QGIS API, the next part will walk you through the console more thoroughly.

by Michal Zimmermann at October 26, 2017 01:00 PM

October 23, 2017

Michal Zimmermann

Serving Mapbox Vector Tiles with PostGIS, Nginx and Python Backend

Since version 2.4.0, PostGIS can serve MVT data directly. MVT returning queries put heavy workload on the database though. On top of that, each of the query has to be run again every time a client demands the data. This leaves us with plenty of room to optimize the process.

During the last week, while working on the Czech legislative election data visualization, I’ve struggled with the server becoming unresponsive far too often due to the issues mentioned above.

According to the schema, the first client to come to the server:

  • goes through filesystem unstopped, because there are no cached files yet,
  • continues to the Flask backend and asks for a file at {z}/{x}/{y},
  • Flask backend asks the database to return the MVT for the given tile,
  • Flask backend writes the response to the filesystem and sends it to the client.

Other clients get tiles directly from the filesystem, leaving the database at ease.

Nginx

Nginx is fairly simple to set up, once you know what you’re doing. The /volby-2017/municipality/ location serves static MVT from the given alias directory. If not found, the request is passed to @postgis location, that asks the Flask backend for the response.

server election {
    location /volby-2017/municipality {
            alias /opt/volby-cz-2017/server/cache/;
            try_files $uri @postgis;
    }

    location @postgis {
            include uwsgi_params;
            uwsgi_pass 127.0.0.1:5000;
    }
}

Flask backend

Generating static MVT in advance

If you’re going to serve static tiles that don’t change often, it might be a good idea to use PostGIS to create files in advance and serve them with Nginx.

CREATE TABLE tiles (
    x integer,
    y integer,
    z integer,
    west numeric,
    south numeric,
    east numeric,
    north numeric,
    geom geometry(POLYGON, 3857)
);

Using mercantile, you can create the tiles table holding the bounding boxes of the tiles you need. PostGIS them inserts the actual MVT into the mvt table.

CREATE TEMPORARY TABLE tmp_tiles AS
    SELECT
        muni.muni_id,
        prc.data,
        ST_AsMVTGeom(
            muni.geom,
            TileBBox(z, x , y, 3857),
            4096,
            0,
            false
        ) geom,
        x,
        y,
        z
    FROM muni
    JOIN (
        SELECT
            x,
            y,
            z,
            geom
        FROM tiles
    ) bbox ON (ST_Intersects(muni.geom, bbox.geom))
    JOIN party_results_cur prc ON (muni.muni_id = prc.muni_id);
CREATE TABLE mvt (mvt bytea, x integer, y integer, z integer);
DO
$$
DECLARE r record;
BEGIN
FOR r in SELECT DISTINCT x, y, z FROM tmp_tiles LOOP
    INSERT INTO mvt
    SELECT ST_AsMVT(q, 'municipality', 4096, 'geom'), r.x, r.y, r.z
    FROM (
        SELECT
            muni_id,
            data,
            geom
        FROM tmp_tiles
        WHERE (x, y, z) = (r)
    ) q;
    RAISE INFO '%', r;
END LOOP;
END;
$$;

Once filled, the table rows can be written to the filesystem with the simple piece of Python code.

#!/usr/bin/env python

import logging
import os
import time
from sqlalchemy import create_engine, text

CACHE_PATH="cache/"

e = create_engine('postgresql:///')
conn = e.connect()
sql=text("SELECT mvt, x, y, z FROM mvt")
query = conn.execute(sql)
data = query.cursor.fetchall()

for d in data:
    cachefile = "{}/{}/{}/{}".format(CACHE_PATH, d[3], d[1], d[2])
    print(cachefile)

    if not os.path.exists("{}/{}/{}".format(CACHE_PATH, d[3], d[1])):
        os.makedirs("{}/{}/{}".format(CACHE_PATH, d[3], d[1]))

    with open(cachefile, "wb") as f:
        f.write(bytes(d[0]))

Conclusion

PostGIS is a brilliant tool for generating Mapbox vector tiles. Combined with Python powered static file generator and Nginx, it seems to become the only tool needed to get you going.

by Michal Zimmermann at October 23, 2017 02:00 PM

October 18, 2017

PostGIS Development

PostGIS Patch Releases

The PostGIS development team has uploaded bug fix releases for the 2.2, 2.3 and 2.4 stable branches.

2.2.6

2.3.4

2.4.1

by Paul Ramsey at October 18, 2017 12:00 AM

October 15, 2017

Boston GIS (Regina Obe, Leo Hsu)

Using pg_upgrade to upgrade PostGIS without installing an older version of PostGIS

PostGIS releases a new minor version of PostGIS every one or two years. Each minor version of postgis has a different libname suffix. In PostGIS 2.1 you'll find files in your PostgreSQL lib folder called postgis-2.1.*, rtpostgis-2.1.*, postgis-topology-2.1.*, address-standardizer-2.1.* etc. and in a PostGIS 2.2 you'll find similar files but with 2.2 in the name. I believe PostGIS and pgRouting are the only extensions that stamp the lib with a version number. Most other extensions you will find are just called extension.so e.g. hstore is always called hstore.dll /hstore.so even if the version changed from 9.6 to 10. On the bright side this allows people to have two versions of PostGIS installed in a PostgreSQL cluster, though a database can use at most one version. So you can have an experimental database running a very new or unreleased version of PostGIS and a production database running a more battery tested version.

On the sad side this causes a lot of PostGIS users frustration trying to use pg_upgrade to upgrade from an older version of PostGIS/PostgreSQL to a newer version of PostGIS/PostgreSQL; as their pg_upgrade often bails with a message in the loaded_libraries.txt log file something to the affect:

could not load library "$libdir/postgis-2.2": ERROR:  could not access file "$libdir/postgis-2.2": No such file or directory
could not load library "$libdir/postgis-2.3": ERROR:  could not access file "$libdir/postgis-2.3": No such file or directory

This is also a hassle because we generally don't support a newer version of PostgreSQL on older PostGIS installs because the PostgreSQL major version changes tend to break our code often and backporting those changes is both time-consuming and dangerous. For example the DatumGetJsonb change and this PostgreSQL 11 crasher we haven't isolated the cause of yet. There are several changes like this that have already made the PostGIS 2.4.0 we released recently incompatible with the PostgreSQL 11 head development.

Continue reading "Using pg_upgrade to upgrade PostGIS without installing an older version of PostGIS"

by Regina Obe (nospam@example.com) at October 15, 2017 05:11 AM

October 04, 2017

Oslandia

Refresh your maps FROM postgreSQL !

Continuing our love story with PostgreSQL and QGIS, we asked QGIS.org a grant application during early 2017 spring.

The idea was to take benefit of very advanced PostgreSQL features, that probably never were used in a Desktop GIS client before.

Today, let’s see what we can do with the PostgreSQL NOTIFY feature!

Ever dreamt of being able to trigger things from outside QGIS? Ever wanted a magic stick to trigger actions in some clients from a database action?

X All The Y Meme | REFRESH QGIS FROM THE DATABASE !!! | image tagged in memes,x all the y | made w/ Imgflip meme maker

 

NOTIFY is a PostgreSQL specific feature allowing to generate notifications on a channel and optionally send a message — a payload in PG’s dialect .

In short, from within a transaction, we can raise a signal in a PostgreSQL queue and listen to it from a client.

In action

We hardcoded a channel named “qgis” and made QGIS able to LISTEN to NOTIFY events and transform them into Qt’s signals. The signals are connected to layer refresh when you switch on this rendering option.

Optionnally, adding a message filter will only redraw the layer for some specific events.

This mechanism is really versatile and we now can imagine many possibilities, maybe like trigger a notification message to your users from the database, interact with plugins, or even code a chat between users of the same database  (ok, this is stupid) !

 

More than just refresh layers?

The first implementation we chose was to trigger a layer refresh because we believe this is a good way for users to discover this new feature.

But QGIS rocks hey, doing crazy things for limited uses is not the way.

Thanks to feedback on the Pull Request, we added the possibility to trigger layer actions on notification.

That should be pretty versatile since you can do almost anything with those actions now.

Caveats

QGIS will open a permanent connection to PostgreSQL to watch the notify signals. Please keep that in mind if you have several clients and a limited number of connections.

Notify signals are only transmitted with the transaction, so when the COMMIT is raised. So be aware that this might not help you if users are inside an edit session.

QGIS has a lot of different caches, for attribute table for instance. We currently have no specific way to invalidate a specific cache, and then order QGIS to refresh it’s attribute table.

There is no way in PG to list all channels of a database session, that’s why we couldn’t propose a combobox list of available signals in the renderer option dialog. Anyway, to avoid too many issues, we decided to hardcode the channel name in QGIS with the name “qgis”. If this is somehow not enough for your needs, please contact us!

Conclusion

The github pull request is here : https://github.com/qgis/QGIS/pull/5179

We are convinced this would be really useful for real time application, let us know if that makes some bells ring on your side!

More to come soon, stay tuned!

 

 

by Régis Haubourg at October 04, 2017 01:09 PM

Oslandia

Undo Redo stack is back QGIS Transaction groups

Let’s keep on looking at what we did in QGIS.org grant application of early 2017 spring.

At Oslandia, we use a lot the transaction groups option of QGIS. It was an experimental feature in QGIS 2.X allowing to open only one common Postgres transaction for all layers sharing the same connection string.

Transaction group option

When activated, that option will bring many killer features:

  • Users can switch all the layers in edit mode at once. A real time saver.
  • Every INSERT, UPDATE or DELETE is forwarded immediately to the database, which is nice for:
    • Evaluating on the fly if database constraints are satisfied or not. Without transaction groups this is only done when saving the edits and this can be frustrating to create dozens of features and having one of them rejected because of a foreign key constraint…
    • Having triggers evaluated on the fly.  QGIS is so powerful when dealing with “thick database” concepts that I would never go back to a pure GIS ignoring how powerful databases can be !
    • Playing with QgsTransaction.ExecuteSQL allows to trigger stored procedures in PostgreSQL in a beautiful API style interface. Something like
SELECT invert_pipe_direction('pipe1');
  • However, the implementation was flagged “experimental” because some caveats where still causing issues:
    • Committing on the fly was breaking the logic of the undo/redo stack. So there was no way to do a local edit. No Ctrl+Z!  The only way to rollback was to stop the edit session and loose all the work. Ouch.. Bad!
    • Playing with ExecuteSQL did not dirty the QGIS edit buffer. So, if during an edit session no edit action was made using QGIS native tools, there was no clean way to activate the “save edits” icon.
    • When having some failures in the triggers, QGIS may loose DB connection and thus create a silent ROLLBACK.

We decided to try to restore the undo/redo stack by saving the history edits in PostgreSQL SAVEPOINTS and see if we could restore the original feature in QGIS.

And.. it worked!

Let’s see that in action:

 

Potential caveats ?

At start, we worried about how heavy all those savepoints would be for the database. It turns out that maybe for really massive geometries, and heavy editing sessions, this could start to weight a bit, but honestly far away from PostgreSQL capabilities.

 

Up to now, we didn’t really find any issue with that..

And we didn’t address the silent ROLLBACK that occurs sometimes, because it is generated by buggy stored procedures, easy to solve.

Some new ideas came to us when working in that area. For instance, if a transaction locks a feature, QGIS just… wait for the lock to be released. I think we should find a way to advertise those locks to the users, that would be great! If you’re interested in making that happen, please contact us.

 

More to come soon, stay tuned!

 

 

by Régis Haubourg at October 04, 2017 01:08 PM

September 30, 2017

PostGIS Development

PostGIS 2.4.0 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0. Best served with PostgreSQL 10rc1 and pgRouting 2.5.0. See the full list of changes in the news file.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
ALTER EXTENSION pgrouting UPDATE;

In order to have Map Box Vector Tiles support enabled, you’ll need to compile with protobuf support and pkg-config to verify the correct minimum version of protobuf-c see protobuf for details. ST_FrechetDistance function will not be enabled if PostGIS is compiled with lower than GEOS 3.7.0. GEOS 3.7.0 is not released yet but is expected sometime next month.

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 30, 2017 12:00 AM

September 27, 2017

PostGIS Development

PostGIS 2.4.0rc3 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0rc3. Best served with PostgreSQL 10rc1 and pgRouting 2.5.0. See the full list of changes in the news file.

This will be the final rc before we have our 2.4.0 release.

We encourage everyone to test and in particular package maintainers to ensure no surprises at final release time.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
ALTER EXTENSION pgrouting UPDATE;
Continue Reading by clicking title hyperlink ..

by Regina Obe at September 27, 2017 12:00 AM

September 24, 2017

Boston GIS (Regina Obe, Leo Hsu)

PostGIS db help and manual in different languages

One of the things I'm most happy about with upcoming PostGIS 2.4.0, due out in about a week is that it is the first version to have almost complete translations into different languages. The Japanese, German, Portugese, and Korean translations are more than 80% complete with Japanese being 96%. You can download the html manuals from PostGIS docs page. Thwre are PDFs for non-Asian languages. Japanese and Korean languages I'm still having issue generating the pdfs.

When you install PostGIS with CREATE EXTENSION postgis;, it also installs the accompanying help extracted from the manual in English format.

The comment generator we have in place is just as happy working with translated docs as it is with the English one so the in db help documents can also be generated in other languages. The help files are located: Japanese, German, Portugese, and Korean

Continue reading "PostGIS db help and manual in different languages"

by Regina Obe (nospam@example.com) at September 24, 2017 09:54 AM

PostGIS Development

PostGIS 2.4.0rc2 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0rc2. Best served with PostgreSQL 10rc1 and pgRouting 2.5.0. See the full list of changes in the news file.

We encourage everyone to test and in particular package maintainers to insure no surprises at final release time.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
ALTER EXTENSION pgrouting UPDATE;
Continue Reading by clicking title hyperlink ..

by Regina Obe at September 24, 2017 12:00 AM

September 22, 2017

Michal Zimmermann

PostgreSQL Dollar Quoting inside Bash Heredoc

Yesterday I spent two very unpleasant hours debugging the weirdest SQL error I’ve seen in my life, running the below query (simplified for this post).

psql -qAt --no-psqlrc <<BACKUP
DO
$$
DECLARE r record;
BEGIN
  RAISE INFO '%', 'info';
END
$$;
BACKUP

Running this in your terminal will result in a nasty syntax error.

ERROR:  syntax error at or near "1111"
LINE 2: 1111
        ^
ERROR:  syntax error at or near "RAISE"
LINE 2:   RAISE INFO '%', 'info';
          ^
ERROR:  syntax error at or near "1111"
LINE 2: 1111;

You stare on the screen for a while, absolutely sure that number 1111 is nowhere close to the data you work with. You try again. Another error. You save the code into a file and try again. It works. What the heck? You try again using the bash heredoc. Another failure.

The minute you realize $$ is being substituted with the ID of the current process, you feel like the dumbest person on Earth. Yet the happiest one at the same time.

The solution is trivial.

psql -qAt --no-psqlrc <<BACKUP
DO
\$\$
DECLARE r record;
BEGIN
  RAISE INFO '%', 'info';
END
\$\$;
BACKUP

by Michal Zimmermann at September 22, 2017 06:30 PM

September 19, 2017

PostGIS Development

PostGIS 2.1.9 Released

The PostGIS development team has uploaded the final release of the PostGIS 2.1 branch. The 2.1 branch is now end-of-life. As befits a patch release, the focus is on bugs and breakages.

Continue Reading by clicking title hyperlink ..

by Paul Ramsey at September 19, 2017 12:00 AM

September 14, 2017

Paul Ramsey

PostGIS Operators in 2.4

TL;DR: If you are using ORDER BY or GROUP BY on geometry columns in your application and you have expectations regarding the order or groups you obtain, beware that PostGIS 2.4 changes the behaviour or ORDER BY and GROUP BY. Review your applications accordingly.


The first operators we learn about in elementary school are =, > and <, but they are the operators that are the hardest to define in the spatial realm.

PostGIS Operators in 2.4

When is = equal?

For example, take “simple” equality. Are geometry A and B equal? Should be easy, right?

But are we talking about:

  1. A and B have exactly the same vertices in the same order and with the same starting points?
  2. A and B have exactly the same vertices in any order? (see ST_OrderingEquals)
  3. A and B have the same vertices in any order but different starting points?
  4. A has some extra vertices that B does not, but they cover exactly the same area in space? (see ST_Equals)
  5. A and B have the same bounds?

Confusingly, for the first 16 years of its existence, PostGIS used definition 5, “A and B have the same bounds” when evaluating the = operator for two geometries.

However, for PostGIS 2.4, PostGIS will use definition 1: “A and B have exactly the same vertices in the same order and with the same starting points”.

Why does this matter? Because the behavour of the SQL GROUP BY operation is bound to the “=” operator: when you group by a column, an output row is generated for all groups where every item is “=” to every other item. With the new definition in 2.4, the semantics of GROUP BY should be more “intuitive” when used against geometries.

What is > and <?

Greater and less than are also tricky in the spatial domain:

  • Is POINT(0 0) less than POINT(1 1)? Sort of looks like it, but…
  • Is POINT(0 0) less than POINT(-1 1) or POINT(1 -1)? Hm, that makes the first one look less obvious…

Greater and less than are concepts that make sense for 1-dimensional values, but not for higher dimensions. The “>” and “<” operators have accordingly been an ugly hack for a long time: they compared the minima of the bounding boxes of the two geometries.

  • If they were sortable using the X coordinate of the minima, that was the sorting returned.
  • If they were equal in X, then the Y coordinate of the minima was used.
  • Etc.

This process returned a sorted order, but not a very satisfying one: a “good” sorting would tend to place objects that are near to each other in space, near to each other in the sorted set.

Here’s what the old sorting looked like, applied to world populated places:

Geometry sorting in PostGIS 2.3

The new sorting system for PostGIS 2.4 calculates a very simple “morton key” using the center of the bounds of a feature, keeping things simple for performance reasons. The result is a sorted order that tends to keep spatially nearby features closer together in the sorted set.

Geometry sorting in PostGIS 2.4

Just as the “=” operator is tied to the SQL GROUP BY operation, the “>” and “<” operators are tied to the SQL ORDER BY operation. The pictures above were created by generating a line string from the populated places points as follows:

CREATE TABLE places_line AS 
SELECT ST_MakeLine(geom ORDER BY geom) AS geom
FROM places;

September 14, 2017 04:00 PM

September 13, 2017

PostGIS Development

PostGIS 2.4.0rc1 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0rc1. Best served with PostgreSQL 10beta4 or rc1 which is coming soon and pgRouting 2.5.0rc 2.5.0 release is eminent. See the full list of changes in the news file.

We encourage everyone to test and in particular package maintainers to insure no surprises at final release time.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS 2.4.0rc1 binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
--pgRouting 2.5.0 is imminent
ALTER EXTENSION pgrouting UPDATE;

In order to have Map Box Vector Tiles support enabled, you’ll need to compile with protobuf support and pkg-config to verify the correct minimum version of protobuf-c see protobuf for details. ST_FrechetDistance function will not be enabled if PostGIS is compiled with lower than GEOS 3.7.0. GEOS 3.7.0 will probably not be released before PostGIS 2.4.0 is out.

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 13, 2017 12:00 AM

September 11, 2017

Anita Graser (Underdark)

Drive-time Isochrones from a single Shapefile using QGIS, PostGIS, and Pgrouting

This is a guest post by Chris Kohler .

Introduction:

This guide provides step-by-step instructions to produce drive-time isochrones using a single vector shapefile. The method described here involves building a routing network using a single vector shapefile of your roads data within a Virtual Box. Furthermore, the network is built by creating start and end nodes (source and target nodes) on each road segment. We will use Postgresql, with PostGIS and Pgrouting extensions, as our database. Please consider this type of routing to be fair, regarding accuracy, as the routing algorithms are based off the nodes locations and not specific addresses. I am currently working on an improved workflow to have site address points serve as nodes to optimize results. One of the many benefits of this workflow is no financial cost to produce (outside collecting your roads data). I will provide instructions for creating, and using your virtual machine within this guide.

Steps:–Getting Virtual Box(begin)–

Intro 1. Download/Install Oracle VM(https://www.virtualbox.org/wiki/Downloads)

Intro 2. Start the download/install OSGeo-Live 11(https://live.osgeo.org/en/overview/overview.html).

Pictures used in this workflow will show 10.5, though version 11 can be applied similarly. Make sure you download the version: osgeo-live-11-amd64.iso. If you have trouble finding it, here is the direct link to the download (https://sourceforge.net/projects/osgeo-live/files/10.5/osgeo-live-10.5-amd64.iso/download)
Intro 3. Ready for virtual machine creation: We will utilize the downloaded OSGeo-Live 11 suite with a virtual machine we create to begin our workflow. The steps to create your virtual machine are listed below. Also, here are steps from an earlier workshop with additional details with setting up your virtual machine with osgeo live(http://workshop.pgrouting.org/2.2.10/en/chapters/installation.html).

1.  Create Virutal Machine: In this step we begin creating the virtual machine housing our database.

Open Oracle VM VirtualBox Manager and select “New” located at the top left of the window.

VBstep1

Then fill out name, operating system, memory, etc. to create your first VM.

vbstep1.2

2. Add IDE Controller:  The purpose of this step is to create a placeholder for the osgeo 11 suite to be implemented. In the virtual box main window, right-click your newly-created vm and open the settings.

vbstep2

In the settings window, on the left side select the storage tab.

Find “adds new storage controller button located at the bottom of the tab. Be careful of other buttons labeled “adds new storage attachment”! Select “adds new storage controller button and a drop-down menu will appear. From the top of the drop-down select “Add IDE Controller”.

vbstep2.2

vbstep2.3

You will see a new item appear in the center of the window under the “Storage Tree”.

3.  Add Optical Drive: The osgeo 11 suite will be implemented into the virtual machine via an optical drive. Highlight the new controller IDE you created and select “add optical drive”.

vbstep3

A new window will pop-up and select “Choose Disk”.

vbstep3.2

Locate your downloaded file “osgeo-live 11 amd64.iso” and click open. A new object should appear in the middle window under your new controller displaying “osgeo-live-11.0-amd64.iso”.

vbstep3.3

Finally your virtual machine is ready for use.
Start your new Virtual Box, then wait and follow the onscreen prompts to begin using your virtual machine.

vbstep3.4

–Getting Virtual Box(end)—

4. Creating the routing database, and both extensions (postgis, pgrouting): The database we create and both extensions we add will provide the functions capable of producing isochrones.

To begin, start by opening the command line tool (hold control+left-alt+T) then log in to postgresql by typing “psql -U user;” into the command line and then press Enter. For the purpose of clear instruction I will refer to database name in this guide as “routing”, feel free to choose your own database name. Please input the command, seen in the figure below, to create the database:

CREATE DATABASE routing;

You can use “\c routing” to connect to the database after creation.

step4

The next step after creating and connecting to your new database is to create both extensions. I find it easier to take two-birds-with-one-stone typing “psql -U user routing;” this will simultaneously log you into postgresql and your routing database.

When your logged into your database, apply the commands below to add both extensions

CREATE EXTENSION postgis;
CREATE EXTENSION pgrouting;

step4.2

step4.3

5. Load shapefile to database: In this next step, the shapefile of your roads data must be placed into your virtual machine and further into your database.

My method is using email to send myself the roads shapefile then download and copy it from within my virtual machines web browser. From the desktop of your Virtual Machine, open the folder named “Databases” and select the application “shape2pgsql”.

step5

Follow the UI of shp2pgsql to connect to your routing database you created in Step 4.

step5.2

Next, select “Add File” and find your roads shapefile (in this guide we will call our shapefile “roads_table”) you want to use for your isochrones and click Open.

step5.3

Finally, click “Import” to place your shapefile into your routing database.

6. Add source & target columns: The purpose of this step is to create columns which will serve as placeholders for our nodes data we create later.

There are multiple ways to add these columns into the roads_table. The most important part of this step is which table you choose to edit, the names of the columns you create, and the format of the columns. Take time to ensure the source & target columns are integer format. Below are the commands used in your command line for these functions.

ALTER TABLE roads_table ADD COLUMN "source" integer;
ALTER TABLE roads_table ADD COLUMN "target" integer;

step6

step6.2

7. Create topology: Next, we will use a function to attach a node to each end of every road segment in the roads_table. The function in this step will create these nodes. These newly-created nodes will be stored in the source and target columns we created earlier in step 6.

As well as creating nodes, this function will also create a new table which will contain all these nodes. The suffix “_vertices_pgr” is added to the name of your shapefile to create this new table. For example, using our guide’s shapefile name , “roads_table”, the nodes table will be named accordingly: roads_table_vertices_pgr. However, we will not use the new table created from this function (roads_table_vertices_pgr). Below is the function, and a second simplified version, to be used in the command line for populating our source and target columns, in other words creating our network topology. Note the input format, the “geom” column in my case was called “the_geom” within my shapefile:

pgr_createTopology('roads_table', 0.001, 'geom', 'id',
 'source', 'target', rows_where := 'true', clean := f)

step7

Here is a direct link for more information on this function: http://docs.pgrouting.org/2.3/en/src/topology/doc/pgr_createTopology.html#pgr-create-topology

Below is an example(simplified) function for my roads shapefile:

SELECT pgr_createTopology('roads_table', 0.001, 'the_geom', 'id')

8. Create a second nodes table: A second nodes table will be created for later use. This second node table will contain the node data generated from pgr_createtopology function and be named “node”. Below is the command function for this process. Fill in your appropriate source and target fields following the manner seen in the command below, as well as your shapefile name.

To begin, find the folder on the Virtual Machines desktop named “Databases” and open the program “pgAdmin lll” located within.

step8

Connect to your routing database in pgAdmin window. Then highlight your routing database, and find “SQL” tool at the top of the pgAdmin window. The tool resembles a small magnifying glass.

step8.2

We input the below function into the SQL window of pgAdmin. Feel free to refer to this link for further information: (https://anitagraser.com/2011/02/07/a-beginners-guide-to-pgrouting/)

CREATE TABLE node AS
   SELECT row_number() OVER (ORDER BY foo.p)::integer AS id,
          foo.p AS the_geom
   FROM (     
      SELECT DISTINCT roads_table.source AS p FROM roads_table
      UNION
      SELECT DISTINCT roads_table.target AS p FROM roads_table
   ) foo
   GROUP BY foo.p;

step8.3

  1.  Create a routable network: After creating the second node table from step 8,  we will combine this node table(node) with our shapefile(roads_table) into one, new, table(network) that will be used as the routing network. This table will be called “network” and will be capable of processing routing queries.  Please input this command and execute in SQL pgAdmin tool as we did in step 8. Here is a reference for more information:(https://anitagraser.com/2011/02/07/a-beginners-guide-to-pgrouting/)   

step8.2

 

CREATE TABLE network AS
   SELECT a.*, b.id as start_id, c.id as end_id
   FROM roads_table AS a
      JOIN node AS b ON a.source = b.the_geom
      JOIN node AS c ON a.target = c.the_geom;

step9.2

10. Create a “noded” view of the network:  This new view will later be used to calculate the visual isochrones in later steps. Input this command and execute in SQL pgAdmin tool.

CREATE OR REPLACE VIEW network_nodes AS 
SELECT foo.id,
 st_centroid(st_collect(foo.pt)) AS geom 
FROM ( 
  SELECT network.source AS id,
         st_geometryn (st_multi(network.geom),1) AS pt 
  FROM network
  UNION 
  SELECT network.target AS id, 
         st_boundary(st_multi(network.geom)) AS pt 
  FROM network) foo 
GROUP BY foo.id;

step10

11.​ Add column for speed:​ This step may, or may not, apply if your original shapefile contained a field of values for road speeds.

In reality a network of roads will typically contain multiple speed limits. The shapefile you choose may have a speed field, otherwise the discrimination for the following steps will not allow varying speeds to be applied to your routing network respectfully.

If values of speed exists in your shapefile we will implement these values into a new field, “traveltime“, that will show rate of travel for every road segment in our network based off their geometry. Firstly, we will need to create a column to store individual traveling speeds. The name of our column will be “traveltime” using the format: ​double precision.​ Input this command and execute in the command line tool as seen below.

ALTER TABLE network ADD COLUMN traveltime double precision;

step11

Next, we will populate the new column “traveltime” by calculating traveling speeds using an equation. This equation will take each road segments geometry(shape_leng) and divide by the rate of travel(either mph or kph). The sample command I’m using below utilizes mph as the rate while our geometry(shape_leng) units for my roads_table is in feet​. If you are using either mph or kph, input this command and execute in SQL pgAdmin tool. Below further details explain the variable “X”.

UPDATE network SET traveltime = shape_leng / X*60

step11.2

How to find X​, ​here is an example​: Using example 30 mph as rate. To find X, we convert 30 miles to feet, we know 5280 ft = 1 mile, so we multiply 30 by 5280 and this gives us 158400 ft. Our rate has been converted from 30 miles per hour to 158400 feet per hour. For a rate of 30 mph, our equation for the field “traveltime”  equates to “shape_leng / 158400*60″. To discriminate this calculations output, we will insert additional details such as “where speed = 30;”. What this additional detail does is apply our calculated output to features with a “30” value in our “speed” field. Note: your “speed” field may be named differently.

UPDATE network SET traveltime = shape_leng / 158400*60 where speed = 30;

Repeat this step for each speed value in your shapefile examples:

UPDATE network SET traveltime = shape_leng / X*60 where speed = 45;
UPDATE network SET traveltime = shape_leng / X*60 where speed = 55;

The back end is done. Great Job!

Our next step will be visualizing our data in QGIS. Open and connect QGIS to your routing database by right-clicking “PostGIS” in the Browser Panel within QGIS main window. Confirm the checkbox “Also list tables with no geometry” is checked to allow you to see the interior of your database more clearly. Fill out the name or your routing database and click “OK”.

If done correctly, from QGIS you will have access to tables and views created in your routing database. Feel free to visualize your network by drag-and-drop the network table into your QGIS Layers Panel. From here you can use the identify tool to select each road segment, and see the source and target nodes contained within that road segment. The node you choose will be used in the next step to create the views of drive-time.

12.Create views​: In this step, we create views from a function designed to determine the travel time cost. Transforming these views with tools will visualize the travel time costs as isochrones.

The command below will be how you start querying your database to create drive-time isochrones. Begin in QGIS by draging your network table into the contents. The visual will show your network as vector(lines). Simply select the road segment closest to your point of interest you would like to build your isochrone around. Then identify the road segment using the identify tool and locate the source and target fields.

step12

step12.2

Place the source or target field value in the below command where you see ​VALUE​, in all caps​.

This will serve you now as an isochrone catchment function for this workflow. Please feel free to use this command repeatedly for creating new isochrones by substituting the source value. Please input this command and execute in SQL pgAdmin tool.

*AT THE BOTTOM OF THIS WORKFLOW I PROVIDED AN EXAMPLE USING SOURCE VALUE “2022”

CREATE OR REPLACE VIEW "​view_name" AS 
SELECT di.seq, 
       di.id1, 
       di.id2, 
       di.cost, 
       pt.id, 
       pt.geom 
FROM pgr_drivingdistance('SELECT
     gid::integer AS id, 
     Source::integer AS source, 
     Target::integer AS target,                                    
     Traveltime::double precision AS cost 
       FROM network'::text, ​VALUE::bigint, 
    100000::double precision, false, false)
    di(seq, id1, id2, cost)
JOIN network_nodes pt ON di.id1 = pt.id;

step12.3

13.Visualize Isochrone: Applying tools to the view will allow us to adjust the visual aspect to a more suitable isochrone overlay.

​After creating your view, a new item in your routing database is created, using the “view_name” you chose. Drag-and-drop this item into your QGIS LayersPanel. You will see lots of small dots which represent the nodes.

In the figure below, I named my view “take1“.

step13

Each node you see contains a drive-time value, “cost”, which represents the time used to travel from the node you input in step 12’s function.

step13.2

Start by installing the QGIS plug-in Interpolation” by opening the Plugin Manager in QGIS interface.

step13.3

Next, at the top of QGIS window select “Raster” and a drop-down will appear, select “Interpolation”.

step13.4

 

A new window pops up and asks you for input.

step13.5

Select your “​view”​ as the​ vector layer​, select ​”cost​” as your ​interpolation attribute​, and then click “Add”.

step13.6

A new vector layer will show up in the bottom of the window, take care the type is Points. For output, on the other half of the window, keep the interpolation method as “TIN”, edit the ​output file​ location and name. Check the box “​Add result to project​”.

Note: decreasing the cellsize of X and Y will increase the resolution but at the cost of performance.

Click “OK” on the bottom right of the window.

step13.7

A black and white raster will appear in QGIS, also in the Layers Panel a new item was created.

step13.8

Take some time to visualize the raster by coloring and adjusting values in symbology until you are comfortable with the look.

step13.9

step13.10

14. ​Create contours of our isochrone:​ Contours can be calculated from the isochrone as well.

Find near the top of QGIS window, open the “Raster” menu drop-down and select Extraction → Contour.

step14

Fill out the appropriate interval between contour lines but leave the check box “Attribute name” unchecked. Click “OK”.

step14.2

step14.3

15.​ Zip and Share:​ Find where you saved your TIN and contours, compress them in a zip folder by highlighting them both and right-click to select “compress”. Email the compressed folder to yourself to export out of your virtual machine.

Example Isochrone catchment for this workflow:

CREATE OR REPLACE VIEW "2022" AS 
SELECT di.seq, Di.id1, Di.id2, Di.cost,                           
       Pt.id, Pt.geom 
FROM pgr_drivingdistance('SELECT gid::integer AS id,                                       
     Source::integer AS source, Target::integer AS target, 
     Traveltime::double precision AS cost FROM network'::text, 
     2022::bigint, 100000::double precision, false, false) 
   di(seq, id1, id2, cost) 
JOIN netowrk_nodes pt 
ON di.id1 = pt.id;

References: Virtual Box ORACLE VM, OSGeo-Live 11  amd64 iso, Workshop FOSS4G Bonn(​http://workshop.pgrouting.org/2.2.10/en/index.html​),

by ckohler4692 at September 11, 2017 08:01 PM

September 02, 2017

PostGIS Development

PostGIS 2.4.0beta1 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0beta1. PostGIS 2.4.0 will be the first version to support PostgreSQL 10. Best served with PostgreSQL 10beta4 and pgRouting 2.5.0beta See the full list of changes in the news file.

We encourage everyone to test and in particular package maintainers to insure no surprises at final release time.

IMPORTANT NOTES

From this point forward until release of PostGIS 2.4.0, no new functions will be added.
Only bug fixes will be addressed in future 2.4.0 betas and rcs.

In order to have Map Box Vector Tiles support enabled, you’ll need to compile with protobuf support and pkg-config to verify the correct minimum version of protobuf-c see protobuf for details. ST_FrechetDistance function will not be enabled if PostGIS is compiled with lower than GEOS 3.7.0. GEOS 3.7.0 will be released around the same time as PostGIS 2.4.0 and will have a beta release in a week.

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 02, 2017 12:00 AM

August 20, 2017

Boston GIS (Regina Obe, Leo Hsu)

geography type is not limited to earth

It's a little known fact that the PostGIS geography type since PostGIS 2.2 (well was introduced in 2.1 I think but had a bug in it until 2.1.4 or so), supports any spheroidal spatial reference system that measures in degrees. When an srid is not specified, it defaults to using 4326, Earth WGS 84 long lat.

Continue reading "geography type is not limited to earth"

by Regina Obe (nospam@example.com) at August 20, 2017 02:50 PM

August 09, 2017

Michal Zimmermann

PostgreSQL Development History Revealed with PostgreSQL

I spend a lot of time reading PostgreSQL docs. It occurred to me just a few weeks ago that those versioned manuals are great opportunity to get an insight into PostgreSQL development history. Using PostgreSQL, of course.

TOP 5 functions with the most verbose docs in each version

SELECT
    version,
    string_agg(func, ' | ' ORDER BY letter_count DESC)
FROM (
    SELECT
        version,
        func,
        letter_count,
        row_number() OVER (PARTITION BY version ORDER BY letter_count DESC)
    FROM postgresql_development.data
) a
WHERE row_number <= 10
GROUP BY version
ORDER BY version DESC

Seems like a huge comeback for CREATE TABLE.

VERSION 1st 2nd 3rd 4th 5th
10.0 CREATE TABLE ALTER TABLE REVOKE GRANT SELECT
9.6 REVOKE ALTER TABLE GRANT CREATE TABLE SELECT
9.5 REVOKE ALTER TABLE GRANT CREATE TABLE SELECT
9.4 REVOKE GRANT ALTER TABLE CREATE TABLE SELECT
9.3 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.2 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.1 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.0 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
8.4 REVOKE GRANT CREATE TABLE ALTER TABLE SELECT
8.3 REVOKE CREATE TABLE GRANT ALTER TABLE COMMENT
8.2 REVOKE CREATE TABLE GRANT ALTER TABLE SELECT
8.1 REVOKE CREATE TABLE GRANT ALTER TABLE SELECT
8 CREATE TABLE REVOKE GRANT SELECT ALTER TABLE
7.4 CREATE TABLE REVOKE ALTER TABLE GRANT SELECT
7.3 CREATE TABLE SELECT ALTER TABLE REVOKE GRANT
7.2 CREATE TABLE SELECT INTO SELECT ALTER TABLE CREATE TYPE
7.1 CREATE TABLE SELECT INTO SELECT CREATE TYPE ALTER TABLE
7.0 SELECT SELECT INTO CREATE TYPE CREATE TABLE COMMENT

Number of functions available in each version

SELECT
    version,
    count(func),
    sum(letter_count)
FROM postgresql_development.data
GROUP BY version ORDER BY version;

The most verbose docs in each version

SELECT DISTINCT ON (version)
    version,
    func,
    letter_count
FROM postgresql_development.data
ORDER BY version, letter_count DESC;

Poor REVOKE, the defeated champion.

VERSION FUNCTION LETTER COUNT
10 CREATE TABLE 3142
9.6 REVOKE 2856
9.5 REVOKE 2856
9.4 REVOKE 2856
9.3 REVOKE 2856
9.2 REVOKE 2856
9.1 REVOKE 2508
9 REVOKE 2502
8.4 REVOKE 2105
8.3 REVOKE 1485
8.2 REVOKE 1527
8.1 REVOKE 1312
8 CREATE TABLE 1251
7.4 CREATE TABLE 1075
7.3 CREATE TABLE 929
7.2 CREATE TABLE 929
7.1 CREATE TABLE 871
7 SELECT 450

CREATE TABLE docs evolution

SELECT
    version,
    letter_count
FROM postgresql_development.data
WHERE func = 'CREATE TABLE'
ORDER BY func, version;

Something’s going on in an upcoming 10.0 version.

All the data was obtained with the following Python script and processed inside the PostgreSQL database. Plots done with Bokeh, though I probably wouldn’t use it again, the docs site is absurdly sluggish and the info is just all over the place.

by Michal Zimmermann at August 09, 2017 05:00 PM