### ### Planet PostGIS

Welcome to Planet PostGIS

November 23, 2017

Michal Zimmermann

QGIS Plugin Development: AttributeTransfer Plugin

This part finally brings the whole source code of the QGIS AttributeTransfer plugin.

The plugin itself resides in the attribute_transfer.py file. When run() method is invoked, the QT form pops up with combos prefilled with available vector layers that support attribute editing.

Source and target layer combos are mutually exclusive, thus it’s not possible to transfer the attribute within the same layer.

Coding the plugin, I came across minor issues related mainly to the QgsSpatialIndex implementation. In the nearest neighbor analysis part of the series, the QgsSpatialIndex.nearestNeighbor method was mentioned. Yet, as I found out, this method only works with QgsPoint geometries. Those are impossible to get from QgsPolygon or QgsPolyline, though. What can one possibly do, facing such a misfortune? Well… draw a solution matrix.

point line polygon
point QgsSpatialIndex.nearestNeighbor QgsSpatialIndex.nearestNeighbor; layers have to be switched, e.g. source layer = line QgsSpatialIndex.nearestNeighbor; layers have to be switched, e.g. source layer = polygon
line QgsSpatialIndex.nearestNeighbor QgsSpatialIndex.intersects with QgsGeometry.distance QgsSpatialIndex.intersects with QgsGeometry.distance
polygon QgsSpatialIndex.nearestNeighbor QgsSpatialIndex.intersects with QgsGeometry.distance QgsSpatialIndex.intersects with QgsGeometry.distance

Using the spatial index brings one more issue I’ve come to realize just after implementing the special comparison workflows for different geometry types. There’s a chance of finding the nearest feature using the bounding box that’s actually not the nearest feature. In that case, I chose to find the most distant vertex of such a feature and use it to construct the rectangle around the target feature. If there are any source features in such a rectangle, it’s very likely one of them is the real nearest feature.

Right now, I’m working on finding the nearest feature even if no bounding box intersection is found. Meanwhile, the plugin is being reviewed to be featured in QGIS Plugins repository. Fingers crossed.

I thought this was going to be the last part of the series. But how could one possibly claim the coding project done without writing tests? Stay tuned for the next episode.

by Michal Zimmermann at November 23, 2017 06:00 PM

November 16, 2017

Boston GIS (Regina Obe, Leo Hsu)

Happy PostGIS Day 2017

To commemorate PostGIS Day 2017, I put together a 3D scene listing my favorite functions.

You can see it PostGIS Day in 3D

by Regina Obe (nospam@example.com) at November 16, 2017 11:51 PM

Michal Zimmermann

QGIS Plugin Development: Creating GUI with Qt Designer

After fiddling with QGIS Python console and implementing nearest neighbor analysis, I’m going to create a very simple GUI for the plugin at last.

While QGIS API docs took me few hours to grasp, the PyQGIS ecosystem knocked my socks off. Here comes the list of tools you should incorporate into your development process as soon as possible.

Plugin Builder

The QGIS Plugin Builder is a plugin created to create… well, other plugins. It gets you going in minutes and lets you code instead of setting up things you don’t want to be setting up. A definite must-have. Note you should put the plugin inside the QGIS plugins folder (defaults to ~/.qgis2/python/plugins) in Linux.

Remember to run pyrcc4 -o resources.py resources.qrc inside your plugin folder before you add it to QGIS.

Plugin Reloader

The QGIS Plugin Reloader is a plugin (possibly created with QGIS Plugin Builder) that lets you live reload your plugin while you code. No QGIS restarts needed. A definite must-have.

Qt Designer

Qt Designer comes with qt4-designer package in Ubuntu. It is tailored to design and build GUIs from Qt components that can be used within QGIS. Its drag&drop interface lets you prototype quickly.

Thanks to the Plugin Builder you can load the attribute_transfer_dialog_base.ui file straight into the Qt Designer and adjust it to your needs.

It doesn’t take much, just one QLineEdit and a few QComboBox widgets. Those will be available in the attribute_transfer.py file as properties of the AttributeTransferDialog class.

The widget name can be customized in the right sidebar and I advise you to do so. I chose the following:

Once loaded with Plugins -> Manage and Install Plugins -> AttributeTransfer, the plugin is available right from the toolbar or Vector menu. It is missing the business logic completely, but I have this covered in the previous part.

All that is to be done is to bind those two parts together.

by Michal Zimmermann at November 16, 2017 02:00 PM

November 15, 2017

PostGIS Development

PostGIS Patch Releases

The PostGIS development team has uploaded bug fix releases for the 2.3 and 2.4 stable branches.

2.3.5

2.4.2

by Paul Ramsey at November 15, 2017 12:00 AM

November 09, 2017

Michal Zimmermann

QGIS Plugin Development: Finding Nearest Neighbors

I described basics of vector layers manipulation in the previous part of the series. With my goal in mind (fully functional custom plugin capable of writing an attribute value from a source layer to a target layer based on a feature distance), I’d like to discuss spatial indexing and nearest neighbor analysis.

The picture above illustrates the task that can be solved solely by using QGIS API. Imagine you’re given a source layer with an attribute filled with values. You’re given a target layer as well, sadly though, the values in this layer are missing (not so rare in the GIS world, right?). Yet you know that the missing attribute value of each feature in the target layer can be filled by the value of its nearest neighbor from the source layer. How do you do that?

Generating dummy data

Let’s create two memory data sets with id and value attributes. Both of them will have ten features.

from qgis.core import QgsMapLayerRegistry, QgsVectorLayer, QgsFeature, QgsGeometry, QgsPoint, QgsSpatialIndex
from qgis.utils import iface

source_layer = QgsVectorLayer("point?crs=epsg:4326&field=id:integer&field=value:integer", "Source layer", "memory")
target_layer = QgsVectorLayer("point?crs=epsg:4326&field=id:integer&field=value:integer", "Target layer", "memory")

def create_dummy_data():

    source_layer.startEditing()
    target_layer.startEditing()

    feature = QgsFeature(source_layer.pendingFields())

    for i in range(10):
        feature.setGeometry(QgsGeometry.fromPoint(QgsPoint(i, i)))
        feature.setAttribute("id", i)
        feature.setAttribute("value", i)
        source_layer.addFeature(feature)

    feature = QgsFeature(source_layer.pendingFields())

    for i in range(10):
        feature.setGeometry(QgsGeometry.fromPoint(QgsPoint(i + i, i)))
        feature.setAttribute("id", i)
        target_layer.addFeature(feature)

    source_layer.commitChanges()
    target_layer.commitChanges()

    QgsMapLayerRegistry.instance().addMapLayer(source_layer)
    QgsMapLayerRegistry.instance().addMapLayer(target_layer)

create_dummy_data()

Writing values from the nearest neighbor

The actual nearest neighbor analysis can be done in ten lines of code! First, the qgis.core.QgsSpatialIndex is built from all the source_layer features. Then, you iterate over the target_layer features and for each of them, gets only one (nearestNeighbor(f.geometry().asPoint(), 1)[0]) nearest neighbor. At last, you just write the nearest neighbor’s attribute value to the target layer and commit changes. Just use the following code with the code above.

def write_values_from_nn():
    source_layer_index = QgsSpatialIndex(source_layer.getFeatures())
    source_layer_features = {feature.id(): feature for (feature) in source_layer.getFeatures()}
    target_layer_features = target_layer.getFeatures()

    target_layer.startEditing()

    for f in target_layer_features:
        nearest = source_layer_index.nearestNeighbor(f.geometry().asPoint(), 1)[0]
        value = source_layer_features[nearest].attribute("value")
        target_layer.changeAttributeValue(f.id(), 1, value)

    target_layer.commitChanges()

write_values_from_nn()

Missing pieces or what’s next

I’m one step closer to my goal. What’s missing?

  • capabilities checks: does the target layer support edits? Check the layer data provider capabilities to find out.
  • user logging: notices, warnings or errors are completely missing. It will be great to have them shown inside qgis.gui.QgsMessageBar.
  • custom attributes: this version expects both layers to have the same attribute with the same data type.
  • GUI: a very simple PyQt widget will turn this console-based script into a custom plugin. That’s what’s going to be next.

by Michal Zimmermann at November 09, 2017 02:00 PM

November 07, 2017

PostGIS Development

Move PostGIS extension to a different schema

As of PostGIS 2.3, the postgis extension was changed to no longer allow relocation. All function calls within the extension are now schema qualified.

While this change fixed some issues with database restore, it created the issue of if you installed PostGIS in a schema other than the one you wanted to it is not intuitive how to move it to a different schema. Luckily there is a way to do this.

For this exercise, I will install PostGIS in the default schema and then demonstrate how to move it into another schema location.

You can run these steps using psql or pgAdmin or any other PostgreSQL tool you want.

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 07, 2017 12:00 AM

November 06, 2017

Paul Ramsey

Parallel PostGIS IIA

One of the core complaints in my review of PostgreSQL parallelism, was that the cost of functions executed on rows returned by queries do not get included in evaluations of the cost of a plan.

So for example, the planner appeared to consider these two queries equivalent:

SELECT *
FROM pd;

SELECT ST_Area(geom)
FROM pd;

They both retrieve the same number of rows and both have no filter on them, but the second one includes a fairly expensive function evaluation. No amount of changing the cost of the ST_Area() function would cause a parallel plan to materialize. Only changing the size of the table (making it bigger) would flip the plan into parallel mode.

Fortunately, when I raised this issue on pgsql-hackers, it turned out to have been reported and discussed last month, and Amit Kapila had already prepared a patch, which he kindly rebased for me.

With the patch in place, I now see rational behavior from the planner. Using the default PostGIS function costs, a simple area calculation on my 60K row polling division table is sequential:

EXPLAIN
SELECT ST_Area(geom)
FROM pd;
Seq Scan on pd  
(cost=0.00..14445.17 rows=69534 width=8)

However, if the ST_Area() function is costed a little more realistically, the plan shifts.

ALTER FUNCTION ST_Area(geometry) COST 100;

EXPLAIN
SELECT ST_Area(geom)
FROM pd;
 Gather  
 (cost=1000.00..27361.20 rows=69534 width=8)
   Workers Planned: 3
   ->  Parallel Seq Scan on pd  
       (cost=0.00..19407.80 rows=22430 width=8)

Perfect!

While not every query receives what I consider a “perfect plan”, it now appears that we at least have some reasonable levers available to get better plans via applying some sensible (higher) costs across the PostGIS code base.

November 06, 2017 08:00 PM

November 02, 2017

Michal Zimmermann

QGIS Plugin Development: Using Python Console

As mentioned in previous part of the series, the QGIS Python console is an entry point to GIS workflow automation within QGIS. Remember there’s an iface object representing qgis.gui.QgisInterface instance within the console that gives you access to the whole QGIS GUI. Let’s see what we can do inside the console.

Loading vector layers folder

import glob
from qgis.core import QgsMapLayerRegistry, QgsVectorLayer

def load_folder(folder):
    VALID_EXTENSIONS = ('.geojson', '.gpkg', '.shp')
    files = [f for f in glob.glob("{}/*".format(folder)) if f.endswith(VALID_EXTENSIONS)]

    for f in files:
        layer = QgsVectorLayer(f, f.split('/')[-1], 'ogr')

        if not layer.isValid():
            iface.messageBar().pushCritical("Failed to load:", f)
            continue

        QgsMapLayerRegistry.instance().addMapLayer(layer)

load_folder("path/to/your/vector/files/folder")
  • QgsMapLayerRegistry represents Layers Panel as present in the QGIS GUI
  • iface.messageBar() returns the message bar of the main app and lets you notify the user of what’s going on under the hood
  • QgsVectorLayer represents a vector layer with its underlying vector data sets

Editing active layer attribute table

The following code demonstrates the possibility to edit vector layer attribute table via console.

  • Any attribute to be written has to come in form of a qgis.core.QgsField - this is more or less an encapsulation of an attribute name and its type (PyQt4.QtCore.QVariant to be precise)
  • The underlying data provider has to be capable of attribute addition (caps & QgsVectorDataProvider.AddAttributes)
  • QgsVectorLayer.addAttribute method returns boolean rather than throwing an exception
from qgis.core import QgsField
from qgis.gui import QgsMessageBar
from PyQt4.QtCore import QVariant


def edit_active_layer(attr_name, attr_type):
    layer = iface.activeLayer()
    caps = layer.dataProvider().capabilities()

    if caps & QgsVectorDataProvider.AddAttributes:
        layer.startEditing()
        if layer.addAttribute(QgsField(attr_name, attr_type)):
            iface.messageBar().pushMessage("Attribute {0} was successfully added to the active layer.".format(attr_name), QgsMessageBar.SUCCESS)
            layer.commitChanges()
        else:
            iface.messageBar().pushMessage("Attribute {0} was not added. Does it already exist?".format(attr_name), QgsMessageBar.CRITICAL)
            layer.rollBack()

edit_active_layer("new_string_attribute", QVariant.String)

The whole series aims to present a plugin capable of writing a new attribute and its value to an existing layer. Thus, this code might come handy in the future.

Creating a new vector layer

It’s possible to create a whole new vector layer with QGIS Python console. I present a very simple create_new_layer function, yet I hope you can imagine the ways it can be tweaked.

from qgis.core import QgsField, QgsFields, QgsVectorLayer, QgsFeature, QgsGeometry, QgsPoint
from PyQt4.QtCore import QVariant

def create_new_layer():
    filename = "/path/to/your/vector/file.gpkg"

    fields = QgsFields()
    fields.append(QgsField("attr1", QVariant.String))
    fields.append(QgsField("attr2", QVariant.Int))

    file = QgsVectorFileWriter(
        filename,
        "UTF8",
        fields,
        QGis.WKBPoint,
        QgsCoordinateReferenceSystem(4326),
        "GPKG"
    )

    layer = QgsVectorLayer(filename, filename.split("/")[-1], "ogr")
    QgsMapLayerRegistry.instance().addMapLayer(layer)

    if not layer.dataProvider().capabilities() & QgsVectorDataProvider.AddAttributes:
        pass

    feature = QgsFeature(layer.pendingFields())
    feature.setGeometry(QgsGeometry().fromPoint(QgsPoint(0, 0)))
    feature.setAttribute("attr1", "attr1")
    feature.setAttribute("attr2", 2)

    layer.startEditing()

    if layer.addFeature(feature, True):
        layer.commitChanges()
    else:
        layer.rollBack()
        iface.messageBar().pushMessage("Feature addition failed.", QgsMessageBar.CRITICAL)

create_new_layer()

Those were just few examples of what can be done with QGIS API and Python console. Next time, I’d like to focus on spatial joins inside QGIS - another step to the final plugin.

by Michal Zimmermann at November 02, 2017 02:00 PM

October 31, 2017

Paul Ramsey

Parallel PostGIS II

A year and a half ago, with the release of PostgreSQL 9.6 on the horizon, I evaluated the parallel query infrastructure and how well PostGIS worked with it.

The results at the time were mixed: parallel query worked, when poked just the right way, with the correct parameters set on the PostGIS functions, and on the PostgreSQL back-end. However, under default settings, parallel queries did not materialize. Not for scans, not for joins, not for aggregates.

With the recent release of PostgreSQL 10, another generation of improvement has been added to parallel query processing, so it’s fair to ask, “how well does PostGIS parallelize now?”

Parallel PostGIS II

TL;DR:

The answer is, better than before:

  • Parallel aggregations now work out-of-the-box and parallelize in reasonable real-world conditions.
  • Parallel scans still require higher function costs to come into action, even in reasonable cases.
  • Parallel joins on spatial conditions still seem to have poor planning, requiring a good deal of manual poking to get parallel plans.

Setup

In order to run these tests yourself, you will need:

  • PostgreSQL 10
  • PostGIS 2.4

You’ll also need a multi-core computer to see actual performance changes. I used a 4-core desktop for my tests, so I could expect 4x improvements at best.

For testing, I used the same ~70K Canadian polling division polygons as last time.

createdb parallel
psql -c 'create extension postgis' parallel
shp2pgsql -s 3347 -I -D -W latin1 PD_A.shp pd | psql parallel

PDs

To support join queries, and on larger tables, I built a set of point tables based on the polling divisions. One point per polygon:

CREATE TABLE pts AS 
SELECT 
  ST_PointOnSurface(geom)::Geometry(point, 3347) AS geom, 
  gid, fed_num 
FROM pd;

CREATE INDEX pts_gix 
  ON pts USING GIST (geom);

Ten points per polygon (for about 700K points):

CREATE TABLE pts_10 AS 
SELECT 
  (ST_Dump(ST_GeneratePoints(geom, 10))).geom::Geometry(point, 3347) AS geom, 
  gid, fed_num 
FROM pd;

CREATE INDEX pts_10_gix 
  ON pts_10 USING GIST (geom);

One hundred points per polygon (for about 7M points):

CREATE TABLE pts_100 AS 
SELECT 
  (ST_Dump(ST_GeneratePoints(geom, 100))).geom::Geometry(point, 3347) AS geom, 
  gid, fed_num 
FROM pd;

CREATE INDEX pts_100_gix 
  ON pts_100 USING GIST (geom);

The configuration parameters for parallel query have changed since the last test, and are (in my opinion) a lot easier to understand.

These parameters are used to fine-tune the planner and execution. Usually you don’t need to change them.

  • parallel_setup_cost sets the planner’s estimate of the cost of launching parallel worker processes. Default 1000.
  • parallel_tuple_cost sets the planner’s estimate of the cost of transferring one tuple from a parallel worker process to another process. Default 0.1.
  • min_parallel_table_scan_size sets the minimum amount of table data that must be scanned in order for a parallel scan to be considered. Default 8MB.
  • min_parallel_index_scan_size sets the minimum amount of index data that must be scanned in order for a parallel scan to be considered. Default 512kB.
  • force_parallel_mode forces the planner to parallelize is wanted. Values: off | on | regress
  • effective_io_concurrency for some platforms and hardware setups allows true concurrent read. Values from 1 (for one spinning disk) to ~100 (for an SSD drive). Default 1.

These parameters control how many parallel processes are launched for a query.

  • max_worker_processes sets the maximum number of background processes that the system can support. Default 8.
  • max_parallel_workers sets the maximum number of workers that the system can support for parallel queries. Default 8.
  • max_parallel_workers_per_gather sets the maximum number of workers that can be started by a single Gather or Gather Merge node. Setting this value to 0 disables parallel query execution. Default 2.

Once you get to the point where #processes == #cores there’s not a lot of advantage in adding more processes. However, each process does exact a cost in terms of memory: a worker process consumes work_mem the same as any other backend, so when planning memory usage take both max_connections and max_worker_processes into consideration.

Before running tests, make sure you have a handle on what your parameters are set to: I frequently found I accidentally tested with max_parallel_workers set to 1.

show max_worker_processes;
show max_parallel_workers;
show max_parallel_workers_per_gather;

Aggregates

First, set max_parallel_workers and max_parallel_workers_per_gather to 8, so that the planner has as much room as it wants to parallelize the workload.

PostGIS only has one true spatial aggregate, the ST_MemUnion function, which is comically inefficient due to lack of input ordering. However, it’s possible to see some aggregate parallelism in action by wrapping a spatial function in a parallelizable aggregate, like Sum():

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 8;

EXPLAIN ANALYZE 
  SELECT Sum(ST_Area(geom)) 
    FROM pd 

Boom! We get a 3-worker parallel plan and execution about 3x faster than the sequential plan.

Finalize Aggregate  
(cost=15417.45..15417.46 rows=1 width=8) 
(actual time=236.925..236.925 rows=1 loops=1)
->  Gather  
(cost=15417.13..15417.44 rows=3 width=8) 
(actual time=236.915..236.921 rows=4 loops=1)
   Workers Planned: 3
   Workers Launched: 3
   ->  Partial Aggregate  
   (cost=14417.13..14417.14 rows=1 width=8) 
   (actual time=231.724..231.724 rows=1 loops=4)
       ->  Parallel Seq Scan on pd  
       (cost=0.00..13800.30 rows=22430 width=2308) 
       (actual time=0.049..30.407 rows=17384 loops=4)
Planning time: 0.111 ms
Execution time: 238.785 ms

Just to confirm, re-run it with parallelism turned off:

SET max_parallel_workers_per_gather = 0;

EXPLAIN ANALYZE 
  SELECT Sum(ST_Area(geom)) 
    FROM pd 

Back to one thread and taking about 3 times as long, as expected.

Scans

The simplest spatial parallel scan adds a spatial function to the filter clause.

SET max_parallel_workers = 8;
SET max_parallel_workers_per_gather = 8;

EXPLAIN ANALYZE 
  SELECT *
    FROM pd 
    WHERE ST_Area(geom) > 10000;    

Unfortunately, that does not give us a parallel plan.

The ST_Area() function is defined with a COST of 10. If we move it up, to 100, we can get a parallel plan.

SET max_parallel_workers_per_gather = 8;

ALTER FUNCTION ST_Area(geometry) COST 100;
EXPLAIN ANALYZE 
  SELECT *
    FROM pd 
    WHERE ST_Area(geom) > 10000;    

Boom! Parallel scan with three workers:

Gather  
(cost=1000.00..20544.33 rows=23178 width=2554) 
(actual time=0.253..293.016 rows=62158 loops=1)
Workers Planned: 5
Workers Launched: 5
->  Parallel Seq Scan on pd  
    (cost=0.00..17226.53 rows=4636 width=2554) 
    (actual time=0.091..210.581 rows=10360 loops=6)
     Filter: (st_area(geom) > '10000'::double precision)
     Rows Removed by Filter: 1229
Planning time: 0.128 ms
Execution time: 302.600 ms

It appears our spatial function costs may still be too low in general to get good planning. And as we will see with joins, it’s possible the planner is still discounting function costs too much in deciding whether to go parallel or not.

Joins

Starting with a simple join of all the polygons to the 100 points-per-polygon table, we get:

SET max_parallel_workers_per_gather = 4;

EXPLAIN  
 SELECT *
  FROM pd 
  JOIN pts_100 pts
  ON ST_Intersects(pd.geom, pts.geom);

PDs & Points

In order to give the PostgreSQL planner a fair chance, I started with the largest table, thinking that the planner would recognize that a “70K rows against 7M rows” join could use some parallel love, but no dice:

Nested Loop  
(cost=0.41..13555950.61 rows=1718613817 width=2594)
 ->  Seq Scan on pd  
     (cost=0.00..14271.34 rows=69534 width=2554)
 ->  Index Scan using pts_gix on pts  
     (cost=0.41..192.43 rows=232 width=40)
       Index Cond: (pd.geom && geom)
       Filter: _st_intersects(pd.geom, geom)

There are a number of knobs we can press on. There are two global parameters:

  • parallel_setup_cost defaults to 1000, but no amount of lowering the value, even to zero, causes a parallel plan.
  • parallel_tuple_cost defaults to 0.1. Reducing it by a factor of 100, to 0.001 causes the plan to flip over into a parallel plan.
SET parallel_tuple_cost = 0.001;

As with all parallel plans, it is a nested loop, but that’s fine since all PostGIS joins are nested loops.

Gather  (cost=0.28..4315272.73 rows=1718613817 width=2594)
Workers Planned: 4
->  Nested Loop  
    (cost=0.28..2596658.92 rows=286435636 width=2594)
     ->  Parallel Seq Scan on pts_100 pts  
         (cost=0.00..69534.00 rows=1158900 width=40)
     ->  Index Scan using pd_geom_idx on pd  
         (cost=0.28..2.16 rows=2 width=2554)
           Index Cond: (geom && pts.geom)
           Filter: _st_intersects(geom, pts.geom)

Running the parallel plan to completion on the 700K point table takes 18s with four workers and 53s with a sequential plan. We are not getting an optimal speed up from parallel processing anymore: four workers are completing in 1/3 of the time instead of 1/4.

If we set parallel_setup_cost and parallel_tuple_cost back to their defaults, we can also change the plan by fiddling with the function costs.

First, note that our query can be re-written like this, to expose the components of the spatial join:

SET parallel_tuple_cost=0.1;
SET parallel_setup_cost=1000;
SET max_parallel_workers_per_gather = 4;

EXPLAIN  
 SELECT *
  FROM pd 
  JOIN pts_100 pts
   ON pd.geom && pts.geom 
   AND _ST_Intersects(pd.geom, pts.geom);

The default cost of _ST_Intersects() is 100. If we adjust it up by a factor of 100, we can get a parallel plan.

ALTER FUNCTION _ST_Intersects(geometry, geometry) COST 10000;

However, what if our query only used a single spatial operator in the join filter? Can we still force a parallel plan on this query?

SET parallel_tuple_cost=0.1;
SET parallel_setup_cost=1000;
SET max_parallel_workers_per_gather = 4;

EXPLAIN  
 SELECT *
  FROM pd 
  JOIN pts_100 pts
  ON pd.geom && pts.geom;

The && operator could activate one of two functions:

  • geometry_overlaps(geom, geom) is bound to the && operator
  • geometry_gist_consistent_2d(internal, geometry, int4) is bound to the 2d spatial index

However, no amount of increasing their COST causes the operator-only query plan to flip into a parallel mode:

ALTER FUNCTION  geometry_overlaps(geometry, geometry) COST 1000000000000;
ALTER FUNCTION  geometry_gist_consistent_2d(internal, geometry, int4) COST 10000000000000;

So for operator-only queries, it seems the only way to force a spatial join is to muck with the parallel_tuple_cost parameter.

More Joins

Can we parallelize a common GIS use case: the spatial overlay?

Shifted PDs

Here is a table that simply shifts the polling divisions up and over, so that they can be overlaid to create a new set of smaller polygons.

CREATE TABLE pd_translate AS 
SELECT ST_Translate(geom, 100, 100) AS geom, 
    fed_num, pd_num 
  FROM pd;
  
CREATE INDEX pd_translate_gix 
  ON pd_translate USING GIST (geom);
CREATE INDEX pd_fed_num_x 
  ON pd (fed_num);
CREATE INDEX pdt_fed_num_x 
  ON pd_translate (fed_num);

The overlay operation finds, for each geometry on one side, all the overlapping geometries, and then calculates the shape of those overlaps (the “intersection” of the pair). Calculating intersections is expensive, so it’s something want to happen in parallel, even more than we want the join to happen in parallel.

This query calculates the overlay of all polling divisions (and their translations) in British Columbia (fed_num > 59000):

EXPLAIN 
SELECT ST_Intersection(pd.geom, pdt.geom) AS geom
  FROM pd
  JOIN pd_translate pdt
  ON ST_Intersects(pd.geom, pdt.geom)
  WHERE pd.fed_num > 59000
  AND pdt.fed_num > 59000;

Unfortunately, the default remains a non-parallel plan. The parallel_tuple_cost has to be adjusted down to 0.01 or the cost of _ST_Intersects() adjusted upwards to get a parallel plan.

Conclusions

  • The costs assigned to PostGIS functions still do not provide the planner a good enough guide to determine when to invoke parallelism. Costs assigned currently vary widely without any coherent reasons.
  • The planner behaviour on spatial joins remains hard to predict: is the deciding factor the join operator cost, the number of rows of resultants, or something else altogether? Counter-intuitively, it was easier to get join behaviour from a relatively small 6K x 6K polygon/polygon overlay join than it was for the 70K x 7M point/polygon overlay.

October 31, 2017 08:00 PM

October 28, 2017

Anita Graser (Underdark)

Movement data in GIS #10: open tools for AIS tracks from MarineCadastre.gov

MarineCadastre.gov is a great source for AIS data along the US coast. Their data formats and tools though are less open. Luckily, GDAL – and therefore QGIS – can read ESRI File Geodatabases (.gdb).

They also offer a Track Builder script that creates lines out of the broadcast points. (It can also join additional information from the vessel and voyage layers.) We could reproduce the line creation step using tools such as Processing’s Point to path. But this post will show how to create PostGIS trajectories instead.

First, we have to import the points into PostGIS using either DB Manager or Processing’s Import into PostGIS tool:

Then we can create the trajectories. I’ve opted to create a materialized view:

The first part of the query creates a temporary table called ptm (short for PointM). This step adds time stamp information to each point. The second part of the query then aggregates these PointMs into trajectories of type LineStringM.

CREATE MATERIALIZED VIEW ais.trajectory AS
 WITH ptm AS (
   SELECT b.mmsi,
     st_makepointm(
       st_x(b.geom), 
       st_y(b.geom), 
       date_part('epoch', b.basedatetime)
     ) AS pt,
     b.basedatetime t
   FROM ais.broadcast b
   ORDER BY mmsi, basedatetime
 )
 SELECT row_number() OVER () AS id,
   st_makeline(ptm.pt) AS st_makeline,
   ptm.mmsi,
   min(ptm.t) AS min_t,
   max(ptm.t) AS max_t
 FROM ptm
 GROUP BY ptm.mmsi
WITH DATA;

The trajectory start and end times (min_t and max_t) are optional but they can help speed up future queries.

One of the advantages of creating trajectory lines is that they render many times faster than the original points.

Of course, we end up with some artifacts at the border of the dataset extent. (Files are split by UTM zone.) Trajectories connect the last known position before the vessel left the observed area with the position of reentry. This results, for example, in vertical lines which you can see in the bottom left corner of the above screenshot.

With the trajectories ready, we can go ahead and start exploring the dataset. For example, we can visualize trajectory speed and/or create animations:

Purple trajectory segments are slow while green segments are faster

We can also perform trajectory analysis, such as trajectory generalization:

This is a first proof of concept. It would be great to have a script that automatically fetches the datasets for a specified time frame and list of UTM zones and loads them into PostGIS for further processing. In addition, it would be great to also make use of the information in the vessel and voyage tables, thus splitting up trajectories into individual voyages.


Read more:


by underdark at October 28, 2017 01:22 PM

October 26, 2017

Michal Zimmermann

QGIS Plugin Development: Getting Started

QGIS 2.1x is a brilliant tool for Python-based automation in form of custom scripts or even plugins. The first steps towards writing the custom code might be a bit difficult, as you need to grasp quite complex Python API. The QGIS Plugin Development series (see the list of other parts at the end of this article) targets pitfalls and traps I’ve met while learning to use it myself.

The outcome of the series is going to be a fully functional custom plugin capable of writing attribute values from a source layer nearest neighbour to a target layer based on their spatial proximity.

In this part, I’ll mention the basics a.k.a. what is good to know before you start.

Documentation

Different QGIS versions come with different Python API. The documentation is to be found at https://qgis.org, the latest being version 2.18. Note that if you come directly to http://qgis.org/api/, you’ll see the current master docs.

Alternatively, you can apt install qgis-api-doc on your Ubuntu-based system and run python -m SimpleHTTPServer [port] inside /usr/share/qgis/doc/api. You’ll find the documentation at http://localhost:8000 (if you don’t provide port number) and it will be available even when you’re offline.

Basic API objects structure

Before launching QGIS, take a look at what’s available inside API:

  • qgis.core package brings all the basic objects like QgsMapLayer, QgsDataSourceURI, QgsFeature etc
  • qgis.gui package brings GUI elements that can be used within QGIS like QgsMessageBar or QgsInterface (very important API element, exposed to all custom plugins)
  • qgis.analysis, qgis.networkanalysis, qgis.server, and qgis.testing packages that won’t be covered in the series
  • qgis.utils module that comes with iface exposed (very handy within QGIS Python console)

QGIS Python Console

Using Python console is the easiest way to automate your QGIS workflow. It can be accessed via pressing Ctrl + Alt + P or navigating to Plugins -> Python Console. Note the above mentioned iface from qgis.utils module is exposed by default within the console, letting you interact with QGIS GUI. Try out the following examples.

iface.mapCanvas().scale() # returns the current map scale
iface.mapCanvas().zoomScale(100) # zoom to scale of 1:100
iface.activeLayer().name() # get the active layer name
iface.activeLayer().startEditing() # toggle editting

That was a very brief introduction to QGIS API, the next part will walk you through the console more thoroughly.

by Michal Zimmermann at October 26, 2017 01:00 PM

October 23, 2017

Michal Zimmermann

Serving Mapbox Vector Tiles with PostGIS, Nginx and Python Backend

Since version 2.4.0, PostGIS can serve MVT data directly. MVT returning queries put heavy workload on the database though. On top of that, each of the query has to be run again every time a client demands the data. This leaves us with plenty of room to optimize the process.

During the last week, while working on the Czech legislative election data visualization, I’ve struggled with the server becoming unresponsive far too often due to the issues mentioned above.

According to the schema, the first client to come to the server:

  • goes through filesystem unstopped, because there are no cached files yet,
  • continues to the Flask backend and asks for a file at {z}/{x}/{y},
  • Flask backend asks the database to return the MVT for the given tile,
  • Flask backend writes the response to the filesystem and sends it to the client.

Other clients get tiles directly from the filesystem, leaving the database at ease.

Nginx

Nginx is fairly simple to set up, once you know what you’re doing. The /volby-2017/municipality/ location serves static MVT from the given alias directory. If not found, the request is passed to @postgis location, that asks the Flask backend for the response.

server election {
    location /volby-2017/municipality {
            alias /opt/volby-cz-2017/server/cache/;
            try_files $uri @postgis;
    }

    location @postgis {
            include uwsgi_params;
            uwsgi_pass 127.0.0.1:5000;
    }
}

Flask backend

Generating static MVT in advance

If you’re going to serve static tiles that don’t change often, it might be a good idea to use PostGIS to create files in advance and serve them with Nginx.

CREATE TABLE tiles (
    x integer,
    y integer,
    z integer,
    west numeric,
    south numeric,
    east numeric,
    north numeric,
    geom geometry(POLYGON, 3857)
);

Using mercantile, you can create the tiles table holding the bounding boxes of the tiles you need. PostGIS them inserts the actual MVT into the mvt table.

CREATE TEMPORARY TABLE tmp_tiles AS
    SELECT
        muni.muni_id,
        prc.data,
        ST_AsMVTGeom(
            muni.geom,
            TileBBox(z, x , y, 3857),
            4096,
            0,
            false
        ) geom,
        x,
        y,
        z
    FROM muni
    JOIN (
        SELECT
            x,
            y,
            z,
            geom
        FROM tiles
    ) bbox ON (ST_Intersects(muni.geom, bbox.geom))
    JOIN party_results_cur prc ON (muni.muni_id = prc.muni_id);
CREATE TABLE mvt (mvt bytea, x integer, y integer, z integer);
DO
$$
DECLARE r record;
BEGIN
FOR r in SELECT DISTINCT x, y, z FROM tmp_tiles LOOP
    INSERT INTO mvt
    SELECT ST_AsMVT(q, 'municipality', 4096, 'geom'), r.x, r.y, r.z
    FROM (
        SELECT
            muni_id,
            data,
            geom
        FROM tmp_tiles
        WHERE (x, y, z) = (r)
    ) q;
    RAISE INFO '%', r;
END LOOP;
END;
$$;

Once filled, the table rows can be written to the filesystem with the simple piece of Python code.

#!/usr/bin/env python

import logging
import os
import time
from sqlalchemy import create_engine, text

CACHE_PATH="cache/"

e = create_engine('postgresql:///')
conn = e.connect()
sql=text("SELECT mvt, x, y, z FROM mvt")
query = conn.execute(sql)
data = query.cursor.fetchall()

for d in data:
    cachefile = "{}/{}/{}/{}".format(CACHE_PATH, d[3], d[1], d[2])
    print(cachefile)

    if not os.path.exists("{}/{}/{}".format(CACHE_PATH, d[3], d[1])):
        os.makedirs("{}/{}/{}".format(CACHE_PATH, d[3], d[1]))

    with open(cachefile, "wb") as f:
        f.write(bytes(d[0]))

Conclusion

PostGIS is a brilliant tool for generating Mapbox vector tiles. Combined with Python powered static file generator and Nginx, it seems to become the only tool needed to get you going.

by Michal Zimmermann at October 23, 2017 02:00 PM

October 18, 2017

PostGIS Development

PostGIS Patch Releases

The PostGIS development team has uploaded bug fix releases for the 2.2, 2.3 and 2.4 stable branches.

2.2.6

2.3.4

2.4.1

by Paul Ramsey at October 18, 2017 12:00 AM

October 15, 2017

Boston GIS (Regina Obe, Leo Hsu)

Using pg_upgrade to upgrade PostGIS without installing an older version of PostGIS

PostGIS releases a new minor version of PostGIS every one or two years. Each minor version of postgis has a different libname suffix. In PostGIS 2.1 you'll find files in your PostgreSQL lib folder called postgis-2.1.*, rtpostgis-2.1.*, postgis-topology-2.1.*, address-standardizer-2.1.* etc. and in a PostGIS 2.2 you'll find similar files but with 2.2 in the name. I believe PostGIS and pgRouting are the only extensions that stamp the lib with a version number. Most other extensions you will find are just called extension.so e.g. hstore is always called hstore.dll /hstore.so even if the version changed from 9.6 to 10. On the bright side this allows people to have two versions of PostGIS installed in a PostgreSQL cluster, though a database can use at most one version. So you can have an experimental database running a very new or unreleased version of PostGIS and a production database running a more battery tested version.

On the sad side this causes a lot of PostGIS users frustration trying to use pg_upgrade to upgrade from an older version of PostGIS/PostgreSQL to a newer version of PostGIS/PostgreSQL; as their pg_upgrade often bails with a message in the loaded_libraries.txt log file something to the affect:

could not load library "$libdir/postgis-2.2": ERROR:  could not access file "$libdir/postgis-2.2": No such file or directory
could not load library "$libdir/postgis-2.3": ERROR:  could not access file "$libdir/postgis-2.3": No such file or directory

This is also a hassle because we generally don't support a newer version of PostgreSQL on older PostGIS installs because the PostgreSQL major version changes tend to break our code often and backporting those changes is both time-consuming and dangerous. For example the DatumGetJsonb change and this PostgreSQL 11 crasher we haven't isolated the cause of yet. There are several changes like this that have already made the PostGIS 2.4.0 we released recently incompatible with the PostgreSQL 11 head development.

Continue reading "Using pg_upgrade to upgrade PostGIS without installing an older version of PostGIS"

by Regina Obe (nospam@example.com) at October 15, 2017 05:11 AM

October 04, 2017

Oslandia

Refresh your maps FROM postgreSQL !

Continuing our love story with PostgreSQL and QGIS, we asked QGIS.org a grant application during early 2017 spring.

The idea was to take benefit of very advanced PostgreSQL features, that probably never were used in a Desktop GIS client before.

Today, let’s see what we can do with the PostgreSQL NOTIFY feature!

Ever dreamt of being able to trigger things from outside QGIS? Ever wanted a magic stick to trigger actions in some clients from a database action?

X All The Y Meme | REFRESH QGIS FROM THE DATABASE !!! | image tagged in memes,x all the y | made w/ Imgflip meme maker

 

NOTIFY is a PostgreSQL specific feature allowing to generate notifications on a channel and optionally send a message — a payload in PG’s dialect .

In short, from within a transaction, we can raise a signal in a PostgreSQL queue and listen to it from a client.

In action

We hardcoded a channel named “qgis” and made QGIS able to LISTEN to NOTIFY events and transform them into Qt’s signals. The signals are connected to layer refresh when you switch on this rendering option.

Optionnally, adding a message filter will only redraw the layer for some specific events.

This mechanism is really versatile and we now can imagine many possibilities, maybe like trigger a notification message to your users from the database, interact with plugins, or even code a chat between users of the same database  (ok, this is stupid) !

 

More than just refresh layers?

The first implementation we chose was to trigger a layer refresh because we believe this is a good way for users to discover this new feature.

But QGIS rocks hey, doing crazy things for limited uses is not the way.

Thanks to feedback on the Pull Request, we added the possibility to trigger layer actions on notification.

That should be pretty versatile since you can do almost anything with those actions now.

Caveats

QGIS will open a permanent connection to PostgreSQL to watch the notify signals. Please keep that in mind if you have several clients and a limited number of connections.

Notify signals are only transmitted with the transaction, so when the COMMIT is raised. So be aware that this might not help you if users are inside an edit session.

QGIS has a lot of different caches, for attribute table for instance. We currently have no specific way to invalidate a specific cache, and then order QGIS to refresh it’s attribute table.

There is no way in PG to list all channels of a database session, that’s why we couldn’t propose a combobox list of available signals in the renderer option dialog. Anyway, to avoid too many issues, we decided to hardcode the channel name in QGIS with the name “qgis”. If this is somehow not enough for your needs, please contact us!

Conclusion

The github pull request is here : https://github.com/qgis/QGIS/pull/5179

We are convinced this would be really useful for real time application, let us know if that makes some bells ring on your side!

More to come soon, stay tuned!

 

 

by Régis Haubourg at October 04, 2017 01:09 PM

Oslandia

Undo Redo stack is back QGIS Transaction groups

Let’s keep on looking at what we did in QGIS.org grant application of early 2017 spring.

At Oslandia, we use a lot the transaction groups option of QGIS. It was an experimental feature in QGIS 2.X allowing to open only one common Postgres transaction for all layers sharing the same connection string.

Transaction group option

When activated, that option will bring many killer features:

  • Users can switch all the layers in edit mode at once. A real time saver.
  • Every INSERT, UPDATE or DELETE is forwarded immediately to the database, which is nice for:
    • Evaluating on the fly if database constraints are satisfied or not. Without transaction groups this is only done when saving the edits and this can be frustrating to create dozens of features and having one of them rejected because of a foreign key constraint…
    • Having triggers evaluated on the fly.  QGIS is so powerful when dealing with “thick database” concepts that I would never go back to a pure GIS ignoring how powerful databases can be !
    • Playing with QgsTransaction.ExecuteSQL allows to trigger stored procedures in PostgreSQL in a beautiful API style interface. Something like
SELECT invert_pipe_direction('pipe1');
  • However, the implementation was flagged “experimental” because some caveats where still causing issues:
    • Committing on the fly was breaking the logic of the undo/redo stack. So there was no way to do a local edit. No Ctrl+Z!  The only way to rollback was to stop the edit session and loose all the work. Ouch.. Bad!
    • Playing with ExecuteSQL did not dirty the QGIS edit buffer. So, if during an edit session no edit action was made using QGIS native tools, there was no clean way to activate the “save edits” icon.
    • When having some failures in the triggers, QGIS may loose DB connection and thus create a silent ROLLBACK.

We decided to try to restore the undo/redo stack by saving the history edits in PostgreSQL SAVEPOINTS and see if we could restore the original feature in QGIS.

And.. it worked!

Let’s see that in action:

 

Potential caveats ?

At start, we worried about how heavy all those savepoints would be for the database. It turns out that maybe for really massive geometries, and heavy editing sessions, this could start to weight a bit, but honestly far away from PostgreSQL capabilities.

 

Up to now, we didn’t really find any issue with that..

And we didn’t address the silent ROLLBACK that occurs sometimes, because it is generated by buggy stored procedures, easy to solve.

Some new ideas came to us when working in that area. For instance, if a transaction locks a feature, QGIS just… wait for the lock to be released. I think we should find a way to advertise those locks to the users, that would be great! If you’re interested in making that happen, please contact us.

 

More to come soon, stay tuned!

 

 

by Régis Haubourg at October 04, 2017 01:08 PM

September 30, 2017

PostGIS Development

PostGIS 2.4.0 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0. Best served with PostgreSQL 10rc1 and pgRouting 2.5.0. See the full list of changes in the news file.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
ALTER EXTENSION pgrouting UPDATE;

In order to have Map Box Vector Tiles support enabled, you’ll need to compile with protobuf support and pkg-config to verify the correct minimum version of protobuf-c see protobuf for details. ST_FrechetDistance function will not be enabled if PostGIS is compiled with lower than GEOS 3.7.0. GEOS 3.7.0 is not released yet but is expected sometime next month.

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 30, 2017 12:00 AM

September 27, 2017

PostGIS Development

PostGIS 2.4.0rc3 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0rc3. Best served with PostgreSQL 10rc1 and pgRouting 2.5.0. See the full list of changes in the news file.

This will be the final rc before we have our 2.4.0 release.

We encourage everyone to test and in particular package maintainers to ensure no surprises at final release time.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
ALTER EXTENSION pgrouting UPDATE;
Continue Reading by clicking title hyperlink ..

by Regina Obe at September 27, 2017 12:00 AM

September 24, 2017

Boston GIS (Regina Obe, Leo Hsu)

PostGIS db help and manual in different languages

One of the things I'm most happy about with upcoming PostGIS 2.4.0, due out in about a week is that it is the first version to have almost complete translations into different languages. The Japanese, German, Portugese, and Korean translations are more than 80% complete with Japanese being 96%. You can download the html manuals from PostGIS docs page. Thwre are PDFs for non-Asian languages. Japanese and Korean languages I'm still having issue generating the pdfs.

When you install PostGIS with CREATE EXTENSION postgis;, it also installs the accompanying help extracted from the manual in English format.

The comment generator we have in place is just as happy working with translated docs as it is with the English one so the in db help documents can also be generated in other languages. The help files are located: Japanese, German, Portugese, and Korean

Continue reading "PostGIS db help and manual in different languages"

by Regina Obe (nospam@example.com) at September 24, 2017 09:54 AM

PostGIS Development

PostGIS 2.4.0rc2 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0rc2. Best served with PostgreSQL 10rc1 and pgRouting 2.5.0. See the full list of changes in the news file.

We encourage everyone to test and in particular package maintainers to insure no surprises at final release time.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
ALTER EXTENSION pgrouting UPDATE;
Continue Reading by clicking title hyperlink ..

by Regina Obe at September 24, 2017 12:00 AM

September 22, 2017

Michal Zimmermann

PostgreSQL Dollar Quoting inside Bash Heredoc

Yesterday I spent two very unpleasant hours debugging the weirdest SQL error I’ve seen in my life, running the below query (simplified for this post).

psql -qAt --no-psqlrc <<BACKUP
DO
$$
DECLARE r record;
BEGIN
  RAISE INFO '%', 'info';
END
$$;
BACKUP

Running this in your terminal will result in a nasty syntax error.

ERROR:  syntax error at or near "1111"
LINE 2: 1111
        ^
ERROR:  syntax error at or near "RAISE"
LINE 2:   RAISE INFO '%', 'info';
          ^
ERROR:  syntax error at or near "1111"
LINE 2: 1111;

You stare on the screen for a while, absolutely sure that number 1111 is nowhere close to the data you work with. You try again. Another error. You save the code into a file and try again. It works. What the heck? You try again using the bash heredoc. Another failure.

The minute you realize $$ is being substituted with the ID of the current process, you feel like the dumbest person on Earth. Yet the happiest one at the same time.

The solution is trivial.

psql -qAt --no-psqlrc <<BACKUP
DO
\$\$
DECLARE r record;
BEGIN
  RAISE INFO '%', 'info';
END
\$\$;
BACKUP

by Michal Zimmermann at September 22, 2017 06:30 PM

September 19, 2017

PostGIS Development

PostGIS 2.1.9 Released

The PostGIS development team has uploaded the final release of the PostGIS 2.1 branch. The 2.1 branch is now end-of-life. As befits a patch release, the focus is on bugs and breakages.

Continue Reading by clicking title hyperlink ..

by Paul Ramsey at September 19, 2017 12:00 AM

September 14, 2017

Paul Ramsey

PostGIS Operators in 2.4

TL;DR: If you are using ORDER BY or GROUP BY on geometry columns in your application and you have expectations regarding the order or groups you obtain, beware that PostGIS 2.4 changes the behaviour or ORDER BY and GROUP BY. Review your applications accordingly.


The first operators we learn about in elementary school are =, > and <, but they are the operators that are the hardest to define in the spatial realm.

PostGIS Operators in 2.4

When is = equal?

For example, take “simple” equality. Are geometry A and B equal? Should be easy, right?

But are we talking about:

  1. A and B have exactly the same vertices in the same order and with the same starting points?
  2. A and B have exactly the same vertices in any order? (see ST_OrderingEquals)
  3. A and B have the same vertices in any order but different starting points?
  4. A has some extra vertices that B does not, but they cover exactly the same area in space? (see ST_Equals)
  5. A and B have the same bounds?

Confusingly, for the first 16 years of its existence, PostGIS used definition 5, “A and B have the same bounds” when evaluating the = operator for two geometries.

However, for PostGIS 2.4, PostGIS will use definition 1: “A and B have exactly the same vertices in the same order and with the same starting points”.

Why does this matter? Because the behavour of the SQL GROUP BY operation is bound to the “=” operator: when you group by a column, an output row is generated for all groups where every item is “=” to every other item. With the new definition in 2.4, the semantics of GROUP BY should be more “intuitive” when used against geometries.

What is > and <?

Greater and less than are also tricky in the spatial domain:

  • Is POINT(0 0) less than POINT(1 1)? Sort of looks like it, but…
  • Is POINT(0 0) less than POINT(-1 1) or POINT(1 -1)? Hm, that makes the first one look less obvious…

Greater and less than are concepts that make sense for 1-dimensional values, but not for higher dimensions. The “>” and “<” operators have accordingly been an ugly hack for a long time: they compared the minima of the bounding boxes of the two geometries.

  • If they were sortable using the X coordinate of the minima, that was the sorting returned.
  • If they were equal in X, then the Y coordinate of the minima was used.
  • Etc.

This process returned a sorted order, but not a very satisfying one: a “good” sorting would tend to place objects that are near to each other in space, near to each other in the sorted set.

Here’s what the old sorting looked like, applied to world populated places:

Geometry sorting in PostGIS 2.3

The new sorting system for PostGIS 2.4 calculates a very simple “morton key” using the center of the bounds of a feature, keeping things simple for performance reasons. The result is a sorted order that tends to keep spatially nearby features closer together in the sorted set.

Geometry sorting in PostGIS 2.4

Just as the “=” operator is tied to the SQL GROUP BY operation, the “>” and “<” operators are tied to the SQL ORDER BY operation. The pictures above were created by generating a line string from the populated places points as follows:

CREATE TABLE places_line AS 
SELECT ST_MakeLine(geom ORDER BY geom) AS geom 
FROM places;

September 14, 2017 04:00 PM

September 13, 2017

PostGIS Development

PostGIS 2.4.0rc1 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0rc1. Best served with PostgreSQL 10beta4 or rc1 which is coming soon and pgRouting 2.5.0rc 2.5.0 release is eminent. See the full list of changes in the news file.

We encourage everyone to test and in particular package maintainers to insure no surprises at final release time.

IMPORTANT NOTES

If you are upgrading from an existing PostGIS install, make sure after installing PostGIS 2.4.0rc1 binaries to do.

ALTER EXTENSION postgis UPDATE;

— if you have additional postgishy extensions below upgrade them too

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;
--pgRouting 2.5.0 is imminent
ALTER EXTENSION pgrouting UPDATE;

In order to have Map Box Vector Tiles support enabled, you’ll need to compile with protobuf support and pkg-config to verify the correct minimum version of protobuf-c see protobuf for details. ST_FrechetDistance function will not be enabled if PostGIS is compiled with lower than GEOS 3.7.0. GEOS 3.7.0 will probably not be released before PostGIS 2.4.0 is out.

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 13, 2017 12:00 AM

September 11, 2017

Anita Graser (Underdark)

Drive-time Isochrones from a single Shapefile using QGIS, PostGIS, and Pgrouting

This is a guest post by Chris Kohler .

Introduction:

This guide provides step-by-step instructions to produce drive-time isochrones using a single vector shapefile. The method described here involves building a routing network using a single vector shapefile of your roads data within a Virtual Box. Furthermore, the network is built by creating start and end nodes (source and target nodes) on each road segment. We will use Postgresql, with PostGIS and Pgrouting extensions, as our database. Please consider this type of routing to be fair, regarding accuracy, as the routing algorithms are based off the nodes locations and not specific addresses. I am currently working on an improved workflow to have site address points serve as nodes to optimize results. One of the many benefits of this workflow is no financial cost to produce (outside collecting your roads data). I will provide instructions for creating, and using your virtual machine within this guide.

Steps:–Getting Virtual Box(begin)–

Intro 1. Download/Install Oracle VM(https://www.virtualbox.org/wiki/Downloads)

Intro 2. Start the download/install OSGeo-Live 11(https://live.osgeo.org/en/overview/overview.html).

Pictures used in this workflow will show 10.5, though version 11 can be applied similarly. Make sure you download the version: osgeo-live-11-amd64.iso. If you have trouble finding it, here is the direct link to the download (https://sourceforge.net/projects/osgeo-live/files/10.5/osgeo-live-10.5-amd64.iso/download)
Intro 3. Ready for virtual machine creation: We will utilize the downloaded OSGeo-Live 11 suite with a virtual machine we create to begin our workflow. The steps to create your virtual machine are listed below. Also, here are steps from an earlier workshop with additional details with setting up your virtual machine with osgeo live(http://workshop.pgrouting.org/2.2.10/en/chapters/installation.html).

1.  Create Virutal Machine: In this step we begin creating the virtual machine housing our database.

Open Oracle VM VirtualBox Manager and select “New” located at the top left of the window.

VBstep1

Then fill out name, operating system, memory, etc. to create your first VM.

vbstep1.2

2. Add IDE Controller:  The purpose of this step is to create a placeholder for the osgeo 11 suite to be implemented. In the virtual box main window, right-click your newly-created vm and open the settings.

vbstep2

In the settings window, on the left side select the storage tab.

Find “adds new storage controller button located at the bottom of the tab. Be careful of other buttons labeled “adds new storage attachment”! Select “adds new storage controller button and a drop-down menu will appear. From the top of the drop-down select “Add IDE Controller”.

vbstep2.2

vbstep2.3

You will see a new item appear in the center of the window under the “Storage Tree”.

3.  Add Optical Drive: The osgeo 11 suite will be implemented into the virtual machine via an optical drive. Highlight the new controller IDE you created and select “add optical drive”.

vbstep3

A new window will pop-up and select “Choose Disk”.

vbstep3.2

Locate your downloaded file “osgeo-live 11 amd64.iso” and click open. A new object should appear in the middle window under your new controller displaying “osgeo-live-11.0-amd64.iso”.

vbstep3.3

Finally your virtual machine is ready for use.
Start your new Virtual Box, then wait and follow the onscreen prompts to begin using your virtual machine.

vbstep3.4

–Getting Virtual Box(end)—

4. Creating the routing database, and both extensions (postgis, pgrouting): The database we create and both extensions we add will provide the functions capable of producing isochrones.

To begin, start by opening the command line tool (hold control+left-alt+T) then log in to postgresql by typing “psql -U user;” into the command line and then press Enter. For the purpose of clear instruction I will refer to database name in this guide as “routing”, feel free to choose your own database name. Please input the command, seen in the figure below, to create the database:

CREATE DATABASE routing;

You can use “\c routing” to connect to the database after creation.

step4

The next step after creating and connecting to your new database is to create both extensions. I find it easier to take two-birds-with-one-stone typing “psql -U user routing;” this will simultaneously log you into postgresql and your routing database.

When your logged into your database, apply the commands below to add both extensions

CREATE EXTENSION postgis;
CREATE EXTENSION pgrouting;

step4.2

step4.3

5. Load shapefile to database: In this next step, the shapefile of your roads data must be placed into your virtual machine and further into your database.

My method is using email to send myself the roads shapefile then download and copy it from within my virtual machines web browser. From the desktop of your Virtual Machine, open the folder named “Databases” and select the application “shape2pgsql”.

step5

Follow the UI of shp2pgsql to connect to your routing database you created in Step 4.

step5.2

Next, select “Add File” and find your roads shapefile (in this guide we will call our shapefile “roads_table”) you want to use for your isochrones and click Open.

step5.3

Finally, click “Import” to place your shapefile into your routing database.

6. Add source & target columns: The purpose of this step is to create columns which will serve as placeholders for our nodes data we create later.

There are multiple ways to add these columns into the roads_table. The most important part of this step is which table you choose to edit, the names of the columns you create, and the format of the columns. Take time to ensure the source & target columns are integer format. Below are the commands used in your command line for these functions.

ALTER TABLE roads_table ADD COLUMN "source" integer;
ALTER TABLE roads_table ADD COLUMN "target" integer;

step6

step6.2

7. Create topology: Next, we will use a function to attach a node to each end of every road segment in the roads_table. The function in this step will create these nodes. These newly-created nodes will be stored in the source and target columns we created earlier in step 6.

As well as creating nodes, this function will also create a new table which will contain all these nodes. The suffix “_vertices_pgr” is added to the name of your shapefile to create this new table. For example, using our guide’s shapefile name , “roads_table”, the nodes table will be named accordingly: roads_table_vertices_pgr. However, we will not use the new table created from this function (roads_table_vertices_pgr). Below is the function, and a second simplified version, to be used in the command line for populating our source and target columns, in other words creating our network topology. Note the input format, the “geom” column in my case was called “the_geom” within my shapefile:

pgr_createTopology('roads_table', 0.001, 'geom', 'id',
 'source', 'target', rows_where := 'true', clean := f)

step7

Here is a direct link for more information on this function: http://docs.pgrouting.org/2.3/en/src/topology/doc/pgr_createTopology.html#pgr-create-topology

Below is an example(simplified) function for my roads shapefile:

SELECT pgr_createTopology('roads_table', 0.001, 'the_geom', 'id')

8. Create a second nodes table: A second nodes table will be created for later use. This second node table will contain the node data generated from pgr_createtopology function and be named “node”. Below is the command function for this process. Fill in your appropriate source and target fields following the manner seen in the command below, as well as your shapefile name.

To begin, find the folder on the Virtual Machines desktop named “Databases” and open the program “pgAdmin lll” located within.

step8

Connect to your routing database in pgAdmin window. Then highlight your routing database, and find “SQL” tool at the top of the pgAdmin window. The tool resembles a small magnifying glass.

step8.2

We input the below function into the SQL window of pgAdmin. Feel free to refer to this link for further information: (https://anitagraser.com/2011/02/07/a-beginners-guide-to-pgrouting/)

CREATE TABLE node AS
   SELECT row_number() OVER (ORDER BY foo.p)::integer AS id,
          foo.p AS the_geom
   FROM (     
      SELECT DISTINCT roads_table.source AS p FROM roads_table
      UNION
      SELECT DISTINCT roads_table.target AS p FROM roads_table
   ) foo
   GROUP BY foo.p;

step8.3

  1.  Create a routable network: After creating the second node table from step 8,  we will combine this node table(node) with our shapefile(roads_table) into one, new, table(network) that will be used as the routing network. This table will be called “network” and will be capable of processing routing queries.  Please input this command and execute in SQL pgAdmin tool as we did in step 8. Here is a reference for more information:(https://anitagraser.com/2011/02/07/a-beginners-guide-to-pgrouting/)   

step8.2

 

CREATE TABLE network AS
   SELECT a.*, b.id as start_id, c.id as end_id
   FROM roads_table AS a
      JOIN node AS b ON a.source = b.the_geom
      JOIN node AS c ON a.target = c.the_geom;

step9.2

10. Create a “noded” view of the network:  This new view will later be used to calculate the visual isochrones in later steps. Input this command and execute in SQL pgAdmin tool.

CREATE OR REPLACE VIEW network_nodes AS 
SELECT foo.id,
 st_centroid(st_collect(foo.pt)) AS geom 
FROM ( 
  SELECT network.source AS id,
         st_geometryn (st_multi(network.geom),1) AS pt 
  FROM network
  UNION 
  SELECT network.target AS id, 
         st_boundary(st_multi(network.geom)) AS pt 
  FROM network) foo 
GROUP BY foo.id;

step10

11.​ Add column for speed:​ This step may, or may not, apply if your original shapefile contained a field of values for road speeds.

In reality a network of roads will typically contain multiple speed limits. The shapefile you choose may have a speed field, otherwise the discrimination for the following steps will not allow varying speeds to be applied to your routing network respectfully.

If values of speed exists in your shapefile we will implement these values into a new field, “traveltime“, that will show rate of travel for every road segment in our network based off their geometry. Firstly, we will need to create a column to store individual traveling speeds. The name of our column will be “traveltime” using the format: ​double precision.​ Input this command and execute in the command line tool as seen below.

ALTER TABLE network ADD COLUMN traveltime double precision;

step11

Next, we will populate the new column “traveltime” by calculating traveling speeds using an equation. This equation will take each road segments geometry(shape_leng) and divide by the rate of travel(either mph or kph). The sample command I’m using below utilizes mph as the rate while our geometry(shape_leng) units for my roads_table is in feet​. If you are using either mph or kph, input this command and execute in SQL pgAdmin tool. Below further details explain the variable “X”.

UPDATE network SET traveltime = shape_leng / X*60

step11.2

How to find X​, ​here is an example​: Using example 30 mph as rate. To find X, we convert 30 miles to feet, we know 5280 ft = 1 mile, so we multiply 30 by 5280 and this gives us 158400 ft. Our rate has been converted from 30 miles per hour to 158400 feet per hour. For a rate of 30 mph, our equation for the field “traveltime”  equates to “shape_leng / 158400*60″. To discriminate this calculations output, we will insert additional details such as “where speed = 30;”. What this additional detail does is apply our calculated output to features with a “30” value in our “speed” field. Note: your “speed” field may be named differently.

UPDATE network SET traveltime = shape_leng / 158400*60 where speed = 30;

Repeat this step for each speed value in your shapefile examples:

UPDATE network SET traveltime = shape_leng / X*60 where speed = 45;
UPDATE network SET traveltime = shape_leng / X*60 where speed = 55;

The back end is done. Great Job!

Our next step will be visualizing our data in QGIS. Open and connect QGIS to your routing database by right-clicking “PostGIS” in the Browser Panel within QGIS main window. Confirm the checkbox “Also list tables with no geometry” is checked to allow you to see the interior of your database more clearly. Fill out the name or your routing database and click “OK”.

If done correctly, from QGIS you will have access to tables and views created in your routing database. Feel free to visualize your network by drag-and-drop the network table into your QGIS Layers Panel. From here you can use the identify tool to select each road segment, and see the source and target nodes contained within that road segment. The node you choose will be used in the next step to create the views of drive-time.

12.Create views​: In this step, we create views from a function designed to determine the travel time cost. Transforming these views with tools will visualize the travel time costs as isochrones.

The command below will be how you start querying your database to create drive-time isochrones. Begin in QGIS by draging your network table into the contents. The visual will show your network as vector(lines). Simply select the road segment closest to your point of interest you would like to build your isochrone around. Then identify the road segment using the identify tool and locate the source and target fields.

step12

step12.2

Place the source or target field value in the below command where you see ​VALUE​, in all caps​.

This will serve you now as an isochrone catchment function for this workflow. Please feel free to use this command repeatedly for creating new isochrones by substituting the source value. Please input this command and execute in SQL pgAdmin tool.

*AT THE BOTTOM OF THIS WORKFLOW I PROVIDED AN EXAMPLE USING SOURCE VALUE “2022”

CREATE OR REPLACE VIEW "​view_name" AS 
SELECT di.seq, 
       di.id1, 
       di.id2, 
       di.cost, 
       pt.id, 
       pt.geom 
FROM pgr_drivingdistance('SELECT
     gid::integer AS id, 
     Source::integer AS source, 
     Target::integer AS target,                                    
     Traveltime::double precision AS cost 
       FROM network'::text, ​VALUE::bigint, 
    100000::double precision, false, false)
    di(seq, id1, id2, cost)
JOIN network_nodes pt ON di.id1 = pt.id;

step12.3

13.Visualize Isochrone: Applying tools to the view will allow us to adjust the visual aspect to a more suitable isochrone overlay.

​After creating your view, a new item in your routing database is created, using the “view_name” you chose. Drag-and-drop this item into your QGIS LayersPanel. You will see lots of small dots which represent the nodes.

In the figure below, I named my view “take1“.

step13

Each node you see contains a drive-time value, “cost”, which represents the time used to travel from the node you input in step 12’s function.

step13.2

Start by installing the QGIS plug-in Interpolation” by opening the Plugin Manager in QGIS interface.

step13.3

Next, at the top of QGIS window select “Raster” and a drop-down will appear, select “Interpolation”.

step13.4

 

A new window pops up and asks you for input.

step13.5

Select your “​view”​ as the​ vector layer​, select ​”cost​” as your ​interpolation attribute​, and then click “Add”.

step13.6

A new vector layer will show up in the bottom of the window, take care the type is Points. For output, on the other half of the window, keep the interpolation method as “TIN”, edit the ​output file​ location and name. Check the box “​Add result to project​”.

Note: decreasing the cellsize of X and Y will increase the resolution but at the cost of performance.

Click “OK” on the bottom right of the window.

step13.7

A black and white raster will appear in QGIS, also in the Layers Panel a new item was created.

step13.8

Take some time to visualize the raster by coloring and adjusting values in symbology until you are comfortable with the look.

step13.9

step13.10

14. ​Create contours of our isochrone:​ Contours can be calculated from the isochrone as well.

Find near the top of QGIS window, open the “Raster” menu drop-down and select Extraction → Contour.

step14

Fill out the appropriate interval between contour lines but leave the check box “Attribute name” unchecked. Click “OK”.

step14.2

step14.3

15.​ Zip and Share:​ Find where you saved your TIN and contours, compress them in a zip folder by highlighting them both and right-click to select “compress”. Email the compressed folder to yourself to export out of your virtual machine.

Example Isochrone catchment for this workflow:

CREATE OR REPLACE VIEW "2022" AS 
SELECT di.seq, Di.id1, Di.id2, Di.cost,                           
       Pt.id, Pt.geom 
FROM pgr_drivingdistance('SELECT gid::integer AS id,                                       
     Source::integer AS source, Target::integer AS target, 
     Traveltime::double precision AS cost FROM network'::text, 
     2022::bigint, 100000::double precision, false, false) 
   di(seq, id1, id2, cost) 
JOIN netowrk_nodes pt 
ON di.id1 = pt.id;

References: Virtual Box ORACLE VM, OSGeo-Live 11  amd64 iso, Workshop FOSS4G Bonn(​http://workshop.pgrouting.org/2.2.10/en/index.html​),


by ckohler4692 at September 11, 2017 08:01 PM

September 02, 2017

PostGIS Development

PostGIS 2.4.0beta1 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0beta1. PostGIS 2.4.0 will be the first version to support PostgreSQL 10. Best served with PostgreSQL 10beta4 and pgRouting 2.5.0beta See the full list of changes in the news file.

We encourage everyone to test and in particular package maintainers to insure no surprises at final release time.

IMPORTANT NOTES

From this point forward until release of PostGIS 2.4.0, no new functions will be added.
Only bug fixes will be addressed in future 2.4.0 betas and rcs.

In order to have Map Box Vector Tiles support enabled, you’ll need to compile with protobuf support and pkg-config to verify the correct minimum version of protobuf-c see protobuf for details. ST_FrechetDistance function will not be enabled if PostGIS is compiled with lower than GEOS 3.7.0. GEOS 3.7.0 will be released around the same time as PostGIS 2.4.0 and will have a beta release in a week.

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 02, 2017 12:00 AM

August 20, 2017

Boston GIS (Regina Obe, Leo Hsu)

geography type is not limited to earth

It's a little known fact that the PostGIS geography type since PostGIS 2.2 (well was introduced in 2.1 I think but had a bug in it until 2.1.4 or so), supports any spheroidal spatial reference system that measures in degrees. When an srid is not specified, it defaults to using 4326, Earth WGS 84 long lat.

Continue reading "geography type is not limited to earth"

by Regina Obe (nospam@example.com) at August 20, 2017 02:50 PM

August 09, 2017

Michal Zimmermann

PostgreSQL Development History Revealed with PostgreSQL

I spend a lot of time reading PostgreSQL docs. It occurred to me just a few weeks ago that those versioned manuals are great opportunity to get an insight into PostgreSQL development history. Using PostgreSQL, of course.

TOP 5 functions with the most verbose docs in each version

SELECT
    version,
    string_agg(func, ' | ' ORDER BY letter_count DESC)
FROM (
    SELECT
        version,
        func,
        letter_count,
        row_number() OVER (PARTITION BY version ORDER BY letter_count DESC)
    FROM postgresql_development.data
) a
WHERE row_number <= 10
GROUP BY version
ORDER BY version DESC

Seems like a huge comeback for CREATE TABLE.

VERSION 1st 2nd 3rd 4th 5th
10.0 CREATE TABLE ALTER TABLE REVOKE GRANT SELECT
9.6 REVOKE ALTER TABLE GRANT CREATE TABLE SELECT
9.5 REVOKE ALTER TABLE GRANT CREATE TABLE SELECT
9.4 REVOKE GRANT ALTER TABLE CREATE TABLE SELECT
9.3 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.2 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.1 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.0 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
8.4 REVOKE GRANT CREATE TABLE ALTER TABLE SELECT
8.3 REVOKE CREATE TABLE GRANT ALTER TABLE COMMENT
8.2 REVOKE CREATE TABLE GRANT ALTER TABLE SELECT
8.1 REVOKE CREATE TABLE GRANT ALTER TABLE SELECT
8 CREATE TABLE REVOKE GRANT SELECT ALTER TABLE
7.4 CREATE TABLE REVOKE ALTER TABLE GRANT SELECT
7.3 CREATE TABLE SELECT ALTER TABLE REVOKE GRANT
7.2 CREATE TABLE SELECT INTO SELECT ALTER TABLE CREATE TYPE
7.1 CREATE TABLE SELECT INTO SELECT CREATE TYPE ALTER TABLE
7.0 SELECT SELECT INTO CREATE TYPE CREATE TABLE COMMENT

Number of functions available in each version

SELECT
    version,
    count(func),
    sum(letter_count)
FROM postgresql_development.data
GROUP BY version ORDER BY version;

The most verbose docs in each version

SELECT DISTINCT ON (version)
    version,
    func,
    letter_count
FROM postgresql_development.data
ORDER BY version, letter_count DESC;

Poor REVOKE, the defeated champion.

VERSION FUNCTION LETTER COUNT
10 CREATE TABLE 3142
9.6 REVOKE 2856
9.5 REVOKE 2856
9.4 REVOKE 2856
9.3 REVOKE 2856
9.2 REVOKE 2856
9.1 REVOKE 2508
9 REVOKE 2502
8.4 REVOKE 2105
8.3 REVOKE 1485
8.2 REVOKE 1527
8.1 REVOKE 1312
8 CREATE TABLE 1251
7.4 CREATE TABLE 1075
7.3 CREATE TABLE 929
7.2 CREATE TABLE 929
7.1 CREATE TABLE 871
7 SELECT 450

CREATE TABLE docs evolution

SELECT
    version,
    letter_count
FROM postgresql_development.data
WHERE func = 'CREATE TABLE'
ORDER BY func, version;

Something’s going on in an upcoming 10.0 version.

All the data was obtained with the following Python script and processed inside the PostgreSQL database. Plots done with Bokeh, though I probably wouldn’t use it again, the docs site is absurdly sluggish and the info is just all over the place.

by Michal Zimmermann at August 09, 2017 05:00 PM

August 06, 2017

Michal Zimmermann

PostGIS as a Mapbox Vector Tiles generator

PostGIS 2.4.0 was released recently bringing the possibilities to generate Mapbox Vector Tiles without any third party tools. I got a shot at it with Node.js and docker. Even if it’s not as straightforward as solely using ST_AsMVT, it still looks pretty great.

Docker container

There are no Ubuntu or Debian based PostGIS 2.4.0 packages as far as I know. As installation from source (especially considering GIS software) is always a bit risky, I prefer using Docker to stay away from trouble. The image is based on Ubuntu 17.04, has PostgreSQL 9.6 and PostGIS 2.4.0 installed. It exposes port 5432 to the host, so you can access the database from the outside the container.

FROM ubuntu:17.04
RUN apt update
RUN apt install -y wget less systemd
RUN touch /etc/apt/sources.list.d/pgdg.list
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ zesty-pgdg main" > /etc/apt/sources.list.d/pgdg.list
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
RUN apt update
RUN apt -y install postgresql-9.6 postgresql-server-dev-9.6

USER postgres
RUN /usr/lib/postgresql/9.6/bin/pg_ctl -D /var/lib/postgresql/9.6/main -l /tmp/logfile start

USER root
RUN echo "host all  all    0.0.0.0/0  trust" >> /etc/postgresql/9.6/main/pg_hba.conf && \
    echo "listen_addresses='*'" >> /etc/postgresql/9.6/main/postgresql.conf


EXPOSE 5432
RUN apt install -y netcat build-essential libxml2 libxml2-dev libgeos-3.5.1 libgdal-dev gdal-bin libgdal20 libgeos-dev libprotobuf-c1 libprotobuf-c-dev libprotobuf-dev protobuf-compiler protobuf-c-compiler
RUN wget http://download.osgeo.org/postgis/source/postgis-2.4.0alpha.tar.gz
RUN tar -xvzf postgis-2.4.0alpha.tar.gz
RUN cd postgis-2.4.0alpha && ./configure && make && make install

USER postgres
RUN service postgresql start && psql -c "CREATE EXTENSION postgis"

USER root
COPY start.postgis.sh /start.postgis.sh
RUN chmod 0755 /start.postgis.sh

CMD ["/start.postgis.sh"]

start.postgis.sh file starts the database server and keeps it running forever.

#!/bin/bash

DATADIR="/var/lib/postgresql/9.6/main"
CONF="/etc/postgresql/9.6/main/postgresql.conf"
POSTGRES="/usr/lib/postgresql/9.6/bin/postgres"

su postgres sh -c "$POSTGRES -D $DATADIR -c config_file=$CONF" &
until nc -z localhost 5432;
do
    echo ...
    sleep 5
done
sleep 5 # just for sure
su - postgres -c "psql -c \"CREATE EXTENSION IF NOT EXISTS postgis\""
echo database up and running

wait $!

Data

I got a cadastre area dataset of the Czech Republic for testing, which contains ~ 13,000 polygons. The geometries should come in Web Mercator a.k.a. EPSG:3857 to work with MVT.

Vector tiles

I got a bit confused by the docs of ST_AsMVT and ST_AsMVTGeom. Especially the latter one took me a few hours to get it right. What is essential (I guess) about Mapbox Vector Tiles is that you have to abstract from the real world coordinates and start thinking inside the tile coordinates. What PostGIS does with ST_AsMVTGeom (and what any other MVT implemenation should do for you) is that it takes real world coordinates and put them inside a tile.

To make this work, you need to know every bounding box of every tile on every zoom level in a Web Mercator projection. Or you can use TileBBox procedure by Mapbox, if you wish.

The SQL query itself is pretty simple (this comes from an express route I’ll be discussing shortly).

SELECT ST_AsMVT('cadastre', 4096, 'geom', q)
FROM (
    SELECT
        code,
        name,
        ST_AsMVTGeom(
            geom,
            TileBBox(${req.params.z}, ${req.params.x}, ${req.params.y}, 3857),
            4096,
            0,
            false
        ) geom
    FROM cadastre_area
    WHERE ST_Intersects(geom, (SELECT ST_Transform(ST_MakeEnvelope($1, $2, $3, $4, $5), 3857)))
) q

When filled with proper arguments instead of placeholders, it returns a bytea.

\x1aa5dbd0070a047465737412e216120400000101180322d7160987913f8db38e01aa59160e2a010412012a0624060e001410420a1a00203b0a3914190e15085912010a0f0c0f06370804080a0e0e0234090e0

This can be consumed by a Leaflet map using Leaflet.VectorGrid plugin. To keep it short, the frontend code actually boils down to three lines of code.

var url = 'http://localhost:3000/mvt/{x}/{y}/{z}';
var cadastre = L.vectorGrid.protobuf(url);
map.addLayer(cadastre);

The server MVP is available as a GitHub gist.

by Michal Zimmermann at August 06, 2017 04:00 PM

August 05, 2017

PostGIS Development

PostGIS 2.4.0alpha Released

The PostGIS development team is pleased to announce the release of PostGIS 2.4.0alpha. This is the first version to support PostgreSQL 10. Best served with PostgreSQL 10beta2. See the full list of changes in the news file.

Continue Reading by clicking title hyperlink ..

by Regina Obe at August 05, 2017 12:00 AM

August 01, 2017

Boston GIS (Regina Obe, Leo Hsu)

Code Sprint in Boston after FOSS4G 2017

Reminder: Right after the Free and Open Source GIS conference in Boston is the OSGeo / LocationTech code sprint on Saturday August 19th 9AM-5PM at District Hall where project members from various Open Source Geospatial projects will be fleshing out ideas, documenting, coding, and introducing new folks to open source development. All are welcome including those who are unable to make the conference.

We are getting a final head-count this week to plan for food arrangements. If you are planning to attend, add your name to the list https://wiki.osgeo.org/wiki/FOSS4G_2017_Code_Sprint#Registered_Attendees. If you are unable to add your name to the list, feel free to send Regina an email at lr@pcorp.us with your name and projects you are interested in so I can add you to the list. Looking forward to hanging out with folks interested in PostgreSQL and PostGIS development.

District Hall is a gorgeous community space. Check out the District Hall View http://bit.ly/2f61J8c

by Regina Obe (nospam@example.com) at August 01, 2017 09:22 PM

July 31, 2017

BigSQL Holly

BigSQL + PostGIS: Reprojecting Your Spatial Data with ST_Transform

With our latest release of PostGIS (release 2.3.3), projection datum shift files have been included across all the platforms. You will find them located in: bigsql/pg96/share/postgresql/contrib/postgis-2.3/proj Why do you care? Well, this means you can easily reproject your PostGIS data using the ST_Transform command. > ST_Transform — Return a new geometry with its coordinates transformed to a different spatial reference.

Exercise Prerequisites

Previous knowledge of projections and spatial references are assumed for this tutorial. To learn more about projections, I recommend these resources:

Previous blog posts to help you get setup with PostGIS:

What we will do in this exercise

Suppose we would like to find all the subway entrances that are withing 220 meters (about 2 football fields as the crow flies) to the Empire State Building (located at 40.748855, -73.985773).

A good projection for measuring distances in NYC is based on the Transverse Mercator projection: UTM Zone 19N (srid 32619).

Unfortunately, the data available by the MTA, Subway Entrances, has a spatial reference of WSGS:84 (srid 4326). This is a good choice if you want a map which shows the whole world, but not for measuring distances within New York City.

So.. we need to:

  • import the shapefile into our PostGIS database
  • create a copy of the table and reproject it
  • select all subway entrances located within 220 meters of the Empire State building converted from lat, long (40.748855, -73.985773) to it’s location in UTM Zone 19N.

Import Shapefile to PostGIS

I’ve made it easier for you by providing the data for you to download here: subway_entrances.zip

Unzip the shapefile, navigate to the location of your subway_entrances.shp file, and run the following command:

shp2pgsql -I -s 4326 subway_entrances.shp subway_entrances | psql -U postgres -d postgis_sample

Connect to your database to make sure it was successfully imported. If using Windows, you may need to include -U postgres with the psql command:

psql 
postgres=# \connect postgis_sample;

See that subway_entrances table is in the database:

postgis_sample=# \dt

Now look at the attributes in subway_entrances table:

\d+ subway_entrances

Column Type Modifiers Storage Stats target Description
gid integer not null default nextval(‘subwayentrancesgid_seq’::regclass) plain
objectid numeric main
url character varying(254) extended
name character varying(254) extended
line character varying(254) extended
geom geometry(Point,4326) main

Reproject subway_entrances from srid 4326 to 32619

Create a table subway_entrances_utm that is a duplicate of subway_entrances:

CREATE TABLE subway_entrances_utm AS SELECT * FROM subway_entrances;

Reproject the new table to UTM Zone 19N (srid 32619):

ALTER TABLE subway_entrances_utm
    ALTER COLUMN geom TYPE geometry(POINT,32619) 
    USING ST_Transform(geom,32619);

Select stations 220 meters from Empire State Building

Run the following query composed of these PostGIS commands:

  • ST_DWithin
  • ST_Transform
  • ST_SetSRID
  • ST_MakePoint

    SELECT name, line
    FROM subway_entrances_utm
    WHERE ST_DWithin(geom,ST_Transform(ST_SetSRID(ST_MakePoint(-73.985773,40.748855),4326),32619),220);
    

Your result should show 7 rows, including one having no value for the name field:

name               |     line
---------------------------------+---------------
 6th Ave & 34th St at SE corner  | B-D-F-M-N-Q-R
 6th Ave & 34th St at NE corner  | B-D-F-M-N-Q-R
 Broadway & 32nd St at NE corner | B-D-F-M-N-Q-R
 Broadway & 32nd St at NW corner | B-D-F-M-N-Q-R
 6th Ave & 35th St at SE corner  | B-D-F-M-N-Q-R
 6th Ave & 35th St at NE corner  | B-D-F-M-N-Q-R
                                 | B-D-F-M-N-Q-R

If you want, you can create a new table, subway_entrances_empire, that only contains these subway stations:

CREATE TABLE subway_entrances_empire AS SELECT *
        FROM subway_entrances_utm
        WHERE ST_DWithin(geom,ST_Transform(ST_SetSRID(ST_MakePoint(-73.985773,40.748855),4326),32619),220);

by Holly Orr at July 31, 2017 10:18 PM

July 29, 2017

Boston GIS (Regina Obe, Leo Hsu)

Using OSGeoLive with VirtualBox

FOSS4G 2017 is just a few weeks away. Many of the workshops will utilize OSGeoLive as a means for workshop participants to get up and running quickly with OSGeo Free and Open Source GIS tools and Boston data. OSGeoLive11 is a LUbuntu 16.04 distribution. OSGeoLive11 is going thru the final stages of prep. You can download the OSGeoLiveRC1 ISO for it from http://aiolos.survey.ntua.gr/gisvm/11.0/osgeo-live-11.0rc1-amd64.iso.

Once OSGeoLive11 is fully prepped, it will be linked on the osgeolive site http://live.osgeo.org. If you want to run OSGeoLive11 from bootable media, you can burn the ISO to DVD or thumb drive and boot. In final prep, you can expect to have a Virtual Image ready to go you can host on Virtual Box or VMWare and make further customizations. OSGeoLive11 thumb drives will be handed out at the conference.

If you are doing any PostgreSQL/ PostGIS / pgRouting / or other GIS training, OSGeoLive is pretty handy. OSGeoLive11 contains PostgreSQL 9.5 (I know slightly dated) , PostGIS 2.3.2 and cli tools, pgRouting 2.4.1, and osm2pgrouting 2.2.0. In addition it contains popular GIS Desktop friends QGIS, OpenJump, gvSig, uDig as well as power tools like GRASS, R, GDAL CLI toolkit, and Jupyter notebooks. Mapping Servers MapServer and GeoServer. We'll be using pgRouting, osm2pgRouting, PostGIS, PostgreSQL, QGIS, and OpenJump in our workshop Problem Solving with pgRouting. A good chunk of FOSS GIS relies on PostgreSQL via PostGIS so you'll find a lot of already setup PostgreSQL databases on this disk.

For this set of exercises, we're going to go thru using the ISO media linked above on a Windows 7 VirtualBox setup. If you are using any other OS (e.g. Mac OSX, Linux, Unix), instructions should be much the same.

Continue reading "Using OSGeoLive with VirtualBox"

by Regina Obe (nospam@example.com) at July 29, 2017 07:55 AM

July 24, 2017

BigSQL Holly

PostGIS 2.3.3 is Now Available with BigSQL!

The good folks on the PostGIS development team recently the released of PostGIS 2.3.3:

As befits a patch release, the focus is on bugs and breakages. Best served with PostgreSQL 9.6.3+ and pgRouting 2.4.1.

See the full list of changes here.

And now… the new release is available in the BigSQL distribution!

If you haven’t installed the BigSQL distribution and setup your PostGIS database start here:
PostGIS – How to create a Spatial Database with PGC Command Line

If you have been using our distribution (good for you!) you can easily upgrade to the latest release with the following command lines:

./pgc update
./pgc upgrade postgis23-pg96

And then connect to database you want to upgrade and run the following command:

ALTER EXTENSION postgis UPDATE;

Now check to make sure your distribution has been updates:

./pgc list

You should see the following row listed for the postgis23-pg96 extension:

Category Component Version Stage ReleaseDt Status Cur? Updates
Extensions postgis23-pg96 2.3.3-1 2017-07-13 Installed 1

The version number consists of:

  • 2.3.3 is the release number from the PostGIS development community
  • -1 is the BigSQL release where we have made improvements our fixes to our distribution

Keep a lookout for our dash releases as we make more improvements to our internal builds.

And… in fact, stay-tuned for an upcoming -2 release the improves our ST_Transform and adds JSON-C (ST_GeomFromGeoJson) functionality to the mix.

by Holly Orr at July 24, 2017 11:37 PM

July 19, 2017

Michal Zimmermann

Fighting Raster GeoPackage with GDAL

As I’m still running Ubuntu 16.04 based Linux Mint, I have no access to GDAL 2.x repositories (except for ubuntugis, that I really don’t like to use). Provided with a GeoPackage raster file recently, I had to find a way to load it into QGIS, somehow. The solution is simple: Docker with gdal_translate.

Preparing the Docker container

I like using Docker for experiments that might leave the OS in an unexpected state (which is exactly what happens to me with ubuntugis repository whenever I use it. That’s why I don’t anymore.). A very simple Dockerfile keeps the troubles away from you.

FROM ubuntu:17.04
RUN apt update
RUN apt install -y gdal-bin

cd into the folder and build the image with docker build -t gdal .. Once ready, summon the daemon, run the container, mount the GeoPackage file to the container directory and you’re ready to rock.

docker run -v /path/to/geopackage:/home/ -it gdal

Raster GeoPackage to GeoTiff translation

With the container running, the raster GeoPackage to GeoTiff translation can be done easily with gdal_translate. Note I chose to cut the source file into tiles, because the gdal_translate was choking about the resulting size.

#!/bin/bash
SIZE=10000
ULX=-630000
ULY=-1135450
LRX=-560000
LRY=-1172479
COUNTER_X=0
COUNTER_Y=0

while [[ $ULX -lt $LRX ]]
do
    while [[ $ULY -gt $LRY ]]
    do
        echo $ULX, $(($ULX+$SIZE)), $ULY, $(($ULY-$SIZE))

        gdal_translate \
            -co TILED=YES \
            -co COMPRESS=DEFLATE \
            -co TFW=YES \
            -co NUM_THREADS=ALL_CPUS \
            -a_nodata 0 \
            -of GTiff \
            -projwin $ULX, $ULY, $(($ULX+$SIZE)), $(($ULY-$SIZE)) \
            -projwin_srs EPSG:5514 \
            data/detected.gpkg data/detected_${COUNTER_X}_${COUNTER_Y}.tiff

        ULY=$(($ULY-$SIZE))
        COUNTER_Y=$((COUNTER_Y+1))
    done
    ULX=$(($ULX+$SIZE))
    ULY=-1135450
    COUNTER_X=$((COUNTER_X+1))
done

Final Touch: Raster to Vector

After the GeoTiff is written to hard drive, inotifywait can be used to generate overviews. And with ease of calling gdal_polygonize.py on each of GeoTiffs…vector layer, at you service.

by Michal Zimmermann at July 19, 2017 11:30 AM

July 12, 2017

BigSQL Holly

Setting Up LDAP with Active Directory in PostgreSQL

In the example below, we are setting up LDAP with Active Directory (AD), one of the more popular server implementations, using the BigSQL PostgreSQL 96 distribution. Available for Windows and Linux.

Download BigSQL Distribution

Linux:

python -c "$(curl -fsSL http://s3.amazonaws.com/pgcentral/install.py)" 

Windows:

@powershell -NoProfile -ExecutionPolicy unrestricted -Command "iex ((new-object net.webclient).DownloadString('http://s3.amazonaws.com/pgcentral/install.ps1'))"

Find the bigsql directory and install PostgreSQL. Windows users don’t prefix the pgc command with ./ as shown in the following commands:

cd bigsql
./pgc install pg96
./pgc start pg96

Set your environment variables:

cd <directory of installation>/pg96

Linux:

source pg96.env

Windows:

pg96-env.bat

Configure pg_hba.conf

From the PostgreSQL documentaton for pg_hba.conf:

Client authentication is controlled by the pg_hba.conf (HBA stands for host-based authentication.)

The general format of the pg_hba.conf file is a set of records, one per line. A record is made up of a number of fields which are separated by spaces and/or tabs. The fields in the each record specifies a connection type, a client IP address range (if relevant for the connection type), a database name, a user name, and the authentication method to be used for connections matching these parameters.

Modify the LDAP connection record

There are many authentication methods in postgres (md5, trust, peer, etc). In this exercise, you will modify pg_hba.conf to use LDAP as the authentication method.

Navigate to the pg_hba.conf

cd bigsql/data/pg96/

Use your favorite editor to modify the line:

TYPE DATABASE USER CIDR-ADDRESS METHOD
host all all 127.0.0.1/32 trust

When we finish our exercise, the line will look something like this:

TYPE DATABASE USER CIDR-ADDRESS METHOD
host all all 127.0.0.1/32 ldap ldapserver=192.0.2.0 ldapprefix=”” ldapsuffix=”@oscg.com”

Modifying the METHOD field

The METHOD field provides connection credentials for Active Directory. There are 4 parameters you will need to create:

  1. ldap (the authentication method)
  2. ldapserver
  3. ldapprefix
  4. ldapsuffix

ldapserver

You can enter either the Active Directory server name or the ip address. For example:

ldapserver=192.0.2.0 

or

ldapserver=oscg.com

‘ldapprefix’ and ‘ldapsuffix’

Active Directory uses the following structure for login:

  • “DOMAIN_NAME\USERNAME”
  • “USERNAME@DOMAIN_NAME”

The ‘ldapprefix’ and ‘ldapsuffix’ serve as the variables for storing the DOMAIN_NAME value.

So, if the domain name for our active directory instance is OSCG, then we can provide the domain and user login credentials in one of 2 ways:

  • ldapprefix=”OSCG\” (the domain name at login – Note: the backslash is required)
  • ldapsuffix=”@oscg.com” (the domain name of the Active Directory server – Note: the @ is required)

Then, when the user (e.g. Vice Admiral Kathryn Janeway) enters her username (e.g. “kjaneway”), it is prepended or appended to the domain name depending on which method you chose to store the DOMAIN_NAME value:

  • OSCG\kjaneway
  • kjaneway@oscg.com

Note: Use either ldapprefix or ldapsuffix, not both!

In this example, we have modified the ldappreffix parameter:

METHOD
ldap ldapserver=192.0.2.0 ldapprefix=”OSCG\” ldapsuffix=””

and in this example, we have modified the ldapsuffix parameter:

METHOD
ldap ldapserver=192.0.2.0 ldapprefix=”” ldapsuffix=”@oscg.com”

Note: The empty quotes are required for whichever method you are not using – ldapprefix or ldapsuffix

Now, replace the parameters we used in the examples with your own Active Directory credentials:

METHOD
ldap ldapserver=”IP ADDRESS OR DOMAIN NAME OF YOUR ACTIVE DIRECTORY SERVER” ldapprefix=”YOUR DOMAIN_NAME” ldapsuffix=””

or

METHOD
ldap ldapserver=”IP ADDRESS OR DOMAIN NAME OF YOUR ACTIVE DIRECTORY SERVER” ldapprefix=”” ldapsuffix=”@” + “YOUR ACTIVE DIRECTORY SERVER NAME”

There are many more modes and configuration options to choose from when setting up your LDAP authentication. To learn more go here.

Create role for the AD domain user

Login to postgres database as postgres user:

psql -U postgres     

Create role on postgres that exists in active directory:

CREATE ROLE "kjaneway" WITH LOGIN;

Exit psql

\q

Reload the postgres to load new pg_hba.conf

cd bigsql
./pgc reload pg96

Login to postgres with Active Directory/Domain User

psql -h 127.0.0.1 -d postgres -U "kjaneway" 

Now, you should be prompted to enter a password. Enter the password you use for the active directory domain:

Password for user kjaneway: 

And, voila! You should now see the following prompt giving you access to the postgres database using LDAP:

psql (9.6.3)
Type "help" for help.

postgres=>

Final test – disable and enable Active Directory

Login to your Active Directory server and disable the domain account you have been testing against:

Connecting to the postgres database with the domain user and password should now fail:

psql -h 127.0.0.1 -d postgres -U "kjaneway" 
Password for user ldaptest: 
psql: FATAL:  LDAP authentication failed for user "kjaneway"

Now, enable the ‘holly.orr’ account in Active Directory:

Connecting to the postgres database with the domain user and password should now suceed:

psql -h 127.0.0.1 -d postgres -U "kjaneway" 
Password for user kjaneway: 
psql (9.6.3)
Type "help" for help.

openscg=# 

The What and Why of LDAP: A Little Context to Help You Understand

Remember phone books? Those big heavy directories, ordered alphabetically, that you would heave onto your kitchen table and thumb through to find a friend’s phone number and address?

With the advent of the internet and email, a new virtual phonebook was in order. One that allowed you look up an address for someone who’s never sent you email or enabled your organization to keep one centralized up-to-date “phone book” that permissable users had access to.

The solution came in the 1980’s. After 70 years of honing the art of managing phone books, the Telecommunications companies answered the call (get it) by applying the concept of directory services to information technology and computer networking. This resulted in the comprehensive X.500 specification.

In the 1990’s, geeks from the University of Michigan (of course) created a alternative protocal, LDAP, Lightweight Directory Access Protocol. As the name suggests, LDAP was lightweight compared to it’s predeccesor x.500 and quickly became widely adopted

Today, LDAP is a staple of enterprise IT infrastructures for authenticating and storing data about users. It is not limited to contact information and is used to look up encryption certificates, services on a network, and provide “single sign-on” where one password for a user is shared between many services.

Popular commercial LDAP server implementations include:

  • Microsoft Active Directory
  • UnboundID Directory Server
  • Oracle Internet Directory
  • Oracle Unified Directory
  • IBM Security Directory Server
  • NetIQ eDirectory
  • CA Directory

Popular open source LDAP server implementations include:

  • OpenLDAP
  • ForgeRock OpenDJ
  • Apache DS
  • 389 Directory Server

And lately, there have been Cloud LDAP solutions like JumpCloud or OneLogin. But this may come at a risk.

What are you using? Have you tried out Cloud solutions? Hit reply and tell us how you use LDAP with PostgreSQL.

by Holly Orr at July 12, 2017 12:17 PM

July 03, 2017

Postgres OnLine Journal (Leo Hsu, Regina Obe)

Installing pgTap in windows with msys2 and mingw64

This weekend we spent sometime moving PostGIS/pgRouting windows buildbot Winnie to new hardware. Leo did the hardware and I handled installing and reconfiguring stuff. While I was at it, I upgraded to new Jenkins. Vicky Vergara has been bugging me to setup pgTap so she can run her pgRouting pgTap tests to make sure they work on windows. She's got 22488 tests. She just loves pgTap. Last time I tried installing pgTap I gave up, but I was in mood for experimentation so gave it another chance.


Continue reading "Installing pgTap in windows with msys2 and mingw64"

by Leo Hsu and Regina Obe (nospam@example.com) at July 03, 2017 04:42 PM

July 01, 2017

PostGIS Development

PostGIS 2.3.3 Released

The PostGIS development team is pleased to announce the release of PostGIS 2.3.3 As befits a patch release, the focus is on bugs and breakages. Best served with PostgreSQL 9.6.3+ and pgRouting 2.4.1.

Continue Reading by clicking title hyperlink ..

by Regina Obe at July 01, 2017 12:00 AM

June 22, 2017

BigSQL Holly

pgAdmin3 LTS Now Supports PostgreSQL 8.4 – 10

We are pleased to announce pgAdmin3 LTS is now available on OSX and Windows for PostgreSQL 10!

As v1.0 of pgAdmin4 was released in September of 2016, the pgAdmin Development Team decided to no longer support pgAdmin III with any more fixes. We believe that pgAdmin3 usage should fade out over time and have forked the code to include basic support for PostgreSQL 9.6+.

Try it out!

Install pgAdmin3 LTS by BigSQL

If you haven’t already installed PostgreSQL, start here.

Via Command Line

Windows users don’t prefix the pgc command with ./ as shown in the following commands:

cd <directory of installation>
./pgc install pgadmin3
./pgc start pgadmin3

Via GUI

If you haven’t already, install pgDevOps.

In pgDevOps, navigate to Package Manager, select the pgAdmin3 component, and install.

Already have pgAdmin3 installed?

cd <directory of installation>
./pgc update
./pgc upgrade pgadmin3

Wanna delete pgAdmin3?

cd <directory of installation>
./pgc remove pgadmin3

As always, give us your feedback so that we can continue to support and respond to the needs of the PostgreSQL community.

by Holly Orr at June 22, 2017 06:15 PM

June 19, 2017

BigSQL Holly

GDAL New Release 2.2 – BigSQL Makes it Easy to Install

The GDAL community just announced the release of GDAL 2.2… and BigSQL just made it easier to implement!

Install it Now

We support Diegos and Yodas with 2 ways to install:

Option 1: Install with the pgDevOps GUI

For GIS analysts (scientists, statisticians, analysts, etc) who find themselves as the de facto DBA (among other things).

“I don’t have time for this! I’m saving this sea turtle. Just give me a GUI!”

1. Go to www.bigsql.org/postgresql/installers and download and run the installer for your operating system.

2. Make sure to check the box for including the pgDevOps component with your install.

3. Open pgDevOps in your web browser:

    <ip of machine>:8051

If installing on local machine:

    localhost:8051  
4. In the pgDevOps dashboard click the Package Manager icon.

5. Click the PostgreSQL installation you want to install on.

6. Click on the PostGIS icon and follow the instructions to install.

That’s it!

Option 2: Install via Command Line and PGC

For DBAs who find themselves supporting these strange spatially enabled databases.

“Real men only use command line! And what the hell is GIS?”

  1. Install pgc via command line to a sandbox.
    *Note: this command will install the BigSQL distribution into the directory you are currently located.

    MAC / Linux:

    python -c "$(curl -fsSL http://s3.amazonaws.com/pgcentral/install.py)" 
    

    Windows:

    @powershell -NoProfile -ExecutionPolicy unrestricted -Command "iex ((new-object net.webclient).DownloadString('http://s3.amazonaws.com/pgcentral/install.ps1'))"
    
  2. Get into the product home directory by navigating via command line to the bigsql directory. Windows users don’t prefix the pgc command with ./ as shown in the below examples:

    cd bigsql
    
  3. Using the pgc command line tool, run the update command to get the lastest distribution:

    ./pgc update
    
  4. Then run the install, init, and start commands on pg96:

    ./pgc install pg96 
    ./pgc start pg96
    
  5. Next, install postgis:

    ./pgc install postgis23-pg96
    

That’s it!

The GDAL_DATA Directory Path

In order for all your PostGIS extensions to properly work, you will need to set the GDAL_DATA environment variable to the location of the directory gdal.

Lucky for you, with the BigQL distribution this is as easy as running the following commands via a command line tool.

OSX:

 cd <directory of installation>/pg96
 source pg96.env

Windows:

 cd <directory of installation>\pg96
 pg96-env

Both of these commands set environment variables that will live during your current session. Note that if you close your terminal or command prompt, these variables are removed from memory.

It’s also possible to set GDAL_DATA as a persistant environment variable. But that is the beyond the scope of this tutorial.

To check to see what version of GDAL you have installed:

gdalinfo --version

WTH is GDAL?

In 1998, Frank Warmerdam began developing GDAL, which stands for Geospatial Data Abstraction Library. The official definition, found on gdal.org, is as follows:

GDAL is a translator library for raster and vector geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single raster abstract data model and single vector abstract data model to the calling application for all supported formats. It also comes with a variety of useful command line utilities for data translation and processing.

Clear as mud, right? Let me refer to a better definition written by Planet Labs’ Robert Simmon in the post, A Gentle Introduction to GDAL, Part 1. I higly recommend reading the entire article:

Even the description is complicated. Put another way, GDAL (pronounced ‘gee-dal’ not ‘goodle’ (too close to Google!)) is a collection of software that helps with the translation of data from different file formats, data types, and map projections. It’s used in free software like QGIS and GRASS, and even some commercial applications like ESRI ArcGIS and Avenza Geographic Imager/MAPublisher.

WTH is OGR?

To make things even a bit more confusing, OGR (which I discussed in an earlier post, is part of GDAL:

In theory it is separate from GDAL, but currently they reside in the same source tree and are somewhat entangled.

So what’s the difference between GDAL and OGR?

The GDAL library is historically for raster data extract, transform, load (ETL) processes, while OGR supports vector data. Sometimes people will interchange the names GDAL, OGR, and GDAL/OGR to refer to the same thing. I know… geeks and their esoteric acronyms, right?

Regardless of the complexities, I can’t overemphasize how essential it is to add GDAL/OGR libraries to your spatial data management toolkit!

by Holly Orr at June 19, 2017 04:44 PM

June 17, 2017

Postgres OnLine Journal (Leo Hsu, Regina Obe)

Dollar-quoting for escaping single quotes

PostgreSQL has a feature called dollar-quoting, which allows you to include a body of text without escaping the single quotes. This feature has existed for quite some time. You've probably seen this in action when defining functions for example:

CREATE OR REPLACE FUNCTION hello_world(param_your_name text)
RETURNS text AS
$$
SELECT 'Hello world. My name is ' || param_your_name || '.';
$$
language sql STRICT;

Which is easier to read, than the equivalent escape quoted function:


CREATE OR REPLACE FUNCTION hello_world(param_your_name text)
RETURNS text AS
'
SELECT ''Hello world. My name is '' || param_your_name || ''.'';
'
language sql STRICT;

Continue reading "Dollar-quoting for escaping single quotes"

by Leo Hsu and Regina Obe (nospam@example.com) at June 17, 2017 11:54 PM

June 09, 2017

Bill Dollins

Refreshing a PostGIS Materialized View in FME

I am following up my previous post with an extremely simple example using FME to kick off the refresh of a materialized view (matview) after a data import. I had never used FME prior to coming to Spatial Networks, but now I’m hooked. I’m having a really hard time finding things it can’t do.

As I mentioned in my last post, it’s really easy to refresh a matview in PostgreSQL using the REFRESH MATERIALIZED VIEW statement. This leaves open the possibility of automating the refresh as appropriate in an application or other process.

I decided to illustrate this using a basic FME example. Using the cellular tower data set from my past post, I extracted a table containing only the records for the state of Maryland. The towers data set contains the two letter abbreviation for the state, but not the full state name. So, I built a matview to join the state name to a subset of columns from the towers data set. The SQL for that matview is here:

View the code on Gist.

I will use FME to append the records for the state of Virginia from a GeoJSON file to the PostGIS table containing the records for Maryland.

Disclaimer: This example wastes the power of FME. If this were all I needed to do, I’d probably just use OGR. I kept it simple for this post.

Here is a screenshot of my import process from FME workbench:

As can be seen, 656 records from the GeoJSON file will be appended to the PostGIS table.

To make FME run the REFRESH statement after the import, I just put the SQL into the advanced parameters for the PostGIS writer.

For demonstration purposes, I’ll run the import without the REFRESH statement. This will result in the source table having a different record count than the matview, as shown here.

Now, I’ll clear out the Virginia records from the base table, add the REFRESH statement back into FME, and rerun the process. This time, the table and the matview are in sync, as seen here.

This basic example illustrates one way to automate the refresh of matviews as part of a data update process. Whether using FME, GeoKettle, or your own application code, it’s very easy to keep matviews in sync.

It’s important to remember however, that every refresh is writing data to a new table. Care should be taken to tune how often matviews are refreshed to ensure that performance doesn’t suffer. They aren’t appropriate for high-velocity data sets, but can work nicely for data that is updated less frequently.

by Bill Dollins at June 09, 2017 06:04 PM

June 06, 2017

Bill Dollins

Working with Materialized Views in PostGIS

It’s been a few months since I’ve posted, owing mainly to getting my feet under me at Spatial Networks. About a month after I started, the company re-merged with Fulcrum, which had previously been spun off as a separate company. As a result, I’ve gotten to know the Fulcrum engineering team and have gotten to peer under the hood of the product.

Of course, Spatial Networks is also a data company. What had originally attracted me was the opportunity to help streamline the delivery of their data products, and this remains a pressing issue. This has kept me elbow-deep in PostGIS, and has led me to delve into using materialized views more than I have before.

What is a materialized view? If you are familiar with relational databases, then you are familiar with views, which are saved queries that are stored in the database. Similar to tables, you can select data from a view; but, rather than directly selecting physically stored data, you are executing the SQL that defines the view, which will stitch together data at the time of execution.

A materialized view is a useful hybrid of a table and a view. It is technically a table, because it is physically stored on disk, but it is generated from a SQL statement like a view. This is can be useful for increasing performance because costly joins and functions (ahem, spatial) are not executed every time the data is accessed. The source SQL is executed and the result written to disk. As a result, the data can be indexed and queries can executed quickly. It could be reasonably thought of as a more streamlined SELECT INTO.

So why use materialized views? If properly designed, a relational database (spatial or otherwise) is built using normalization to optimize storage, reduce data redundancy, provide constraints and enhance data quality. This is excellent for storage, but not always ideal for analytics, which is why views exist.

A materialized view provides the ability to prepare a persisted version of data that is better suited for analysis and/or human readability. In a spatial database such as PostGIS, it also provides the ability to pre-process spatial analysis to enhance database and application performance. That is what got me interested in them.

From here, it’s probably best to discuss materialized views in terms of a scenario. I’ll discuss a fairly straightforward example, data binning, which touches upon the advantages of materialized views for me.

I’ll work with four data sets:

  1. An extract of cellular towers from the FCC downloaded from ArcGIS Online
  2. US counties extracted from GADM
  3. 0.5-degree rectangular grid created in QGIS
  4. 0.5-degree hexagonal grid created in QGIS

I imported all the above into PostGIS.

My goal is to calculate the number of cellular tower in each grid square, hexagon, and county. I’ll use a fairly simple bit of SQL to do that.

View the code on Gist.

The SQL above generates a count of the towers that fall within each hex, as governed by the st_contains function and the GROUP BY clause. In a traditional view, this spatial query would executed every time I issued a SELECT statement against the view.

You will notice that I don’t return the hexagon geometry. I will create a second view that joins the geometry to the results of this query. I did that strictly to keep the SQL clean for this post, but it is perfectly fine to get all of the data in one step. The SQL that joins the hexagons is:

View the code on Gist.

Notice the left join so that all of the hexagons are returned regardless of whether or not they contain towers. I included the full CREATE statement this time to highlight the contrast with the next example.

It’s now time to create the materialized view, which calls upon the view created in the previous step. This is the SQL to create the materialized view:

View the code on Gist.

Note the creation of the spatial index. This is important because a materialized view creates a new data set, so the data is divorced from any indices in the source data.

The SQL above physically writes the data to the database, similar to a SELECT INTO statement. Unlike that approach however, we can easily refresh the data by issuing the following command:

REFRESH MATERIALIZED VIEW mvw_cellular_count_geom_hex;

This enables a user or application to automatically updated the stored data whenever the underlying source data changes.

Notice in the SQL above, I am calculating a UUID column. This is being done to aid visualization in QGIS. To add a view of any kind to a QGIS map, you need to tell it what the unique identifier is for the data set. I tried numerous different approaches, but found using a UUID to work most consistently. I will keep exploring this. The binned hex grid layer (with a bit of styling) looks like this in QGIS:

I mentioned earlier that materialized views can help performance. This really becomes apparent with complex geometries. There is minimal difference between the standard view and the materialized view when working with squares or hexes. But, to produce the county map shown at the top of this post, the standard view took 4.2 seconds to run on a Linux machine with quad-core, SSD, and 64GB of RAM. The materialized view returned in 292 milliseconds. This is where not having to re-run spatial queries using the details GADM polygons really pays off.

Materialized views are not a panacea. If you have rapidly updating data, the refresh process with probably introduce too much latency. You are also storing data, such as geometries, twice. But, depending on your needs, the benefits may outstrip the drawbacks of using materialized view in spatial applications.

by Bill Dollins at June 06, 2017 09:38 PM

Oslandia

OSGeo CS 2016 report : PostGIS

The annual OSGeo Codesprint took place from 23d to 26th of February in Paris, at Mozilla’s Foundation offices. Here is a summary of some achievements done during this event by the PostGIS team.

PostGIS

A big team of PostGIS developers worked together during the sprint on numerous issues.

New functions and dev guide

Remi Cura from IGN had prepared a full list of improvements to existing PostGIS functions and new features to develop. Together with Marc Ducobu ( Champs Libres ) , they managed to go all the way from simple PostGIS users to PostGIS developers. They dived into the code and managed to write clean patches for functions and documentation.

They documented their work and created a step-by-step guide for new PostGIS contributors.

New clustering features


New features have been introduced for PostGIS 2.3, among them clustering methods : ST_ClusterKMeans() and ST_ClusterDBSCAN() by Paul Ramsey and Dan Baston. These are really nice Window functions for PostGIS !
On a side note, the documentation for PostGIS Window functions has been improved too ( R. Obe )

OGR ODBC FDW

Regina Obe managed to travel to Paris, and she put efforts analyzing why the ODBC performances of the ogr_fdw PostGIS foreign data wrapper were not as good as expected. This should lead to performance improvement later on. She also worked on problems encountered when restoring PostGIS data ( especially rasters), related to how the search_path is handled. There is still some work to experiment with EVENT TRIGGERS, but the issue is on a good path to be solved.

3D in PostGIS : SFCGAL

 

Our team at Oslandia has been working on SFCGAL, the 3D library behind PostGIS 3D features. Vincent Mora and Hugo Mercier teamed with Mickael Borne ( IGN ) and Sébastien Loriot ( GeometryFactory ) to break things and rebuild them, better. They focused mainly on performances:

  • different computation and precision kernels in CGAL, to use them efficiently and be much faster
  • set a flag for valid geometries, so as not to re-test for validity in every operation
  • lower serialization overhead when passing data between PostGIS, CGAL and SFCGAL

This effort lead to significant improvement in speed. Preliminary tests on 3D intersection showed improvement from 15 seconds down to 100 milliseconds, which is impressive result.
A lot of refactoring of the code has to be done, and this work also started, to simplify and ease the use of SFCGAL.

New indexes for big data

Another significant contribution is the work on BRIN indexes for PostGIS. At PGConf Europe, we already spoke with the team at Dalibo about the new index type in PostgreSQL 9.5 : BRIN use cases and implementation in a GIS context. Some time before OSGeo code sprint, we realized that Giuseppe Broccolo from 2ndQuadrant had started a prototype implementation, so we invited him to join. Giuseppe teamed with Ronan Dunklau and Julien Rouhaud from Dalibo, and together they managed to have a working implementation of this new type of indexes for PostGIS.

Having the opportunity for PostgreSQL developers to meet in this geospatial environment was the ideal way to get things done efficiently.

PostGIS BRIN will be detailled after some code consolidation and benchmarks, but here is the idea. BRIN indexes arepartial indexes: they deal with data blocks ( a given amount of pages ) and not rows. You can set the number of pages per block. This makes indexes much faster to build, and a lot smaller than classic indexes ( GiST for PostGIS ). Of course, the tradeoff is that they are slower to query. And since they group data in blocks, they loose their efficiency if the data is not clustered on disk.

Some of our use cases, with tables full of Billions of points ( using PgPointCloud ), are a good match for BRIN indexes : the data is naturally clustered spatially ( patches of points are loaded sequencially ), and sometimes the data is too big for a table’s index to fit entirely in memory. E.g. we have a 3TB table, resulting in a 20GB GiST index.

A BRIN index can be a really good compromise for this : preliminary tests on a medium-size dataset show a performance degradation of 5x in querying, while the index is 1000x smaller. And this tradeoff is adjustable, according to your dataset size and hardware configuration.

And more

Other topics which have been worked on (mainly by Paul, see his blog post) :

  • Expanded object header, to avoid serialization / deserialization steps
  • A nasty upgrade bug ( 2.2 blocker )

You will find more details in the following reports :

  • Paul Ramsey – http://blog.cleverelephant.ca/2016/03/paris-code-sprint-postgis-recap.html
  • Giuseppe Broccolo – http://blog.2ndquadrant.com/brin-postgis-codesprint2016-paris/
  • Regina Obe (PostgresOnLine) – http://www.postgresonline.com/journal/archives/363-Paris-OSGeo-Code-Sprint-2016-Highlights.html
  • Regina Obe (BostonGIS) – http://www.bostongis.com/blog/index.php?/archives/252-Paris-OSGeo-Code-Sprint-2016-PostGIS-Highlights.html

Thanks to the PostGIS team for such hard work and for their reports !

Figures credits : Openstreetmap’s Soup, OpenScienceMap, Wikipedia

 

by oslandia at June 06, 2017 09:54 AM

June 05, 2017

PostGIS Development

Causes for 'postgis.backend' is already set

The error ‘postgis.backend’ is already set comes up every so often in PostGIS mailing list. The issue arises often during or after an upgrade. I’ll go over causes for this I am aware of and how to fix.

The question goes something like this

After upgrading to Postgis 2.3 from 2.1, my server log is filled with these messages :

“WARNING ‘postgis.backend’ is already set and cannot be changed until you reconnect”

Continue Reading by clicking title hyperlink ..

by Regina Obe at June 05, 2017 12:00 AM

May 25, 2017

BigSQL Holly

Connecting to Remote Spatial Data Sources with ogr_fdw

Shout out goes to Regina Obe and Paul Ramsey for their work and documentation on ogr_fdw. The ogr_fdw project can be found here: https://github.com/pramsey/pgsql-ogr-fdw.

ogr+fdw = ogr_fdw

In an earlier post, I showed you how to use the ogr2ogr (or as my geogeek friends like to call it “ogre to ogre”) command line tool to import features in different data formats to Postgres (e.g. shapefiles or geojson).

Foreign Data Wrappers (FDWs) allow you to connect to remote data sources from within Postgres. From there you can query them with SQL, join across disparate data sets, or join across different systems. There are FDW implementations to connect to MySQL, Oracle, SQLite, as well as flat files.

ogr_fdw allows you to connect your PostGIS database directly to an existing GIS file or database and read from it without importing the data.

It’s like getting a hug from an ogre!

Exercise: Intro to ogr_fdw

Exercise Prerequisites

ogr_fdw is packaged with BigSQL’s PostgreSQL installers. If you have not installed the BigSQL PostgreSQL distribution and setup your PostGIS database, start here.

The GDAL_DATA Directory Path

In order for ogr_fdw to properly work, you will need to set the GDAL_DATA environment variable to the location of the directory gdal (if you haven’t already).

Lucky for you, with the BigQL distribution this is as easy as running the following commands via a command line tool.

Linux / OSX:

 cd <directory of installation>/pg96
 source pg96.env

Windows:

 cd <directory of installation>\pg96
 pg96-env

Both of these commands set environment variables that will live during your current session. Note that if you close your terminal or command prompt, these variables are removed from memory.

It’s also possible to set GDAL_DATA as a persistant environment variable. But that is the beyond the scope of this tutorial.

Download the data

For this exercise, I downloaded the NYC Subway Lines shapefile available at the New York City Open Data site.

To download this data click here: nyc_subway_lines shapefile

ogr_fdw_info Commands

Begin by running the ogr_fdw_info command to show a list of supported formats:

ogr_fdw_info -f

Supported Formats:
  -> "PCIDSK" (read/write)
  -> "netCDF" (read/write)
  ...
  -> "HTTP" (readonly)

Navigate to the nyc_subway_lines directory you downloaded and run the following command to see the layers in the directory:

ogr_fdw_info -s <nyc_subway_lines directory>
    example OSX/Linux: ogr_fdw_info -s /Users/hollyorr/gis_data/nyc_subway_lines
    example Windows: ogr_fdw_info -s C:\gis_data\nyc_subway_lines 

Layers:
  nyc_subway

Now use the following command to read an OGR data source and output a server and table definition for this particular layer:

ogr_fdw_info -s <nyc_subway_lines directory> -l nyc_subway
    example OSX: ogr_fdw_info -s /Users/hollyorr/gis_data/nyc_subway_lines -l nyc_subway
    example Windows: ogr_fdw_info -s C:\gis_data\nyc_subway_lines -l nyc_subway

Output:

CREATE SERVER myserver
  FOREIGN DATA WRAPPER ogr_fdw
  OPTIONS (
    datasource 'nyc_subway_lines',
    format 'ESRI Shapefile' );

CREATE FOREIGN TABLE nyc_subway (
  fid bigint,
  geom Geometry(LineString,4326),
  objectid real,
  shape_len real,
  url varchar,
  name varchar,
  rt_symbol varchar,
  id real
) SERVER myserver
OPTIONS (layer 'nyc_subway');

You can run the commands generated as output in pgAdmin or command line. In this exercise we will run the sql statement in command line with psql.

Connect to psql:

Linux / OSX:

 cd <directory of installation>/pg96
 source pg96.env    

Windows:

 cd <directory of installation>\pg96\bin

Open psql and connect to postgis database:

$ psql
postgres=# \connect postgis_sample
    You are now connected to database "postgis_sample" as user "postgres".

Create the remote server and foreign table:

CREATE SERVER myserver FOREIGN DATA WRAPPER ogr_fdw OPTIONS (datasource '<location of nyc_subway_lines directory>', format 'ESRI Shapefile');

CREATE FOREIGN TABLE nyc_subway (fid bigint, geom Geometry(LineString,4326), objectid real, shape_len real, url varchar, name varchar, rt_symbol varchar, id real) SERVER myserver OPTIONS (layer 'nyc_subway');

Now open pgDevOps by BigSQL and switch to the pgAdmin4 Web view.

Check to see that your remote server was successfully created:

In the pgAdmin4 browser, navigate to:

Servers >> pg96 >> Databases >> postgis_sample >> Foreign Data Wrappers >> ogr_fdw >> Foreign Servers >> myserver

Check to see that your foreign table was successfully created:

In the pgAdmin4 browser, navigate to:

Servers >> pg96 >> Databases >> postgis_sample >> Schemas >> public >> Foreign Tables >> nyc_subway

Finally, query the nyc_subway table to see that your remote data using the ogr_fdw:

SELECT * FROM public.nyc_subway LIMIT 100;

by Holly Orr at May 25, 2017 07:39 PM

May 21, 2017

Michal Zimmermann

Mapping North America with QGIS: Tips and Tricks

Recently I’ve bought a book called Maps by Aleksandra Mizielinska and Daniel Mizielinski to my nephew. The book’s absolutely wonderful and made me want to try crafting a map with similar looks. I don’t do maps much at CleverMaps, so this was a great opportunity to find out what new features became available during the last months of QGIS development.

Result

A map of North America in scale of 1:22,000,000 featuring the biggest lakes, rivers, mountain ranges and basic administrative units for the North American countries. I aimed for visually appealing overview map rather than perfectly correct topographic one.

Data

I used my beloved Natural Earth dataset for both cultural (boundaries, cities) and physical (rivers, lakes) map features. Different scales came to play for different map layers as they seemed a bit too/few simplified for the given scale.

Fonts

I usually use built-in system fonts (Ubuntu Condensed or such), but this kind of map needed a more handwritten looking, sort of childish font. After searching dafont.com I chose PreCursive by RaseOne Full Time Artists and KG Primary Penmanship by Kimberly Geswein.

Symbols

The mountain point symbol was one of the two custom symbols used on the map. It comes from BSGStudio. The ocean wave symbol was made by myself.

QGIS effects

I’ve used several techniques I find interesting enough to be listed here.

Coastlines

For a long time I’ve considered coastlines a field for cartographic invention. They can be emphasized by shading or 3D effects. I chose the set of four parallel coastlines subtly disappearing into the sea, hopefully invoking the feeling of waves coming to the shore.

It’s done by dissolving all the features and buffering them again and again.

Buffered labels

Buffered labels are usually hard to get right, because they fill so much space if the buffer color’s not corresponding to its surroundings. But choosing the proper color can be a real struggle at times.

On this map, almost all the labels are buffered with the color of its surroundings, which makes them more legible, yet not too expressive. This is possible thanks to QGIS expression based properties that let you define unique styling to different map features.

Where it isn’t possible (e.g. Bahamas or Honduras) to choose just one buffer color, the label is not buffered at all (or the semi-transparent white buffer is used).

Note the Rocky Mountains label is split on the borders of the U.S.A. and Canada and its both parts match the background color.

Tapered rivers

Rivers are tapered based on the Natural Earth’s width attribute value for each river segment.

Labels in separate layers

I’m used to put labels into separate layers in more complicated map compositions, especially when you need to draw label along path for areal features (such as countries or states).

It becomes a bit harder to keep the features in sync with the labels though. I’d like to use only one layer for all the map layers in the future, as I feel that’s the way to go for the best labeling.

Labels wrapped on character

Some labels just can’t fit the feature they belong to and QGIS lets you deal with this by wrapping labels on a special character, \ in my case.

Layer blending mode

The mechanics behind layer blending modes are still a mystery to me, but they can add that little extra to a map very easily. Thanks to the Overlay blending mode, the Rocky Mountains may remain very subtle on different kinds of background.

by Michal Zimmermann at May 21, 2017 01:30 PM

Boston GIS (Regina Obe, Leo Hsu)

PostGIS 2.4.0, Code Sprints and other extensions to try with PostgreSQL 10 beta1

So PostgreSQL 10beta1 came out recently as Holly mentioned. When first mention of beta hits, things start getting serious for me. I have to make sure that PostGIS compiles against said distribution to make sure eager testers aren't held back.

As with other releases, PostGIS didn't compile against the new PostgreSQL version without some nurturing. We've still got one regress failure, but at least PostGIS 2.4 now compiles cleanly against PostgreSQL 10beta1. I'm hoping that we can release PostGIS 2.4.0 just in time for PostgreSQL 10 planned release in September so I don't have to backport PostgreSQL 10 patches I made to lower PostGIS versions.

For PostGIS 2.4 the main focus will be cleaning up the parallel work so that all aggregate functions can enjoy use of parallel optimization. This is even more important with PostgreSQL 10 now that more kinds of queries can benefit from parallelization work. I'm also hoping to focus more energy on the raster side of things.

Continue reading "PostGIS 2.4.0, Code Sprints and other extensions to try with PostgreSQL 10 beta1"

by Regina Obe (nospam@example.com) at May 21, 2017 07:37 AM

April 24, 2017

BigSQL Holly

ETL (Extract, Transform, Load) PostGIS Data with ogr2ogr

In an earlier post, I showed you how to use the shp2pgsql tool to import shapefiles into PostGIS. But what if you need to import other geodata formats (e.g. GeoJSON, MapInfo, KML, CSV, etc.)?

ogr2ogr – ETL for PostGIS

Shrek

Lucky for you, the BigSQL PostGIS distribution comes bundled with GDAL and the ogr2ogr (or as my geogeek friends like to call it “ogre to ogre”) command line tool.

From gdal.org/ogr2gr:

This program can be used to convert simple features data between file formats performing various operations during the process such as spatial or attribute selections, reducing the set of attributes, setting the output coordinate system or even reprojecting the features during translation.

In other words, ogr2ogr is your go to ETL tool for importing and exporting PostGIS data.

Exercise: Intro to ogr2ogr

In this exercise, we will import and export GeoJSON files to and from our postgis_sample database using ogr2ogr commands.

Exercise Prerequisites

The ogr2ogr command line tool is packaged with BigSQL’s PostgreSQL installers. If you have not installed the BigSQL PostgreSQL distribution and setup your PostGIS database, start here.

GeoJSON

GeoJSON is an extension of JSON open standard format and designed for representing simple geographical features, along with their non-spatial attributes.

Example:

    {
        "type": "Feature",
        "geometry": {
            "type": "Point",
            "coordinates": [125.6, 10.1]
        },
        "properties": {
            "name": "Dinagat Islands"
        }
    }

Getting to know ogr2ogr Command Line

Open your command line utility and enter the this help command to see ogr2ogr parameters and usage:

ogr2ogr --long-usage

You should see the following parameter and under it a list of all the data formats your installation of ogr2ogr supports:

-f format_name

Make sure that the these two formats are included in the list:

-f "GeoJSON"
-f "PostgreSQL"

Import GeoJSON to PostGIS

For this exercise, I downloaded the NYC Wi-Fi Hotspot Locations GeoJSON file available at the New York City Open Data site.

To download this data click here: nyc_hotspots.geojson

To import a GeoJSON file into your PostGIS database, use the following command:

ogr2ogr -f "PostgreSQL" PG:"dbname=my_database user=postgres" "source_data.json" -nln new_table_name

If you want to append the new data to already existing records, use the -append flag:

ogr2ogr -f "PostgreSQL" PG:"dbname=my_database user=postgres" "source_data.geojson" -nln destination_table -append

So, for our exercise run the following commands:

cd <location of nyc_hotspots.geojson>
ogr2ogr -f "PostgreSQL" PG:"dbname=postgis_sample user=postgres" "nyc_hotspots.geojson" -nln nyc_hotspots

Now, run the following sql query on your postgis_sample database to see the table’s attributes:

select * from public.nyc_hotspots;

Export from PostGIS to GeoJSON

To generate a geojson file from one of your PostGIS tables, run the following command.

ogr2ogr -f "GeoJson" out.geojson PG:"host=localhost dbname=postgis_sample user=postgres password=<your_password>" \  -sql "SELECT * from nyc_hotspots"

*Note: Although you can use ogr2ogr to import and export shapefile format to and from PostGIS, it is recommended you use the shp2pgsql tool I discussed in an earlier post.

by Holly Orr at April 24, 2017 12:37 PM

April 20, 2017

Michal Zimmermann

Routing with GRASS GIS: Catchment Area Calculation

I got my hands on pgRouting in the last post and I’m about to do the same with GRASS GIS in this one.

GRASS GIS stores the topology for the native vector format by default, which makes it easy to use for the network analysis. All the commands associated with the network analysis can be found in the v.net family. The ones I’m going to discuss in this post are v.net itself, v.net.path, .v.net.alloc and v.net.iso, respectively.

Data

I’m going to use the roads data from the previous post together with some random points used as catchment areas centers.

# create the new GRASS GIS location
grass -text -c ./osm/czech

# import the roads
v.in.ogr input="PG:host=localhost dbname=pgrouting" layer=cze.roads output=roads -eo  --overwrite

# import the random points
v.in.ogr input="PG:host=localhost dbname=pgrouting" layer=temp.points output=points -eo --overwrite

I got six different points and the pretty dense road network. Note none of the points is connected to the existing network.

You have to have routable network to do the actual routing (the worst sentence ever written). To do so, let’s:

  • connect the random points to the network
  • add nodes to ends and intersections of the roads

Note I’m using the 500m as the max distance in which to connect the points to the network.

v.net input=roads points=points operation=connect threshold=500 output=network
v.net input=network output=network_noded operation=nodes

Finding the shortest path between two points

Once the network is routable, it is easy to find the shortest path between points number 1 and 4 and store it in the new map.

echo "1 1 4" | v.net.path input=network_noded output=path_1_4

The algorithm doesn’t take bridges, tunnels and oneways into account, it’s capable of doing so though.

Distributing the subnets for nearest centers

v.net.alloc input=network_noded output=network_alloc center_cats=1-6 node_layer=2

v.net.alloc module takes the given centers and distributes the network so each of its parts belongs to exactly one center - the nearest one (speaking the distance, time units, …).

Creating catchment areas

v.net.iso input=network_noded output=network_iso center_cats=1-6 costs=1000,3000,5000

v.net.iso splits net by cost isolines. Again, the costs might be specified as lengths, time units, ….

Two different ways lead to the actual catchment area creation. First, you extract nodes from the roads with their values, turn them into the raster grid and either extract contours or polygonize the raster. I find the last step suboptimal and would love to find another way of polygonizing the results.

Note when extracting contours the interval has to be set to the reasonable number depending on the nodes values.

Remarks

  • Once you grasp the basics, GRASS GIS is real fun. Grasping the basics is pretty tough though.
  • Pedestrians usually don’t follow the road network.
  • Bridges and tunnels might be an issue.
  • Personally, I find GRASS GIS easier to use for the network analysis compared to pgRouting.

by Michal Zimmermann at April 20, 2017 03:30 PM