Welcome to Planet PostGIS

August 20, 2014


Postgis and PostgreSQL in Action - Timezones


Recently, I was lucky to be part of an awesome project called the Breaking Boundaries Tour.

This project is about two brothers, Omar and Greg Colin, who take there Stella scooters to make a full round trip across the United States. And, while they are at it, try to raise funding for Surfer's Healing Folly Beach - an organization that does great work enhancing the lives of children with autism through surfing . To accommodate this trip, they wished to have a site where visitors could follow their trail live, as it happened. A marker would travel across the map, with them, 24/7.

Furthermore, they needed the ability to jump off their scooters, snap a few pictures, edit a video, write some side info and push it on the net, for whole the world to see. Immediately after they made their post, it had to appear on the exact spot they where at when snapping their moments of beauty.

To aid in the live tracking of their global position, they acquired a dedicated GPS tracking device which sends a latitude/longitude coordinate via a mobile data network every 5 minutes.

Now, this (short) post is not about how I build the entire application, but rather about how I used PostGIS and PostgreSQL for a rather peculiar matter: deducting timezone information.

For those who are interested though: the site is entirely build in Python using the Flask "micro framework" and, of course, PostgreSQL as the database.

Timezone information?

Yes. Time, dates, timezones: hairy worms in hairy cans which many developers hate to open, but have to sooner or later.

In the case of Breaking Boundaries Tour, we had one major occasion where we needed the correct timezone information: where did the post happen?

Where did it happen?

A feature we wanted to implement was one to help visitors get a better view of when a certain post was written. To be able to see when a post was written in your local timezone is much more convenient then seeing the post time in some foreign zone.

We are lazy and do not wish to count back- or forward to figure out when a post popped up in our frame of time.

The reasoning is simple, always calculate all the times involved back to simple UTC (GMT). Then figure out the clients timezone using JavaScript, apply the time difference and done!

Simple eh?

Correct, except for one small detail in the feature request, in what zone was the post actually made?


While you heart might be at the right place while thinking: "Simple, just look at the locale of the machine (laptop, mobile phone, ...) that was used to post!", this information if just too fragile. Remember, the bothers are crossing the USA, riding through at least three major timezones. You can simply not expect all the devices involved when posting to always adjust their locale automatically depending on where they are.

We need a more robust solution. We need PostGIS.

But, how can a spatial database help us to figure out the timezone?

Well, thanks to the hard labor delivered to us by Eric Muller from efele.net, we have a complete and maintained shapefile of the entire world, containing polygons that represent the different timezones accompanied by the official timezone declarations.

This enables us to use the latitude and longitude information from the dedicated tracking device to pin point in which timezone they where while writing their post.

So let me take you on a short trip to show you how I used the above data in conjunction with PostGIS and PostgreSQL.

Getting the data

The first thing to do, obviously, is to download the shapefile data and load it in to our PostgreSQL database. Navigate to the Timezone World portion of the efele.net site and download the "tz_world" shapefile.

This will give you a zip which you can extract:

$ unzip tz_world.zip

Unzipping will create a directory called "world" in which you can find the needed shapefile package files.

Next you will need to make sure that your database is PostGIS ready. Connect to your desired database (let us call it bar) as a superuser:

$ psql -U postgres bar

And create the PostGIS extension:


Now go back to your terminal and load the shapefile into your database using the original owner of the database (here called foo):

$ shp2pgsql -S -s 4326 -I tz_world | psql -U foo bar

As you might remember from the PostGIS series, this loads in the geometry from the shapefile using only simple geometry (not "MULTI..." types) with a SRID of 4326.

What have we got?

This will take a couple of seconds and will create one table and two indexes. If you describe your database (assuming you have not made any tables yourself):

public | geography_columns | view     | postgres
public | geometry_columns  | view     | postgres
public | raster_columns    | view     | postgres
public | raster_overviews  | view     | postgres
public | spatial_ref_sys   | table    | postgres
public | tz_world          | table    | foo
public | tz_world_gid_seq  | sequence | foo

You will see the standard PostGIS bookkeeping and you will find the tz_world table together with a gid sequence.

Let us describe the table:

\d tz_world

And get:

Column |          Type          |                       Modifiers                        
gid    | integer                | not null default nextval('tz_world_gid_seq'::regclass)
tzid   | character varying(30)  | 
geom   | geometry(Polygon,4326) | 
    "tz_world_pkey" PRIMARY KEY, btree (gid)
    "tz_world_geom_gist" gist (geom)

So we have:

  • gid: an arbitrary id column
  • tzid: holding the standards compliant textual timezone identification
  • geom: holding polygons in SRID 4326.

Also notice we have two indexes made for us:

  • tz_world_pkey: a simple B-tree index on our gid
  • tz_world_geom_gist: a GiST index on our geometry

This is a rather nice set, would you not say?

Using the data

So how do we go about using this data?

As I have said above, we need to figure out in which polygon (timezone) a certain point resides.

Let us take an arbitrary point on the earth:

  • latitude: 35.362852
  • longitude: 140.196131

This is a spot in the Chiba prefecture, central Japan.

Using the Simple Features functions we have available in PostGIS, it is trivial to find out in which polygon a certain point resides:

SELECT tzid 
    FROM tz_world
    WHERE ST_Intersects(ST_GeomFromText('POINT(140.196131 35.362852)', 4326), geom);

And we get back:



In the above query I used the function ST_Intersects which checks if a given piece of geometry (our point) shares any space with another piece. If we would check the execute plan of this query:

    FROM tz_world
    WHERE ST_Intersects(ST_GeomFromText('POINT(140.196131 35.362852)', 4326), geom);

We get back:

                                                      QUERY PLAN                                                          
Index Scan using tz_world_geom_gist on tz_world  (cost=0.28..8.54 rows=1 width=15) (actual time=0.591..0.592 rows=1 loops=1)
    Index Cond: ('0101000020E61000006BD784B446866140E3A430EF71AE4140'::geometry && geom)
    Filter: _st_intersects('0101000020E61000006BD784B446866140E3A430EF71AE4140'::geometry, geom)
Total runtime: 0.617 ms

That is not bad at all, a runtime of little over 0.6 Milliseconds and it is using our GiST index.

But, if a lookup is using our GiST index, a small alarm bell should go off inside your head. Remember my last chapter on the PostGIS series? I kept on babbling about index usage and how geometry functions or operators can only use GiST indexes when they perform bounding box calculations.

The latter might pose a problem in our case, for bounding boxes are a very rough approximations of the actual geometry. This means that when we arrive near timezone borders, our calculations might just give us the wrong timezone.

So how can we fix this?

This time, we do not need to.

This is one of the few blessed functions that makes use of both an index and is very accurate.

The ST_Intersects first uses the index to perform bounding box calculations. This filters out the majority of available geometry. Then it performs a more expensive, but more accurate calculation (on a small subset) to check if the given point is really inside the returned matches.

We can thus simply use this function without any more magic...life is simple!


Now it is fair to say that we do not wish to perform this calculation every time a user views a post, that would not be very efficient nor smart.

Rather, it is a good idea to generate this information at post time, and save it for later use.

The way I have setup to save this information is twofold:

  • I only save a UTC (GTM) generalized timestamp of when the post was made.
  • I made an extra column in my so-called "posts" table where I only save the string that represents the timezone (Asia/Tokyo in the above case).

This keeps the date/time information in the database naive of any timezone and makes for easier calculations to give the time in either the clients timezone or in the timezone the post was originally written. You simply have one "root" time which you can move around timezones.

On every insert of a new post I have created a trigger that fetches the timezone and inserts it into the designated column. You could also fetch the timezone and update the post record using Python, but opting for an in-database solution saves you a few extra, unneeded round trips and is most likely a lot faster.

Let us see how we could create such a trigger.

A trigger in PostgreSQL is an event you can set to fire when certain conditions are met. The event(s) that fire have to be encapsulated inside a PostgreSQL function. Let us thus first start by creating the function that will insert our timezone string.

Creating functions

In PostgreSQL you can write functions in either C, Procedural languages (PgSQL, Perl, Python) or plain SQL.

Creating functions with plain SQL is the most straightforward and most easy way. However, since we want to write a function that is to be used inside a trigger, we have even a better option. We could employ the power of the embedded PostgreSQL procedural language to easily access and manipulate our newly insert data.

First, let us see which query we would use to fetch the timezone and update our post record:

UPDATE posts
  SET tzid = timezone.tzid
          FROM tz_world
          WHERE ST_Intersects(
              ST_MakePoint(140.196131, 35.362852),
          geom)) AS timezone
  WHERE pid = 1;

This query will fetch the timezone string using a subquery and then update the correct record (a post with "pid" 1 in this example).

How do we pour this into a function?

    UPDATE posts
    SET tzid = timezone.tzid
    FROM (SELECT tzid 
            FROM tz_world
              WHERE ST_Intersects(
                  ST_MakePoint(NEW.longitude, NEW.latitude),
              geom)) AS timezone
    WHERE pid = NEW.pid;
END $$

First we use the syntax CREATE OR REPLACE FUNCTION to indicate we want to create (or replace) a custom function. Then we tell PostgreSQL that this function will return type TRIGGER.

You might notice that we do not give this function any arguments. The reasoning here is that this function is "special". Functions which are used as triggers magically get information about the inserted data available.

Inside the function you can see we access our latitude and longitude prefixed with NEW. These keywords, NEW and OLD, refer to the record after and before the trigger(s) happened. In our case we could have used both, since we do not alter the latitude or longitude data, we simply fill a column that is NULL by default. There are more keywords available (TG_NAME, TG_RELID, TG_NARGS, ...) which refer to properties of the trigger itself, but that is beyond today's scope.

The actual SQL statement is wrapped between double dollar signs ($$). This is called dollar quoting and is the preferred way to quote your SQL string (as opposed to using single quotes). The body of the function, which in our case is mostly the SQL statement, is surrounded with a BEGIN and END keyword.

A trigger function always needs a RETURN statement that is used to provide the data for the updated record. This too has to reside in the body of the function.

Near the end of our function we need to declare in which language this function was written, in our case PLPGSQL.

Finally, the IMMUTABLE keyword tells PostgreSQL that this function is rather "functional", meaning: if the inputs are the same, the output will also, always be the same. Using this caching keyword gives our famous PostgreSQL planner the ability to make decisions based on this knowledge.

Creating triggers

Now that we have this functionality wrapped into a tiny PLPGSQL function, we can go ahead and create the trigger.

First you have the event on which a trigger can execute, these are:


Next, for each event you can specify at what timing your trigger has to fire:


The last one is a special timing by which you can replace the default behavior of the mentioned events.

For our use case, we are interested in executing our function AFTER INSERT.

CREATE TRIGGER set_timezone
    EXECUTE PROCEDURE set_timezone();

This will setup the trigger that fires after the insert of a new record.

Wrapping it up

Good, that all there is to it.

We use a query, wrapped in a function, triggered by an insert event to inject the official timezone string which is deducted by PostGIS's spatial abilities.

Now you can use this information to get the exact timezone of where the post was made and use this to present the surfing client both the post timezone time and their local time.

For the curious ones out there: I used the MomentJS library for the client side time parsing. This library offers a timezone extension which accepts these official timezone strings to calculate offsets. A lifesaver, so go check it out.

Also, be sure to follow the bros while they scooter across the States!

And as always...thanks for reading!

by Tim van der Linden at August 20, 2014 10:00 AM

August 08, 2014

Stephen Mather

KNN with FLANN and laspy, a starting place

FLANN is Fast Library for Approximate Nearest Neighbors, which is a purportedly wicked fast nearest neighbor library for comparing multi-dimensional points. I only say purportedly, as I haven’t verified, but I assume this to be quite true. I’d like to move some (all) of my KNN calculations outside the database.

I’d like to do the following with FLANN– take a LiDAR point cloud and change it into a LiDAR height-above-ground point cloud. What follows is my explorations so far.

In a previous series of posts, e.g. http://smathermather.wordpress.com/2014/07/14/lidar-and-pointcloud-extension-pt-6/


I have been using the point cloud extension in PostGIS. I like the 2-D chipping, but I think I should segregate my data into height classes before sending it into the database. In this way, I can query my data by height class and by location efficiently, taking full advantage of the efficiencies of storing all those little points in chips, while also being able to query the data in any of the dimensions I need to in the future. Enter FLANN.

I haven’t gotten far. To use FLANN with LiDAR through Python, I’m also using laspy.  There’s a great tutorial here: http://laspy.readthedocs.org/en/latest/tut_part_1.html


I make one change to the tutorial section using FLANN. The code as written is:

import laspy
import pyflann as pf
import numpy as np

# Open a file in read mode:
inFile = laspy.file.File("./laspytest/data/simple.las")
# Grab a numpy dataset of our clustering dimensions:
dataset = np.vstack([inFile.X, inFile.Y, inFile.Z]).transpose()

# Find the nearest 5 neighbors of point 100.

neighbors = flann.nn(dataset, dataset[100,], num_neighbors = 5)
print("Five nearest neighbors of point 100: ")
print("Distances: ")

To make this example work with the current version of pyflann, we need to make sure we import all of pyflann (or at least nn), and also set flann = FLANN() as follows:

import laspy
import numpy as np
from pyflann import *

# Open a file in read mode:
inFile = laspy.file.File("simple.las")
# Grab a numpy dataset of our clustering dimensions:
dataset = np.vstack([inFile.X, inFile.Y, inFile.Z]).transpose()

# Find the nearest 5 neighbors of point 100.
flann = FLANN()
neighbors = flann.nn(dataset, dataset[100,], num_neighbors = 5)
print("Five nearest neighbors of point 100: ")
print("Distances: ")

Finally, a small note on installation of pyflann on Ubuntu. What I’m about to document is undoubtedly not the recommended way to get pyflann working. But it worked… .

Installation for FLANN on Ubuntu can be found here: http://www.pointclouds.org/downloads/linux.html

But this does not seem to install pyflann. That said, it installs all our dependencies + FLANN, so…

I cloned, compiled, and installed the FLANN repo: https://github.com/mariusmuja/flann

git clone git://github.com/mariusmuja/flann.git
cd flann
mkdir BUILD
cmake ../.
sudo make install

This get’s pyflann where it needs to go, and voila! we can now do nearest neighbor searches within Python.

Next step, turn my LiDAR xyz point cloud into a xy-height point cloud, then dump in height-class by height class into PostgreSQL. Wish me luck!

by smathermather at August 08, 2014 02:15 PM

August 07, 2014

Stephen Mather

Drivetime analyses, pgRouting

Map of overlapping 5-minute drive times from park access points
We’ve got some quick and dirty pgRouting-based code up on github. I say quick and dirty because it directly references the table names in both of the functions. I hope to fix this in the future.

The objective with this code is to input a point, use a nearest neighbor search to find the nearest intersection, and from that calculate a drive time alpha shape.

First, the nearest neighbor search:

CREATE OR REPLACE FUNCTION pgr.nn_search (geom geometry) RETURNS
int8 AS $$

SELECT id FROM pgr.roads_500_vertices_pgr AS r
ORDER BY geom <#> r.the_geom


This function takes one argument– pass it a point geometry, and it will do a knn search for the nearest point. Since we are leveraging pgRouting, it simply returns the id of the nearest intersection point. We wrote this as a function in order to run it, e.g. on all the points in a table. As I stated earlier, it directly references a table name, which is a little hackerish, but we’ll patch that faux pax later.

Now that we have the ID of the point in question, we can do a driving distance calculation, wrap that up in an alpha shape, and return our polygon. Again, we write this as a function:

CREATE OR REPLACE FUNCTION pgr.alpha_shape (id integer, minutes integer) RETURNS
geometry AS $$

WITH alphashape AS(SELECT pgr_alphaShape('WITH
SELECT seq, id1 AS node, cost
FROM pgr_drivingDistance(''SELECT id, source, target, cost FROM pgr.roads_500'',' || id || ', ' || minutes || ', false, false)),
dd_points AS (
SELECT id_ AS id, x, y
FROM pgr.roads_500_vertices_pgr v, DD d
WHERE v.id = d.node)
SELECT * FROM dd_points')),

alphapoints AS (
SELECT ST_Makepoint((pgr_alphashape).x, (pgr_alphashape).y) FROM alphashape),

alphaline AS (
SELECT ST_MakeLine(ST_MakePoint) FROM alphapoints)

SELECT ST_MakePolygon(ST_AddPoint(ST_MakeLine, ST_StartPoint(ST_MakeLine))) AS the_geom FROM alphaline


Finally, we’ll use these functions in conjunction with a set of park feature access points to map our our 5-minute drive time. Overlaps in 5-minute zones we’ve rendered as a brighter green in the image above.

CREATE TABLE pgr.alpha_test AS
WITH dest_ids AS (
SELECT pgr.nn_search(the_geom) AS id FROM pgr.dest_pts
SELECT pgr.alpha_shape(id::int, 5)::geometry, id FROM dest_ids;

Oh, and I should point out that for the driving distance calculation, we have pre-calculated the costs of the roads based on the speed limit, so cost is really travel time in minutes.

by smathermather at August 07, 2014 01:50 AM

August 01, 2014

Stephen Mather

Plugin-free QGIS TMS tiles via GDAL

Want to load your favorite tiles into QGIS? How about a plugin-free QGIS TMS tiles via GDAL:


Really awesome… .

Needs but one change: epsg:900913 should be epsg:3857 or QGIS (GDAL?) throws an error. Presumably you could also define epsg:900913 in some config file, but barring that create an XML file as follows, and load as a raster in QGIS:

    <Service name="TMS">
    <Cache />

Now I can use my pretty tilestache maps _everywhere!_:
Image of coyote points overlayed on tilestache rendered hillshade map

Screenshot of map from postgis

edit: forgot the hattip for the lead on this, Thomas gratier: @ThomasG77

by smathermather at August 01, 2014 03:28 AM

Stephen Mather

Cleaning animal tracking data — throwing away extra points

Much the problem of the modern era– too much data, uneven data, and yet, should we keep it all?

Here’s the problem space: attach GPS collar to a coyote, send that data home, and you have a recipe for gleaning a lot of information about the movement of that animal across the landscape. In order to maximize the data collected while also maximizing the battery life of said device, sometimes (according to a preset scheme) you might sample every hour or every 15 minutes, and then switch back to once a day.

When you get the data back in, you get something like this:

Map of raw coyote data-- all points, showing all available collar GPS points

Already, we see some pretty interesting patterns. The first thing that I notice is clusters of activity. Are these clusters related to our uneven sampling, sometimes every 15 minutes, sometimes once or twice a day? Or are those really important clusters in the overall activity of the animal?

The next thing I notice is how perfectly the range of movement of the coyote is bounded by expressways to the west and north, and by the river to the east. Those seem to be pretty impermiable boundaries.

So, as does a man who only has a hammer, we clean the data with PostGIS. In this case, it’s an ideal task. First, metacode:

Task one: union / dissolve our points by day — for some days this will give us a single point, for other days a multi-point for the day.

Task two: find the centroid of our union. This will grab a centroid for each clump of points (a weighted spatial mean, if you will) and return that single point per day. There are some assumptions here that might not bear out under scrutiny, but it’s not a bad first approximation.

Now, code:

-- union / dissolve our points by day
WITH clusterp AS (
	SELECT ST_Union(geom) AS geom, gmt_date FROM nc_clipped_data_3734
		GROUP BY gmt_date
-- find the centroid of our union
centroids AS (
	SELECT ST_Centroid(geom) AS geom, gmt_date FROM clusterp
SELECT * FROM centroids;

Boom! One record per day:

Map showing much smaller home range, one record per day clustered around a single core zone

A different pattern now emerges. We can’t see the hard boundaries of the use area, but now the core portion of the home range can be seen.

Upcoming (guest) blog post on the use of these data in home range calculations in R. Stay tuned.

Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under CC BY SA.

by smathermather at August 01, 2014 02:58 AM

July 28, 2014

A GeoSpatial World

PostGIS Viewer 3D suite


PosGIS Viewer 3D suite


image_thumb175 ntroduction

Comme je vous l’annoncez dans mon précédent billet j’ai fait évoluer le pg3DViewer pour qu’il puisse gérer les géométries de type POINT, et pour mes test je me suis intéressé aux nuages de points ‘Points Clouds’'









Pour la suite de ce tutorial nous utiliserons deux fichiers de données dont voici les liens de téléchargement :

image_thumb174 nstallation


Cliquez sur le lien pg3DViewer_setup.exe pour télécharger le programme d’installation de la nouvelle version, si vous aviez installé la version précédente désinstallez la, puis lancer l’exécution.

Cliquer sur le  bouton Suivant à chaque étape, puis sur le bouton Terminer.

Dans le précédent billet vous avez tous les écrans de l’installation : http://ageoguy.blogspot.fr/2014/07/postgis-3d-viewer.html


image_thumb179 tilisation

Avant de commencer à utiliser pg3Dviewer vous devez avoir une base de données PostgreSQL avec la cartouche spatiale PostGIS version 2.0 ou supérieure.

image_thumb61 Double cliquez sur l’icone créé par le programme d’installation sur le bureau pour lancer l’application.



Connecter vous à votre base de données 3D



Commencez par cliquer sur cet icône pour vous connecter a une base de données PostgreSQL/PostGIS version 2.0 ou supérieur contenant des données 3D ou prête à en recevoir.

  • Choisissez localhost pour serveur si votre base de données est en local sur votre machine ou bien saisissez l’adresse IP de votre serveur
  • le port est par défaut 5432, changez le si nécessaire
  • saisissez l’utilisateur
  • saisissez le mot de passe
  • choisissez une base de donnée dans la combobox
  • cliquez sur le bout OK pour valider votre saisie.



Traitement d’un fichier Ascii XYZ

Nous allons utiliser le fichier bunny.xyz que vous avez téléchargé dans l’introduction.

Nous allons créer la table qui va accueillir les donnés. Dans PgAdmin III ouvrez l’outil de requêtes puis
copiez la requête suivante :

-- Table: bunny

-- DROP TABLE bunny;

  x double precision,
  y double precision,
  z double precision,
  id bigserial NOT NULL,
  CONSTRAINT bunny_pkey PRIMARY KEY (id)
  OWNER TO postgres;

Puis exécuter la.

Puis nous allons insérer les données dans la table. Copiez la requête suivante dans l’éditeur de requêtes :

copy lidar_test(x,y,z,classification)
from 'C:/temp/bunny.txt'
with delimiter ' '

puis exécuter la.

Maintenant nous allons pouvoir visualiser les données. Dans le panneau Query copiez la requête suivante :

SELECT x,y,z
FROM bunny AS points_xyz

Cliquez sur l’icone ‘Execute query

Et voici le résultat après Zoom et Rotation.
Comme vous avez du le remarquer le temps d’affichage est très rapide pour une table contenant 34835 points.

Nous allons maintenant essayer une nouvelle requête.Dans le panneau Query copiez la requête suivante :

SELECT st_geomfromtext('POINT Z('||x||' '||y||' '||z||')',0) as geom,'255:0:0'::text AS rgb FROM bunny

Cliquez sur l’icone ‘Execute query

Et voici le résultat après Zoom et Rotation.
Le temps d’affichage est beaucoup plus long parce que dans cette requête nous construisons la géométrie à la volée.

La différence avec la requête précédente est que celle-ci génère un fichier de points au format xyz (grâce à la commande COPY TO de PostgreSQL) qu’exploite un ‘reader’ de  la bibliothèque VTK. Tout cela ce fait avec l’ajout de ‘A'S points_xyz’ a la fin de la requête et la sélection des champs x,y et z.


Traitement d’un fichier LAS

Nous allons utiliser le fichier ‘2013 LiDAR 5105_54480.las’ que vous avez téléchargé dans l’introduction. Il va nous falloir des outils pour pouvoir importer ce fichier dans notre base de donnée, nous allons utiliser une collection d’outils qui s’appelle LASTOOLS que l’on peut télécharger sur le site de http://www.cs.unc.edu/~isenburg/lastools/ et dont voici le lien de téléchargement : http://lastools.org/download/lastools.zip

L’outil que nous allons utiliser s’appelle las2txt , voici l’entête du README associé:

Converts from binary LAS/LAZ 1.0 - 1.4 to an ASCII text format
For updates check the website or join the LAStools mailing list.


Martin @lastools

Et le lien pour le consulter : http://www.cs.unc.edu/~isenburg/lastools/download/las2txt_README.txt

Ouvrez une fenêtre DOS, allez dans le répertoire ou vous avez installé LASTOOLS, puis dans le sous répertoire bin. Copiez la commande suivante :
las2txt -i "C:\temp\2013 LiDAR 5105_54480.las" -parse xyzc -sep semicolon

Cette commande va générer un fichier Ascii (.txt) au format x,y,z et classification avec pour séparateur de colonne le caractère ‘;’ (point virgule).

Nous disposons à présent d’un fichier que nous allons pouvoir insérer dans notre base de données. Pour cela nous allons commencer par créer la table qui recevra les données, dans la fenêtre de requêtes de PgAdmin III copiez la requête suivante:

-- Table: lidar_test

-- DROP TABLE lidar_test;

CREATE TABLE lidar_test
  x double precision,
  y double precision,
  z double precision,
  classification integer
ALTER TABLE lidar_test
  OWNER TO postgres;

puis exécuter la.

Puis nous allons insérer les données dans la table. Copiez la requête suivante dans l’éditeur de requêtes :

copy lidar_test(x,y,z,classification)
from 'C:/temp/2013 LiDAR 5105_54480.txt'
with delimiter ';'

puis exécuter la.

Maintenant nous allons pouvoir visualiser les données. Dans le panneau Query copiez la requête suivante :

select x,y,z,
WHEN classification=2 then '128:0:0'
WHEN classification=3 then '0:64:0'
WHEN classification=4 then '0:128:0'
WHEN classification=5 then '0:255:0'
WHEN classification=6 then '255:128:64'
WHEN classification=11 then '128:128:128'
END as rgb
from lidar_test as points_xyzrgb

Cliquez sur l’icone ‘Execute query

Et voici le résultat

Les valeurs de classification utilisées dans la requête sont issues de cette requête :
select distinct classification from lidar_test order by 1

Vous pouvez également obtenir des informations sur le fichier LAS avec l’outil lasinfo, dans une fenêtre DOS tapez la commande suivante :

lasinfo c:\temp\2013 LiDAR 5105_54480.las

a la fin du rapport vous obtenez la définition de la classification.

lasinfo for c:\temp\2013 LiDAR 5105_54480.las
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            1
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.2
  system identifier:          ''
  generating software:        'TerraScan'
  file creation day/year:     156/2013
  header size:                227
  offset to point data:       447
  number var. length records: 2
  point data format:          1
  point data record length:   28
  number of point records:    7938187
  number of points by return: 6742290 914660 231891 44542 4804
  scale factor x y z:         0.001 0.001 0.001
  offset x y z:               510499.94 5447999.8799999999 0
  min x y z:                  510499.940 5447999.880 34.110
  max x y z:                  511000.110 5448500.100 275.690
variable length header record 1 of 2:
  reserved             43707
  user ID              'LASF_Projection'
  record ID            34735
  length after header  88
  description          'Georeferencing Information'
    GeoKeyDirectoryTag version 1.1.0 number of keys 10
      key 1024 tiff_tag_location 0 count 1 value_offset 1 - GTModelTypeGeoKey: ModelTypeProjected
      key 2048 tiff_tag_location 0 count 1 value_offset 4269 - GeographicTypeGeoKey: GCS_NAD83
      key 2054 tiff_tag_location 0 count 1 value_offset 9102 - GeogAngularUnitsGeoKey: Angular_Degree
      key 2056 tiff_tag_location 0 count 1 value_offset 7019 - GeogEllipsoidGeoKey: Ellipse_GRS_1980
      key 2057 tiff_tag_location 34736 count 1 value_offset 0 - GeogSemiMajorAxisGeoKey: 6378137
      key 2058 tiff_tag_location 34736 count 1 value_offset 1 - GeogSemiMinorAxisGeoKey: 6356752.314
      key 2059 tiff_tag_location 34736 count 1 value_offset 2 - GeogInvFlatteningGeoKey: 298.2572221
      key 3072 tiff_tag_location 0 count 1 value_offset 26910 - ProjectedCSTypeGeoKey: UTM 10 northern hemisphere
      key 3076 tiff_tag_location 0 count 1 value_offset 9001 - ProjLinearUnitsGeoKey: Linear_Meter
      key 4099 tiff_tag_location 0 count 1 value_offset 9001 - VerticalUnitsGeoKey: Linear_Meter
variable length header record 2 of 2:
  reserved             43707
  user ID              'LASF_Projection'
  record ID            34736
  length after header  24
  description          'Double Param Array'
    GeoDoubleParamsTag (number of doubles 3)
      6.37814e+006 6.35675e+006 298.257
reporting minimum and maximum for all LAS point record entries ...
  X                   0     500170
  Y                   0     500220
  Z               34110     275690
  intensity           0        255
  return_number       1          5
  number_of_returns   1          5
  edge_of_flight_line 0          0
  scan_direction_flag 0          1
  classification      2         11
  scan_angle_rank   -19         22
  user_data          79         79
  point_source_ID 10217      10238
  gps_time 49786940.848411 49796462.877692
WARNING: there is coordinate resolution fluff (x10) in XYZ
overview over number of returns of given pulse: 5825940 1363444 563503 160908 24392 0 0
histogram of classification of points:
         2030423  ground (2)
         1836807  low vegetation (3)
          234836  medium vegetation (4)
         1811842  high vegetation (5)
         2022021  building (6)
            2258  road surface (11)


image_thumb100 onclusion.

Nous sommes arrivés à la fin de ce billet, je vous conseille la lecture d’un excellent tutorial de Paul Ramsey concernant les données LIDAR que vous trouverez a cette adresse  http://workshops.boundlessgeo.com/tutorial-lidar/ , il ne vous restera plus qu’a afficher les données en utilisant les fonctions présentées dans ce ‘tutorial lidar’.

by Jérôme ROLLAND (noreply@blogger.com) at July 28, 2014 10:08 AM

July 25, 2014

A GeoSpatial World

Plugin PgAdmin III : PostGISViewer suite

Plugin PgAdmin III : PostGISViewer  suite

Nouvelle version



J'ai mis en place la prise en compte des multi-géométries  :

J'ai rajouté la possibilité d'exécuter les requêtes présente dans l'éditeur SQL de PgAdmin III, mais pour que cela soit possible, il faut respecter certaines règles:
  • Un seul éditeur SQL de PgAdmin III doit être ouvert.
  • Pour que la ou les requête(s) soient exécutées, il ne faut pas être positionné sur un nom de table dans le navigateur d'objets de PgAdmin III , ou alors que la table soit déjà chargée dans le visualiseur.  Une table sélectionnée prendra toujours le pas sur une ou des requêtes.
  • Le premier champ doit être  géométrique ou une géométrie issue d'une fonction de traitement géométrique (ST_Buffer, ST_Intersection....) par requête sera utilisé pour l'affichage. Ce champ ou cette géométrie sera encapsulé par la fonction AsBinary et aura pour alias geom :
    • SELECT AsBinary(wkb_geometry) as geom ....
    • SELECT AsBinary(ST_Buffer(wkb_geometry, 1.0)) as geom...
    • SELECT AsBinary(ST_Intersection(a.wkb_geometry,b.wkb_geoemetry)) as geom... 
    • ....
  •  Le second champ doit permettre de déterminer le type de géométrie, il doit donc être encapsulé par la fonction GeometryType, un alias n'est pas nécessaire :
    • SELECT ..., GeometryType(wkb_geometry)...
    • SELECT ..., GeometryType(ST_Buffer(wkb_geometry,1.0))...
    • SELECT ..., GeometryType(ST_Intersection(a.wkb_geometry,b.wkb_geometry))...
    • ....

  • Chaque requête devra se terminer par un point virgule, ce qui permettra de pouvoir exécuter plusieurs requêtes a la suite.

Ci-dessous deux requêtes se terminant par des points virgules :
  • La première requête va charger la commune qui a pour nom 'Sainte-Croix-de-Quintillargues'
  • La seconde requête va charger tous les bâtiments de cette commune.

Après avoir lancé le plugin PostGISViewer, les deux requêtes sont exécutées et donne le résultat suivant  :

Les couches créées portent comme nom query avec un identifiant qui s'incrémente pour toute la cession du visualiseur.

Toutes suggestions, remarques pour l'amélioration de cet outil seront les bienvenues.

A suivre...

by Jérôme ROLLAND (noreply@blogger.com) at July 25, 2014 02:48 PM

July 20, 2014

A GeoSpatial World

PostGIS 3D Viewer


image n viewer 3D pour PostGIS

image pg3DViewer_title[6]

image ntroduction

Avec la version 2.0 PostGIS est entré dans le monde de la 3D, il est possible de stocker des géométries 3D avec l’ajout de la coordonnée Z dans toutes les géométries existantes et de stocker des géométries volumétriques avec l’introduction de deux nouveaux type  :

  • TIN(Triangulated Irregular Network, une collection de TRIANGLE)
  • POLYHEDRALSURFACE (une collection de POLYGON)

Cela fait déjà un an que la création d’un viewer pour visualiser les géométries 3D de PostGIS me trottait dans la tête suite à un échange avec Claus Nagel et Felix Kunde de virtualcitySYSTEMS à propos de l’outil d’import du format CityGML que j’avais réalisé CityGML vers PostGIS 2.0 , ils m’avaient conseillé alors de me diriger vers le développement d’un viewer 3D. Lorsque récemment j’ai repris contact avec eux pour leur parler de mon développement en cours, ils ont accueilli la nouvelle avec enthousiasme et ont bénéficié de versions beta, ce qui m’a grandement aidé pour mes choix dans la finalisation de mon application.

Aujourd’hui la 3D est de plus en plus présente dans le monde SIG avec le format CityGML et dans le monde de l’architecture avec le format IFC, importer ces formats dans une base de données PosgtgreSQL/PostGIS est une démarche naturelle car cela apporte toute la puissance de traitement des bases de données spatiale pour la gestion de ces données.

Pour le format CityGML la société virtualcitySYSTEMS met à disposition un schéma de base de données Open Source et gratuit pour le stockage et la gestion de ‘3D City Models’. Ce schéma de base de données est le reflet du modèle de données CityGML. Les données peuvent être importées via un outil d’import/export .
le lien pour accéder à leur site : http://virtualcitysystems.de/en/solutions.html#3dcitydb 

A ce stade du développement je n’affiche que des géométries de type POLYGON, MULTIPOLYGON, TIN et POLYHEDRALSURFACE. Ce qui permet d’afficher toutes type de géométries volumétriques répondant à cette contrainte. Je vais travailler rapidement sur les autres type de géométrie POINT, LINESTRING… , ce qui fera l’objet d’une nouvelle version de l’application dans les prochaines semaines.

J’ai développé cet outil en C# 2013 avec :

  • La librairie GDAL/OGR
[image66.png] GDAL - Geospatial Data Abstraction Library
  • ActiViz .Net un wrapper pour VTK
  ActiViz .Net

vtk-logo VTK
  • PostgreSQL 9.2
(thumbnail) PostgreSQL Windows
  • PostGIS 2.2.0 dev
adbadge_wide_240 Postgis


Pour l’intégration de données 3D j’ai testé l’outil 3DCityDB-Importer-Exporter http://http://www.3dcitydb.org/3dcitydb/downloads/ qui permet d’importer des fichiers au format CityGML dans une base de données PostgreSQL avec PostGIS (a partir de la version 2.0).


Voici une extraction du document 3DCityDB-v2_0_6-postgis-Tutorial.pdf concernant la partie PostGIS (en anglais ) :












image nstallation


Cliquez sur le lien pg3DViewer_setup.exe pour télécharger le programme d’installation, puis lancer l’exécution.

Cliquer sur le  bouton Suivant à chaque étape, puis sur le bouton Terminer.








image tilisation

Avant de commencer à utiliser pg3Dviewer vous devez avoir une base de données PostgreSQL avec la cartouche spatiale PostGIS version 2.0 ou supérieure. Cette base de données doit contenir des données 3D, l’idéal serait que vous ayez créé une base 3DCityDB et importé un ou plusieurs fichiers CityGML. Voici un lien ou vous pourrez télécharger des données CityGML Rotterdam 3D, demandez à notre ami Google de traduire la page cela peut aider, cela permet de découvrir un lien sur le site permettant de télécharger un fichier pdf contenant une carte avec tous les nom des quartiers téléchargeables http://www.rotterdam.nl/GW/Images/Gemeentewerken%20Home/Landmeten/overzicht%20buurten.pdf , il ne vous reste plus qu’a choisir les quartiers dans la liste sous le lien.

Vous pouvez quand même faire les premiers tests sans avoir téléchargé de données.

image Double cliquez sur l’icone créé par le programme d’installation sur le bureau pour lancer l’application.



Connecter vous à votre base de données 3D



Commencez par cliquer sur cet icône pour vous connecter a une base de données PostgreSQL/PostGIS version 2.0 ou supérieur contenant des données 3D ou prête à en recevoir.

  • Choisissez localhost pour serveur si votre base de données est en local sur votre machine ou bien saisissez l’adresse IP de votre serveur
  • le port est par défaut 5432, changez le si nécessaire
  • saisissez l’utilisateur
  • saisissez le mot de passe
  • choisissez une base de donnée dans la combobox
  • cliquez sur le bout OK pour valider votre saisie.


Visualiser une géométrie de type POLYHEDRALSURFACE

Dans le panneau Query copiez la requête suivante :

SELECT st_force_collection(
        ((0 0 0, 0 1 0, 1 1 0, 1 0 0, 0 0 0)),
        ((0 0 0, 0 0 1, 0 1 1, 0 1 0, 0 0 0)),
        ((0 0 0, 1 0 0, 1 0 1, 0 0 1, 0 0 0)),
        ((0 0 1, 1 0 1, 1 1 1, 0 1 1, 0 0 1)),
        ((1 0 0, 1 1 0, 1 1 1, 1 0 1, 1 0 0)),
        ((1 1 0, 0 1 0, 0 1 1, 1 1 1, 1 1 0))


Cliquez sur l’icone ‘Execute query

Vous obtenez ce résultat.

Interactions de la souris avec la fenêtre de rendu:
- Bouton gauche : Rotation
- Roulette          : Zoom
- Bouton droit    : Zoom dynamique
- Bouton gauche + touche majuscule : Pan
Testez les différentes associations entre les touches Majuscule, Ctrl et alt et  les boutons de la souris.

Interactions touches claviers avec la fenêtre de rendu :
- Touche w : passe en mode affichage fil de fer.
- Touche s  : passe en mode affichage surfacique.
La fenêtre de rendu après rotation. image

Avant de continuer une petite explication sur la requête utilisé précédemment :

SELECT st_force_collection(
        ((0 0 0, 0 1 0, 1 1 0, 1 0 0, 0 0 0)),
        ((0 0 0, 0 0 1, 0 1 1, 0 1 0, 0 0 0)),
        ((0 0 0, 1 0 0, 1 0 1, 0 0 1, 0 0 0)),
        ((0 0 1, 1 0 1, 1 1 1, 0 1 1, 0 0 1)),
        ((1 0 0, 1 1 0, 1 1 1, 1 0 1, 1 0 0)),
        ((1 1 0, 0 1 0, 0 1 1, 1 1 1, 1 1 0))


Comme vous avez du le remarquer la fonction ST_GeomFromText est encadré par la fonction ST_Force_Collection. Pourquoi rajouter cette fonction. Si vous exécuter cette requête sous pgAdmin III en encadrant le tout par ST_AsText vous obtenez comme résultat ceci :
POLYGON Z ((0 0 0,0 1 0,1 1 0,1 0 0,0 0 0)),
POLYGON Z ((0 0 0,0 0 1,0 1 1,0 1 0,0 0 0)),
POLYGON Z ((0 0 0,1 0 0,1 0 1,0 0 1,0 0 0)),
POLYGON Z ((0 0 1,1 0 1,1 1 1,0 1 1,0 0 1)),
POLYGON Z ((1 0 0,1 1 0,1 1 1,1 0 1,1 0 0)),
POLYGON Z ((1 1 0,0 1 0,0 1 1,1 1 1,1 1 0)))
Nous sommes passé d’une géométrie de type POLYHEDRALSURFACE à une géométrie de type GEOMETRYCOLLECTION Z contenant des POLYGON Z, mais rappelez vous une géométrie de type POLYHEDRALSURFACE est aussi une collection de POLYGON donc jusque la tout va bien.
Revenons à la question initiale pourquoi utiliser la fonction ST_Force_Collection, tout simplement parce la librairie GDAL que j’utilise dans mon développement ne reconnait pas encore les géométries de type POLYHDERALSURFACE. Cette astuce permet de contourner la limitation présente.

Visualiser une géométrie de type TIN

Dans le panneau Query copiez la requête suivante :

SELECT st_polygon(
            ((0 0 0, 1 0 0, 0 1 0, 0 0 0)),
            ((0 0 0, 1 0 0, 0 0 1, 0 0 0)),
            ((0 0 0, 0 0 1, 0 1 0, 0 0 0)),
            ((0 0 1, 1 0 0, 0 1 0, 0 0 1))

    ) as geom
SELECT st_forceCollection(st_collect(geom))
FROM res


Cliquez sur l’icone ‘Execute query

Vous obtenez ce résultat après rotation.


La requête utilisée nécessite une explication :

SELECT st_polygon(
            ((0 0 0, 1 0 0, 0 1 0, 0 0 0)),
            ((0 0 0, 1 0 0, 0 0 1, 0 0 0)),
            ((0 0 0, 0 0 1, 0 1 0, 0 0 0)),
            ((0 0 1, 1 0 0, 0 1 0, 0 0 1))

    ) as geom
SELECT st_forceCollection(st_collect(geom))
FROM res

Comme pour la requête concernant une géométrie de type POLYHEDRALSURFACE, les géométries de type TIN ne sont pas encore reconnues par la librairie GDAL. Pour rappel une géométrie de type TIN est aussi une collection de TRIANGLE que ne reconnait pas la librairie GDAL ce qui explique la requête légèrement plus complexe.
Si vous exécutez la requête suivante sous PgAdmin III :

SELECT St_AsText(st_force_collection(
            ((0 0 0, 1 0 0, 0 1 0, 0 0 0)),
            ((0 0 0, 1 0 0, 0 0 1, 0 0 0)),
            ((0 0 0, 0 0 1, 0 1 0, 0 0 0)),
            ((0 0 1, 1 0 0, 0 1 0, 0 0 1))

Vous obtenez ceci :

TRIANGLE Z ((0 0 0,1 0 0,0 1 0,0 0 0)),
TRIANGLE Z ((0 0 0,1 0 0,0 0 1,0 0 0)),
TRIANGLE Z ((0 0 0,0 0 1,0 1 0,0 0 0)),
TRIANGLE Z ((0 0 1,1 0 0,0 1 0,0 0 1)))

Que ne reconnait pas la librairie GDAL, ce qui explique la première partie de la requête qui recrée des géométries de type POLYGON à partir des géométries de type TRIANGLE. La seconde partie de la requête (ou l’on retrouve la fonction ST_Force_Collection) recrée une GEOMETRYCOLLECTION Z contenant des POLYGON Z donc interprétable par la librairie GDAL.

Pour simplifier les requêtes sur des géométries de type TIN, la création d’une fonction opérant la transformation me parait indispensable.


Fonction passant d’une géométrie de type TIN à une géométrie de type GEOMETRYCOLLECTION Z

-- Function: ST_TinToGeomCollection(geometry, INTEGER)

-- DROP FUNCTION ST_TinToGeomCollection(geometry, INTEGER);

CREATE OR REPLACE FUNCTION ST_TinToGeomCollection(geometry, integer)
  RETURNS geometry AS
    sql text;
    rec record;
'WITH res AS
          SELECT st_polygon(st_boundary((st_dump(geometry('||quote_literal($1::text)||'))).geom),'||$2||') as geom
         SELECT st_forceCollection(st_collect(geom)) as geom FROM res
       RETURN rec.geom;
  COST 100;
ALTER FUNCTION ST_TinToGeomCollection(geometry,integer)
  OWNER TO postgres;
GRANT EXECUTE ON FUNCTION ST_TinToGeomCollection(geometry, integer) TO public;
GRANT EXECUTE ON FUNCTION ST_TinToGeomCollection(geometry, integer) TO postgres;

Vous allez créer la fonction dans votre base de données 3D :

  • Copier le contenu du cadre ci-dessus et collez dans la fenêtre de requêtes de PgAdmin III
  • lancez l’exécution, la fonction est créée.

la requête pour l’affichage de géométrie de type TIN devient alors:

select st_astext(ST_TinToGeomCollection(
            ((0 0 0, 1 0 0, 0 1 0, 0 0 0)),
            ((0 0 0, 1 0 0, 0 0 1, 0 0 0)),
            ((0 0 0, 0 0 1, 0 1 0, 0 0 0)),
            ((0 0 1, 1 0 0, 0 1 0, 0 0 1))

            ) ,0))
le second paramètre de la fonction ST_TinToGeomCollection étant le SRID de la géométrie, 4326 par exemple si vous êtes dans un système de coordonnées WGS84. Si vous ne le connaissez pas saisissez la valeur 0.

Affichage de données dans une base 3DCityDB contenant des données au format CityGML

Dans le panneau Query copiez la requête suivante :

WITH wallsurface as
SELECT a.id,c.root_id,st_collect(c.geometry) AS g,'255:128:0'::text AS rgb
FROM building a,thematic_surface b, surface_geometry c
WHERE b.building_id=a.id
AND b.type='WallSurface'
AND c.root_id=b.lod2_multi_surface_id
GROUP BY a.id,c.root_id
), roofsurface as
SELECT a.id,c.root_id,st_collect(c.geometry) AS g,'255:0:0'::text AS rgb
FROM building a,thematic_surface b, surface_geometry c
WHERE b.building_id=a.id
AND b.type='RoofSurface'
AND c.root_id=b.lod2_multi_surface_id
GROUP BY a.id,c.root_id
select id,root_id,g as geom,rgb
from wallsurface
union all
select id,root_id,g as geom,rgb
FROM roofsurface


Cliquez sur l’icone ‘Execute query

Dans la requête il est possible d’affecter une couleur à un type de géométrie, la syntaxe est le code R:G:B entre quotte suivi de AS rgb :
’255:128:0’ AS rgb  (wallsurface)
‘255:0:0’     AS rgb  (rootsurface)



image Sortie de l’application.
image Connexion à une base de données.
image Panneau Requête
image Remet à blanc la fenêtre de requêtes
image Ouvre un fichier requête, qui ne doit contenir qu’une seule requête.
image Exécution de la requête.
image Collapse le panneau sur la gauche.
image Panneau affichage 3D
image Change la cuouleur du fond, Blanc ou Noir.
image Zoom Full.
image Zoom Plus.
image Zoom Moins.
image Efface le contenu de la fenêtre de rendu, toutes les géométries précédemment chargées sont supprimées.
image Panneau données attributaires
image Collapse le panneau sur le bas.

image onclusion.

Ce billet met à votre disposition un outil permettant de visualiser des géométries 3D stockées dans une base de données PostGIS 2.0 ou supérieure. Il me reste à implémenter l’affichage des points et des lignes.

Toutes suggestions d’amélioration seront les bienvenues, j’attends vos retours.

by Jérôme ROLLAND (noreply@blogger.com) at July 20, 2014 04:16 PM

A GeoSpatial World

PostGIS 3D Viewer introduction


image_thumb178 n viewer 3D pour PostGIS

image_thumb7[1] pg3DViewer_title6_thumb8

image_thumb175 ntroduction

Avec la version 2.0 PostGIS est entré dans le monde de la 3D, il est possible de stocker des géométries 3D avec l’ajout de la coordonnée Z dans toutes les géométries existantes et de stocker des géométries volumétriques avec l’introduction de deux nouveaux type  :

  • TIN(Triangulated Irregular Network, une collection de TRIANGLE)
  • POLYHEDRALSURFACE (une collection de POLYGON)

Cela fait déjà un an que la création d’un viewer pour visualiser les géométries 3D de PostGIS me trottait dans la tête suite à un échange avec Claus Nagel et Felix Kunde de virtualcitySYSTEMS à propos de l’outil d’import du format CityGML que j’avais réalisé CityGML vers PostGIS 2.0 , ils m’avaient conseillé alors de me diriger vers le développement d’un viewer 3D. Lorsque récemment j’ai repris contact avec eux pour leur parler de mon développement en cours, ils ont accueilli la nouvelle avec enthousiasme et ont bénéficié de versions beta, ce qui m’a grandement aidé pour mes choix dans la finalisation de mon application.

Aujourd’hui la 3D est de plus en plus présente dans le monde SIG avec le format CityGML et dans le monde de l’architecture avec le format IFC, importer ces formats dans une base de données PosgtgreSQL/PostGIS est une démarche naturelle car cela apporte toute la puissance de traitement des bases de données spatiale pour la gestion de ces données.

Pour le format CityGML la société virtualcitySYSTEMS met à disposition un schéma de base de données Open Source et gratuit pour le stockage et la gestion de ‘3D City Models’. Ce schéma de base de données est le reflet du modèle de données CityGML. Les données peuvent être importées via un outil d’import/export .

Affichage de données CityGML dans le viewer 3D



Continuer la lecture/Continue reading “PostGIS 3D Viewer”

by Jérôme ROLLAND (noreply@blogger.com) at July 20, 2014 04:12 PM

July 15, 2014

Stephen Mather

LiDAR and pointcloud extension pt 6

There’s all sorts of reasons why what I’ve been doing so far is wrong (see the previous 5 posts).  More on that later. But in case you needed 3D visualization of the chipping I’ve been working through, look no further:

Isometric 3D visualization of 3D chipping

by smathermather at July 15, 2014 12:32 AM

July 13, 2014

Boston GIS (Regina Obe, Leo Hsu)

PostGIS Minimalist X3D Viewer for PHP and ASP.NET

I've been thinking for a long time about creating an HTML based X3D viewer to visualize PostGIS 3D geometries. Now that I've started to document the new SFCGAL features, the need became more pressing. So I've created a very rudimentary one. You can check out the code on my github page https://github.com/robe2/postgis_x3d_viewer. I'm hoping to expand it in the future to allow people to pick materials for their geometries and also allow people to save the generated scene as a full X3D document.

x3d viewer for postgis

It utilizes the X3DOM Javascript library and JQuery for injecting X3D objects into the scene. I'm really excited about X3DOM JS. It was fairly trivial to work with and integrate the ST_AsX3D function. Utilizing it will help me stress test the ST_AsX3D function and patch up the holes in it. When I developed the function I was only interested in 3D geometries and didn't think too much about collections either. So as a result the function is kinda broken when working with 2D or Geometry Collections.

Features and Limitations

  • It has one scene and every query you do writes into the scene. This allows you to dump a bunch of objects on the same white board using different queries.
  • Similar to Raster Viewer I created a while ago, it's got a query log and a color picker. The color picker allows you to color each query output differently. I hope to augment this to allow picking different materials too.
  • Requires either .NET or PHP and of course a PostGIS 2+ enabled database
  • Similar to the Web raster viewer I created, it can only handle a query that outputs a single geometry (geometry mode) or an X3D scene snippet (raw mode)

Continue reading "PostGIS Minimalist X3D Viewer for PHP and ASP.NET"

by Regina Obe (nospam@example.com) at July 13, 2014 02:15 AM

July 12, 2014

Stephen Mather

LiDAR and pointcloud extension pt 5

Now for the crazy stuff:

Plaid-like overlay of vertical patches

The objective is to allow us to do vertical and horizontal summaries of our data. To do this, we’ll take chipped LiDAR input and further chip it vertically by classifying it.

First a classifier for height that we’ll use to do vertical splits on our point cloud chips:

CREATE OR REPLACE FUNCTION height_classify(numeric)
  RETURNS integer AS
	CASE WHEN $1 < 10 THEN 1
	     WHEN $1 >= 10 AND $1 < 20 THEN 2
	     WHEN $1 >= 20 AND $1 < 30 THEN 3
	     WHEN $1 >= 30 AND $1 < 40 THEN 4
	     WHEN $1 >= 40 AND $1 < 50 THEN 5
	     WHEN $1 >= 50 AND $1 < 60 THEN 6
	     WHEN $1 >= 60 AND $1 < 70 THEN 7
	     WHEN $1 >= 70 AND $1 < 80 THEN 8
	     WHEN $1 >= 80 AND $1 < 90 THEN 9
	     WHEN $1 >= 90 AND $1 < 100 THEN 10
	     WHEN $1 >= 100 AND $1 < 110 THEN 11
	     WHEN $1 >= 110 AND $1 < 120 THEN 12
	     WHEN $1 >= 120 AND $1 < 130 THEN 13
	     WHEN $1 >= 130 AND $1 < 140 THEN 14
	     WHEN $1 >= 140 AND $1 < 150 THEN 15
	     WHEN $1 >= 150 AND $1 < 160 THEN 16
	     WHEN $1 >= 160 AND $1 < 170 THEN 17	     
	     WHEN $1 >= 170 AND $1 < 180 THEN 18
	     ELSE 0
	FROM test;
  COST 100;

And now, let’s pull apart our point cloud, calculate heights from approximate ground, then put it all back together in chips grouped by height classes and aggregates of our original chips. Yes, I will explain more later. And don’t do this at home– I’m not even sure if it makes sense yet… :

First we explode the patches into points,
 and along the way grab the minimum value
for the patch, and the id
WITH lraw AS (
    SELECT PC_Explode(pa) AS ex,  PC_PatchMin(pa, 'Z') AS min, id
    FROM test1
Calculate our height value ( Z - min )
heights AS (
    SELECT PC_Get(ex, 'X') || ',' || PC_Get(ex, 'Y') || ',' -- grab X and Y
    || PC_Get(ex, 'Z') - min                -- calculate height
    || ',' ||
-- And return our remaining dimensions
    PC_Get(ex, 'Intensity') || ',' || PC_Get(ex, 'ReturnNumber') || ',' || PC_Get(ex, 'NumberOfReturns') || ',' ||
    PC_Get(ex, 'ScanDirectionFlag') || ',' || PC_Get(ex, 'EdgeOfFlightLine') || ',' ||
    PC_Get(ex, 'Classification') || ',' || PC_Get(ex, 'ScanAngleRank') || ',' || PC_Get(ex, 'UserData') || ',' ||
    PC_Get(ex, 'PointSourceId') || ',' || PC_Get(ex, 'Time') || ',' || PC_Get(ex, 'PointID') || ',' || PC_Get(ex, 'BlockID')
    AS xyheight, PC_Get(ex, 'Z') - min AS height, id FROM lraw
-- Now we can turn this aggregated text into an array and then a set of points
heightcloud AS (
    SELECT PC_MakePoint(1, string_to_array(xyheight, ',')::float8[]) pt, height_classify(height) heightclass, id FROM heights
-- Finally, we bin the points back into patches grouped by our heigh class and original ids.
SELECT PC_Patch(pt), id / 20 AS id, heightclass FROM heightcloud GROUP BY id / 20, heightclass;

The resultant output should allow us to query our database by height above ground and patch. Now we can generate vertical summaries of our point cloud. Clever or dumb? It’s too early to tell.

by smathermather at July 12, 2014 01:56 AM

July 11, 2014

Stephen Mather

LiDAR and pointcloud extension pt 4

Height cloud overlayed with patchs

Now, we navigate into unknown (and perhaps silly) territory. Now we do point level calculations of heights, and drop the new calculated points back into the point cloud as though it were a Z value. I don’t recommend this code as a practice– this is as much as me thinking aloud as anything Caveat emptor:

First we explode the patches into points,
 and along the way grab the minimum value
for the patch, and the id
WITH lraw AS (
	SELECT PC_Explode(pa) AS ex,  PC_PatchMin(pa, 'Z') AS min, id
	FROM test1
Calculate our height value ( Z - min ) 
heights AS (
	SELECT PC_Get(ex, 'X') || ',' || PC_Get(ex, 'Y') || ',' -- grab X and Y
	|| PC_Get(ex, 'Z') - min				-- calculate height
	|| ',' ||
-- And return our remaining dimensions
	PC_Get(ex, 'Intensity') || ',' || PC_Get(ex, 'ReturnNumber') || ',' || PC_Get(ex, 'NumberOfReturns') || ',' ||
	PC_Get(ex, 'ScanDirectionFlag') || ',' || PC_Get(ex, 'EdgeOfFlightLine') || ',' ||
	PC_Get(ex, 'Classification') || ',' || PC_Get(ex, 'ScanAngleRank') || ',' || PC_Get(ex, 'UserData') || ',' ||
	PC_Get(ex, 'PointSourceId') || ',' || PC_Get(ex, 'Time') || ',' || PC_Get(ex, 'PointID') || ',' || PC_Get(ex, 'BlockID')
	AS xyheight, id FROM lraw
-- Now we can turn this aggregated text into an array and then a set of points
heightcloud AS (
	SELECT PC_MakePoint(1, string_to_array(xyheight, ',')::float8[]) pt, id FROM heights
-- Finally, we bin the points back into patches
SELECT PC_Patch(pt) FROM heightcloud GROUP BY id / 20;

by smathermather at July 11, 2014 03:11 AM

July 09, 2014

Stephen Mather

LiDAR and pointcloud extension pt 3

Digging a little deeper. Ran the chipper on a smaller number of points and then am doing a little hacking to get height per chip (if you start to get lost, go to Paul Ramsey’s tutorial).

Here’s my pipeline file. Note the small chipper size– 20 points per chip.

<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
  <Writer type="drivers.pgpointcloud.writer">
    <Option name="connection">dbname='pggis' user='pggis' host='localhost'</Option>
    <Option name="table">test1</Option>
    <Option name="srid">3362</Option>
    <Filter type="filters.chipper">
      <Option name="capacity">20</Option>
      <Filter type="filters.cache">
        <Reader type="drivers.las.reader">
          <Option name="filename">60002250PAN.las</Option>
          <Option name="spatialreference">EPSG:3362</Option>

Easy enough to load (though slow for the sake of the chip size):

pdal pipeline las2pg.xml

Now we can do some really quick and dirty height calculations per chip — like what’s the difference between the minimum and maximum value in the chip:

DROP TABLE IF EXISTS test1_patches;

CREATE TABLE patch_heights AS
  pa::geometry(Polygon, 3362) AS geom,
  PC_PatchAvg(pa, 'Z') AS elevation,
  PC_PatchMax(pa, 'Z') - PC_PatchMin(pa, 'Z') AS height
FROM test1;

Patch heights showing vegetation and open areas

by smathermather at July 09, 2014 02:55 AM

Stephen Mather

LiDAR and pointcloud extension pt 2

As much for my edification as anyones:
Want to get at just a few points within the pointcloud extension? A with query will get you there:


-- get me some patches (our lidar points are stored in patches)
patches AS (
  SELECT pa FROM test
-- Now, I want all the points from each patch
pa_pts AS (
  SELECT PC_Explode(pa) AS pts FROM patches
-- OK. Can I have that as standard old points, PostGIS style?
SELECT pts::geometry FROM pa_pts;
--k thanx

LiDAR points shown within their patches

by smathermather at July 09, 2014 02:35 AM

Stephen Mather

LiDAR and pointcloud extension

Paul Ramsey has a great tutorial on using the pointcloud extension with PostgreSQL / PostGIS: http://workshops.boundlessgeo.com/tutorial-lidar/

Image of lidar chipped into 400 point envelopes

You can get point cloud and all that goodness running a variety of ways. Probably the easiest is to download OpenGeo Suite from Boundless: http://boundlessgeo.com/solutions/opengeo-suite/download/

If you are an Ubuntu user, try a docker instance to run PostGIS with PDAL, pointcloud, etc in a container:


Also, I’m working toward a simple installer script for Ubuntu 14.04 and for Vagrant, which is a fork of Vincent’s docker install:


One tip for those using, say QGIS instead of GeoServer for connecting and visualizing your data. Change any views created on the fly to tables, e.g.:

CREATE VIEW medford_patches AS
  pa::geometry(Polygon, 4326) AS geom,
  PC_PatchAvg(pa, 'Z') AS elevation
FROM medford;

change to:

CREATE TABLE medford_patches AS
pa::geometry(Polygon, 4326) AS geom,
PC_PatchAvg(pa, ‘Z’) AS elevation
FROM medford;

CREATE VIEW medford_patches AS
  pa::geometry(Polygon, 4326) AS geom,
  PC_PatchAvg(pa, 'Z') AS elevation,
FROM medford;

QGIS doesn’t like those views with casts for some reason… . without an id.

If the code I’m working on now works, it will do some height calculations using pointcloud, will show up on Github, and may include a little bonus pointcloud tutorial here. Stay tuned.

by smathermather at July 09, 2014 02:09 AM

July 07, 2014

Paul Ramsey

FOSS4G 2014 in Portland, Oregon, September 2014

Just a quick public service announcement for blog followers in the Pacific Northwest and environs: you've got a once in a not-quite-lifetime opportunity to attend the "Free and Open Source Software for Geospatial" (aka FOSS4G) conference this year in nearby Portland, Oregon, a city so hip they have trouble seeing over their pelvis.

Anyone in the GIS / mapping world should take the opportunity to go, to learn about what technology the open source world has available for you, to meet the folks writing the software, and the learn from other folks like you who are building cool things.

September 8th-13th, be there and be square.

by Paul Ramsey (noreply@blogger.com) at July 07, 2014 09:07 PM

July 03, 2014


QGIS versioning plugin

We developped a tool to manage data history, branches, and to work offline with your PostGIS-stored data and QGIS. Read more to get the insight of QGIS Versioning plugin.

The QGIS plugin is available in QGIS plugin repository, and you can fork it on GitHub too !

Capturing data offline


Even if the necessity of data versioning often arises, no standard solution exist for databases.

The GeoGit project proposes a solution to store versioned geospatial data. There is also an existing plugin for QGIS, pgversion, which uses views and triggers to version a PostGIS database.

Unfortunately those solutions were not adapted to the specific constrains of this project, namely: using a PostGIS database as the main repository (excludes GeoGit) and the ability to working off-line (excludes pgversion).

The project we developed QGIS/PostGIS versioning looks like the following.


The database is stored in a PostGIS schema, the complete schema is versioned (i.e. not individual tables). Revisions are identified by a revision number. A revision table in the versioned schema, called 'revisions', keeps track of the date, author, commit message and branch of all revisions.

Once a table structure is defined, three operations can be performed on rows: INSERT, DELETE and UPDATE. To be able to track history, every row is kept in the tables. Deleted rows are marked as such and updated rows are a combined insertion-deletion where the deleted and added rows are linked to one another as parent and child.

A total of five columns are needed for versioning the first branch:

a unique identifier across the table
revision when this record was inserted
last revision for which this record exist (i.e. revision when it was deleted minus one)
in case the row has been inserted as the result of an update, this fields stores the hid of the row that has been updated
in case the row has been marked as deleted as the result of an update, this field stores the hid of the row that has been inserted in its place.

For each additional branch, four additional columns are needed (the ones with the prefix branch_).

If the branch_rev_begin is null, it means that a row belongs to another branch.

SQL views are used to see the database for a given revision number. If we note 'rev' the revision we want to see. For each table, the condition for a row to be present is the view is:

(branch_rev_end IS NULL OR branch_rev_end >= rev) AND branch_rev_begin <= rev

In the special case of the current revision, or head revision, the condition reads:

branch_rev_end IS NULL AND branch_rev_begin IS NOT NULL
Since elements are not deleted (but merely marked as such) from an historized table, care must be taken with the definition of constrains, in particular the conceptual unicity of a field values.

Withing the PostGIS database, the views on revisions must be read-only and historized tables should not be edited directly. This is a basic principle for version control: editions must be made to working copies an then committed to the database. Please note that by default PostGIS 9.3 creates updatable views.

Workflow schema

This setup allows for multiple users to use and edit data offline from a central repository, and commit their modifications concurrently.

Working copies

Two kinds of working copies are available:

SpatiaLite working copies
They are meant to be used off-line. They consist of the versioned tables of a given versioned database (i.e. PostGIS schema) or any subset. For each table, only the elements that have not been marked as deleted in the head revision need to be present. Furthermore only a subset of the elements the user needs to edit can be selected (e.g. a spatial extend). To create a working copy (i.e. to checkout), tables from the versioned schema (or the aforementioned subsets) are converted to a SpatiaLite database using ogr2ogr.

PostGIS working copies
They are meant to be used when the connection to the original database will remain available. They are quite similar to pgversion working copies since they only store differences from a given revision (the one checked out).

The following description is aimed at understanding the inner workings of the qgis versioning plugin. The user does not need to perform the described operations manually.

For each versioned table in the working copy, a view is created with the suffix _view (e.g. mytable_view). Those views typically filters out the historization columns and shows the head revision. A set of triggers is defined to allow updating on those views (DELETE, UPDATE and INSERT).

The DELETE trigger simply marks the end revision of a given record.

The INSERT trigger create a new record and fills the branch_rev_begin field.

The UPDATE trigger create a new record and fills the branch_rev_begin and branch_parent fields. It then marks the parent record as deleted, and fills the branch_rev_end and branch_child fields.

Updating the working copy

Changes can be made to the database while editing the working copy. In order to reconcile those edition, the user needs to update the working copy.

When updating, a set of records can be in conflicts: the records for which the end revision has been set since the initial checkout or last update if any.

Multiple editions can be made to the same record. Therefore the child relation must be followed to the last child in order to present tu user with the latest state of a given conflicting feature.

Conflicts are stored in a table and identified with a conflict id and the tag 'theirs' or 'mine'. A DELETE trigger on this table is used for conflict resolution. On deletion of 'mine', the working copy edition is discarded, on deletion of 'theirs' the working copy edition is appended to the feature history (i.e. the working copy feature becomes a child of the last state of the feature in the historized database).

Committing the editions to the versionned database

If a working copy is up to date, the editions can be integrated in the versioned database. This operation consists simply in the insertion of a record in the revisions table, and, for each versioned table, the update of rows that are different and inserting rows that are not present.


A branch can be created from any revision by adding the four history columns and setting the branch_rev_begin field of features that are present in their revision.

Plugin interface tutorial

Groups are used for all versioning operations in QGIS since the versions are for a complete PostGIS schema or SpatiaLite database. The versioning toolbar will change depending on the group selected in the QGIS legend.

The group elements must share the same connection information (i.e. share the same database and schema for PostGIS working copies and revision views or share same SpatiaLite database for SpatiaLite working copies).

Versioning a PostGIS schema

Starting with an unversioned database, import a number of layers from a schema that needs to be versioned into a QGIS project. Once the layers are imported, they must be grouped together.

Selecting the newly created group will cause the versioning toolbar to display the historize button (green V). On click a confirmation is requested to version the database schema.

The versioned layers are imported in a new group and the original layers are removed from the project.

The symobology is not kept in the process.

Working with a versioned PostGIS schema

Versioned layers can be imported in QGIS. The layers must be from a head revision or a view on any revision.

Once the layers are in QGIS, they must be grouped.

For PostGIS groups at head revision, the versioning plugin allows the user to create a SpatiaLite or a PostGIS working copy, create a view on a given revision or create a branch. A corresponding group will be imported in QGIS.

If the user chooses to create a SpatiaLite working copy, he will be asked to select a file to store the working copy.

Commiting changes

Once the working copy is imported in QGIS, the user can start edition of its layers. With SpatiaLite working copies, this edition can be done off-line.

When the user is done with edition, he can commit the changes to the database and if commit is feasible (i.e. the working copy is up to date with the versioned database), he will be prompted for a commit message and subsequently be informed of the revision number he committed.

If the commit is not feasible, the user will be informed that he must update his working copy prior to commit.

Resolving conflicts

Conflicts are detected during update, the user is informed, and conflicts layers are imported into QGIS.

To resolve conflicts, the user can open the conflict layer's attribute table. Selected entries are also selected on the map canvas and the user can decide which version, either his or the database's, he wants to keep. User version is tagged with 'mine' and database version with 'theirs'. The conflict is resolved by deleting the unwanted entry in the conflict layer.

On deletion of one conflict entry, both entries are removed (by a trigger) but the attribute table (and canvas) are not refreshed. As a workaround, the user can close and re-open the attribute table to see the actual state of the conflict table.

Once the conflict table is empty, the commit can be done.


Due to design choices and tools used for conversion to SpatiaLite, a number of restrictions apply to the versioned database:

  • schemas, tables and branch names should not have space, caps or quotes
  • tables must have primary keys
  • columns are lowercase (because of conversion to SpatiaLite) but can have spaces (not that it's recommended)
  • geometry column is geom in PostGIS, GEOMETRY in SpatiaLite

Do not edit OGC_FID or ROWID
The constrains on the tables are be lost in the PostGIS to SpatiaLite conversion.

Known bug

The conflict layer won't be loaded automatically is it has no geometry. The user will have to load it manually.

by Vincent Mora at July 03, 2014 10:00 AM

June 30, 2014


QGIS 2.4 release out, Oslandia inside

There is a new QGIS release out : version 2.4, codename Chugiak is now available. Binary packages for your platform have been generated, and you can directly download and try out this new release of the famous Desktop GIS software.

QGIS 2.4 has a lot of new features in all of its components. There is a visual changelog available where you can discover QGIS improvements.

Oslandia inside

Oslandia is a QGIS core contributor, and we have been busy improving QGIS 2.4. We contributed to various of these new features. Here are a few enhancements we developped.

Predefined scales mode for atlas maps

When working with atlas map items, you can now specify a predefined scale mode for the map. It will use the best fitting option from the list of predefined scales in you your project properties settings (see Project -> Project Properties -> General -> Project Scales to configure these predefined scales).

This feature has been funded by the city of Uster

New Inverted Polygon renderer

The biggest feature Oslandia developped is the inverted Polygon renderer. This feature has been funded by Agence de l'Eau Adour-Garonne and mainly developped by Hugo Mercier.

A new renderer has been added for polygon features, allowing you to style everything outside your polygons. This can be useful for highlighting areas, or for creating cartographic masks. When used with new shapeburst style, you can now produce output as shown in the image for this entry.

New Mask Plugin

Alongside with the inverted polygon renderer feature, Oslandia developped a new Mask plugin. It enables you to make atlases focusing on the specific feature you are interested in, occulting the rest with a really nice effect. Furthermore, it helps masking the labels when you mask geometry objects.

You can read more details on these features on this blog post.

This plugin has also be funded by Agence de l'Eau Adour Garonne.

Layered SVG export

Another feature implemented in this version is the ability to export layered SVG files. Beforehand, all features were exported as a single layer, whatever the QGIS layer was. Now you can use Inkscape or Illustrator and their layering capabilities to finish the design of your map with greater ease of use. There also is an option to vectorize labels.

This feature has been funded by Agence de Développement du Grand Amiénois (ADUGA) .

WAsP format support

The WAsP format is the standard format for roughness and elevation in the wind science field. This format was not supported by QGIS until recently, when Vincent Mora added WAsP to QGIS supported GIS file formats. In fact, we did better as we implemented WAsP support in GDAL/OGR , so that any software using this library is now able to read and write WASP files. WAsP is available starting from GDAL/OGR >= 1.11.0.

This was an opportunity to add Vincent Mora as an official GDAL/OGR commiter, in charge of maintaining this driver. This feature will enable wind management operations to be completed on QGIS with a better user experience. No more file conversion before working on the GIS side. We also developped a companion plugin to manage data simplification when needed. It is available in QGIS plugins repository.

With this work, QGIS becomes a great complement to opensource computational wind engineering softwares like ZephyTools .

This work has been funded by La Compagnie du Vent

Epanet Plugin

Oslandia has integrated the EPANET water distribution model inside QGIS Processing, as a plugin. You can read more on this plugin on this blog .

Epanet integration has been funded by European funds and the GIS office of Apavil, Romania.

Vizitown Plugin

Vizitown is part of our efforts on 3D GIS development , alongside PostGIS 3D and more. It is a QGIS plugin allowing users to display QGIS layers in 3D in a Three.js / WebGL environment, in a browser. It can leverage PostGIS 3D, and display live data from the database, as well as other sources of data. It can display DEM, a raster background, 2D vector data draped on the DEM, 2.5D data (e.g. buildings), or real 3D Meshes. The user can set a symbology in QGIS and see the modifications live in the browser in 3D.

You can see Vizitown in action on Youtube . Vizitown has been developped with IG3 students from ESIPE .

Multiple bugfixes

Oslandia also work continuously on improving QGIS quality, and we try to fix as many bugs as we can.

These bugfixes are funded by our QGIS support offer clients, and also by the french Ministère de l'environnement and Agence de l'Eau Adour-Garonne.

What next ?

We continue to work on new QGIS features, corrections, refactoring and integration with other tools. We namely studied a possible unification of all database-like features in QGIS, using SQLITE/Spatialite features. We intend to work on Native Read+Write support of Mapinfo TAB files.

We offer a wide range of services around QGIS, be it for training, assistance, development, or consulting in general.

We also propose various support opportunities for QGIS . This is the best way for you to improve the quality of this software, contribute to its development, and ensure that you can work in good conditions without having to worry about potential bugs. Our team of experienced engineers, who contribute to QGIS core, will be available in any case of critical bug.

We can offer you personalized support, with specific conditions and fares. Do not hesitate to contact us at infos@oslandia.com .

by Vincent Picavet at June 30, 2014 12:00 PM

June 25, 2014


Postgis, PostgreSQL's spatial partner - Part 3

You have arrived at the final chapter of this PostGIS introduction series. Before continuing, I recommend you read chapter one and chapter two first.

In the last chapter we finished by doing some real world distance measuring and we saw how different projections pushed forward different results.

Today I would like to take this practical approach a bit further and continue our work with real world data by showing you around the town of Kin in Okinawa. The town where I live.

A word before we start

In this chapter I want to do a few experiments together with you on real world data. To gather this data, I would like to use OpenStreetMap because it is not only open but also gives us handy tools to export map information.

We will use a tool called osm2pgsql to load our OSM data into PostGIS enable tables.

However, it is more common to import and export real world GIS data by using the semi-closed ESRI standard shapefile format. OpenStreetMap does not support exporting to this shapefile format directly, but exports to a more open XML file (.osm) instead.

Therefor, near the end of this post, we will briefly cover these shapefiles as well and see how we could import them into our PostgreSQL database. But for the majority of our work today, I will focus on the OpenStreetMap approach.

The preparation

Let us commence with this adventure by first getting all the GIS data related to the whole of Okinawa. We will only be interested in the data related to Kin town, but I need you to pull in a data set that is large enough (but still tiny in PostgreSQL terms) for us to experiment with indexing.

Hop online and download the file being served at the following URL: openstreetmap.org Okinawa island It is a file of roughly 180 Mb and covers most of the Okinawan main island. Save the presented "map" file.

Next we will need to install a third party tool which is specifically designed to import this OSM file into PostGIS. This tool is called osm2pgsql and is available in many Linux distributions.

On a Debian system:

apt-get install osm2pgsql

Loading foreign data

Now we are ready to load in this data. But first, let us clean our "gis" database we used before.

Since all these import tools will create their own PostGIS enabled tables, we can delete our "shapes" table. Connect to your "gis" database and drop this table:

DROP TABLE shapes;

Using this new tool, repopulate the "gis" database with the data you just downloaded:

osm2pgsql -s -U postgres -d gis map

If everything went okay, you will get a small report containing the information about all the tables and indexes that where created.

Let us see what we just did.

First we ran osm2pgsql with the -s flag. This flag enabled slim mode, which means it will use a database on disk, rather then processing all the GIS data in RAM. The latter does not only potentially slow down your machine for larger data sets, but it enables less features to be available.

Next we tell the tool to connect as the user "postgres" and load the data into the "gis" database. The final argument is the "map" file you just downloaded.

What do we have now?

Open up a database console and let us describe our database to see what this tool just did:


As you can see, it inserted 7 new tables:

Schema |        Name        | Type  |  Owner   
public | geography_columns  | view  | postgres
public | geometry_columns   | view  | postgres
public | planet_osm_line    | table | postgres
public | planet_osm_nodes   | table | postgres
public | planet_osm_point   | table | postgres
public | planet_osm_polygon | table | postgres
public | planet_osm_rels    | table | postgres
public | planet_osm_roads   | table | postgres
public | planet_osm_ways    | table | postgres
public | raster_columns     | view  | postgres
public | raster_overviews   | view  | postgres
public | spatial_ref_sys    | table | postgres

The other 5 views and tables are the good old PostGIS bookkeeping.

It is also important, yet less relevant for our work here today, to know that these tables, or rather the way osm2pgsql imports, is optimized to work with Mapnik. Mapnik is an open-source map rendering software package used for both web and offline usage.

The tables that are imported contain many different types of information. Let me quickly go over them to give you a basic feeling of how the import happened:

  • planet_osm_line: holds all non-closed pieces of geometry (called ways) at a high resolution. They mostly represent actual roads and are used when looking at a small, zoomed-in detail of a map.
  • planet_osm_nodes: an intermediate table that holds the raw point data (points in lat/long) with a corresponding "osm_id" to map them to other tables
  • planet_osm_point: holds all points-of-interest together with their OSM tags - tags that describe what they represent
  • planet_osm_polygon: holds all closed piece of geometry (also called ways) like buildings, parks, lakes, areas, ...
  • planet_osm_rels: an intermediate table that holds extra connecting information about polygons
  • planet_osm_roads: holds lower resolution, non-closed piece of geometry in contrast with "planet_osm_line". This data is used when looking at a greater distance, covering much area and thus not much detail about smaller, local roads.
  • planet_osm_ways: an intermediate table which holds non-closed geometry in raw format

We will now continue working with a small subset of this data.

Let us take a peek at the Polygons tables for example. First, let us see what we have available:

\d planet_osm_polygon

That is quite a big list, but the major part of these columns are of mere TEXT type and contain human information about the geometry stored. These columns corresponds with the way OpenStreetMap categorizes their data and with the way you could use the Mapnik software described above.

Let us do a targeted query:

SELECT name, building, ST_AsText(way)
    FROM planet_osm_polygon
    WHERE building = 'industrial';

Notice that I use the output function ST_AsText() to convert to a human readable WKT string. Also, I am only interested in some of the industrial buildings, so I set the building type to industrial.

The result:

                    name                      |  building  |                                                                     st_astext                                                                   
 沖ハム (Okiham)                               | industrial | POLYGON((14221927.83 3049797.01,14222009.77 3049839.68,14222074.84 3049714.68,14222028.9 3049690.76,14221996.33 3049753.33,14221960.34 3049734.58,14221927.83 3049797.01))
Kin Thermal Power Plant Coal storage building | industrial | POLYGON((14239931.42 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 3053984.84,14239931.42 3054117.72))
Kin Thermal Power Plant Exhaust tower         | industrial | POLYGON((14240167.1 3054497.14,14240172.26 3054507.93,14240176.04 3054515.82,14240195.76 3054506.39,14240186.84 3054487.7,14240167.1 3054497.14))

We get back three records containing one industrial building each, described with a closed polygon. Cool.

Now, I can assure you that Okinawa has more then three industrial buildings, but do remember that we are looking at a rather rural island. OpenStreetMap relies greatly on user generated content and there simply are not many users who have felt the need to index the industrial buildings here in this neck of the woods.

The planet_osm_polygon table does contain little over 6000 buildings of various types, which is still a small number, but for our purpose today I am only interested in the latter two, which both lie here in Kin town.

Also, if you would, for example, take a chunk of Tokyo, where there are hundreds of active OpenStreetMap contributors, you will find that many buildings are present and are sometimes even more accurately represented then some other online proprietary mapping solutions offered by some famous search engines. Ahum.

Before continuing, though, I would like to delete two GiST indexes that "osm2pgsql" made for us, purely to be able to demonstrate the importance of an index.

For now, just take my word and delete the indexes on all the geometry columns of the tables we will use today:

DROP INDEX planet_osm_line_index;
DROP INDEX planet_osm_polygon_index;

Then perform a VACUUM:

VACUUM ANALYZE planet_osm_line;
VACUUM ANALYZE planet_osm_polygon;

VACUUM together with ANALYZE will force PostgreSQL to recheck the whole table for any changed conditions, as is the case since we removed the index.

The first thing I would like to find out is how large these building actually are. We cannot measure how tall they are, for we are working with two dimensional data here, but we can measure their footprint on the map.

Since PostGIS makes all of our work easy, we could simply employ a function to tell us this information:

SELECT ST_Area(way)
    FROM planet_osm_polygon
    WHERE building = 'industrial';

And get back:


As we know from the previous chapter, to be able to know what these numbers mean, we have to find out in which SRID this data was saved. You could either describe the table again and look at the geometry column description, or use an accessor function ST_SRID(), to find it:

    FROM planet_osm_polygon;
    WHERE building = 'industrial';

And get back:


You could also query the PostGIS bookkeeping directly and look in the geometry_columns view:

SELECT f_tablename, f_geometry_column, coord_dimension, srid, type
    FROM geometry_columns;

This view holds information about all the geometry columns in our PostGIS enabled database. Our above query will return a list containing all the GIS describing information we saw in the previous chapter.

Nice. Both our buildings are stored in a geometry column and have an SRID of 900913. We can now use our spatial_ref_sys table to look up this ID:

SELECT srid, auth_name, auth_srid, srtext, proj4text
    FROM spatial_ref_sys
    WHERE srid = 900913;

As you can see, this is basically a Mercator projection used by OpenStreetMap. In the "proj4text" column we can see that its units are meters.

This thus means that the information we get back is in square Meters.

In this map (only looking at the latter two Kin buildings) we thus have a building with a total area of 33 square Kilometers and a more modest building of around 452 square Meters. The former is a coal storage facility belonging to the Kin Thermal Power Plant and is indeed huge. The second building represents the exhaust tower of that same plant.

You have just measured the area these buildings occupy, very neat right?

Now, let us find out which road runs next to this power plant, just in case we wish to drive to there. It is important to note that OSM (and many other mapping solutions) divide roads into different types.

You have trunk roads, highways, secondary roads, tertiary roads, etc. I am now interested to find the nearest secondary road.

To get a list of all the secondary roads in Okinawa, simply query the planet_osm_roads table:

SELECT ref, ST_AsText(way) 
    FROM planet_osm_line
    WHERE highway = 'secondary';

We now get back all the linestring objects together with their reference inside of OSM. The reference refers to the actual route number each road has.

The total count should be around 3215 pieces of geometry, which is already a nice list to work with.

Let us now see which of these roads is closest to our coal storage building.

To find out how far something is (nearest neighbor search) we could use our ST_Distance() function we used in the previous chapter and perform the following lookup:

SELECT road.highway, road.ref, ST_Distance(road.way, building.way) AS distance
    FROM planet_osm_polygon AS building, planet_osm_line AS road
    WHERE road.highway = 'secondary' AND building.name = 'Kin Thermal Power Plant Coal storage building'
    ORDER BY distance;

This will bring us:

 highway  | ref |     distance     
secondary | 329 | 417.374986575458
secondary | 104 | 2258.90394593648
secondary | 104 | 2709.00178089638
secondary | 104 | 2745.76782385198
secondary | 234 | 5897.78205314507

Cool, secondary route 329 is the closest to our coal storage building with a distance of 417 meters.

While this will return quite accurate results, there is one problem with this query. Indexes are not being used. And every time an index is potentially left alone, you should start to worry, especially with larger data sets.

How do I know they are ignored? Simple, we did not make any indexes (and we deleted the ones made by "osm2pgsql")...which makes me pretty sure we cannot use them.

I refer you to chapter three of my PostgreSQL Full Text series where I talk a bit more about GiST and B-Tree index types. And, as I also say in that chapter, I highly recommend reading Markus Winand's Use The Index, Luke series, which explains in great detail how database indexes work.

The first thing to realize is that an index will only be used if the data set on which it is build is of sufficient size. PostgreSQL has an AI build in, called the query planner, which will make a decision on whether or not to use an index.

If your data set is small enough a more traditional Sequential Scan will be faster or equal.

To know what is going on exactly and to know how fast our query runs, we have the EXPLAIN command at our disposal.

Speeding things up

Let us EXPLAIN the query we have just run:

EXPLAIN ANALYZE SELECT road.highway, road.ref, ST_Distance(road.way, building.way) AS distance
    FROM planet_osm_polygon AS building, planet_osm_line AS road
    WHERE road.highway = 'secondary' AND building.name = 'Kin Thermal Power Plant Coal storage building'
    ORDER BY distance;

We simply put the keyword EXPLAIN (and ANALYZE to give us total runtime) right in front of our normal query.

The result:

Sort  (cost=5047.50..5055.32 rows=3129 width=391) (actual time=41.481..41.815 rows=3215 loops=1)
    Sort Key: (st_distance(road.way, building.way))
    Sort Method: quicksort  Memory: 348kB
    ->  Nested Loop  (cost=0.00..4309.34 rows=3129 width=391) (actual time=1.188..38.617 rows=3215 loops=1)
        ->  Seq Scan on planet_osm_polygon building  (cost=0.00..279.01 rows=1 width=207) (actual time=0.981..1.409 rows=1 loops=1)
            Filter: (name = 'Kin Thermal Power Plant Coal storage building'::text)
            Rows Removed by Filter: 6320
        ->  Seq Scan on planet_osm_line road  (cost=0.00..3216.79 rows=3129 width=184) (actual time=0.166..26.524 rows=3215 loops=1)
            Filter: (highway = 'secondary'::text)
            Rows Removed by Filter: 73488
Total runtime: 42.153 ms

That is a lot of output, but it shows you how the internal planner executes our query and which decisions it makes along the way.

To fully interpret a query plan (this is still a simple one), a lot more knowledge is needed and this would easily deserve its own series. I am by far not an expert in the query planner (though it is an interesting study topic), but I will do my best to extract the important bits we need for our direct performance tuning.

A query plan is always made up out of nested nodes, the parent node containing all the accumulated information (costs, rows, ...) of its child nodes.

Inside the nested loop parent node we see above, we can find that the planner decided to use two filters, which correspond to the WHERE clause conditions of our query (building.name and road.highway). You can see that both child nodes are of Seq Scan type, which means Sequential Scan. These types of nodes scan the whole table, simply from top to bottom, directly from disk.

Another important thing to note is the total time this query costs, which is 42.153 ms. The time reported here is the time on my local machine, depending on how decent your computer is, this time could vary.

A detail not to forget when looking at this timing, is the fact that it is slightly skewed if compared to real-world application use:

  • We neglect network/client traffic. This query now runs internally and does not need to communicate with a client driver (which almost always brings extra overhead)
  • The time measurement itself also introduces overhead.

The total runtime from our above plan does not sound as a big number, but we are working with a rather small data set - the area of Okinawa is large, but the geometry is rather sparse.

So our first reaction should be: this can be better.

First, let us try to get rid of these sequential scans, for they are a clear indication that the planner does not use an index.

Creating indexes

In our case we want to make two types of indexes:

  • Indexes on our "meta" data, the names and other attributes describing out geometrical data
  • Indexes that actually index our geometrical data itself

Let us start with our attributes columns.

These are all simple VARCHAR, TEXT or INT columns, so the good old Balanced Tree or B-Tree can be used here. In our query above we use "road.highway" and "building.name" in our lookup, so let us make a couple of indexes that adhere to this query. Remember, an index only makes sense if it is built the same way your queries question your data.

First, the "highway" column of the "planet_osm_line" table:

CREATE INDEX planet_osm_line_highway_index ON planet_osm_line(highway);

The syntax is trivial. You simply tell PostgreSQL to create an index, give it a name, and tell it on which column(s) of which table you want it to be built. PostgreSQL will always default to the B-Tree index type.

Next, the name column:

CREATE INDEX planet_osm_polygon_name_index ON planet_osm_polygon(name);

Now perform another VACUUM ANALYZE on both tables:

VACUUM ANALYZE planet_osm_line;
VACUUM ANALYZE planet_osm_polygon;

Let us run explain again on the exact same query:

Sort  (cost=4058.73..4066.56 rows=3129 width=394) (actual time=20.817..21.149 rows=3215 loops=1)
    Sort Key: (st_distance(road.way, building.way))
    Sort Method: quicksort  Memory: 348kB
    ->  Nested Loop  (cost=72.95..3310.07 rows=3129 width=394) (actual time=1.356..17.743 rows=3215 loops=1)
        ->  Index Scan using planet_osm_polygon_name_index on planet_osm_polygon building  (cost=0.28..8.30 rows=1 width=207) (actual time=0.054..0.056 rows=1 loops=1)
            Index Cond: (name = 'Kin Thermal Power Plant Coal storage building'::text)
        ->  Bitmap Heap Scan on planet_osm_line road  (cost=72.67..2488.23 rows=3129 width=187) (actual time=1.258..4.661 rows=3215 loops=1)
            Recheck Cond: (highway = 'secondary'::text)
            ->  Bitmap Index Scan on planet_osm_line_highway_index  (cost=0.00..71.89 rows=3129 width=0) (actual time=0.864..0.864 rows=3215 loops=1)
                   Index Cond: (highway = 'secondary'::text)
Total runtime: 21.527 ms

You can see that we now traded our Seq Scan for Index Scan and Bitmap Heap Scan, which indicates that our attribute indexes are being used, yay!

The so-called Bitmap Heap Scan, instead of a Sequential Scan, is performed when the planner decides it can use the index to gather all the rows it thinks it needs, sort them in logical order and then fetch the data from the table on disk in the most optimized way possible (trying to open each disk page only once).

The order by which the Bitmap Heap Scan arranges the data is directed by the child node aka the Bitmap Index Scan. This latter type of node is the one doing the actual searching inside the index. Because in our WHERE clause we have a condition which tells PostgreSQL to limit the rows to the ones of "highway" type "secondary", the Bitmap Index Scan fetches the needed rows from our B-Tree index we just made and passes them to its parent, the Bitmap Heap Scan, which then goes on to order the geometry rows to be fetched.

This already helped much, for our query runtime dropped to half. Now, let us make the indexes for our actual geometry, and see the effect:

CREATE INDEX planet_osm_line_way ON planet_osm_line USING gist(way);
CREATE INDEX planet_osm_polygon_way ON planet_osm_polygon USING gist(way);

Creating a GiST index is quite similar to a normal B-Tree index. The only difference here is that you specify the index to be build with GiST.


VACUUM ANALYZE planet_osm_line;
VACUUM ANALYZE planet_osm_polygon;

Now poke it again with the same query and see our new plan:

Sort  (cost=4038.82..4046.54 rows=3089 width=395) (actual time=21.137..21.479 rows=3215 loops=1)
    Sort Key: (st_distance(road.way, building.way))
    Sort Method: quicksort  Memory: 348kB
    ->  Nested Loop  (cost=72.64..3299.76 rows=3089 width=395) (actual time=1.382..17.858 rows=3215 loops=1)
        ->  Index Scan using planet_osm_polygon_name_index on planet_osm_polygon building  (cost=0.28..8.30 rows=1 width=207) (actual time=0.041..0.044 rows=1 loops=1)
            Index Cond: (name = 'Kin Thermal Power Plant Coal storage building'::text)
        ->  Bitmap Heap Scan on planet_osm_line road  (cost=72.36..2488.32 rows=3089 width=188) (actual time=1.297..4.726 rows=3215 loops=1)
            Recheck Cond: (highway = 'secondary'::text)
            ->  Bitmap Index Scan on planet_osm_line_highway_index  (cost=0.00..71.59 rows=3089 width=0) (actual time=0.866..0.866 rows=3215 loops=1)
                  Index Cond: (highway = 'secondary'::text)
Total runtime: 21.873 ms

Hmm, the plan did not change at all, and our runtime is roughly identical. Why is our performance still the same?

The culprit here is ST_Distance().

As it turns out, this function is unable to use the GiST index and is therefor not a good candidate to set loose on your whole result set. The same goes for the ST_Area() function, by the way.

So we need a way to limit the amount of records we do this expensive calculation on.


We introduce a new function: ST_DWithin(). This function could be our savior in this case, for it does use the GiST index.

Whether or not a function (or operator) can use the GiST index, depends on if it uses bounding boxes when performing calculations. The reason why is because GiST indexes mainly store bounding box information and not the exact geometry itself.

ST_DWithin() checks if given geometry is within a radius of another piece of geometry and simply returns TRUE or FALSE. We can thus use it in our WHERE clause to filter out geometry for which it returns FALSE (and thus not falls within the radius). It performs this check using bounding boxes, and thus is able to retrieve this information from our GiST index.

Let me present you with a query that limits the result set based on what ST_DWithin() finds:

EXPLAIN ANALYZE SELECT road.highway, road.ref, ST_Distance(road.way, building.way) AS distance
    FROM planet_osm_polygon AS building, planet_osm_line AS road
    WHERE road.highway = 'secondary'
        AND building.name = 'Kin Thermal Power Plant Coal storage building'
        AND ST_DWithin(road.way, building.way, 10000)
    ORDER BY distance;

As you can see we simply added one more WHERE clause to limit the returned geometry by radius. This will result in the following plan:

Sort  (cost=45.66..45.67 rows=1 width=395) (actual time=6.048..6.052 rows=27 loops=1)
    Sort Key: (st_distance(road.way, building.way))
    Sort Method: quicksort  Memory: 27kB
    ->  Nested Loop  (cost=4.63..45.65 rows=1 width=395) (actual time=3.157..6.005 rows=27 loops=1)
        ->  Index Scan using planet_osm_polygon_name_index on planet_osm_polygon building  (cost=0.28..8.30 rows=1 width=207) (actual time=0.051..0.054 rows=1 loops=1)
            Index Cond: (name = 'Kin Thermal Power Plant Coal storage building'::text)
        ->  Bitmap Heap Scan on planet_osm_line road  (cost=4.34..37.09 rows=1 width=188) (actual time=3.090..5.771 rows=27 loops=1)
            Recheck Cond: (way && st_expand(building.way, 10000::double precision))
            Filter: ((highway = 'secondary'::text) AND (building.way && st_expand(way, 10000::double precision)) AND _st_dwithin(way, building.way, 10000::double precision))
            Rows Removed by Filter: 4838
                ->  Bitmap Index Scan on planet_osm_line_way  (cost=0.00..4.34 rows=8 width=0) (actual time=1.978..1.978 rows=4865 loops=1)
                    Index Cond: (way && st_expand(building.way, 10000::double precision))
Total runtime: 6.181 ms

Good. We have just gone down to only 6.181 ms, That seems to be much more efficient.

As you can see, our query plan got a few new rows. The main thing to notice is the fact that our Bitmap Heap Scan got another Recheck Cond, our expanded ST_DWithin() condition. More to the bottom, you can see that the condition is being pulled from the GiST index:

Index Cond: (way && st_expand(building.way, 10000::double precision))

This seems to be a much more desirable and scalable query.

But there is a drawback, though ST_DWithin() will make for speedy results, it works only by giving it a fixed radius.

As you can see from our usage, we call the function as follows: ST_DWithin(road.way, building.way, 10000). The last argument, "10000", tells us how big the search radius is. In this case our geometry is in meters, so this means we search in a radius of 10 Km.

This static radius number is quite arbitrary and might not always be desirable. What other options do we have without compromising performance too much?


Another addition of PostGIS we have not talked about much up until now are the spatial operators we have available. You have a total of 16 operators you can use to perform matches on your GIS data.

You have straightforward operators like &&, which returns TRUE if one piece of geometry intersects with another (bounding box calculation) or the << which returns TRUE if one object is fully to the left of another object.

But there are more interesting ones like the <-> and the <#> operators.

The first operator, <->, returns the distance between two points. If you feed it other types of geometry (like a linestring of polygon) it will first draw a bounding box around that geometry and perform a point calculation by using the bounding box centroids. A centroid is the calculated center of a piece of geometry (the drawn bounding box in our case).

The second, <#>, acts completely the same, but works directly on bounding boxes of given geometry. In our case, since we are not working with points, it would make more sense to use this operator.

The big advantage of this distance calculation operator is, once more, the fact that it too calculates using a bounding box and is thus able to use a GiST index. However, the ST_Distance() function calculates distances by finding two points on the given geometry most close to each other, which serves the most accurate result. The <#> operator, as said before, stretches a bounding box around each piece of geometry and therefor deforms our objects, making for less accurate distance measuring.

It is therefor not wise to use <#> to calculate accurate distances, but it is a life saver to sort away geometry that is too far away for our interest.

So a proper usage would be to first roughly limit the result set using the <#> operator and then more accurately measure the distance of, say, the first 50 matches with our famous ST_Distance().

Before we can continue, it is important to point out that both the <-> and <#> operator can only use the GiST index when either the left or right hand side of the operator is a constant or fixed piece of geometry. This means we have to provide actual geometry using a constructor function.

There are other ways around this limitation by, for example as Alexandre Neto points out on the PostGIS mailing list, providing your own function which converts our "dynamic" geometry into a constant.

But this would make this post run way past its initial focus. Let us simply try by providing a fixed piece of geometry. The fixed piece is, of course, still our "Kin Thermal Power Plant Coal storage building", but converted into WKT:

    SELECT way AS road, ref AS route
        FROM planet_osm_line
        WHERE highway = 'secondary'
        ORDER BY ST_GeomFromText('POLYGON((14239931.42 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 3053984.84,14239931.42 3054117.72))', 900913) <#> way
        LIMIT 50
SELECT ST_Distance(ST_GeomFromText('POLYGON((14239931.42 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 3053984.84,14239931.42 3054117.72))', 900913), road) AS true_distance, route
    FROM distance
    ORDER BY true_distance
    LIMIT 1;

This query uses a Common Table Expression or CTE (you could also use a simpler subquery) to first get a rough result set of about 50 rows based on what <#> finds. Then only on those 50 rows do we perform our more expensive, index-agnostic distance calculation.

This results in the following plan and runtime:

Limit  (cost=274.57..274.57 rows=1 width=64) (actual time=11.236..11.237 rows=1 loops=1)
    CTE distance
        ->  Limit  (cost=0.28..260.82 rows=50 width=173) (actual time=0.389..10.764 rows=50 loops=1)
            ->  Index Scan using planet_osm_line_way on planet_osm_line  (cost=0.28..16362.19 rows=3140 width=173) (actual time=0.389..10.745 rows=50 loops=1)
                Order By: (way <#> '010300002031BF0D000100000005000000D7A3706D17296B41C3F528DC124D47417B14AECF1E296B4100000020484D4741CDCCCCC43C296B410AD7A3B0054D4741295C8F6235296B41B81E856BD04C4741D7A3706D17296B41C3F528DC124D4741'::geometry)
                Filter: (highway = 'secondary'::text)
                Rows Removed by Filter: 4562
            ->  Sort  (cost=13.75..13.88 rows=50 width=64) (actual time=11.234..11.234 rows=1 loops=1)
                Sort Key: (st_distance('010300002031BF0D000100000005000000D7A3706D17296B41C3F528DC124D47417B14AECF1E296B4100000020484D4741CDCCCCC43C296B410AD7A3B0054D4741295C8F6235296B41B81E856BD04C4741D7A3706D17296B41C3F528DC124D4741'::geometry, distance.road))
                Sort Method: top-N heapsort  Memory: 25kB
    ->  CTE Scan on distance  (cost=0.00..13.50 rows=50 width=64) (actual time=0.412..11.188 rows=50 loops=1)
Total runtime: 11.268 ms

As you can see, we are now using the GiST index "planet_osm_line_way", which was what we were after.

This yields roughly the same runtime as with our ST_DWithin(), but without the arbitrary distance setting. We indeed have a somewhat arbitrary limiter of 50, but this is much less severe then a distance limiter.

Even if the closest secondary road is 100 Km from our building, the above query would still find it whereas our previous query would return nothing.

One more for the road home

Let us do a few more fun calculations on our Okinawa data, before I let you off the island.

Next I would like to find the longest trunk road that runs through this prefecture:

SELECT ref, highway, ST_Length(way) as length
    FROM planet_osm_line 
    WHERE highway = 'trunk'
    ORDER BY length DESC;

We have a new function ST_Length() which simply returns the length, given that the geometry is a linestring or multilinestring. The only index that will be used is our "planet_osm_line_highway_index" B-Tree index to perform our Bitmap Index Scan.

ST_Length() does obviously not work with bounding boxes and therefor cannot use the geometrical GiST index. This is yet another function you should use carefully.

When looking at the result set that was returned to us, you will see that some routes show up multiple times. Take route 58, which is the longest and most famous route in Okinawa. It shows up around 769 times. Why?

This is because, especially for a database prepared for mapping, these pieces of geometry are divided over different tiles.

We thus need to accumulate the length of all the linestrings we find that represent pieces of route 58. First, we could try to accomplish this with plain SQL:

WITH road_pieces AS (
    SELECT ST_Length(way) AS length
    FROM planet_osm_line
    WHERE ref = '58'
SELECT sum(length) AS total_length
FROM road_pieces;

This will return:


Meaning a total length of 536.486 Kilometers. This query will run in about 19.375 ms. Let us add an index to our "ref" column:

CREATE INDEX planet_osm_line_ref_index ON planet_osm_line(ref);

Perform vacuum:

VACUUM ANALYZE planet_osm_line;

This index creation will speed up to query and make it run in little over 3.524 ms. Nice runtime.

You could also perform almost the exact same query, but instead of using an SQL sum() function, you could use ST_Collect(), which creates collections of geometry out of all the separate pieces you feed it. In our case we feed it separate linestrings, which will make this function output a single multilinestring. We would then only have to perform one length calculation.

WITH road_pieces AS (
    SELECT ST_Collect(way) AS geom
    FROM planet_osm_line
    WHERE ref = '58'    
SELECT ST_Length(geom) AS length
    FROM road_pieces;

This query will run even around 1 ms faster then former and it returns the exact same distance of 536.486 Kilometers.

Now that we have this one multilinestring which represents route 58, we could check how close this route comes to our famous Kin building (which we will statically feed):

WITH road_pieces AS (
    SELECT ST_Collect(way) AS geom
    FROM planet_osm_line
    WHERE ref = '58'    
SELECT ST_Distance(geom, ST_GeomFromText('POLYGON((14239931.42 3054117.72,14239990.49 3054224.25,14240230.15 3054091.38,14240171.08 3053984.84,14239931.42 3054117.72))',900913)) AS distance
    FROM road_pieces;

Which would give us:


In other words: Route 58 is, at it closest point, 7.9 Kilometers from our coal storage building. This query now took about 5 ms to complete. A rather nice throughput.

Okay, enough exploring for today.

We took a brief look at indexing our spatial data, and what benefits we could gain from it. And, as you can imagine, a lack of indexes and improper use of the GIS functions, could lead to dramatic slow-downs, certainly on larger data sets.


Before I will let you go I want to take a brief look at another mechanism of carrying around GIS data: the shapefile. Probably more used then the OSM XML format, but less open. It is almost the GIS standard way of exchanging data between GIS systems.

We can import shapefiles by using a tool called "shp2pgsql" which comes shipped with PostGIS. This tool will attempt to upload ESRI shape data into your PostGIS enables database.


ESRI stands for Environmental Systems Research Institute and is yet another organization that taps into the world of digital cartography.

They have defined a (somewhat open) file format standard that allows the GIS world to save their data in a so called shapefile. These files hold GIS primitives (polygons, linestrings, points, ...) together with a bunch of descriptive information that tells us what each primitive represents.

It was once developed for ESRI's own, proprietary software package (ArcGIS), but was quickly picked up by the rest of the GIS community. Today, almost all serious GIS packages have the ability to read and/or write to such shapefiles.

Shapefile build-up

Let us take a peek at the guts of such a shapefile.

First, contrary to what the name suggest, a shapefile is not a single file. At a minimal level, it is a bundle containing a minimum of three files to be spec compliant:

  • .shp: the first mandatory file has the extension .shp and holds the GIS primitives themselves.
  • .shx: the second important file is an index of the geometry
  • .dbf: the last needed file is a database file with geometry attributes

Getting shapefile data

There are many organizations who offer shapefiles of all areas of the globe, either free or for a small fee. But since we already have data in our database we are familiar with, we could create our own shapefiles.

Exporting with pgsql2shp

Besides "shp2pgsql", which is used to import or load shapefiles, we also got shipped a reverse tool called "pgsql2shp", which can export to or dump shapefiles based on geometry in your database.

So let us, per experiment, create a shapefile containing all secondary roads of Okinawa.

First we need to prepare an empty directory where this tool can dump our data. Since it will create multiple files, it is best to put them in their own spot. Open up a terminal window and go to your favorite directory-making place and create a directory called "okinawa-roads":

$ mkdir okinawa-roads

Next enter that directory.

The "pgsql2shp" tool needs a few parameters to be able to successfully complete. We will be using the following flags:

  • -f, tells the tool which file name to adhere
  • -u, the database user to connect with

After these flags we need to input the database we wish to take a chunk out of and the query which will determine the actual data to be dumped.

The above will result in the following command:

$  pgsql2shp -f secundairy_roads -u postgres gis "select way, ref from planet_osm_line where highway = 'secondary';"

As you can see we construct a query which only gets the road reference and the geometry "way" column from the secondary road types.

After some processing it will have created 4 files, the 3 mandatory ones mentioned above, and a new one called a projection file. This file contains the coordinate system and other projection information in WKT format.

This bundle of 4 files is now our shapefile format which you could easily exchange between GIS aware software packages.

Importing with shp2pgsql

Let us now import these shapefiles back into PostgreSQL and see what happens.

For this we will ignore out "gis" database, and simply create a new database to keep things separated. Connect to a PostgreSQL terminal, create the database and make it PostGIS aware:

\c gisshape

Now go back to your terminal window to do some importing.

The import tool works by dumping the SQL statements to stdin or to a SQL dump file if preferred. If you do not wish to work with such a dump file, you have to pipe the output to the psql command to be able to load in the data.

From the directory where you saved the shapefile dump, run the "shp2pgsql" tool:

$ shp2pgsql -S -s 900913 -I secundairy_roads | psql -U postgres gisshape

Let me go over the flags we used:

  • -S: is used to keep the geometry simple. The tool otherwise will convert all geometry to its MULTI... counterpart
  • -s: is needed to set the correct SRID
  • -I: specifies that we wish the tool to create GiST indexes on the geometry columns

Note that the -S flag will only work if all of your geometry is actual simple and does not contain true MULTI... types of geometry with multiple linestrings, points or polygons in them.

An annoying fact is that you have to tell the loader which SRID your geometry is in. There is a .prj file in our shapefile bundle, but it only contains the WKT projection information, not the SRID. One trick to find the SRID based on the information in the projection file is by using OpenGEO's Prj2EPSG" website, which does quite a good job at looking up the EPSG ID (which most of the time is the SRID). However, it fails to find the SRID of our OSM projection.

Another way of finding our about the SRID is by using the PostGIS spatial_ref_sys table itself:

SELECT srid FROM spatial_ref_sys WHERE srtext = 'PROJCS["Popular Visualisation CRS / Mercator (deprecated)",GEOGCS["Popular Visualisation CRS",DATUM["Popular_Visualisation_Datum",SPHEROID["Popular Visualisation Sphere",6378137,0,AUTHORITY["EPSG","7059"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6055"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4055"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Mercator_1SP"],PARAMETER["central_meridian",0],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER["false_northing",0],AUTHORITY["EPSG","3785"],AXIS["X",EAST],AXIS["Y",NORTH]]';

This will gives us:



If you now connect to your database and query its structure:

\c gisshape

You will see we have a new table called "secondary_roads". This table now holds only the information we dumped into the shapefile, being our road route numbers and their geometry. Neat!

The end


We are done folks. I hope I have given you enough firepower to be able to commence with your own GIS work, using PostGIS. As I have said in the beginning of this series, the past three chapters form merely an introduction into the capabilities of PostGIS, so as I expect you will do every time: go out and explore!

Try to load in different areas of the world, either with OpenStreetMap or by using shapefiles. Experiment with all the different GIS functions and operators that PostGIS makes available.

And above all, have fun!

And as always...thanks for reading!

by Tim van der Linden at June 25, 2014 10:00 AM

June 20, 2014

Boston GIS (Regina Obe, Leo Hsu)

PostGIS Training sessions at Postgres Open 2014: Chicago

We'll be giving two training sessions on PostGIS at Postgres Open in Chicago September 17th 2014. First will be fairly introductory PostGIS Introduction: Geometry, Geography, and Geocoding and the second will be on more advanced concepts and tools PostGIS on caffeine: Raster, Topology, and pgRouting. Check out the other sessions at Postgres Open Conference Sssions Postgres Open 2014.

by Regina Obe (nospam@example.com) at June 20, 2014 06:50 AM

June 18, 2014


Postgis, PostgreSQL's spatial partner - Part 2

Welcome to the secoflynd part of our spatial story. If you have not done so, I advise you to go and read part one first.

The first part of this series gives you some basic knowledge about the GIS world (GIS Objects, WKT, Projections, ...). This knowledge will come in handy in this chapter.

Today we will finally take an actual peek at PostGIS and do some database work:

  • We will see how we can create valid GIS objects and insert them into our database
  • Next let PostGIS retrieve information about these inserted GIS objects
  • Further down the line we will manipulate these object a bit more
  • Then we will leap from geometry into geography
  • Finally we will be doing some real world measurements

Let us get started right away!

Creating the database

Before we can do anything else, we need to make sure that we have the PostGIS extension installed. PostGIS is most of the time packaged as a PostgreSQL contribution package. On a Debian system, it can be installed as follows:

apt-get install postgresql-9.3-postgis-2.1

This will install PostGIS version 2.1 for the PostgreSQL 9.3 database.

Next, fire up your database console and let us first create a new user and database:

CREATE user gis WITH PASSWORD '10gis10';

Not very original names, I know, but it states its purpose. Next, connect to the gis database and enable the PostGIS extension:

\c gis

Now our database is PostGIS aware, and we are ready to get our hands dirty!

Notice that if you now describe your database:


PostGIS has created a new table and a few new views. This is PostGIS's own bookkeeping and it will store which tables contain geometry or geography columns.

Fun with Polygons

Let us begin this adventure with creating a polygon that has one interior ring, similar to the one we saw in the previous chapter.

Before we can create them, though, we have to create a table that will hold their geometrical data:

    name VARCHAR

Now we have a table named "shapes" with only a column to store its name. But where do we store the geometry?

Because of the new data types that PostGIS introduces (geometry and geography) and to keep its bookkeeping up to date, you can create this column with a PostGIS function named AddGeometryColum():

SELECT AddGeometryColumn('shapes', 'shape', 0, 'POLYGON', 2);

Let us do a breakdown.

First, all the functions that PostGIS makes available to us are divided in groups that define their area of use. AddGeometryColumn() falls in the "Management Functions" group.

It is a function that will create a geometry column in a table of choice and adds a reference to this column to its bookkeeping. It accepts a number of arguments:

  • The table name to where you wish to add the column
  • The actual column name you wish to have
  • The SRID
  • The WKT object you wish to represent
  • The coordinate type you desire (2 means XY)

In the above case we thus wish to add a geometry column to the "shapes" table. The column will be named "shape". The geometry inserted there will get an SRID of 0 and will be of object type POLYGON and have a normal, two dimensional coordinate layout.


One thing that you might not yet know from the above function definition is the SRID or Spatial Reference ID and is a very important number when working with spatial data. Remember in the last chapter I kept on yapping about different projections we had and that each projection would yield different results? Well, this is where all this information comes together: the SRID.

Our famous OGC has create a lookup table containing a whopping 3911 entries, each entry with a unique ID, the SRID. This table is called spatial_ref_sys and is, by default, installed into your PostgreSQL database when you enable PostGIS.

But hold on, there is something I neglected to tell you in the previous chapter: the European Petroleum Survey Group or EPSG. The following is something that confuses many people and makes them mix-and-match SRID and EPSG ID's. I will try my best not to add up to that confusion.


The EPSG, now called the OGP, is a group of organizations that, among other things, concern themselves over cartography. They are the world's number one authority that defines how spatial coordinates (projected or real world) should be calculated. All the definitions they make get and accompanying ID called the EPSG ID.

The OGC maintains a list to be used inside databases (GIS systems). They give all their entries a unique SRID. These entries refer to defined and official projections, primarily maintained by the EPSG which have their own EPSG ID and unique name. Other projections (not maintained by the EPSG) are also accepted into the OGC SRID list as are your own projections (if you would feel the need).

Let us poke the spatial reference table and see if we can get a more clear picture.

If we would query our table (sorry for the wildcard) and ask for a famous SRID (more on this one later):

SELECT * FROM spatial_ref_sys WHERE srid = 4326;

We would get back one row containing:

  • srid, which is the famous id
  • auth_name, the name of authority organization, in most cases EPSG
  • auth_srid, the EPSG ID the authority organization introduced
  • srtext, tells us how the spatial reference is built using WKT
  • proj4text, commands that drive the proj4 library which is used to make the actual projections

And as you can see, both the "srid" column and the "auth_srid" are identical. This will be the case with many entries.

I should also tell you that this huge list of SRID entries mostly consists of dead or localized projections. Many of the projections listed are not used anymore, but where popular some time in history (they are marked deprecated), or are very localized. In the previous chapter I mentioned that the general UTM system, for example, could be used as a framework for more localized UTM projections. There are hundreds of these local projections that only make sense when used in the area they are intended for.

Simple Features Functions

As I have told you before, the functions that PostGIS makes available are divided into several, defined groups. The functions themselves are too defined, not by PostGIS but by the Simple Features standard maintained by the OGC (as we saw in the previous chapter).

There are a total of 8 major categories available:

  • Management functions: functions which can manipulate the internal bookkeeping of PostGIS
  • Geometry constructors: functions that can create or construct geometry and geography objects
  • Geometry accessors: functions that let us access and ask questions about the GIS objects
  • Geometry editors: functions that let us manipulate GIS objects
  • Geometry outputs: functions that give us various means by which to transform and "export" GIS objects
  • Operators: various SQL operators to query our geography and geometry
  • Spatial relationships and measurements: functions that let us do calculations between different GIS objects
  • Geometry processing: functions to perform basic operations on GIS objects

I have left a few categories out for they are either not part of the Simple Features standard (such as three dimensional manipulations) or beyond the scope. To see a list of all of the functions and their categories, I advise you to visit the PostGIS reference, section 8.

Let us now do some fun manipulations and use some of the functions from these categories, just to get a bit more familiar with how it all works together.

If you inserted the last SQL command which makes the geometry column, you should have gotten back the following result:

public.shapes.shape SRID:0 TYPE:POLYGON DIMS:2

This tells us we created the "shape" column in the "shapes" table and set the SRID to 0. SRID 0 is a convention used to tell a GIS system that you currently do not care about the SRID and simply want to store geometry with an arbitrary X and Y value.

Let us now insert the shape of our square. To insert a polygon into your column, you could use various functions. One of these functions is ST_GeomFromText():

INSERT INTO shapes (name, shape) VALUES (
    'Square with hole',
    ST_GeomFromText('POLYGON ((8 1, 8 8, 1 8, 1 1, 8 1), (6 3, 6 6, 3 6, 3 3, 6 3))', 0)

This now inserts our polygon and gives it a name. The ST_GeomFromText() function enables us to enter our polygon object using WKT. This function also accepts a second, optional parameter which is the SRID by which we wish to work. The category of this function is called Geometry Constructors.

You know this polygon has two rings, the exterior and the interior. Let us now ask PostGIS to return only the line that represents the exterior ring:

SELECT ST_ExteriorRing(shape)
    FROM shapes
    WHERE name = 'Square with hole';

And we get back:


Oh my...that is not what we expected. But yet it is correct. This is how PostgreSQL stores geometry/geography. The result is correct, yet unreadable to us humans.

If we wish to get back a readable WKT string, we have to convert it using one of the conversion functions:

SELECT ST_AsText(ST_ExteriorRing(shape))
    FROM shapes
    WHERE name = 'Square with hole';

And we get:

LINESTRING(8 1,8 8,1 8,1 1,8 1)

Aha, that is more like it! This we can read!

We used the ST_ExteriorRing() which falls under the Geometry Accessors category and the ST_AsText() function which resides in the category Geometry Outputs.

Okay, now we wish to know the interior ring:

SELECT ST_AsText(ST_InteriorRingN(shape,1))
    FROM shapes
    WHERE name = 'Square with hole';

The result:

LINESTRING(6 3,6 6,3 6,3 3,6 3)

Notice that the function ST_InteriorRingN() requires you to give the integer of which ring you wish to get, starting from 1. As we have seen before, polygon objects can have multiple interior rings, but only a single exterior one.

Next let us ask all the information about what makes up the shape:

SELECT ST_Summary(shape)
    FROM shapes
    WHERE name = 'Square with hole';

And get back:

Polygon[B] with 2 rings+
  ring 0 has 5 points  +
  ring 1 has 5 points

Ohh, that is pretty cool. We get back a human readable string that explains to us how this particular piece of geometry is build.

Let us now add another polygon:

INSERT INTO shapes (name, shape) VALUES (
    'The intersecting one',
    ST_GeomFromText('POLYGON ((14 1, 15 8, 7 8, 7 1, 14 1))', 0)

This is a polygon that will intersect with part of our previous polygon. Let us ask PostGIS if these polygons really intersect each other:

SELECT ST_Intersects(a.shape, b.shape)
    FROM shapes a, shapes b 
    WHERE a.name = 'Square with hole' AND b.name = 'The intersecting one';

If all is well, this will simply return TRUE if they intersect and FALSE if they do not. In this case, it will return TRUE.

The counterpart of our intersect function is ST_Disjoint():

SELECT ST_Disjoint(a.shape, b.shape)
    FROM shapes a, shapes b 
    WHERE a.name = 'Square with hole' AND b.name = 'The intersecting one';

Which will return FALSE in our case.

Let us now add a third polygon which does not intersect our previous two:

INSERT INTO shapes (name, shape) VALUES (
    'The solitary one',
    ST_GeomFromText('POLYGON ((20 20, 20 40, 1 40, 1 20 ,20 20))', 0)

This polygon will reside well "above" the other two and does not share any space. Let us now see how far this polygon resides from our first polygon, the one with the hole:

SELECT ST_Distance(a.shape, b.shape)
    FROM shapes a, shapes b 
    WHERE a.name = 'Square with hole' AND b.name = 'The solitary one';

This returns us the number "12", which means they are 12 units apart. And remembering the definition of both shapes, the first shape is 8 units tall and the second shape starts at unit 20. This indeed leaves a gap of 12.

Nice! We have just measured the distance between two objects in a spatial database!

Hmmm, this may mean we are getting closer to knowing the distance to Tokyo...but not yet, we need to play a bit more first.

PostGIS also has the ability to manipulate geometry. Let us, for example, try to move our solitary polygon even further away using the ST_Translate() function under the Geometry Editors category:

SELECT ST_Translate(shape, 0, 10)
    FROM shapes
    WHERE name = 'The solitary one';

The ST_Translate() function will accept the to-be-altered geometry and accepts an X, Y and an optional third dimension.

Running this query will give us a binary representation of a new piece of geometry. The original geometry is not altered. So how can we actually move the geometry that resided in the database?

Simply by using SQL:

UPDATE shapes
    SET shape = ST_Translate(shape, 0, 10)
    WHERE name = 'The solitary one';

Let us now check the new distance:

SELECT ST_Distance(a.shape, b.shape)
    FROM shapes a, shapes b 
    WHERE a.name = 'Square with hole' AND b.name = 'The solitary one';

And get back:


Aha! Nice! It has now moved ten units upwards.

Now let us alter the distance once again, but this time we will scale the polygon down:

UPDATE shapes
    SET shape = ST_Scale(shape, 0.5, 0.5)
    WHERE name = 'The solitary one';

If we now check the distance, we will get back:


Wow, we have gone from 22 to 7, how did that happen?

Well, it is important to know that the ST_Scale() function currently only supports scaling by multiplying each coordinate. This means that the polygon will not only become smaller or bigger, but will also translate as a result. To know exactly how our new, scaled version of our polygon looks, we can use the ST_Boundary() function which shows us the outer most linestring:

SELECT ST_AsText(ST_Boundary(shape))
    FROM shapes
    WHERE name = 'The solitary one';

And we will get:

LINESTRING(10 15,10 25,0.5 25,0.5 15,10 15)

If we compare that to the same result before scaling (which I handily made ready for you):

LINESTRING(20 30,20 50,1 50,1 30,20 30)

You can see that each value in each coordinate simply was divided by 2. This also clarifies why our polygons are now only 7 units apart (first square stops at 8, the scaled square start at 15).

Okay, okay, I guess we have played enough now. We have seen a small glimpse of the operations you can do on GIS data within PostGIS and seen that PostGIS makes all of this work fairly easy.

I guess we can now take it one step further and start to actually look at some geography!

Fun with the earth

Up until now we have been working with an SRID of 0, which means undefined, inside a geometry column, meaning the data was of type "geometry". Now we want to go out and explore the actual earth, which means we wish to continue in a geographical coordinate system.

This brings us at a crossroad of choices. First, you will need to ask yourself the same question we pondered in chapter one: you wish to work with geometry or geography?

On the one hand we know that geographical measurements are expensive calculations, but most accurate for they are unprojected. On the other hand GIS convention tells us that in any case, we should continue in a geometrical or Cartesian system, simply because...well...it is a convention.

So what do we do?

It all depends on your specific use case.

When working on a "small" scale, say, part of North America, it would make sense to not use geography. Instead, you could (and should) work in a geometrical system using a very accurate projection with SRID 4267 (datum NAD27) or SRID 4269 (datum NAD83) which are both local UTM variants for North America.

Depending on which region you work in, chances are high you have several local projections with their own datum and coordinate system, ready to use. They are very accurate and less expensive to use then direct geography.

For us, however, we will be working on a large scale, for we want to measure a distance that covers much of the globe. You cannot use a local projection or local datum for that.

In such a case you, again, are presented with two options. You could either neglect the convention and simply use geographical data and functions or be nice and adhere to what is agreed upon and work in a Cartesian system.

We will be doing both and we will use the common SRID 4326. This very popular SRID is by heart geographical, for it uses the geographical coordinate system, but can also be used with geometrical data. Confused?

Join the club.

Let me try to clarify.

First, the authority of this SRID is the EPSG and the EPSG ID is identical to the SRID. It uses a popular datum (remember chapter one) called WGS 84 and is referred to as unprojected for it is a geographical representation. This datum is one that is used in GPS systems and is often referred to as a word wide datum.

When you store objects with an SRID of 4326, you are storing them using geographical coordinates aka latitude and longitude. This in contrast to, for example, the former SRID's, like 4267 or 4269, which store their coordinates in UTM values. When you do measurements between two objects carrying this SRID you have two options. You can either do a geographical or a geometrical measurement.

With a geographical measurement there will be no projection and the system will use the WSG 84 datum (the spheroid) to calculate the distance, in three dimensional space. As we have seen before, such a calculation is more expensive and unconventional.

With a geometrical measurement, your geographical coordinates have to be projected on to a flat Cartesian or geometrical plane. This is done automatically when you ask PostGIS to measure distance using one of the more common geometrical functions. When projecting, all GIS systems will use the Plate Carrée projection which means they will use the stored latitude and longitude coordinates directly as an X and Y value.

Let us see this story in action. First we can take a look at the more native geography data. Let us clean our shape table first:

SELECT DropGeometryColumn('shapes', 'shape');

Here we use the DropGeometryColumn() to remove this column from out "shapes" table. Now clear the table:

TRUNCATE shapes;

Next add a new geography column:

ALTER TABLE shapes ADD COLUMN location geography('POINT');

We create a new geography column with the name location in our "shapes" table. We will only be storing Point types.

Notice that the syntax is different and that here we use plain SQL as opposed to the AddGeometryColumn() function from before. Since PostGIS 2 it is possible to create and drop both geometry and geography columns with standard SQL syntax.

If you wish to rewrite our "shape" column addition from the beginning of this chapter, you could write it like this:

ALTER TABLE shapes ADD COLUMN shape geometry('POLYGON',0);

Looks more native and simple, no? Sorry to tell you this so late in the adventure, but now you know the existence of both the functions and the more native SQL syntax. Both will also keep the PostGIS bookkeeping in sync.

Also, for fun, you could do a describe on the table:

\d shapes

And get back:

 Column  |         Type          | Modifiers 
name     | character varying     | 
location | geography(Point,4326) |

As you can see, the column is of type geography and automatically gets the famous SRID 4326.

Good, let us now try and find an answer to our famous question, How far is Tokyo from my current location. You will be surprised how trivial this will be.

First, as you might suspect, since we are only interested in a point on the earth and not the shape of your location nor Tokyo, we will suffice with a Point object. Next we will need to insert two points into our database, your location and the center of Tokyo, both in geographical coordinates.

Finding Your Location

This means you need to find out your exact latitude and longitude of the place you are at right now.

This could, of course, be done in a myriad of ways: using your cell phone's GPS capabilities, using your dedicated GPS device or using an online map system. I will choose the latter and will be using OpenStreetMap (what else?) to locate my current position.

Open up your favorite web browser and surf to openstreetmap.org. Once there, punch in your address or use the "Where Am I" function. This would give you a point on the map and in the search bar on the left your latitude and longitude coordinate. Take this coordinate and save is as point data into your fresh column:

INSERT INTO shapes VALUES ('My location', ST_GeographyFromText('POINT(127.6791949 26.2124702)'));

The point I am inserting reflects central Naha, the main city of the Okinawa prefecture. Not my current location, but it serves as an illustrative point.

Note that PostGIS expects a longitude as X and latitude as Y. This is many times reversed as what you get back from other sources.

Now you can insert the location of Tokyo, which I conveniently looked up for you:

INSERT INTO shapes VALUES ('Tokyo', ST_GeographyFromText('POINT(139.7530053 35.6823815)'));

Ah, nice! Okay, are you ready to finally, after all the rambling we went through, know the distance? You already know the syntax, punch in the magic:

SELECT ST_Distance(a.location, b.location)
    FROM shapes a, shapes b 
    WHERE a.name = 'My location' AND b.name = 'Tokyo';

In the case you would live in the exact cartographic center of Naha, you will get back:


Yeah! This, my lovely folk, is how far you are from Tokyo, at this very moment.

But what is this number you get back?

The result you see here is the distance returned in Meters, meaning, from the point I inserted as "My location", I am 1557506.28 Meters or 1557.50628 Kilometers from Tokyo.

Very neat stuff, would you not say? PostgreSQL just told us how far we are from Tokyo, awesome!

But wait, we are not finished yet. We have now done the most accurate, real geographical distance measurement using expensive geographical calculations.

There is an "in-between" solution before we jump to geometry. PostGIS gives us the ability to replace our spheroid datum with the more classical sphere. The latter has much simpler calculations, but can still return more accurate results them some of the projections.

To redo our calculation from above with a sphere, simply set the spheroid Boolean, a third and optional parameter to the ST_Distance() function, to False:

SELECT ST_Distance(a.location, b.location, False)
    FROM shapes a, shapes b 
    WHERE a.name = 'My location' AND b.name = 'Tokyo';

The result:


Which is a total distance of 1557.886 Kilometers, a difference of around 300 Meters.

Let us now repeat this story, but use geometry instead. Let us do it the GIS conventional way.

We do not need to recreate our column as a geometry column and insert our data again. We could cheat a little. PostGIS together with PostgreSQL has the unique capability of casting data from one type to another. So without recreating anything, we could simply cast our geography data into geometry on the fly and see what happens.

Note that casting, while very convenient for quick checks, can render an index totally mute. It is therefor important to think ahead and decide if you want to work with geometry or geography, then create the correct column type and use this without casting.

But for our quick and dirty queries, this is fine. Let us continue:

SELECT ST_Distance(a.location::geometry, b.location::geometry)
    FROM shapes a, shapes b 
    WHERE a.name = 'My location' AND b.name = 'Tokyo';

In this query, we cast (::) the geography data inside the "location" columns into geometry. Now we get back:


Hmm, that is a different result all together. It looks like a much smaller number then before. What is happening?

We just casted our geography to geometry, this means PostGIS will now use a Cartesian system or projection to calculate the distance in a linear way. When using the distance measuring function ST_Distance() on geometry, it will return not meters but the distance expressed in the units the original data was stored in. Since our data is stored with SRID 4326, its units are latitude and longitude. The value you get back is thus degrees.

In the case of the Naha location, this will be 15.344 degrees from Tokyo.

For our human brain this is difficult to imagine, a result in Meters is much more easy to comprehend. So, let us transform this degree value into a metric value.

It is an estimation that one planar degree (in our Cartesian system) equals 111 KM. So the distance now becomes 15.344 degrees times 111: 1703 Kilometers. That is a difference of about 145 Kilometers.

The reason this difference exist is of the projection we are now using. As we have mentioned a few times before, when going from data containing SRID 4326, PostGIS will automatically use the infamous Plate Carrée projection. This projection, as we have seen before, is the least accurate for something like distance measuring.

So let us poke this projection mechanism and try a different, more accurate one, the Lambert, which carries SRID 3587.

To change the projection PostGIS will use, we can use the ST_Transform() function which casts objects to different SRIDs. Note that ST_Transform() only works for geometry objects, so we have to continue to cast our geography location to be able to use them in this function.

SELECT ST_Distance(ST_Transform(a.location::geometry, 3587), ST_Transform(b.location::geometry, 3587))
    FROM shapes a, shapes b 
    WHERE a.name = 'My location' AND b.name = 'Tokyo';

This will gives us:


Meaning 1602.392 Kilometers, a difference of about 45 Kilometers. That is indeed in between the Plate Carrée and our native geographical measurement.

Another, even more accurate and popular projection is our famous UTM. It can, however, not be used on a world scale. You can only perform measurements within the same UTM zone.

As mentioned in the previous chapter, there are roughly 60 World UTM zones on the earth, but each zone uses their own projection and their own coordinates. This kind of projection is thus not fit for measuring distance on such a large scale.

Let us therefor take this one step further before I leave you to rest. Let us do a measurement with such a UTM projection. We will make a measurement inside of Japan's mainland UTM zone: 54N which has an SRID of 3095.

First we will have to make another point in our database:

INSERT INTO shapes VALUES ('Aomori', ST_GeographyFromText('POINT(140.750616 40.788079)'));

This point represents the city of Aomori in northern Japan, famous for its huge lantern parades.

First let us measure with the native geographical calculations:

SELECT ST_Distance(a.location, b.location)
    FROM shapes a, shapes b 
    WHERE a.name = 'Aomori' AND b.name = 'Tokyo';

This returns:


Or 573.416 Kilometers, which is most accurate.

Next, let us throw the good old Plate Carrée projection at it:

SELECT ST_Distance(a.location::geometry, b.location::geometry)
    FROM shapes a, shapes b 
    WHERE a.name = 'Aomori' AND b.name = 'Tokyo';

This will yield


Which is in degrees again, doing this times 111 Kilometers will yield a total distance of 577.444 Kilometers.

Then let us measure using the correct UTM projection:

SELECT ST_Distance(ST_Transform(a.location::geometry, 3095), ST_Transform(b.location::geometry, 3095))
    FROM shapes a, shapes b 
    WHERE a.name = 'Aomori' AND b.name = 'Tokyo';

This will give us:


Or 573.228 Kilometers and thus only around 200 meters different, in contrast with the Plate Carrée, which was 4 Kilometers different.

You can see that different projections will result in different measurements. It is therefor crucial to know which one to choose. Some are better used on a local scale, like we just did for Japan, others are better on a global scale.

Again, it all comes down to trade-offs and choices.

Okay, yet another big chunk of PostGIS goodness is taken. I suggest a good rest of the mind.

We have seen how we can insert various types of geometry and geography, we saw how to manipulate and question them and we looked at a few real world measurements.

In the next and final chapter, we will be looking at loading some real GIS data from OpenStreetMap into our PostGIS database, take a quick look around my town here in Okinawa and take a deeper look at creating some important indexes.

And as always...thanks for reading!

by Tim van der Linden at June 18, 2014 10:00 AM

June 17, 2014

Paul Ramsey

Examples are not Normative

Once, when I still had the energy, I was reading an Open Geospatial Consortium specification document, and found an inconsistency between a directive stated in the text, and the examples provided to illustrate the directive. This seemed pretty important, since most "Humans" use the examples and ignore the text, so I raised it with the author, who replied to me:

"Examples are not normative"

To me, this seemed to summarize in four words everything there was to dislike about the standards community: dismissive, self-referential ("normative"? really?), and unconcerned with real-world practice. One of the reasons I no longer have the energy.

by Paul Ramsey (noreply@blogger.com) at June 17, 2014 07:35 PM

June 16, 2014


Masking features in QGIS 2.4

Have you ever wondered how to mask features on a map, so that only a particular zone is highlighted ?

There have been a simple plugin to do that for a while. Called 'Mask', it allowed to turn a vector selection into a new memory layer with only one geometry made by the geometric inversion of the selection: the polygons that were selected get transformed into holes of a squared polygon bigger than the current extent.

One could then use this new layer, like any other one and apply a vector symbology on it. An opaque color to mask everything but the selection, or some semi-transparent color in order to only highlight the selection.

It was very useful but came with some limitations, and especially the fact that no update of the 'mask' layer was done during an atlas printing.

Thanks to the support of Agence de l'Eau Adour Garonne, Oslandia has been developing some evolutions to the core of QGIS, as well as to the mask plugin.

The core part consists of a new feature renderer that can be used on any polygon vector layer, as a symbology element. It is called inverted polygon renderer and allows to apply any other renderer to polygons that have been inverted. It was designed originally to allow only simple filling mode to be applied to the exterior ring of polygons, but it now allows to use more complex renderers like graduated, categorized or even rule-based renderers.

Inverted polygons: Simple usage

The simplest usage is to select the default sub renderer that is set to "single symbol" in order to have a uniform exterior fill of a layer.

Advanced usage

When the sub-renderer used by the inverted polygon renderer has different symbol categories, features are grouped by symbol category before being inverted and rendered. It then only makes sense when the symbology used is partly transparent, so that the different inverted polygons can be distinguished from each other. This can be used for example to render a semi-transparent shapeburst fill around the current atlas feature.

In this example, we have an inverted polygon renderer with a rule-based sub renderer. The rule will only select the current atlas geometry, thanks to the expression $id=$atlasfeatureid. The symbol used is made of two symbol layers: a semi-transparent blue simple fill and a shapeburst fill on top of it. The polygon layer is then duplicated to also have a green "interior fill" for each polygon. The output can be seen hereafter:

Label masking

When the map has labels enabled, this inverted polygon renderer is not sufficient to mask labels as well. When a user wants to highlight a particular zone on a map, she usually also wants to mask labels that are around, or at least make them less visible.

The way QGIS handles labels out of the box is not directly compatible with this need. QGIS always displays labels on top of every other layers.

To circumvent this, the original 'mask' plugin has been enhanced in order to be aware of layers with labels. A new 'mask' layer can be computed and its geometry can be used to test whether a feature has to be labeled or not. The plugin exposes two special variables that are available for any expressions :

  • in_mask(srid) will return a boolean that tests if the current feature is in the mask. The parameter srid is the SRID projection code of the layer where this function is used.
  • $mask_geometry will return the current mask geometry

Different spatial predicates can be used to test if the current feature lies inside the highlighted zone. A different type of predicate will be used for polygon layers, line layers and point layers.

Suppose we have a map of some french départements with a background raster map, and some linear layer displaying rivers.

If we want to highlight only one département, we can use the mask plugin for that. We will first select it and call the plugin.

A fancy inverted polygon symbology, based on a shapeburst fill is created. We see here that we can choose the function that will be used for filtering the labeling. By default these functions are "pointOnSurface" for polygon layers and "intersects" for line layers.

Here, we want both the départements layer and the rivers layers to see their labeling rules modified in order to hide the labels outside of the defined mask polygon.

By modifying the mask symbology, adding a little bit of transparency, we obtain the following result :

The plugin is able to interact with the atlas printing and will update its mask geometry as well as the labels that are allowed to be displayed.

The mask layer can also be saved with the project, using a memory layer if you use the Memory Layer Saver plugin, or using an OGR vector file format.

Here is a short video that shows how the plugin can be setup for a simple mask with an atlas print.

How to get the plugin ?

The new mask plugin is available in its 1.0 version on the QGIS official repository of plugins. It requires QGIS 2.4.

We are currently investigating the addition of this label masking feature to the QGIS core. The idea would be to have a concept of "label layer" that could then be hidden by others, or even made partly transparent.

It is not an easy task, though, since it would require to rework parts of the labeling and rendering engine. If you are interested by such a feature, please let us know !

by Hugo Mercier at June 16, 2014 07:00 AM

June 12, 2014


Postgis, PostgreSQL's spatial partner - Part 1


In Dutch we have an expression that says "Van hier tot Tokio", which literally translated means "From here to Tokyo" and is used to indicate that something is very far or very difficult. Unless you live in Japan, like me, then Tokyo is not that far actually....but you get the point. Tokyo is far, period.

But the question for today is...how far is it exactly? From where you are reading this right now...how far is Tokyo from you? How can you know? You could of course just hop online and question your favorite search engine for help, or use something like Open Street Map to figure it out.

But that would be too simply, no? This would mean my post has to stop here, and, as some of you might know, it is difficult for me to write short blog posts. Sorry.

Also, you would miss out on all of the fun that is actually happening behind the screen when you question spatial search engines and that is against my belief: know how the tools you depend on actually work!

And, as the same some of you might know, I love PostgreSQL.

So, knowing that I cannot write short posts and I like PostgreSQL...what would you suspect would happen if you ask me how far Tokyo is from my current location? You guessed it, simply use The Elephant to figure that out!

As I have showed you before, PostgreSQL is capable of storing, matching and retrieving much more then boring VARCHAR or INT data types and it is designed to be extendable. And extending is what the folks behind the PostGIS project did. To summarize, the PostGIS project extends PostgreSQL to store, match, manipulate and retrieve spatial data. It makes PostgreSQL a full-blown GIS.

The purpose of this series is to get your feet wet with PostGIS and to learn a thing or two about GIS itself. In the first chapter, the one you are reading now, I would like to show you some fundamental GIS concepts: GIS Objects, standardization of GIS, geography and projections. We will not be doing any database action today I am afraid.

Then, starting from the second chapter, we will open up PostgreSQL, initiate a database to be PostGIS aware and start playing around. We will look at a bunch of different database functions we have available and how the knowledge from this chapter maps to the actual database. And we will of course be solving the question posed above: how far is Tokyo from your current location.

Are you ready for a new PostgreSQL adventure?

Note: I will take you over all the following information in lighting speed. My intent is not to make you a GIS expert, but I do feel it is necessary to touch on a few important topics so you know why PostGIS is doing stuff the way it does. This will hopefully make the actual database work from the next chapter more clear and spark some curiosity towards learning more about this topic.

The data

Before we can do anything GIS related, we need to take a look at what kind of data we will be working with: the GIS objects.

GIS Objects?

Geographic information system, or GIS in short, is merely the name of any system which can store, retrieve, generate, manipulate and visualize spatial data - the kind of data that represents objects in two or three dimensional space.

The GIS world is a world of standards, as with most computer sciences. These standards define what spatial data is and how we can work with it and is defined and maintained by the Open Geospatial Consortium or OGC.

Every system that wishes to work with GIS data, including PostGIS, should adhere to these standards.

Simple Features

The OGC's standard for working with GIS data in SQL is defined in a OGC and ISO specification called Simple Features.

Simple Features defines how we can represent spatial objects, as you will see soon, but also defines how we can access and manipulate them. You typically have available:

  • Functions to create two dimensional spatial objects
  • Functions to alter these objects
  • Functions to retrieve and describe single or multiple objects
  • Functions to compare and measure single or multiple objects

PostGIS has been certified by the OGC for its wide support of the Simple Features set.

Well-known Text

The first and most important part that is defined in the Simple Features spec are the means by which we can represent spatial data. I mean, we know how we can represent numbers or strings of text inside our database, but how do we represent something more abstract as a line, or a square?

Folks familiar with 2D drawing or 3D modeling software might already have a gut feeling of how to represent such data, and this gut feeling is right: you store coordinates. If you wish to represent a line, you will only need to know the two end points of this line to be able to store, manipulate or visualize it. The same goes for a square, though there you will need four coordinates.

And, as is always the case with standards, the OGC has devised two famous ways of representing these objects and their coordinates: Well-known Text or WKT and Well-know Binary or WKB. These two are almost identical, only differing in the area of use.

WKT is a markup language which you can use to simply write down your objects and use it in queries. It is human readable. However, if you wish to store it in a database or wish to perform matches on the data, it has to be stored in a defined binary format, the WKB format that is.

WKT can represent a wide range of objects from simple points to complex multi-polygons. The notation, however, stays roughly the same. If you wish to represent a square, for example, you could use the POLYGON object:

POLYGON ((4 1, 4 4, 1 4, 1 1, 4 1))

For people unfamiliar with the term "polygon", a polygon is a closed, two dimensional object with only straight lines. It has to have a minimum of three coordinates (points) thus giving it a minimum of three straight edges (making it, in that case, a triangle).

Let us take a deeper look at what is happening here. First, you will see we define a polygon object which you will need if you wish to represent closed, shape objects. Next we define the four coordinates, the four corners of our square, laid out on a fictional grid of 4 by 4 units. There are two important notes to take about this coordinate listing:

First, the coordinates are all two dimensional and represent and X and a Y coordinate respectively.

Also, you do not see four but five coordinates. This is another rule from the spec that tells us that all polygon shapes must be closed. To get a better visualization of this you could imagine a pen moving to each coordinate. To finish the loop you draw, the pen has to move back to the original coordinate.

The last thing to note is that the drawing direction of these coordinates is counterclockwise, as is with most computer defined drawing systems. This means we put our pen on our grid at coordinate (4 1) and then draw up in a straight line to (4 4). Next we go left in a straight line to (1 4) and down in a straight line to (1 1). Finally, we close the loop by drawing a straight line right, to the starting coordinate (4 1).

It is also perfectly possible to define more then one coordinate set when defining a polygon object. A definition like this is perfectly legal:

POLYGON ((8 1, 8 8, 1 8, 1 1, 8 1), (6 3, 6 6, 3 6, 3 3, 6 3))

This will create a square polygon with a size of 8 by 8, called the exterior ring and another square inside it with a size of 4 by 4. Because this small square resides inside the area of the big square we call it the interior ring and, as a result, this small square will be interpreted by the standard as a hole in the bigger square.

To bring this even further, you can define as many holes in your exterior ring as you like, you simply have to make sure that the interior rings never touch each other and never go outside of the exterior ring. The exterior ring is always derived from the first set of coordinates in your object definition.

The POLYGON object in the WKT standard also has a MULTIPOLYGON counterpart for when you wish to define a multiple, non intersecting set of polygon objects which, in turn, can have as many interior rings as you like.

Other objects we have available in the WKT standard are:

  • POINT(0 0) to represent a point on a grid
  • LINESTRING(0 0, 0 1) to represent a line. Note that a line can consist out of more then two coordinates.

All of these also have a MULTIPOINT and a MULTILINESTRING variant respectively.

As we have seen before, all of these objects are two dimensional, but PostGIS also partly supports a three and a four dimensional version of some of these objects.

Note that these extra dimensions are currently not in de specification and is a PostGIS specific extension on top of the features defined by the OGC. Furthermore, if the OGC decides to standardize three of four dimensional objects, PostGIS will have to adapt its syntax to stay compliant. We thus refer to this extended format not as WKT or WKB but as Extended WKT and Extended WKB or simply EWKT and EWKB.

To make our polygon object three dimensional, we could write it down like so:

POLYGON ((4 1 2, 4 4 2, 1 4 2, 1 1 2, 4 1 2))

You can see that we now have three numbers per coordinate, the third one adds a Z or depth value.

A point gets even more fancier. If we wish to place a point in three dimensional space, we could write it down the same as we did with our polygon:

POINT (4 1 2)

The third parameter here being, again, a place on the Z axis.

But points can also have a fourth dimension which sounds fancy, but is nothing more then an extra reference we can ship with our coordinates. This reference, also called a linear reference, is a number we can put in place that tell us where, along a linear path, the point we define resides.

It can be written down like this:

POINT (4 1 2 6)

Here we have four numbers, the last one being the linear reference or M.

With EWKT you also have the possibility to define a two dimensional object with a linear reference:

POINT M(4 1 6)

Here we have again three numbers, but to distinguish between the last number being Z or M, we have to reference M together with our point declaration. There are more extensions defined in the EWKT and EWKB, but that is slightly off-topic, because, as I mentioned before, these are not standardized. In most use cases you can simply use the standard WKT and WKB forms.

What to use these objects for?

You now know what kind of objects we can represent using text and what we can, later along the road, insert into our PostGIS enabled PostgreSQL database. But how do these points, lines and polygons help us measure distance or help us locate stuff?

First it is important to understand that all of the objects we have available will act as proxies to real world objects. Take, for example, the point. A point can be used on a map to indicate a place, a spot so to speak, without defining shape or size. When you wish to know where Tokyo is, a point will suffice on a global scale, you do not need nor want to know the exact shape of the metropolis.

However, if you would zoom in on our fictional map and you wish to see a part of the city the size of a few city blocks, you might be interested in the shapes of buildings, lakes, parks, etc. These items that take up two dimensional space will be drawn with polygons that resemble the shape of the real world objects as close as possible.

Lines (or linestrings), finally, will almost always be used to represent roads, railroads, metro systems, etc. They many times represent actual paths one could travel along.

Geometry and Geography

So you know that you can represent a place in the world with a simple point. And as you also know, a point is defined like this:

POINT (10 20)

This would create a point that sits at coordinate (10 20). But, what does that mean? How do these numbers relate to the real world? What is 10 or 20 anyway?

Well, first you will have to ask yourself the following question: Do I wish to be Cartesian or Geographical?

Cartesian or Geographical

As you may or may not remember from your boring math lessons, a Cartesian system is a two dimensional flat grid with a X and a Y axis. These axis go both positive and negative with the origin sitting exactly in the middle of the flat plane.

When working with GIS objects, we refer to this flat, Cartesian grid system as Geometry. When, however we are working with measurements or objects related to the real earth we, in PostGIS, refer to these measurements as Geography.

Why? what is the difference? Well, to understand this, we have to take a step back, a step back into time that is.

When the first maps of the world where crafted, people truly believed the earth was flat (which it is not...for your information). This meant that all charts that where drawn assumed we could simply place a grid comprised out of an X (length) and Y (height) axis across the drawing and from their measure distances between points. If you wish to know the distance between Paris and London, simply place two points on your map, take your straight ruler and measure the distance indicated. Then factor in the chart's scale and you have your distance. You use Geometry or Geometric measurements.

However, after Copernicus nearly got his head chopped off telling people the earth was not flat, the chart drawing people gasped for air. This meant their measuring technique was not correct. If the earth really was a sphere, then one could not simply wrap a grid around it and act as if everything was linear. A sphere meant that there was a certain amount of distortion happening with their overlaying grid, and the measurements should encompass for those differences.

Even later in time, the chart drawing folk, who barely recovered from their first shock, where zapped again when people started to realize the earth was not a sphere either. The globe turned out to be more of an egg shape, which, again, meant that measurement techniques had to be adjusted.

This was the birth of the geographical measurement system where cartographers devices a model called the spheroid. A spheroid is a three dimensional object on which we can most accurately place points and measure real earth distances. Each point on such a spheroid is define by a latitude and a longitude:

  • Latitude is measured from the center of the earth (the hot place) in an angle up or down towards the surface
  • Longitude is measured from the same hot center in an angle left or right towards the surface

Because both latitude and longitude represent an angle we express them as a degrees and we simply call the geographical coordinates.

Now, it is not quite convenient to have to carry around a three dimensional spheroid to find out where you are or to measure distance. A classical old paper map is still more easy to bring along and more easy to work with. But how do we go from a spheroid, which has the correct distortion, back to our old, flat, two dimensional geometrical map?

With projection or map projection to be more precise. We need to project the three dimensional spheroid system onto our two dimensional map. This projecting is roughly done in three steps.

First we have to decide whether to take our spheroid as the base or a simpler sphere. A simpler sphere will yield less accurate results because it does not quite represent the correct curvature of the earth, but it does keep the maths behind the calculations simpler and thus can make for faster calculations. When choosing which shape we want, we also will have to define which datum we would like.

After choosing the base object and the datum that represents it, we have to transform the geographic system coordinates (latitude and longitude) to more standard X and Y coordinates to be used on a simple, flat, Cartesian plane.

The last part is to find out to what ratio the final two dimensional surface is scaled compared to the original, base object (which represents the earth).


Before continuing, a word about datums.

As we said before, people agreed that the earth has a spheroid shape and that this model represents the earth most accurately. We say "model" because the spheroid is something that is actually defined with math.

The math behind the spheroid model is what we call the datum. It is nothing more then a mathematical formula describing the shape.

Something we did not see is the fact that there actually are many types of spheroids out there. Each serving their own purpose and each with their own math aka datum. Some spheroids are better to do measurements on a global scale, others are better for a more local "zoomed-in" level (continent, country, ...).

The reason we have to tell which datum (thus shape) our spheroid has, is because while latitude and longitude always represent degrees, they can have different meaning depending on the chosen datum. If you use a datum that draws the spheroid a little bit "elongated" so to speak, then 1 degree longitude will cover slightly more distance then if the datum draws a more compact spheroid.

We will see more about datums in the next chapter, but it is an important part of GIS.

Types of Projections

Something that might not be as obvious right now is the fact that going from our three-dee globe to a flat surface is a process of choices. In an ideal world you wish to keep every aspect of your spheroid intact, meaning the proportions of the objects on the map are accurate everywhere, the shape of these objects is correct, the area covered by the objects is true and the distance between these objects is retained. However, as it turns out, this is impossible on a two dimensional surface. You have to give up some of these properties to preserve others.

Throughout history there have been many attempts at creating projections that would keep as much of these aspects intact.

Mercator projection

As a Belgian I should be most proud about this type of projection, since it was created by a fellow Flemish-man, around 450 years ago and it is a projection that is still being used today. When a map is created with this type of projection we will get a comfortable and familiar view of the earth. A big advantage of this projection type is the fact that the shape of all objects are accurate.

The Mercator projection is most accurate around the equator, but the further you travel up or down, the more the map goes out of proportion. Mercator used a cylindrical projection to unwrap the earth into a flat plane. Because of the nature of such a cylindrical projection, the areas more close to the poles become blown up to fit in a two dimensional world.

This distortion has caused quite some frowned foreheads in the last few decades and as a result people tend to abandon this projection, specially to project regions far from the equator.

Mercator variants

To make up for the heavy distortions found in the original Mercator system, people have made two new Mercator projections. The first that came about was called the Transverse Mercator which fixes the distortions around the poles, but introduces the problem that it will make for incorrect distance measuring.

To make up for this new problem, folks made yet another Mercator derivative: the Universal Transverse Mercator. This type of projection takes a whole new approach and uses its own coordinate system. It introduces the concept of UTM zones. The earth is divided into roughly 60 zones and are each about 800 Km wide. The map that is rendered in each single zone uses the previous, Transverse Mercator projection to draw the actual map. A big advantage of this approach is the fact that we get a very constant distance measurement all across the globe.

Such a UTM coordinate looks quite different from our classic latitude/longitude or our X/Y version. I will give you a random UTM coordinate:

54 N 384524.5 3948304.4

The first number identifies one of the 60 UTM zones. The letter N show us in which hemisphere we should search this zone. These letters range from C to X (omitting I and O). The first float tell us the easting, or X value, the last float tells us the northing or the Y value. Both these floats represent actual meters.

Another important note to take about UTM is that it also acts as a framework for more localized UTM versions. This means that each country or region could make its own maps, using smaller UTM zones to accurately represent their land, city, forest, etc.

Lambert Azimuthal

This projection (also called the Lambert Equal-Area) is yet another approach as it uses a disc to map our spheroid to a flat surface.

The big advantage of this type of projection is the fact that it represent the area of objects very accurately and is true regarding distance calculation. However, it fails when it comes to accurate shape representation for shapes get more and more distorted once you start moving away from the center of the disc.

The Lambert projection is one of the more accepted projections, right after the UTM.

Plate Carrée

And then you have Plate Carrée.

This is one of the oldest projections out there and was invented around 1800 years ago. In our little history story above, this projection came about when people thought the earth was rather flat.

It combines almost all disadvantages of previous projections:

  • It does a terrible job in representing correct area
  • It does not care about the shape of objects
  • Distance measuring is way off

Despite the fact that this projection turns out to be so terrible, it is still quite commonly used today. Well, not for navigation or distance calculation, obviously, but for illustrative purposes.

Many organizations across the globe use this simple projection to demonstrate statistical data, overlaid on this map. Demographics, political info, zombie outbreak danger zones, ... .

As we also saw in our history lesson, the first charts used the Cartesian system quite literally and without much conversion, because the earth was flat anyway. So in GIS systems, this means that this projection maps latitude and longitude directly to a X and Y coordinate without much conversion.

Because the conversion math is simple and calculations are few, this projection is among the fastest, but as you know now, at great cost.

Other variants

There are numerous other variants out there that all have their advantages or disadvantages. To name a few more:

  • The Robinson projection displays the earth not in a flat image, but in a cylindrical flat sphere. It show the world more accurately, but it fails when it comes to representing area and shape, especially near the poles.
  • The Winkel Tripel projection is another popular projection type which has many parallels with the Robison one, but has less distortion.
  • The Peirce quincuncial projection uses a technique to unwrap the earth spheroid into a square, much like you would peel an orange. These maps are not used much, for they are very heavy in calculations, but the technique is now widely used to present a spherical image, unwrapped into a square.
  • The Goode homolosine projection is a projection developed as a teaching instrument in a frustrating answer to the heavily distorted Mercator projection. It is famous for its quite unique shape where the spheroid is unwrapped into a beast with four "legs".
  • ...

What do projections mean to us?

There are many more types of projections, but that would bore you to tears.

The fact that I keep going on about projections, geometry and geography is because later on, when working with PostGIS, you will need to make a decision about how you wish to combine all of these.

First it is very important to understand that geometry and geography are two different data types which PostGIS can store into PostgreSQL.

PostGIS is quite unique in the fact that it gives you the ability to work directly with our three dimensional spheroid (geography) and ignore the projections and their Cartesian Flat Land (geometry). You will have the power to work with latitude and longitude and perform real world calculations, right out of the box. This way of working, however, comes with a few trade-offs.

The first, and most obvious one: real, three dimensional spheroid geographical calculations will cost more computing time then the simpler, two dimensional geometry counterparts. Another disadvantage of geography over geometry is the fact that PostGIS simply has much less native functions ready for you to use.

So depending on your use case, it might be a good idea to convert all your geographical data into geometrical ones. This, however, requires knowledge about the projections we just saw for different projections will yield different results.

If you have two points with a latitude and longitude coordinate (thus being geographical data) and wish to know the distance between them using geometrical functions, you have to project these points on a flat surface thus converting them into a Cartesian system (the whole projection story we saw so far).

As we will see in the next chapter, if you simply convert geography into geometry, PostGIS will project the geometry coordinates using the Plate Carrée, which may not be very desirable when you which to calculate distances as we will be doing later on. We have the ability to tell PostGIS to use a different projection when converting, but all come with merits and demerits.

You simply cannot do serious GIS work if you do not have at least a basic understanding of what is going on when projecting geography. By reading through this chapter, I hope I have given you enough food-for-thought to go out and explore a bit more about these different projections.

What is next?

Okay, I think we have covered enough for today. I do apologize for the rather theoretical nature of this first chapter, but believe me, you will need the knowledge.

Next time we will finally be looking at some actually PostGIS work and put some of this theory into practice:

  • We will see how to GIS enable your PostgreSQL database
  • We will look at how we can store geometry and geography
  • We will actually put some points on the earth, draw some lines between them and perform some fun calculations
  • We will take a look at how different projections will yield different results
  • And finally, we will answer the question that started it all: How far is Tokyo from your current location?

And as always...thanks for reading!

by Tim van der Linden at June 12, 2014 10:00 AM

June 11, 2014


PostgreSQL Session 6 : PostGIS

Après le succès des cinq premières Sessions PostgreSQL nous avons le plaisir d'annoncer la tenue de la sixième session, le 25 septembre prochain, à Paris.

En effet, pour continuer sur cette lancée, Dalibo et Oslandia organisent ensemble une journée de conférences gratuite dédiée à PostgreSQL et PostGIS : http://www.postgresql-sessions.org/

Aperçu de visualisation de données PostGIS 3D

Nous lançons donc un appel à conférenciers, vous pouvez nous envoyer vos propositions d'intervention, en anglais ou en français. Chaque intervention doit durer entre 30 et 45 minutes (en comptant les éventuelles questions) et concerner PostgreSQL et PostGIS. Nous sommes intéressés en particulier par les thématiques suivantes (liste non exhaustive):

  • Nouveautés de PostgreSQL et PostGIS
  • Monitoring de parc
  • Migration vers PostgreSQL et PostGIS
  • Gestion de grandes volumétries
  • Analyse spatiale avancée avec PostgreSQL/PostGIS

Les interventions peuvent prendre plusieurs formes : Témoignage utilisateur, Proof of Concept, Tutoriel, Comparatif, Présentation de nouveautés, etc…

Toutes les propositions doivent nous parvenir avant le 20 juin 2014.

Merci d'envoyer vos propositions à l'adresse email : call-for-paper@postgresql-sessions.org en précisant les informations suivantes :

  • Nom et Prénom
  • Compte twitter (le cas échéant)
  • Société
  • Biographie et contributions à la communauté PostgreSQL et/ou PostGIS
  • Titre de la conférence
  • Résumé
  • Demandes spécifiques

Les fichiers de présentations ("slides") devront être transmis à Dalibo sous licence Creative Commons BY-ND 3.0 ou compatible.

Dans l'attente de vos propositions, nous vous donnons rendez-vous à Paris fin septembre !

by Vincent Picavet at June 11, 2014 05:00 PM

May 31, 2014

Postgres OnLine Journal (Leo Hsu, Regina Obe)

psql watch for batch processing

A while back, we discussed using pgAdmin pgScript as a quicky way for running a repetitive update script where you want each loop to commit right away. Since stored functions have to commit as a whole, you can't use stored functions alone for this kind of processing.

Question: Can you do similar easily with psql?
Answer: yes with the \watch command described nicely by Michael Paquier a while back.

If you are using the psql client packaged with PostgreSQL 9.3 or above, then you can take advantage of the \watch command that was introduced in that version of psql. We'll demonstrate that by doing a batch geocoding exercise with PostGIS tiger geocoder and also revise our example from the prior article to use the more efficient and terser LATERAL construct introduced in PostgreSQL 9.3.

Continue reading "psql watch for batch processing"

by Leo Hsu and Regina Obe (nospam@example.com) at May 31, 2014 03:24 AM

May 28, 2014

OpenShift by RedHat

ER-Mapping with OpenShift, Getting Started with pgModeler

Image Upload: 
Example DB Design
Home screen for pgModeler
connection screen in pgModeler
using the connection in the import dialog
using the connection in the import dialog for pgModeler

Greetings Shifters!

I am an unabashed PostgreSQL fanboi. I am also someone who likes to design databases visually (ER Diagrams) since I can't visualize anything beyond two tables and their relationships without pictures.

I want tools capabable of round-tripping with the database--meaning that they generate tables (or SQL) for the database or generate a diagram with a database (or SQL) as the input. Finally, tools must run on Fedora Linux since that's my Operating System of choice. Bonus points awarded if the tool is FOSS.

During some unspeakably dismal times, when Windows was my main operating system, I had a MSDN subscription and Visio (plus some hand manipulation to make it work with PostgreSQL). Then I experienced the frustration of Dia for diagramming.

read more

by scitronpousty at May 28, 2014 03:48 PM

May 22, 2014

Stephen Mather

The easiest way to get PostGIS and friends running:

Docker.  See: https://github.com/vpicavet/docker-pggis for a quick and easy Docker running PostGIS. Understand, this isn’t production ready (the connection permissions make my teeth hurt a little) but, so nice to have a one stop shop for postgis, pgrouting, and pointcloud. Now I may have to write blog posts on pointcloud. Point cloud didn’t get in the PostGIS Cookbook because it was too hard to build. I built it. I played with its wonders. Then I decided it was too much too early. Well, not any more… .

CREATE EXTENSION postgis_topology;
CREATE EXTENSION pointcloud_postgis;

Screen shot of pgAdmin with CREATE EXTENSIONS for all the fun postgresql extensions... .

Oh, one more thing: pro-tip. You can modify the Docker file into a shell script for installing all this goodness on your local Ubuntu/Debian machine as well:

by smathermather at May 22, 2014 01:56 AM

May 20, 2014


Full Spatial database power in 2 lines

This post explains how to setup a powerful spatial data store with a wide range of features on Ubuntu 14.04 (Trusty) in 2 command lines. And how it works.


Run this on Ubuntu 14.04 Trusty Tahr, and enjoy :

sudo apt install docker.io
sudo docker.io run --rm -P --name pggis_test oslandia/pggis /sbin/my_init

Database credentials : pggis/pggis/pggis for database/user/password


More info

You want to store and manipulate spatial data. Lots of spatial data. Various spatial data. This Docker image lets you install the latest components based on the most powerful opensource database (PostgreSQL, in case you wonder) and benefit from a wide range of features.

Docker is a Linux container based technology. See containers like something between the good old chroot and a Virtual Machine. Fast, efficient and painless. See Docker as a mix between a meta-package manager, git and a whale.

Docker is available natively on Ubuntu 14.04 (and named docker.io), and can also run on other Linux flavors, or in a minimalist Virtual Machine with Boot2docker for other environments (Windows, MacOSX).

This image focus on installing and running the following components :

  • PostgreSQL 9.3
  • PostGIS 2.1.3 with SFCGAL support
  • PgRouting
  • PostgreSQL PointCloud
  • PDAL

PostgreSQL is the advanced database providing a lot of features for data store and management. Version 9.3 is the latest stable version (9.4 is just around the corner), providing support for streaming replication, JSON, CTE, window queries and much, much more.

PostGIS is a PostgreSQL extension, enabling spatial support. It features new data types, such as 2D (and 2.5D) vector data, be it projected or in lat/lon. It also offers raster data support, to store gridded image data, and topological vector data storage.

SFCGAL support for PostGIS adds 3D features to the database. You can store 3D meshes, TIN, and operate on those objects.

PgRouting is a PostgreSQL/PostGIS extension allowing you to perform routing in the database on topological spatial data. It can be used to dynamically compute shortest paths, driving distances and more.

PostgreSQL PointCloud is a PostgreSQL/PostGIS extension which lets you deal with huge amounts of points, e.g. LIDAR data. PDAL is a library and set of tools allowing you to load and extract data from and to PointCloud (among other things).

All of this makes the best platform for spatial data crushing. Loads of spatial data.

Using it

Below is a longer full explanation to install this image and use it.

Install Docker

You first need to install docker tools on your Ubuntu system. You can install Docker on a lot of recent Linux flavors or in a Linux Virtual Machine ( with Boot2docker for example).

sudo apt install docker.io

Get the image and run the container

The following commands use docker to download the image from the docker registry (~1GB) and then run it in the foreground. If you omit the pull command and try to run it directly, Docker will look for the image on the Docker registry if not already present locally.

sudo docker.io pull oslandia/pggis
sudo docker.io run --rm -P --name pggis_test oslandia/pggis /sbin/my_init

Your container is initiated from the oslandia/pggis image. It is named pggis_test, runs in the foreground (default), will redirect all exposed ports to a host port ( -P ), and will delete all created filesystem and container image when the container exits ( --rm ). It will run the /sbin/my_init startup script of baseimage to launch all necessary processes inside the container.

Now PostgreSQL runs in your container with all extensions activated, and a new database created just for you.

Connect to the database

Assuming you have the postgresql-client installed on your host, you can use the host-mapped port to connect to the database. You need to use docker.io ps to find out what local host port the container is mapped to first:

$ sudo docker.io ps
CONTAINER ID        IMAGE                   COMMAND                CREATED             STATUS              PORTS                     NAMES
75fec271dc5e        oslandia/pggis:latest   /sbin/my_init       51 seconds ago      Up 50 seconds>5432/tcp   pggis_test

You can see that te container's port 5432 has been mapped to the host's port 49154 in this case (yours may differ). We can now connect to the database server through this host port. A pggis user has been created (with pggis as password) and a corresponding database with extensions activated.

$ psql -h localhost -p 49154 -d pggis -U pggis --password

Enjoy GIS features

You are now ready to use the database. Inside the psql console, you can check that everything is in order with queries like below.

pggis=# SELECT postgis_version();
(1 row)

pggis=# SELECT st_astext(st_makepoint(random() * 100, random() * 100)) from generate_series(1, 10);
 POINT(90.1295741088688 97.7623191196471)
 POINT(79.2798819020391 0.146342813968658)
 POINT(56.9643571972847 50.8249804843217)
 POINT(14.1903728246689 26.436754828319)
 POINT(46.5733857825398 39.7641783114523)
 POINT(89.1805129591376 36.6665146779269)
 POINT(17.3397121019661 20.6000158563256)
 POINT(11.5902675781399 93.6640297528356)
 POINT(96.0820932406932 53.891480108723)
 POINT(88.9889035373926 15.435611223802)
(10 rows)

Describing all features of installed components would require a lot more posts, coming later.

If you are interested in learning more, Oslandia can provide training, assistance and support


Do not use this in production !

The PostgreSQL configuration in this image opens the database connection without any IP restriction for the pggis super-user. The default pggis password is very weak too. This should NOT BE USED IN PRODUCTION, and you should adapt the configuration to your specific setup and security level. See below to rebuild the image with different configuration.

In a production environment, you would also want to run the container in the background ( -d option) and keep the container filesystem and images after running ( no --rm option ).

How it works

This image is a Docker image, which uses Linux Containers (LXC). The base image is Phusion baseimage , which it itself a Ubuntu 14.04 (Trusty). It is is built using a Dockerfile (see the source on Github ). This Dockerfile is used by Docker to build the pggis image, by processing all installation, download, compilation and setup of the various components. The result is a Docker image, which is then published on docker.io.

You can either directly download and use the image from docker.io as explained above, or rebuild it from the Dockerfile in the git repository like below.

Rebuilding the image

First install Docker on your machine as already stated. Native installation on Ubuntu 14.04 :

sudo apt-get install docker.io

Then clone the GitHub repository to get the Dockerfile :

git clone https://github.com/vpicavet/docker-pggis.git
cd docker-pggis

Then you can build the image, using the Dockerfile in the current repository.

sudo docker.io build -t oslandia/pggis .

Note that building the image requires quite a lot of download from internet, as well as enough RAM and CPU. Compilations are achieved along the way, with make -j3, thus enabling parallel compilation. You should adapt this value according to the number of CPUs you have.

Once you have built the image, you can run a container based on it just like above.

sudo docker.io run --rm -P --name pggis_test oslandia/pggis /sbin/my_init

If you want to change how the container works, or the default setup, you can edit the Dockerfile and rebuild the image. Docker caches every step of the build process (after each RUN command), so rebuilding can be very fast.

If you have a docker.io account, you can upload the image to the image registry like this :

sudo docker.io login
sudo docker.io push oslandia/pggis


This is a first dive into Docker, with a spatial GIS database container. There is a lot more you can do with Docker, and plenty of use cases for this spatial database setup. This Docker image, the setup and the way it is built can be improved a lot as well, and will certainly change along with package availability, ppa repositories, next versions of components, better Docker practices and versions, but it is already a good and efficient way to get the stack up quickly. Do not hesitate to fork the GitHub repository and send Pull Requests.

You can also report any issue on GitHub .

If you want to go further, or if you have any remark concerning the tools covered in this blog post, do not hesitate to contact us : infos+pggis@oslandia.com

by Vincent Picavet at May 20, 2014 07:00 AM

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PostgreSQL 9.4beta1 and PostGIS 2.2.0 dev on Windows

PostgreSQL 9.4beta1 was released last week and windows binaries for both 32-bit and 64-bit are already available to try it out from http://www.postgresql.org/download/windows. Since this is a beta release, there are no installers yet, just the zip binary archive. To make the pot a little sweeter, we've setup the PostGIS windows build bot (Winnie) to automatically build for 9.4 - PostGIS 2.2.0 development branch and pgRouting 2 branches whenever there is a change in the code. We also have the pointcloud extension in the extras folder. If you are on 9.3, we've got 2.2 binaries for that as well. The PostGIS/pgRouting related stuff you can find at http://postgis.net/windows_downloads in the 9.4 folder.

For the rest of this article we'll discuss a couple of stumbling blocks you may run into.

Much of what we'll describe here is windows specific, but thanks to the beauty of extensions and GUCs, the extension install and GUC setting part for PostGIS is applicable to all operating systems.

Continue reading "PostgreSQL 9.4beta1 and PostGIS 2.2.0 dev on Windows"

by Leo Hsu and Regina Obe (nospam@example.com) at May 20, 2014 01:39 AM

May 19, 2014

PostGIS Development

Security releases 2.0.6 and 2.1.3

It has come to our attention that the PostGIS Raster support may give more privileges to users than an administrator is willing to grant. These include reading files from the filesystem and opening connections to network hosts.

Continue Reading by clicking title hyperlink ..

by Sandro Santilli at May 19, 2014 12:00 AM

May 14, 2014

Pierre Racine

A guide to the rasterization of vector coverages in PostGIS

A frequent question since the beginning of the PostGIS Raster adventure has been "How can I convert my geometry table to a raster coverage?" (here also). People often describe the way to rasterize a single geometry using ST_AsRaster() but I don’t know any text or tutorial explaining how to convert a complete vector coverage to a single raster or to a tiled raster coverage.

In this article I will describe two methods to rasterize a vector coverage: one very fast but not very flexible and one very flexible but unfortunately a bit slow! I will rasterize a vector forest cover so that it aligns perfectly with an already loaded raster elevation layer. I will use the "height" column of the forest cover to assign values to pixels. I will assume that both layers are in the same SRID and that the forest only partially covers the elevation layer. The elevation layer is divided in 625 tiles of 100x100 pixels for a total of 6 250 000 pixels.

"Intelligent" vector VS "stupid" raster!

Why would one go from a human parsed, precise, object oriented, "intelligent" vector table to an unparsed, jagged, single variable, "stupid" raster layer? Over the years I could identify only two good reasons to do so. The first is to get all your data layers into a single analysis paradigm (in this case the raster one) to simplify the geoprocessing chain. The second is because this "stupid" raster paradigm is still also the fastest when it’s time to analyse large volume of data. Processing a raster stack relies on very simple matrix operations generally available through what geo people call map algebra operations. This is still generally faster than the linear algebra necessary to analyse vector coverages. Anyway, our task here is not to decide whether doing it in vector mode is better than in raster mode. Our task is just to get the conversion done properly…

"Rasterizing" vs "rendering"

Another interesting matter is the difference between rasterizing and rendering. Rendering is the conversion of a vector model to an image for display purpose. Rasterizing is the conversion of a vector model to a raster for data analysis purpose. Renderers, like MapServer, GeoServer and Mapnik, besides blending many layers together, symbolizing geometries and labeling them, will generally produce anti-aliasing along the edge of polygons and translate values into symbolic colors. Any original data value is lost and there is no way to use the product of a renderer for GIS analysis. On the other hand, a rasterizer function, like the ones provided by GDAL, QGIS, ArcGIS or PostGIS, will produce raw pixels trying to represent the original vector surface as crisply and accurately as possible, assigning to each pixel one of the value (and only those values) associated with the original geometry. The raster resulting from a rasterization process is a valuable input into a geoprocessing analysis chain. The result of a rendering process is not. We must also say that although there are some methods to rasterize a vector coverage in PostGIS, there are still no ways to render one (unless you consider ST_AsJPEG(), ST_AsTIFF() and ST_AsPNG() as very basic renderers).

Method 1.1 – Basic ST_Union() and ST_Tile()

The first rasterization method I present is the one that was planned from the beginning of the PostGIS Raster project: simply convert all the geometries to small rasters and union them into a single raster like we union many geometries together using ST_Union(geometry). A complete query, aligning the resulting raster on an existing elevation raster coverage, should looks like this:

CREATE TABLE forestheight_rast AS
SELECT ST_Union(ST_AsRaster(geom, rast, '32BF', height, -9999)) rast
FROM forestcover, (SELECT rast FROM elevation LIMIT 1) rast;

Close-up on the unioned rasterized
version of the forest cover resulting
in "jagged" areas of same height.
This results is a unique "big" raster, aggregating all the "small" rasterizations of the geometries.

Note the arguments to ST_AsRaster(). The first one is the geometry to rasterize. The second one is the reference raster from which the alignment parameters are borrowed (a pixel corner x and y, the scale (or pixel size) and the skew (or rotation)). The third argument is the pixel type of the expected raster (32 bit float). The fourth parameter is the value to assign to the pixel. Here is it the 'height' column from the forest cover table. The last parameter ('-9999') is the nodata value to assign to pixels padding the area surrounding each rasterized geometry. This is the very first variant of the ST_AsRaster() function in the PostGIS Raster reference.

Like its geometric counterpartST_Union() simply aggregate (merge) all those small rasters together into a unique raster. Without ST_Union(), the query would have resulted in 2755 small rasters. One for each geometry.

Note also that we picked only one raster from the elevation layer as the reference raster to ST_AsRaster(). This is because all the rasters in this table are well aligned together and we need only one set of pixel corners coordinates to align everything. We could also have put this query selecting a single raster directly into the SELECT part:

CREATE TABLE forestheight_rast AS
SELECT ST_Union(ST_AsRaster(geom, (SELECT rast FROM elevation LIMIT 1), '32BF', height, -9999)) rast
FROM forestcover;

We normally don’t want to keep big rasters into PostGIS so we can split the merged raster into smaller tiles having the same dimensions as the elevation coverage with the ST_Tile() function:

CREATE TABLE tiled_forestheight_rast AS
SELECT ST_Tile(rast, 100, 100) rast
FROM forestheight_rast;

The tiling of the rasterized forest cover, in red, is not well
aligned with the tiling of the the elevation raster, in grey.
Note that the squares are 100x100 pixels tiles, not pixels...
This results in 48 tiles, most being 100x100 pixels. Some are only 18x100 pixels and some are only 100x46 pixels. The lower right one is only 18x46 pixels...

There are 625 tiles of 100x100 pixels in the elevation coverage and 2755 geometries in the forest cover. These geometries cover only 31 of those tiles and the resulting big raster covers 56 of them. The query for creating the "big" raster takes 5 seconds (thanks to Bborie Park who speeded up ST_Union() by a huge factor in PostGIS 2.1) and the query to resample/retile it takes only 2 seconds.

One problem with this result is that even though the pixels are well aligned with the elevation layer and most of the tiles have the same dimensions, the tiles themselves are not aligned with the elevation coverage and they do not cover the same area. Method 1.2 will fix that.

Method 1.2 – ST_Union() and ST_MapAlgebra()

In order to produce a raster coverage having a footprint identical to the elevation coverage we must use this coverage not only for alignment but also as the base for our final tiles. The trick is to "burn" the big raster into empty tiles based on each of the elevation tiles. For that we will use a quite simple (!) two rasters ST_MapAlgebra() call:

CREATE TABLE welltiled_forestheight_rast AS
SELECT ST_MapAlgebra(fc.rast, ST_AddBand(ST_MakeEmptyRaster(e.rast), '32BF'::text, -9999, -9999), '[rast1]', '32BF', 'SECOND') rast
FROM forestheight_rast fc, elevation e;

The two rasters ST_MapAlgebra() function overlays the two rasters and compute an expression for each aligned set of two pixels. The result of this expression becomes the value assigned to each pixel of a new raster.

As the two rasters are not necessarily well aligned, the new raster extent can be equal to 1) the extent of the union of both raster, 2) the extent of the intersecting area 3) the extent of the first raster or 4) the extent of the second raster. Here, no two rasters involved in the map algebra have the same extent. Since we want to "burn" the big raster into tiles having the same extent as the elevation tiles, which are second in the list of arguments, we have to specify that the new raster will have the same extent as the "SECOND" raster.

The first argument is the "big" forest height raster. It is the only row of the forestheight_rast table.

The second argument is a new tile generated with ST_MakeEmptyRaster() and ST_AddBand() from each of the 626 elevation tiles. We could have used the original tiles directly, but since we are specifying the resulting extent to be "SECOND" and that the nodata value of the result from ST_MapAlgebra() is picked from the "FIRST" or the "SECOND" raster when those values are specified, then the resulting raster would have had 32767 as nodata value. This is the nodata value of the elevation raster coverage. We want the nodata value of the result to be the same as the one of the forest height coverage, which is -9999. So the trick is to "build" a new tile from scratch with ST_MakeEmptyRaster() with the original elevation tiles as alignment reference and add them a band with ST_AddBand() specifying a pixel type ('32BF'), an initial value ('-9999') and a nodata value ('-9999') identical to the forest height raster.

The third argument is the expression computing the value to assign to each pixel. Pixel values from the first raster are refered as "[rast1]" and pixel values from the second raster are referred as "[rast1]". You can also reference the x coordinate of the pixel with "[rast.x]" and the y coordinate with "[rast.y]". The expression is any normal PostgreSQL expression like for example "([rast1] + [rast2]) / 2".

In our case the expression is simply the value of the first raster, the forest height one, that we want to "burn" into the tile. It does not involve values from the elevation raster. In other word the elevation tiles are merely used as well aligned blank pieces of paper on which we "print" the value of the forest height raster...

The fourth argument is the pixel type ('32BF') expected for the resulting raster. It is the same as the forest height raster.

The last argument ('SECOND') tells ST_MapAlgebra() to use the extent of the second raster (the elevation tiles) to build the resulting raster.

ST_MapAlgebra() will hence copy each value from the forest height raster to new empty tiles which dimensions are based on the elevation ones. The query results in 625 tiles. When a tile does not intersect the area covered by the forest cover, all its pixel are simply set to nodata.

If your big raster does not cover much of the reference raster coverage, you might speed up the process by adding a spatial index to the reference coverage (it should already be there) and restraining the ST_MapAlgebra() computation to tiles intersecting with the vector coverage:

CREATE INDEX elevation_rast_gist ON elevation USING gist (st_convexhull(rast));

CREATE TABLE welltiled_forestcover_rast AS
SELECT ST_MapAlgebra(fc.rast, ST_AddBand(ST_MakeEmptyRaster(e.rast), '32BF'::text, -9999, -9999), '[rast1]', '32BF', 'SECOND') rast
FROM forestcover_rast fc, elevation e
WHERE ST_Intersects(fc.rast, e.rast);

Keep in mind however that this query is faster because it does not process and hence does not return tiles not touching the forest cover area. If you want to produce a complete set of tile, identical in number and extent to the elevation coverage, keep with the previous query.

Some drawbacks to the methods using ST_AsRaster() and ST_Union()

All those queries involving ST_AsRaster() and ST_Union() are quite fast but they present two drawbacks:
  • ST_Union limited by RAM - Any query involving ST_Union() is limited by the RAM available to PostgreSQL. If the big raster resulting from the union of all the rasterized geometries is bigger than the available RAM, the query will fail. The trick to avoid this is to aggregate the small rasters not into a huge unique raster, but into smaller groups of rasters intersecting with tiles from the reference raster. Method 1.3 will show how to do that.
  • Only one extractable metric - A second limitation is that ST_AsRaster() has only one method to determine if the geometry value must be assigned to the intersecting pixel or not. When the geometry is a polygon, for example, ST_AsRaster() assigns the value of the geometry to the pixel if the center of the pixel is inside this polygon whatever what proportion of this polygon covers the pixel area. In the worst case the value of a very small polygon encircling the center of the pixel would be assigned when all the rest of the pixel area would be covered by a polygon with a different value. You can think of many cases where the value at the center of the pixel is not necessarily representative of the area covered by the pixel.

    This is because ST_AsRaster() rasterize only one geometry at a time without consideration for other geometries intersecting with the pixel. If you consider all the geometries of a layer when assigning a value, you might want to assign other metrics to the pixels like the count of geometry intersecting with the pixel, the total length of all the polylines intersecting with the pixel, the value associated with the smallest polygon intersecting with the pixel, etc...

    Method 2 will show how to use the PostGIS Addons ST_ExtractToRaster() function to extract some of those metrics from a whole vector coverage and how to easily implement new ones.

    Method 1.3 – Memory efficient ST_Union() and ST_MapAlgebra()

    This method is RAM safe in that it never produces a "big" raster bigger than the area covered by one tile from the reference raster. It is a little bit slower than the first query of method 1.2 because it union some rasters many times when they touches many tiles. It also produces 625 tiles like the elevation coverage.

    CREATE TABLE ramsafe_welltiled_forestcover_rast AS
    WITH forestrast AS (
    SELECT rid, ST_MapAlgebra(
    ST_Union(ST_AsRaster(geom, rast, '32BF', height, -9999)),
    ST_AddBand(ST_MakeEmptyRaster(rast), '32BF'::text, -9999, -9999),
    '[rast1]', '32BF', 'SECOND') rast
    FROM a_forestcover2, elevation
    WHERE ST_Intersects(geom, rast)
    GROUP BY rid, rast
    SELECT a.rid,
    WHEN b.rid IS NULL THEN ST_AddBand(ST_MakeEmptyRaster(a.rast), '32BF'::text, -9999, -9999)
    ELSE b.rast
    END rast
    FROM elevation a LEFT OUTER JOIN forestrast b
    ON a.rid = b.rid;

    The first query in the WITH statement (forestrast) rasterize the geometries, union them by group intersecting the reference raster tiles and burn each of them to a new tile as seen before. This is done by the addition of a WHERE clause limiting the operation to geometries intersecting with tiles and a GROUP BY clause insuring that ST_Union() only aggregate small rasters "belonging" to the same tile.

    The WHERE clause however cause the extent to be limited to the area covered by the forest cover. This first part alone would return only 32 tiles. The second part of the query make sure we get the 593 remaining tiles. It uses a LEFT OUTER JOIN to make sure a row is returned for every 625 tiles from the reference raster coverage. When the rid of the reference raster coverage match one of the 32 tiles, these tiles are returned. Otherwise a new empty tile is returned. You might skip this part if your source vector layer covers the full area of the reference raster or if you wish to limit the rasterized area to the one of the vector layer.

    Method 2 – PostGIS Add-ons ST_ExtractToRaster()

    As said before, ST_AsRaster() is quite limited in terms of the kind of value it can assign to the pixels. Not so many values can be extracted from a single polygon: the value at the centroid, the proportion of the pixel area covered, the value weighted by this proportion, the length of the intersecting part of a polyline being rasterized... ST_AsRaster() however, because it is based on GDAL, only implements the first one.

    Furthermore, many more metrics can be imagined if the value assigned to each pixel comes not only from one geometry but from the aggregation of all the geometries (or the values associated with them) of a vector coverage intersecting the centroid or the surface of the pixel. We might, for example, want to compute the number of geometries intersecting with the pixel. We might be interested by the value of the polygon covering the biggest part of the pixel or of the polyline having the longest length inside the pixel are. We might want to merge together geometries having the same value before computing the metric. I could identify nearly 76 different possible metrics extractable from point, polyline and polygon coverages.

    We hence need a function that is able to extract metrics from a vector coverage as a whole and which allows us to implement new extraction methods easily. The PostGIS Add-ons ST_ExtractToRaster() function was written with exactly this goal in mind.

    The PostGIS Add-ons is a set of about 15 pure PL/pgSQL community written functions helping to write advanced PostGIS queries. One of the most useful of these functions is ST_ExtractToRaster().

    A query using ST_ExtractToRaster(), returning the same set of tiles as method 1.3 and also not suffering from RAM limitations would look like this:

    CREATE INDEX forestcover_geom_gist ON forestcover USING gist (geom);

    CREATE TABLE extracttoraster_forestcover AS
    SELECT ST_ExtractToRaster(
    ST_AddBand(ST_MakeEmptyRaster(rast), '32BF'::text, -9999, -9999),
    FROM elevation;

    First and very important remark: this query takes around 200 seconds! This is about 16 times more than the equivalent RAM safe query of method 1.3! This is because ST_ExtractToRaster() does much more than ST_AsRaster(). It not only considers the whole coverage, it can compute very different things from this coverage. It is definitely not worth using ST_ExtractToRaster() if what you want is the value of the geometry intersecting the pixel centroid. The interest for this function is the other methods available...

    Note the last argument to ST_ExtractToRaster(): 'MEAN_OF_VALUES_AT_PIXEL_ CENTROID'. This is the method used to extract a value for the pixel. Fifteen (15) of these methods have been implemented with ST_ExtractToRaster() up to now. They are listed below with a short description.

    The 'MEAN_OF_VALUES_AT_PIXEL_CENTROID' method returns a raster coverage which values are identical to the raster coverage produced with the previous methods. Most other methods, however, will return quite different rasters…

    The other arguments to ST_ExtractToRaster() are quite trivial: the first is the raster which values are extracted to. It’s a good practice to pass a new empty raster band having the right pixel type, initial value and nodata value and which alignment is based on tiles from the reference raster. This is identical as what we did in 1.2 and 1.3. The second argument is the schema of the vector table to rasterize ('public'). The third and the fourth are the name of the table ('forestcover') and the name of the column  ('geom') containing the geometry to rasterize. The fifth argument is the column containing the value to be used in the computation of a new value when it is necessary ('height'). When the computation involve only geometries (e.g. COUNT_OF_VALUES_AT_PIXEL_ CENTROID or SUM_OF_LENGTHS), this argument is not necessary.

    The query will return a new raster for each raster in the elevation table, in this case 625.

    Close-up on four pixel taking different values depending on the method used to assign them a value. The green and blue pixels were extracted using the MEAN_OF_VALUES_AT_PIXEL_CENTROID method. The pixels delimited by the red lines were extracted using the VALUE_OF_MERGED_BIGGEST method. The values displayed are the values associated with the polygons which a delineated in black. Note how the pixels delineated by the red lines take the values of the polygons occupying the biggest proportion of the pixel.
    Here is a list of methods already implemented for ST_ExtractToRaster(). All methods using the pixel centroid are postfixed with "_AT_PIXEL_CENTROID". All other methods involve the pixel as a rectangular surface.
    • COUNT_OF_VALUES_AT_PIXEL_CENTROID: Number of geometries intersecting with the pixel centroid.
    • MEAN_OF_VALUES_AT_PIXEL_CENTROID: Average of all values intersecting with the pixel centroid.
    • COUNT_OF_POLYGONS: Number of polygons or multipolygons intersecting with the pixel surface.
    • COUNT_OF_LINESTRINGS: Number of linestrings or multilinestrings intersecting with the pixel surface.
    • COUNT_OF_POINTS: Number of points or multipoints intersecting with the pixel surface.
    • COUNT_OF_GEOMETRIES: Number of geometries (whatever they are) intersecting with the pixel surface.
    • VALUE_OF_BIGGEST: Value associated with the polygon covering the biggest area intersecting with the pixel surface.
    • VALUE_OF_MERGED_BIGGEST: Value associated with the polygon covering the biggest area intersecting with the pixel surface. Polygons with identical values are merged before computing the area of the surface.
    • VALUE_OF_MERGED_SMALLEST: Value associated with the polygon covering the smallest area in the pixel. Polygons with identical values are merged before computing the area of the surface.
    • MIN_AREA: Area of the geometry occupying the smallest portion of the pixel surface.
    • SUM_OF_AREAS: Sum of the areas of all polygons intersecting with the pixel surface.
    • SUM_OF_LENGTHS: Sum of the lengths of all linestrings intersecting with the pixel surface.
    • PROPORTION_OF_COVERED_AREA: Proportion, between 0.0 and 1.0, of the pixel surface covered by the union of all the polygons intersecting with the pixel surface.
    • AREA_WEIGHTED_MEAN_OF_VALUES: Mean of all polygon values weighted by the area they occupy relative to the pixel being processed. The weighted sum is divided by the maximum between the intersecting area of the geometry and the sum of all the weighted geometry areas. That means if the pixel being processed is not completely covered by polygons, the value is multiplied by the proportion of the covering area. For example, if a polygon covers only 25% of the pixel surface and there are no other polygons covering the surface, only 25% of the value associated with the covering polygon is assigned to the pixel. This is generally the favorite behavior.
    • AREA_WEIGHTED_MEAN_OF_VALUES_2: Mean of all polygon values weighted by the area they occupy relative to the pixel being processed. The weighted sum is divided by the sum of all the weighted geometry areas. That means if the pixel being processed is not completely covered by polygons, the full weighted value is assigned to the pixel. For example, even if a polygon covers only 25% of the pixel surface, 100% of its value is assigned to the pixel.

    Adding methods to ST_ExtractToRaster()

    Want to compute some other esoteric metrics with ST_ExtracToRaster()? Internally the function use the callback version of ST_MapAlgebra() which call one of the two predefined callbacks depending on if the value is computed for the pixel centroid or if the value is computed using the full surface of the pixel. Those functions are named ST_ExtractPixelCentroidValue4ma() and ST_ExtractPixelValue4ma(). They fulfil to the criteria for ST_MapAlgebra() callback functions. They take three arguments defining the value of the input pixel (there can be many), the position of the pixel in the raster and some extra parameters when necessary. In order to build a query for each pixel, both functions expect extra parameters: the alignment parameters of the raster for which values are being computed and the parameters of the geometry column for which values are being extracted. The callbacks then define a series of extraction methods.

    To add a new method to these callbacks, you can simply copy the query associated with an existing method that extract a value similar to what you want to extract, give it a proper name, modify it to compute the expected value and make sure it uses the callback extra arguments properly. It is a good idea to test the query directly on the vector coverage prior to test it in one of the two callback. Most of the methods do a ST_Intersects() between the shape of the pixel and the vector coverage (this is why it is important that the vector coverage be spatially indexed) and extract some metrics from the list of intersecting geometries.

    If you add a new method which might be useful to others, let me know. I will add it to the PostGIS Add-ons code…


    Rasterizing a complete vector coverage can be done very quickly with ST_AsRaster() and ST_Union() but some projects require more sophisticated methods to assign values to pixels considering the coverage as a whole. PostGIS Add-ons ST_ExtractToRaster() is slow but provides much more control on the metric that is extracted from the vector coverage.

    In both case it is possible to extract values following a preloaded reference tiled coverage. That should be the case for most applications. It is also easy to add more extraction methods to ST_ExtractToRaster().

    by Pierre Racine (noreply@blogger.com) at May 14, 2014 12:42 PM

    May 07, 2014

    Boston GIS (Regina Obe, Leo Hsu)

    PostGIS in Action and PostgreSQL Up and Running 2nd Editions

    You might have noticed I haven't been blogging much lately. Hopefully I can get back to that and updating the BostonGIS tutorials soon. They are embarrassingly dated at this point and I want to add more like using the very cool TileMill, comparisons between Leaflet and OpenLayers, updated comparisons between SQL Server, PostGIS, and Oracle among other content updates. Most of the reason for sluggish pace is that writing has been taking up a good chunk of our time.

    We are in final stages of two books, both 2nd editions:
    PostGIS In Action, 2nd

    , which covers predominantly PostGIS 2.1 and

    PostgreSQL: Up and Running, 2nd edition

    which covers mostly PostgreSQL 9.2 and 9.3.

    Both are feature complete, but still undergoing polishing . Hopefully when all is said and done we'll have two books that make great companions for GIS folks working with PostGIS. Both are now available in draft form direct from the publishers and you'll get the final polished version when it comes out.

    Continue reading "PostGIS in Action and PostgreSQL Up and Running 2nd Editions"

    by Regina Obe (nospam@example.com) at May 07, 2014 04:09 AM

    April 25, 2014

    Safe Software

    5 Ways to Do More with Mapnik

    Mapnik is a free, open source toolkit that takes vector and raster spatial data and renders it into a beautiful image. As of FME 2014, FME and Mapnik have been brought together to create the MapnikRasterizer.

    This aesthetically pleasing raster image was created by combining a variety of data sources using FME and Mapnik.

    This aesthetically pleasing raster image was created by combining a variety of data sources using FME and Mapnik.

    Adding a Mapnik transformer to FME is like adding a top hat to Abraham Lincoln: it brings a whole new level of style and class. While FME can already take plain vector data and render it as a raster, Mapnik brings a new level of elegance and quality offered by subpixel anti-aliasing. It also offers a cleaner replacement to many FME transformers that do regular image processing. Rasters that once needed dozens of transformers to render now need only a handful. Mapnik is certainly a top hat on FME’s rasterizing capabilitiesso what does FME bring to Mapnik? We offer 5 improvements below. Be sure to join our webinar on May 7, 2014 at 8am PDT to see Mapnik examples that will change the way you think about cartography!

    1. Read and write more data formats than Mapnik currently supports

     Create more meaningful Mapnik cartography with the ability to integrate hundreds of data formats.

    Create more meaningful Mapnik cartography with the ability to integrate hundreds of data formats.

    When you work with spatial data, being limited to a handful of formats is like having a cartographer sit on you and stuff your arms in a straitjacket. (Constricting, yes, but also painful and awkward.) At the time of writing, Mapnik can read Esri Shapefiles, PostGIS, TIFF raster, OSM XML, Kismet, and OGR/GDAL formats. FME broadens that format reach to include 325+ vector, raster, 3D, tabular, database, web, XML, cloud (and more) formats.  This means it’s easier than ever to work with Mapnik from a variety of systems and applications. FME’s data integration capabilities offer precise control over the source and destination data, including over coordinate systems and reprojection.

    2. Automate Mapnik workflows

    This graphical workflow was created using FME Desktop and reads from 5 disparate sources. The workflow can easily be scaled and automated.

    This graphical workflow was created using FME Desktop and reads from 5 disparate sources. The workflow can easily be scaled and automated.

    When you design a workflow involving Mapnik, you can use any of FME’s 400+ transformers to validate the data, optimize data before processing it, or restructure it to meet requirements. From there it’s easy to turn all your Mapnik rendering tasks into standard, repeatable actions. You can perform the same data transformations on many datasets, or on a single dataset that constantly changes. This is really good news, because the data workflow you spent so long creating shouldn’t have to disappear forever. This isn’t an ice sculpture contest. (By the way, Mapnik currently runs on Windows, Mac, and Linux, and so does FME.)

    3. Wile those tiles, OSM style! (i.e., set up a Mapnik WMTS/WMS with FME Server)

    FME makes it simple to set up a Mapnik web map tile service.

    FME makes it simple to set up a Mapnik web map tile service.

    With FME you can take advantage of scalable automation by creating a Mapnik workflow and uploading it to FME Server or FME Cloud. Whether your server needs are big or small, it’s easy as that to make your data available and real-time. You can create a web service that serves map tiles in a standard way; for instance, you could serve OpenStreetMap (OSM) data as a backdrop, stylized in a way that meets your needs.

    4. Easy Mapnik styling – goodbye Python, hello GUI

    FME's friendly interface makes it simple to style Mapnik output without code.

    FME’s friendly interface makes it simple to style Mapnik output without code.

    Mapnik offers precise control of symbology and styling, including fill color, dash pattern, fonts, etc. FME makes it intuitive to control it all. Back in 2011, FME users could leverage Mapnik with Python. Today, the MapnikRasterizer makes this much simpler with a friendly GUI. Thank God, because I didn’t want to touch that Python script with a 39½ foot pole. The MapnikRasterizer transformer doesn’t require any CSS or XML skills: it simply accepts parameters in a point-and-click interface like the rest of the FME transformers.

    5. Win stuff for sharing your Mapnik art

    Keeping true to the world of Open Data, we think Mapnik projects are worth sharing. Enter our contest at safe.com/MapnikContest

    Keeping true to the world of Open Data, we think Mapnik projects are worth sharing. Check out our contest.

    Ok, Picasso. We’re giving out prizes for your Mapnik art! What can you create using FME and Mapnik? Whatever it is, I hope it’s less creepy than this. Together, Mapnik and FME help create beautiful and meaningful cartography worth sharing. If you think you’re the ultimate Mapnik Master, enter our Mapnik contest. (Deadline: June 30, 2014)     banner_fmerocks I said adding Mapnik to FME is like adding a top hat to Abraham Lincoln – but maybe adding FME to Mapnik is a shift into Abraham Lincoln: Vampire Hunter. FME brings hardcore data transformation and automation capabilities to your Mapnik projects. With FME Desktop, FME Server, and FME Cloud at your fingertips, the potential to create amazing cartography is endless. Thanks to the Mapnik team for making such great cartography possible in FME. What will you do with Mapnik? Be sure to share your creations with us in our contest, and tune in to next week’s webinar to learn more. For technical resources on the MapnikRasterizer, check out the MapnikRasterizer FME transformer documentation, and these Mapnik tutorials on FMEpedia.

    The post 5 Ways to Do More with Mapnik appeared first on Safe Software Blog.

    by Tiana Warner at April 25, 2014 07:01 PM


    Premiers pas avec qgis-epanet

    EPANET est un logiciel de simulation du comportement hydraulique et qualitatif de l'eau dans les réseaux sous pression. Il s'agît donc d'un logiciel permettant de déterminer pression, débit et qualité de l'eau dans un réseau d'adduction.

    Oslandia a développé une extension pour QGIS permettant de réaliser des simulations hydrauliques directement dans ce logiciel SIG, et de visualiser les résultats. Oslandia fournit du service autour de ces outils ( infos@oslandia.com ) et développe actuellement l'extension pour les réseaux d'assainissement (SWMM).

    Ce document présente l'installation et les premiers pas de l'extension de simulation hydraulique pour QGIS.

    Une documentation traduite en français de EPANET est disponible. Les tables et couches géographiques qui sont utilisées par le plugin sont calquées sur les sections du fichier d'entrée d'EPANET présentées en Annexe C du document.

    Étape 1: Installation d'EPANET

    Télécharger l'installeur d'EPANET et l'exécuter.

    Dans le première fenêtre cliquer sur Next. Dans la seconde fenêtre, notez l'endroit où sera installé EPANET, ici C:Program Files (x86)EPANET2, puis cliquer sur Next.

    Dans la troisième et la quatrième fenêtre cliquer sur Next

    Cliquer sur Finish.

    Étape 2: Installation du plugin

    Dans le menu Extension cliquer sur Installer/Gérer les extensions

    Note : pour le moment l'extension est encore en mode "expérimental". Il faut donc pour la voir, activer les extensions expérimentales de QGIS en allant cocher la case à cet effet dans l'onglet des paramètres de la fenêtre de gestion des extensions.

    Dans l'onglet Non installé rechercher l'extension qgis_epanet, la sélectionner et cliquer sur Installer l'extension.

    Fermer la boîte de dialogue vous signalant que l'installation est réussie en cliquant sur OK puis fermer le gestionnaire d'extensions en cliquant sur Fermer.

    Étape 3: Configuration du plugin

    Pour lancer une simulation, le plugin doit savoir où trouver le programme EPANET.

    Dans le menu Traitements choisir Options.

    Dérouler le menu Epanet dans Prestaire de service et double-cliquer à droite de Epanet command line tool (zone bleue sur la figure ci-dessous), puis cliquer sur ...

    Sélectionner l'application epanet2d (ou epanet2d.exe) qui se trouve dans le répertoire d'installation d'EPANET (ici C:Program Files (x86)EPANET2). Puis cliquer sur ouvrir.

    De retour dans la fenêtre des options des Traitements, cliquer sur OK.

    Dans le menu Traitements cliquer sur Boîte à outils.

    epanetico Epanet devrait apparaître dans la liste des outils de traitement disponible. Si ce n'est pas le cas, vérifier que l'interface des outils de traitement est en mode avancé (Advanced interface en bas de la fenêtre)

    Étape 4: Création du réseau

    Pour prendre en main le plugin, nous allons créer un réseau simple: un réservoir se vidant dans un tuyau.

    Nous proposons dans ce qui suit une création pas à pas. Vous pouvez aussi télécharger l'exemple complet et charger les couches à partir du fichier simple_network.sqlite:

    • junctions
    • pipes
    • tanks

    ainsi que les tables (couches sans géométrie):

    • times
    • report

    Créer une nouvelle couche shapefile pour les nœuds du réseau (junctions dans EPANET):

    Supprimer le champ id (présent par défaut) et créer les champs:

    • id_node de type texte,
    • altitude de type décimal avec un précision de 4,
    • base_demand de type décimal avec un précision de 4,
    • modulation_curve de type texte

    Cliquer sur OK et nommer le fichier junctions.shp.

    Sélectionner la couche nouvellement créee, passer en mode édition et ajouter un nœud et saisir la valeur ci-dessous.

    Créer une nouvelle couche shapefile pour les réservoirs (tanks dans EPANET) avec les champs suivants:

    • id_node de type texte,
    • altitude de type décimal avec une précision de 4,
    • init_level de type décimal avec une précision de 4,
    • min_level de type décimal avec une précision de 4,
    • max_level de type décimal avec une précision de 4,
    • diameter de type décimal avec une précision de 4,
    • min_volume de type décimal avec une précision de 4,
    • volume_curve de type texte

    Ne pas oublier de supprimer le champ id présent par défaut. Cliquer sur OK et nommer le fichier tanks.shp.

    Sélectionner la couche nouvellement créee, passer en mode édition, ajouter un nœud et saisir la valeur ci-dessous.

    Ajouter une nouvelle couche shapefile pour les tuyaux (pipes dans EPANET) avec les champs suivants:

    • id_arc de type texte,
    • start_node de type texte,
    • end_node de type texte,
    • length de type décimal avec une précision de 4,
    • diameter de type décimal avec une précision de 4,
    • rugosity de type décimal avec une précision de 4,
    • minor_loss de type décimal avec une précision de 4,
    • status de type texte

    Ne pas oublier de mettre le type de la couche à Ligne (le défaut est Point) et de supprimer le champ id présent par défaut.

    Sélectionner la couche nouvellement créee, passer en mode édition et ajouter une ligne joignant les deux nœuds puis saisir la valeur ci-dessous.

    Dans le cas d'un réseau existant, il est aisé de créer des vues sur les données pour satisfaire à cette contrainte. On peut se référer au fichier qwat_to_epanet_views.sql pour un exemple de vues sur un modèle de données de réseau de distribution d'eau.

    Étape 5: Définition des paramètres de simulation

    Deux types de paramètres sont nécessaires: ceux relatifs au temps (times pour EPANET) et ceux relatifs à la génération de résultats (report pour EPANET).

    Les champs pour la table times sont:

    • "simulation title" de type texte,
    • duration de type texte,
    • "hydraulic timestep" de type texte,
    • "quality timestep" de type texte,
    • "pattern timestep" de type texte,
    • "pattern start" de type texte,
    • "report timestep" de type texte,
    • "report start" de type texte,
    • "start clocktime" de type texte,
    • statistic de type texte

    Les champs nécessaires pour la table report sont (ils peuvent être plus nombreux):

    • "simulation title" de type texte,
    • status de type texte,
    • summary de type texte,
    • nodes de type texte,
    • links de type texte,

    Les paramètres de simulation sont stockés sous la forme d'une ligne dans une table indexée par le nom de la simulation. Autrement dit, pour chaque simulation on a une ligne dans la la table times et une ligne dans la table report avec une valeur identique dans le champ simulation title.

    Lancer le programme Bloc-notes (ou tout autre éditeur de texte) et copier les lignes suivantes:

    "simulation title",  "status", "summary", "nodes", "links"
    'Epanet Simulation', 'Full',   'No',      'ALL',   'ALL'

    Enregistrer le fichier sous le nom report.csv. Dans le menu Fichier, cliquer sur Nouveau et copier les lignes suivantes:

    "simulation title", duration, "hydraulic timestep", "quality timestep", "pattern timestep", "pattern start", "report timestep", "report start", "start clocktime", statistic
    Epanet Simulation, 48:00, 0:05, 0:05, 0:05, 0:00, 0:05, 0:00, 12:00 AM, None

    Enregistrer le fichier sous le nom times.csv

    Dans QGIS, ajouter une couche texte délimité, sélectionner le fichier report.csv que vous venez de créer, cocher CSV (virgule) et Pas de géométrie

    Ajouter de la même façon times.csv dans QGIS comme couche texte délimité.

    Étape 6: Simulation

    Votre projet devrait maintenant ressembler à ceci :

    Dans la boîte à outils de Traitements, double-cliquer sur Simulate flow in drinking water network. Donner les valeurs suivantes aux paramètres en sélectionnant dans les menus déroulants:

    • Title -> Epanet Simulation
    • Junctions Layer -> junctions
    • Pipes Layer -> pipes
    • Tanks Layer -> tanks
    • Simulation Time -> times
    • Report options -> report

    Note : dans les prochaines versions de QGIS, ces menus déroulant se rempliront automatiquement avec les couches portant les noms déterminés

    Cliquer sur Run. Quatre couches sans géométries devraient être ajoutées au projet à l'issue de la simulation.

    Étape 7: Analyse des résultats

    Il y a deux type de tables de résultat:

    • résultats bruts de la simulation: pressions et débits en tout point du réseau, à chaque étape de la simulation
    • résultats d'agrégations temporelles: minima, maxima, moyennes de valeurs en tout point du réseau

    Les résultats bruts sont utilisés pour tracer l'évolution temporelle d'une valeur en fonction du temps : pression pour un nœud, débit pour un tuyau et niveau pour un réservoir.

    Pour voir l'évolution du niveau du réservoir en fonction du temps, sélectionner la couche tanks (1) puis l'outil de sélection d'entités (2), sélectionner le réservoir (3) et cliquer sur le bouton timeplot (4)

    Les agrégats temporels servent à créer des cartes. Pour cette raison les tables sont liées (grâce à une jointure) aux couches géographiques. Des attributs additionnels sont donc disponibles pour les couches junctions, pipes et tanks. Grâce à une symbologie par règle, l'utilisateur peut, par exemple, visualiser en rouge les tuyaux du réseau dans lesquels la vitesse moyenne était insuffisante (stagnation) ou les nœud du réseau où la pression maximum était trop grande.

    Dans le cas de notre réseau trop simple cette représentation n'est pas pertinente, mais vous pouvez en voir une démonstration sur la vidéo de présentation de l'extension :

    Vous pouvez aussi lire la présentation du plugin en anglais.

    Oslandia peut vous aider dans la mise en place de ces outil, ou l'intégration dans votre système d'information, par de la formation, du conseil, support, ou du développement pour améliorer les fonctionnalités actuels de cette extension : infos@oslandia.com

    Nous travaillons également sur une intégration équivalente de SWMM, outil pour la modélisation de réseaux d'assainissement, qui devrait être publié dans les mois à venir. Si vous voulez participer au financement de cette extension pour la voir sortir plus rapidement, n'hésitez pas à nous contacter.

    by Vincent Mora at April 25, 2014 07:00 AM

    Postgres OnLine Journal (Leo Hsu, Regina Obe)

    Using HStore for Archiving

    I'm not a big proponent of schemaless designs, but they have their place. One particular place where I think they are useful is for archiving of data where even though the underlying table structure of the data you need to archive is changing, you want the archived record to have the same fields as it did back then. This is a case where I think Hstore and the way PostgreSQL has it implemented works pretty nicely.

    Side note: one of the new features of PostgreSQL 9.4 is improved GIN indexes (faster and smaller) which is very often used with hstore data (and the new jsonb type). We're really looking forward to the GIN improvements more so than the jsonb feature. We're hoping to test out this improved index functionality with OpenStreetMap data soon and compare with our existing PostgreSQL 9.3. OpenStreetMap pbf and osm extract loaders (osm2pgsql, imposm) provide option for loading tagged data into PostgreSQL hstore fields, in addition to PostGIS geometry and other attribute fields. So 9.4 enhancements should be a nice gift for OSM data users. More on that later.

    Continue reading "Using HStore for Archiving"

    by Leo Hsu and Regina Obe (nospam@example.com) at April 25, 2014 04:24 AM

    April 09, 2014

    Boston GIS (Regina Obe, Leo Hsu)

    MySQL 5.7 GIS building on Boost Geometry

    Interesting times ahead. The upcoming MySQL 5.7 promises to provide great enhancements in the spatial space. They've scrapped much of the existing GIS plumbing they have in place seen in prior 5.6 and are replacing muc of it with Boost Geometry. In addition R-Tree indexes will be available for InnoDb. Details: Why Boost.Geometry in MySQL? and Making use of Boost Geometry in MySQL. Along the roadmap (not for 5.7) they plan to have geography support as well and projections, 3D and the various output formats e.g GeoJSON currently available in PostGIS.

    Just when I thought database spatial domain was getting boring. It will be interesting to see how the new MySQL Boost Geometry stacks against PostGIS GEOS /3D CGAL support. Might be time for a PostGIS/MySQL shoot-out soon.

    by Regina Obe (nospam@example.com) at April 09, 2014 07:44 PM

    April 01, 2014

    Boundless Geo

    Code Sprinting in Vienna

    Over 50 people came together in Vienna last week for five days of working on open source geospatial software. This code sprint merged the annual “C tribe” sprint — PostGIS, MapServer, GDAL, PDAL, and others — with the regular sprints held by the QGIS and GRASS communities. As a result, the attendance was at a record level, and the opportunities for collaboration much greater.

    PostGIS 2.1.2 & 2.0.5 Released

    I attended to work on PostGIS development, as usual, and achieved my modest goal: closing all the remaining tickets and getting 2.1.2 and 2.0.5 released. These are both bug fix releases, so no new functionality is included, but there are some important issues in the 2.1 branch that were resolved with the 2.1.2 release.

    More QGIS in Our Future

    Wearing my OpenGeo Suite product manager hat, the sprint was an excellent opportunity to sit down and talk with leaders in the QGIS community about our plans for bundling QGIS with OpenGeo Suite. When OpenGeo Suite 4.1 is released later this year, it will include builds of QGIS for all our supported platforms (Windows, OSX, Linux). It will also have some specialized plugins for managing map composition and versioned editing within QGIS. Our own Victor Olaya is a QGIS community member and a developer of the the Processing framework (formerly Sextante) in QGIS. We’re looking forward to being a part of the QGIS community, in the same way that we are a part of the OpenLayers, GeoServer and PostGIS communities.

    Building Community

    Five days is a long time to hunker down in a room slinging a code, and there were a lot of valuable side meetings: developers showed off some of their latest work (I saw some great 3D work from the PDAL’s Howard Butler and Oslandia’s Olivier Courtin); the QGIS community worked through important design decisions; and Tim Sutton and Anita Graser collected lots of interviews for the QGIS podcast.

    Code Sprint Beer Mat

    The “beer track” (not run concurrently with the other tracks) was also great, thanks to Stephan Meißl’s organizing: some onsite events, a walk and dinner in downtown Vienna, and a dinner in the local neighborhood. Most importantly, opportunities to get to know developers from other communities and bring the whole open source geo community closer together.

    Next year, the sprint will once again combine as many communities as possible (hopefully adding some!), this time in a North American venue (likely Philadelphia). I’ll certainly be in attendance. If you write open source geo software, I hope you will too.


    The post Code Sprinting in Vienna appeared first on Boundless.

    by Paul Ramsey at April 01, 2014 01:44 PM

    March 31, 2014

    PostGIS Development

    PostGIS 2.1.2 Released

    The 2.1.2 release of PostGIS is now available.

    The PostGIS development team is happy to release patch for PostGIS 2.1, the 2.1.2 release. As befits a patch release, the focus is on bugs and breakages.

    Best served with a bottle of GEOS 3.4.2.


    Continue Reading by clicking title hyperlink ..

    by Paul Ramsey at March 31, 2014 12:00 AM

    PostGIS Development

    PostGIS 2.0.5 Released

    The 2.0.5 release of PostGIS is now available.

    The PostGIS development team is happy to release patch for PostGIS 2.0, the 2.0.5 release. As befits a patch release, the focus is on bugs and breakages.

    Best served with a bottle of GEOS 3.3.9.


    Doc downloads: html doc download pdf doc download

    Continue Reading by clicking title hyperlink ..

    by Paul Ramsey at March 31, 2014 12:00 AM

    March 26, 2014

    Postgres OnLine Journal (Leo Hsu, Regina Obe)

    An almost idiot's guide to install PostgreSQL 9.3, PostGIS 2.1 and pgRouting with Yum

    In this exercise, we'll go thru installing PostgreSQL 9.3 on a CentOS 6 64-bit box. We'll cover upgrading in a later article. For the rest of this article, we'll go over configuring yum to use the PostgreSQL PGDG Yum repository found at http://yum.postgresql.org , which has the latest and greatest of 9.3. It's been a while since we wrote step by step instructions for installing with Yum.

    Note: PostGIS 2.1.2 release is eminent, so you might want to wait till we release and Yum is updated before you install/upgrade.

    Continue reading "An almost idiot's guide to install PostgreSQL 9.3, PostGIS 2.1 and pgRouting with Yum"

    by Leo Hsu and Regina Obe (nospam@example.com) at March 26, 2014 05:38 AM

    Stephen Mather

    Packt Sale — 2 books for the price of one

    Now I feel like a salesman. I feel obliged to mention the Packt buy one get one free sale (ends tomorrow):


    If you want a postgresql title to go with the PostGIS Cookbook (just sayin’):


    Ok. Enough. Back to fun reading:

    An map of Mount Pleasant


    by smathermather at March 26, 2014 03:56 AM

    March 21, 2014

    Postgres OnLine Journal (Leo Hsu, Regina Obe)

    New PostgreSQL releases from Packt Publishing

    This past year has seen a fair number of PostgreSQL and PostGIS book releases from Packt Publishing. Packt is currently running a special E-Book offer, buy one of any of their 2000 titles and get one free. Visit http://bit.ly/1j26nPN to redeem on the offer. Offer ends: 26th-Mar-2014.

    Recent titles you'll want to get as a PostgreSQL / PostGIS user

    • PostGIS Cookbook Published Jan 2014. Though not directly related to PostGIS - Learning QGIS Cookbook Pub September 2013 is a fine companion to any PostGIS book.
    • PostgreSQL Server Programming Published Jun 2013
    • PostgreSQL Replication Published August 2013

    Continue reading "New PostgreSQL releases from Packt Publishing"

    by Leo Hsu and Regina Obe (nospam@example.com) at March 21, 2014 06:05 PM

    March 14, 2014

    PostGIS Development

    Getting intersections the faster way

    Doing an ST_Intersection is much slower than relation checks such as ST_Intersects , ST_CoveredBy, and , ST_Within . In many situations you know the intersection of 2 geometries without actually computing an intersection. In these cases, you can skip the costly ST_Intersection call. Cases like this:

    1. geometry a is covered by geometry b -> intersection is geometry a
    2. geometry b is covered by geometry a -> intersection is geometry b
    3. geometry a does not intersect geometry b -> intersection is empty geometry

    This kind of question comes up a lot: As discussed in stackexchange Acquiring ArcGIS speed in PostGIS

    Continue Reading by clicking title hyperlink ..

    by Regina Obe at March 14, 2014 12:00 AM

    March 13, 2014

    Geo Perspectives (Justin Lewis)

    Exploding Polygons Into Random Points by Density with PostGIS

    Visualizing geo-data by density is a common need but often leaves us with yet another chloropleth map.  Sometimes there is a need to make the data more visually interesting or to capture multiple characteristics of that density in one layer.  This can be the case with race and ethnicity data where visualizing race and density simultaneously can be very powerful.  The University of Virginia’s Racial Dot Map is an excellent example of this.  In this post I will demonstrate one way to convert polygons into points based on a related attribute for density using a PostGIS function.  In other words, if you have a polygon with an associated value for population you could use this technique to generate points for each person in that polygon.  


    1. Generate points from polygons
    2. Ensure point locations are random.
    3. Ensure point locations fall within the boundary of the polygon they originate from.

    Data for this example can be downloaded from the DRCOG Regional Data Catalog.

    The PostGIS function that generates random point locations representing people from Census Tracts:

    -- Function: public.pop_tract_explode()
    -- DROP FUNCTION public.pop_tract_explode();
    CREATE OR REPLACE FUNCTION public.pop_tract_explode()
      RETURNS text AS
        record    RECORD;
        gid INTEGER;
        geom_2232 GEOMETRY;
        geoid10 BIGINT;
        -- Variables for explode
        total INTEGER;
        category_tag TEXT;   
        ct integer;
        skipped INTEGER := 0;
        -- Variables for randomize steps
        i INTEGER;
        maxiter INTEGER := 1000;
        poly_geom GEOMETRY;
        randPoint Geometry;
        -- Creating a destination table with a matching data model to the input table and adding needed fields
        EXECUTE 'DROP TABLE IF EXISTS pop_tract_explode';
        EXECUTE 'CREATE TABLE pop_tract_explode AS 
                 SELECT "gid", "geoid10", "pop2010" FROM public.census_tracts_2010 LIMIT 1;';
        EXECUTE 'TRUNCATE TABLE pop_tract_explode;';
        EXECUTE 'ALTER TABLE pop_tract_explode ADD COLUMN "category_tag" character varying(100);';
        EXECUTE 'ALTER TABLE pop_tract_explode ADD COLUMN rand_point GEOMETRY;';
        -- Loop through all records of the input table
        FOR record IN EXECUTE 'SELECT gid, geoid10, geom_2232, "pop2010" FROM public.census_tracts_2010;'
    		-- Set the polygon geometry variable for the current record
    		poly_geom := record.geom_2232;
    		-- Determine the bounds of polygon geometry
    		x0 = ST_XMin(poly_geom);
    		dx = (ST_XMax(poly_geom) - x0);
    		y0 = ST_YMin(poly_geom);
    		dy = (ST_YMax(poly_geom) - y0);	
    		-- Set the variables for the current record
    		gid := record.gid::INTEGER;
    		geoid10 := record.geoid10::BIGINT;	
    	 	total := record.pop2010::INTEGER;		
    		-- Loop throu
    		ct := 0;
    		category_tag := 'total';
    		WHILE ct < total  -- you can make each point represent a specific number of occurances by dividing by that value (i.e. total/100 makes 1 pt = 100 people)
    			RAISE NOTICE 'Count: %', ct;
    			-- Only try to place the point within the polygon if not exceeding the maximum interation value
    			i := 0;		
    			WHILE i < maxiter 
    				i = i + 1;
    				xp = x0 + dx * random();
    				yp = y0 + dy * random();
    				randPoint = ST_SetSRID( ST_MakePoint( xp, yp ), ST_SRID(poly_geom) );
    				EXIT WHEN ST_Within( randPoint, poly_geom ); 
    			END LOOP;
    		 	IF i >= maxiter THEN
    				RAISE NOTICE 'RandomPoint: number of interations exceeded %', maxiter;
    				skipped = skipped + 1;
    			-- Insert a new record into the destination table for the random point 					
    			INSERT INTO pop_tract_explode VALUES ( gid,geoid10,total,category_tag,randPoint );
    			END IF;
    			ct = ct + 1;
    		END LOOP;
    	END LOOP;
    	-- Create spatial index on the new spatial column
    	EXECUTE 'CREATE INDEX pop_tract_explode_gist ON pop_tract_explode USING gist (rand_point)';
        RETURN skipped;
      COST 100;
    ALTER FUNCTION public.pop_tract_explode()
      OWNER TO postgres;


    Run the function with this simple command:

    SELECT pop_tract_explode();


    This function can be improved in many ways.  One major improvement would be to allow the function to accept inputs for input/output table names as well as density field names so the function can be used with other data sets.  This would also be a start toward allowing the generation of points for different categories of density data so you could, for example, visualize population density by race category.

    by jmapping at March 13, 2014 06:15 AM

    March 06, 2014

    Stephen Mather

    PostGIS Cookbook(s)

    This is starting to feel real…

    PostGIS Cookbook(s).  2904 pages.

    6 PostGIS Cookbooks

    Page count for a single book can be calculated as in the following code:

    WITH distinction AS (
        SELECT DISTINCT( the_text) AS pages FROM postgis_cookbook
    SELECT COUNT (pages) AS "Count of Pages" FROM distinction;

    The preceding query will result in the following:

    Count of Pages

    Feeling real

    A special thanks to my family, my co-authors, all the contributors to PostGIS, and to Regina Obe for helping me through the hardest bit of code in my part of the book. Check out Regina and Leo’s book, PostGIS in Action. Their second edition is already on pre-order. I got to read a bit of it last year, and it promises to be a really epic edition. I still reference the 1st edition myself. I wish I had had it when I was starting in PostGIS… .

    by smathermather at March 06, 2014 03:25 AM

    February 27, 2014

    Geo Perspectives (Justin Lewis)

    Scripted Geo-Data Processing Techniques – Getting/Accessing Data

    This post is a follow up to my last post where I stated that there are three basic steps to geo-data processing.  I want to dig into the first of those steps (getting/accessing data) so as to get a better idea of some of the many ways you can approach this task.  Many of these examples overlap with the third basic step of geo-data processing (outputting data) as it is common to first move data to a processing environment.  I will cover outputting data in a future post.  This is not an exhaustive list of all the ways you can achieve this goal but hopefully it will help you get started with these tools.

    It is common for geo-data tools to come with a minimalist command line interface, also commonly called the terminal.  While the terminal can be intimidating for newbies it offers a great amount of versatility that is difficult to match with a GUI. The terminal can also be handy for executing commands from a script.  This is not always the best approach to automating a task but can be useful in certain situations.  While the terminal is often very useful there are also great tools that allow you to work with data at a finer level from within a script.  For this post I will focus on some powerful command line tools, psycopg2, and Python.

    Note:  You can execute any terminal command with Python using the OS module that ships by default with Python.  Wherever I demonstrate a terminal command in this post I also include the corresponding Python code to execute the command from a script.  

    import os


    Copy data from a text file to PostgreSQL (PostGIS)

    It is often the cast that data is provided as a text file.  There are many flavors of text file but probably the most common is the CSV (comma separated values).  While there is no official standard for the CSV format there are generally agreed upon standards that a CSV is commonly defined by.  The most basic structure requires a CSV to have values seperated (delimited) by commas.  All this aside, the PostgreSQL copy command is probably the fastest command available for copying data from a text file to PostgreSQL.  The hardest part is ensuring your destination table is setup correctly to except your input data (from the text file).  Before running a copy command like this you must pre-build a table with a matching structure to the text file.  Meaning, the same number of columns that have appropriate data types.


    Wrap this in Python using the OS module:

    import os
    os.system("psql -U admin -d demo -c 'COPY YOUR_DESTINATION_TABLE FROM '/PATH/TO/YOUR/CSV' WITH DELIMITER AS ',';")

    An arguably less ideal option is to use ogr2ogr with something like this:

    ogr2ogr --config PG_LIST_ALL_TABLES YES -update append -skipfailures -f PostgreSQL PG:"port=5432 host=YOUR_HOST dbname=YOUR_DATABASE user=YOUR_USER password=YOUR_PASS" %%f -nln YOUR_OUTPUT_TABLE_NAME


    Get data from a PostgreSQL database table with Python

    It is common to need to access data from a PostgreSQL database for further processing in a script.  Pysycopg2 can be handy in this situation by enabling you to execute SQL statements and return the results to your Python environment.  This is a very basic example but demonstrates the general technique.

    #DB connection properties
    conn = psycopg2.connect(dbname = 'YOUR_DATABASE', host= 'YOUR_HOST', port= 5432, user = 'YOUR_USER',password= 'YOUR_PASS')
    cur = conn.cursor()  ## open a cursor
    sql = 'SELECT * FROM your_table_name'
    cur.execute(sql)  ## executes the sql command
    result = cur.fetchall()  ## returns the result
    print result


    Copy data from an ESRI Shapefile to PostgreSQL (PostGIS)

    Geo-data is commonly shared in the ESRI Shapefile format.  While not an open standard the Shapefile is well known and it is common for geo based tools to have at least some level of compatibility with the format.  This is true for programming tools as well.

    shp2pgsql terminal command


    import os
    os.system("shp2pgsql /PATH/TO/YOUR/SHAPEFILE.shp | psql -U YOUR_USER -d YOUR_DATABASE")

    ogr2ogr terminal command:

    ogr2ogr -overwrite -skipfailures -progress -f PostgreSQL PG:"dbname='YOUR_DATABASE' host='YOUR_HOST' port='5432' user='YOUR_USER' password='YOUR_PASS'" PATH/TO/county_2013.shp -nln county

    import os
    os.system("ogr2ogr -overwrite -skipfailures -progress -f PostgreSQL PG:"dbname='YOUR_DATABASE' host='YOUR_HOST' port='5432' user='YOUR_USER' password='YOUR_PASS'" PATH/TO/county_2013.shp -nln county")


    Copy data from a PostgreSQL (PostGIS) database to another PostGIS database

    You might find yourself in a situation where you need to pull data from one PostGIS database to another in order to isolate the data.  There are many ways you can solve this problem but I will only cover one for simplicity.

    ogr2ogr -overwrite -f PostgreSQL --config PG_USE_COPY YES  PG:"dbname='DESTINATION_DB' host='DESTINATION_HOST' port='5432' user='USER' password='PASS'" PG:"dbname='ORIGIN_DB' host='ORIGIN_HOST' port='5432' user='USER' password='PASS'" NAME_OF_TABLE_IN_ORIGIN_DB

    import os
    os.system("ogr2ogr -overwrite -f PostgreSQL --config PG_USE_COPY YES  PG:"dbname='DESTINATION_DB' host='DESTINATION_HOST' port='5432' user='USER' password='PASS'" PG:"dbname='ORIGIN_DB' host='ORIGIN_HOST' port='5432' user='USER' password='PASS'" NAME_OF_TABLE_IN_ORIGIN_DB")

    TIP: You can add multiple tables to copy more than one in a single command by simply adding table names separated by a space to the end of this command.


    Get data from a GeoJSON Web-Service

    With the increase of open data catalogs and data api’s comes the growing need (or option) to consume data directly from a web service.  These services often offer data in the common JSON or GeoJSON format.  This can be useful for consuming directly into a web browser but can also be useful for scripting data collection efforts.  For this post I will only demonstrate how to GET JSON data with python and not how to process it afterwords as that is a topic a bit beyond the scope of this post.

    import urllib2, json
    url = "http://gis.drcog.org/REST/v1/ws_geo_identify.php?distance=50&format=json&fields=name&geotables=muni_2010&srid=2232&x=3138704.521018773&y=1699213.268416112"
    data = json.load(urllib2.urlopen(url))
    print data


    Copy data from a Web Page (scraping)

    Scraping web pages is a useful way to get data that is not readily available and machine readable.  It is a fairly challenging task that involves the processing of HTML elements.  HTML provides the structure for a web page and so you can scrape content from specific HTML elements in a variety of ways.  The easiest is to select types of elements based on the elements class or id.   Luckily there are many scraping tools like Beautiful Soup that help greatly in this process.

    Important Note: Always review the usage limitations of the web-site before scraping any site.  Some web-sites do NOT allow sites to be scraped.

    import urllib2, re, psycopg2
    from bs4 import BeautifulSoup
    from bs4 import SoupStrainer
    class scrape:
        def get_soup(self, source_url):     
            url = urllib2.urlopen(source_url)
            content = url.read()
            soup = BeautifulSoup(content)
            paragraphs = soup.findAll("p")
            #print paragraphs
            return paragraphs
    if __name__ == '__main__':
        #DB connection properties
        conn = psycopg2.connect(dbname = 'YOUR_DATABASE', host= 'YOUR_HOST', port= 5432, user = 'YOUR_USER',password= 'YOUR_PASS')
        cur = conn.cursor()  ## open a cursor
        source_url = "http://jmapping-maplabs.rhcloud.com/about/"
        scrape = scrape()
        paragraphs = scrape.get_soup(source_url)  
        print paragraphs


    That’s all I have for now.  Let me know if you have more to add to this list.

    by jmapping at February 27, 2014 02:53 AM

    February 23, 2014

    Geo Perspectives (Justin Lewis)

    Getting Started With Scripted Geo-Data Processing (PostgreSQL, PostGIS, Python, and a little OGR)

    Processing data is an important part of any agency that deals with data and/or data visualization.  While there are many amazing tools with handy interfaces to help process geo-data it’s often the case that scripting is needed or preferred.  Scripted data processing tasks are beneficial for a number of reasons but arguably the most important are automation and repeat-ability.  That said, there are three basic steps to processing data.

    Basic steps to a data processing task:

    1. Get/access the data
    2. Manipulate the data
    3. Put the data somewhere (if delivery outside of the processing environment is needed)

    Simple enough… There are also different situations that will influence how to handle any one (or all) of these three steps.  Of these three I feel the second step warrants the most attention as it tends to be where the bulk of your efforts will be spent.  That said, there are two general ways to approach the step of manipulating data depending on a number of factors.  That list could be infinite but can be simplified with just a couple.

    Simple factors influencing data manipulation tool/environment choice:

    1. Is a project database currently available that has the functionality you need (i.e. spatial functionality)?
    2. Is there a need to process data in a single processing environment with a minimal tool set?
      • This could be needed for simplicity, to prevent pushing data between environments/hardware, efficiency, or many other reasons.

    For the remainder of this post we will take the approach of data manipulation using a database.  My favorite tools for this step (and all three for that matter) include a combination of PostgreSQL (with PostGIS) and Python (with psycopg2).  This relatively simple combination offers enough flexibility to do pretty much anything you will ever need to do in a data processing task.  Lets look at these tools a little closer.

    • PostgreSQL is a mature and very robust open source relational database commonly used by organizations large and small.
    • PostGIS is a spatial extender to PostgreSQL. In other words, PostGIS makes PostgreSQL a spatial database.
    • Python is a very robust programming language with a rich history and large user base.
    • Psycopg2 is the most popular PostgreSQL adapter for the Python programming language.  Meaning psycopg2 allows you to interact with a PostgreSQL database with Python.

    Once we have all this software installed we are basically left with two very powerful tools.

    1. A database providing storage, structure, and SQL driven functionality.
    2. A programming environment (external from a database) providing control over database query execution and the flexibility to work outside of the database.

    NOTE: As was hinted at above, you could potentially eliminate either the database or the programming environment in this scenario depending on your needs.

    Lets dive into some examples.  Here we download, unzip, and load a shapefile into a PostGIS enabled database from a Python script.  The dependencies for this script are Python, psycopg2, and ogr2ogr.

    #!/usr/bin/env python
    import psycopg2, urllib, zipfile, os
    #DB connection properties
    conn = psycopg2.connect(dbname = 'YOUR_DATABASE', host= 'YOUR_HOST', port= 5432, user = 'YOUR_USER',password= 'YOUR_PASS')
    cur = conn.cursor()  ## open a cursor
    ## Download the zipped shapefile
    urllib.urlretrieve("http://gis.drcog.org/datacatalog/sites/default/files/shapefiles/county_2013_2.zip", "county.zip")
    ## Unzip the shapefile
    fh = open('county.zip', 'rb')
    z = zipfile.ZipFile(fh)
    for name in z.namelist():
        outpath = "/YOUR/DESTINATION/PATH/"
        z.extract(name, outpath)
    ## Load the shapefile into the database
    os.system("""ogr2ogr -overwrite -skipfailures -progress -f PostgreSQL PG:"dbname='YOUR_DATABASE' host='YOUR_HOST' port='5432' user='YOUR_USER' password='YOUR_PASS'" PATH/TO/county_2013.shp -nln county""")
    ## Not the prettiest way to rename the geometry field 
    fixCountyGeomName = 'ALTER TABLE county RENAME COLUMN wkb_geometry TO the_geom;'
    ## Download the zipped shapefile
    urllib.urlretrieve("http://gis.drcog.org/datacatalog/sites/default/files/shapefiles/drcog_crash_2006_shapefile_0.zip", "crash.zip")
    ## Unzip the shapefile
    fh = open('crash.zip', 'rb')
    z = zipfile.ZipFile(fh)
    for name in z.namelist():
        outpath = "/YOUR/DESTINATION/PATH/"
        z.extract(name, outpath) 
    ## Load the shapefile into the database
    os.system("""ogr2ogr -overwrite -skipfailures -progress -f PostgreSQL PG:"dbname='YOUR_DATABASE' host='YOUR_HOST' port='5432' user='YOUR_USER' password='YOUR_PASS'" PATH/TO/drcog_crash_2006.shp -nln crash_pts""")
    ## Not the prettiest way to rename the geometry field 
    fixCrashGeomName = 'ALTER TABLE crash_pts RENAME COLUMN wkb_geometry TO the_geom;'
    conn.commit() ## commits pending transactions to the db (making them persistent). This is needed for sql that modifies data or creates objects.
    print "Data loaded into database!"

    Now that the data is loaded into the database we can connect with psycopg2 and do some processing or analysis of the imported data.  This technique gives you the ability to utilize the full power of the database through SQL queries while also being able to pull data into your Python environment if needed.  This also provides control over the sequence of processing steps.   You could gain similar control through PostgreSQL functions but I will save that topic for another post.

    Here we perform a PostGIS ST_Within() function to determine how many pedestrians where injured or killed within Denver in 2006 while also creating an additional database table of just the crashes where pedestrians where involved.

    #!/usr/bin/env python
    import psycopg2
    #DB connection properties
    conn = psycopg2.connect(dbname = 'YOUR_DB_NAME', host= 'YOUR_HOST', port= 5432, user = 'YOUR_USER',password= 'YOUR_PASS')
    cur = conn.cursor()
    getPedestrianCrashesWhithin = '''SELECT a.killed, a.injured FROM crash_pts a, county b WHERE (a.ped_act1 > 0 OR a.ped_act2 > 0 OR a.ped_act3 > 0) AND b.county = 'Denver' AND ST_Within(a.the_geom, b.the_geom);'''
    createSubsetOfPedestrianCrashes = '''DROP TABLE IF EXISTS ped_crash_pts;
                                         CREATE TABLE ped_crash_pts AS
                                         SELECT a.* FROM crash_pts a, county b WHERE a.ped_act1 > 0 OR a.ped_act2 > 0 OR a.ped_act3 > 0;'''
    cur.execute(getPedestrianCrashesWhithin)  ## executes the sql command
    pedestrianCrashes = cur.fetchall()  ## returns the result
    totalKilled = 0
    totalInjured = 0
    for crash in pedestrianCrashes:
        killed = crash[0]
        injured = crash[1]
        if killed > 0:
            totalKilled += 1
        if injured > 0:
            totalInjured += 1
    print "Total killed in Denver = ", totalKilled
    print "Total injured in Denver = ", totalInjured

    From techniques like this you can get simple text based results like this:

    Total killed in Denver =  13
    Total injured in Denver =  186

    Or map based results like this (example only):

    Pedestrian Crashes (example)

     Important: This post was created for demo purposes only and so the results of the process should not be quoted with certainty of any acuracy. No time was spent testing the data for quality and completeness.


    That’s about it! While this post was intended to give an overview of a strategy for scripted processing of geo-data it by no means included all the tools you could use. I hope it provoked ideas about how you could use techniques like this or how to do this in a way that works better for you.  This is just one example of how to approach a geo-data processing task and with the great selection of tools available you can easily take a completely different approach.

    All the data used in this example can be downloaded from the DRCOG Regional Open Data Catalog

    by jmapping at February 23, 2014 03:51 AM

    February 19, 2014


    QGIS plugin for water management

    Oslandia releases today a new plugin for the QGIS processing framework, allowing for water distribution network simulation. It integrates the opensource EPANET simulation software.

    EPANET models water distribution networks. It's a widely used public-domain simulation software developed by the US Environmental Protection Agency.

    Hydraulic simulation is used to understand water distribution in distribution network, to forecast the impact of network alterations, to dimension network elements or study extreme case scenarios (e.g. important demand for firefighting, pipes breakages, interruption in supply).

    QGIS provides a graphical user interface that can be used to import/edit/export hydraulic model elements and simulation parameters from various sources, launch simulation and visualize results directly inside QGIS.

    Hydraulic model

    A hydraulic model consists of junctions (POINT) and pipes (LINESTRING) along with various other elements like tanks, pumps and valves. Those elements can be stored as features in a spatially enabled database. Features attributes can be simple (e.g. pipe diameter) or complex (e.g. pumps characteristic curves or water consumption). Complex attributes are stored via a foreign key in other alphanumeric tables.

    This is the kind of data QGIS is designed to handle. It can import/export them from/to a variety of sources and also display and edit them.

    Simulation parameters

    Simulation parameters and options (e.g. simulation time step or accuracy) are key-value pairs. The values can be stored in a table which columns are keys. Each set of simulation parameters is then a record in this table. This kind of table can be loaded in QGIS as a vector layer without geometry.

    Integration in the processing framework

    Once the hydraulic model and simulation parameters are loaded in QGIS, the simulation can be launched through the Processing toolbox. The plugin uses the standalone command line interface of EPANET (CLI) which path needs to be specified in processing Options and configuration.

    The plugin assembles an EPANET input file, runs EPANET and parses its output to generate result layers.

    One interesting aspect with processing modules is that they can be used for chained processing: the user can use other modules to do additional transformations of simulation results, as feeding them into another simulation model.

    Result visualization

    Simulation results are water pressure and velocity at all points in the network along with state of network elements (e.g. volume in tanks, power of pumps) for all simulation time steps . This represent a huge amount of data that are usually displayed either as time-plots or as map-plots of time aggregated data (e.g. max and min during simulation).

    Results of particular interest are:

    • time-plots of:

      • volume in reservoirs
      • flow at pumps
      • pressure in pipes and at junctions
    • map-plots of:

      • low speed (stagnation)
      • high and low pressure (risk of breakage, unhappy consumer)
      • lack of level variation in reservoirs (stagnation)
      • empty reservoir
      • reservoir overflow
      • abnormal pressure (typical of error in the altitude of a node in the model)
      • flow direction

    QGIS is naturally suited for map-plots. Time-aggregated simulation results are automatically joined to map layers when the result table is added to the map. Rule-based symbology is used to highlight zones of concern (e.g. low water velocity or empty reservoirs).

    The matplotlib library provides 2D plotting facilities in python and QGIS provides an extensive set of selection tools (on the map or in tables). The plugin's time-plot button plots the appropriate value depending on the selected feature type (e.g. water level for tanks, pressure for junctions).


    For a full demo of this plugin, see the following video :

    Where and who

    The plugin is available on GitHub and should be available soon on QGIS plugin repository :


    This work has been funded by European Funds. Many thanks to the GIS Office of Apavil, Valcea County (Romania).

    Oslandia has developped this plugin, and provides support and development around QGIS, PostGIS and this plugin. Get in touch if you need more : infos@oslandia.com

    We are looking for a free dataset with full informations (pumps, tanks, valves, pipes and their characteristics…) to distribute with this plugin as a test case and demonstration. If you can provide this, mail us !

    We also are implementing a Processing plugin for SWMM, the public domain Waste-water simulation tool. If you are interested to participate to the development, please contact us.

    by Vincent Mora at February 19, 2014 01:00 PM