I did a new PostGIS talk for FOSS4G North America 2015, an exploration of some of the tidbits I've learned over the past six months about using PostgreSQL and PostGIS together to make "magic" (any sufficiently advanced technology...)

Somehow I've gotten through 10 years of SQL without ever learning this construction, which I found while proof-reading a colleague's blog post and looked so unlikely that I had to test it before I believed it actually worked. Just goes to show, there's always something new to learn.

Suppose you have a GPS location table:

**gps_id**: integer**geom**: geometry**gps_time**: timestamp**gps_track_id**: integer

You can get a correct set of lines from this collection of points with just this SQL:

SELECT

gps_track_id,

ST_MakeLine(geom ORDER BY gps_time ASC) AS geom

FROM gps_poinst

GROUP BY gps_track_id

Those of you who already knew about placing `ORDER BY`

within an aggregate function are going "duh", and the rest of you are, like me, going "whaaaaaa?"

Prior to this, I would solve this problem by ordering all the groups in a CTE or sub-query first, and only then pass them to the aggregate make-line function. This, is, so, much, nicer.

The 2.1.6 release of PostGIS is now available.

The PostGIS development team is happy to release patch for PostGIS 2.1, the 2.1.6 release. As befits a patch release, the focus is on bugs, breakages, and performance issues. Users with large tables of points will want to priorize this patch, for substantial (~50%) disk space savings.

http://download.osgeo.org/postgis/source/postgis-2.1.6.tar.gz

Continue Reading by clicking title hyperlink ..For something a little different, here is a PostGIS recursive SQL quadgrid function which has been in my toolbox for some time now. The inspiration came in 2010 when reading “Open Source GIS: A Grass GIS Approach” by Markus Neteler and Helena Mitasova (3rd edition, p.244). The quadgrid function works by recursively subdividing tiles into smaller tiles (quadcells) up to a maximum depth if say the number of intersecting points (or some other feature criteria) exceeds a certain threshold.

Quadgrids have many applications, but I’ve found them useful for mapping urban phenomena (such as population density), both in 2D and 3D. Since large cities tend to be packed with more people, the result is more quadcells packed into a given land area. Quadgrids are also computationally efficient, as a finer grid cell resolution is only used in the more populous areas.

Below is an example of how I’ve used a population density quadgrid to create a 3D city skyline of Sydney. The taller the quadcells, the greater the number of people per square meter – no different to the actual built form of our cities where, if land is scarce, we build skyward.

The recursive quadgrid function is available on github. It requires PostgreSQL9.3 or above.

https://github.com/dimensionaledge/cf_public/blob/master/lattices/DE_RegularQuadGrid.sql

The function is called in the same way as any other pl/pgsql function. For instance:

CREATE TABLE quadgrid AS

SELECT depth::integer, the_geom::geometry(Polygon, 3577) as wkb_geometry, cell_value::float8

FROM DE_RegularQuadGrid((SELECT wkb_geometry FROM abs_aus11_tiles_32k WHERE tid IN (17864)), ‘tutorials.abs_mb11_points’,’wkb_geometry’, 10, 1000);

The arguments of the quadgrid function are: the parent tile geometry, the name of the points feature table along with its geometry column name, the maximum depth of recursion, and the threshold number of points per cell above which a cell will sub-divide.

The function uses two helper functions, namely DE_MakeRegularQuadCells() and DE_MakeSquare(). They work in tandem by taking a square geometry and subdividing it into four child geometries.

https://github.com/dimensionaledge/cf_public/tree/master/shapes

I’ve also just refactored the quadgrid function to incorporate a new, highly efficient LATERAL ‘Point in Polygon’ code pattern which avoids the use of GROUP BY when counting points in polygons. The result is a 75% reduction in query times compared to the conventional CROSS JOIN and GROUP BY approach.

SELECT r.the_geom, r.pcount FROM

(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM abs_aus11_tiles_32k WHERE tid IN (17864)) l,

LATERAL

(SELECT l.the_geom, count(*) as pcount FROM tutorials.abs_mb11_points WHERE ST_Intersects(l.the_geom, wkb_geometry) AND l.the_geom && wkb_geometry) r;

Quadgrid run times depend on the number of underlying point features, and the depth of recursion. In the above GIF animation, there are 1.86 million points alone that intersect with the parent tile, making it one of the most populous areas in Australia. For this exercise, the time to ‘quadgrid’ this one tile took 19.9 seconds to a depth of 10. Less populated tiles tend to take only fractions of a second.

--------------------------------------------------------------------------- -- Code Desciption: --------------------------------------------------------------------------- -- PostgreSQL/PostGIS custom function for generating quadcells recursively from a given starting geometry and intersecting reference table, to a maximum number of iteration levels or threshold value per cell -- Dependencies: DE_MakeSquare(), DE_MakeRegularQuadCells() -- Developed by: mark[a]dimensionaledge[dot]com -- Licence: GNU GPL version 3.0 --------------------------------------------------------------------------- DROP FUNCTION IF EXISTS DE_RegularQuadGrid(geometry, text, text, integer, double precision); CREATE OR REPLACE FUNCTION DE_RegularQuadGrid(parent_geom geometry, reference_table text, reference_geom_col text, max_depth integer, threshold_value double precision) RETURNS TABLE (depth integer, the_geom GEOMETRY, cell_value double precision) AS $$ DECLARE reference_geom_type text; BEGIN EXECUTE 'SELECT GeometryType('|| reference_geom_col ||') FROM '|| reference_table ||' LIMIT 1' INTO reference_geom_type ; IF reference_geom_type NOT IN ('POINT') THEN RAISE EXCEPTION 'Reference table is not a valid geometry type'; ELSE END IF; RETURN QUERY EXECUTE 'WITH RECURSIVE quadcells (depth, the_geom, cell_value) AS ( --SEED THE PARENT GEOMETRY AND CELL VALUE SELECT 1, l.the_geom, r.pcount FROM (SELECT ST_GeomFromEWKT(ST_AsEWKT('|| quote_literal(CAST(parent_geom as text)) ||')) as the_geom) l, LATERAL (SELECT count(*) as pcount, l.the_geom FROM '|| reference_table ||' WHERE ST_Intersects(l.the_geom, '|| reference_geom_col ||') AND l.the_geom && '|| reference_geom_col ||') r --RECURSIVE PART UNION ALL SELECT t.depth, t.the_geom, t.pcount FROM --TERMINAL CONDITION SUBQUERY LOOPS UNTIL THE CONDITIONS ARE NO LONGER MET - NOTE THE RECURSIVE ELEMENT CAN ONLY BE EXPLICITYLY REFERRED TO ONCE, HENCE THE USE OF CTE ( WITH a AS (SELECT * FROM quadcells WHERE the_geom IS NOT NULL AND depth < '|| max_depth ||' AND cell_value > '|| threshold_value ||'), b AS (SELECT max(depth) as previous FROM a), c AS (SELECT a.* FROM a,b WHERE a.depth = b.previous), d AS (SELECT r.the_geom, r.pcount FROM (SELECT DE_MakeRegularQuadCells(the_geom) as the_geom FROM c) l, LATERAL (SELECT count(*) as pcount, l.the_geom FROM '|| reference_table ||' WHERE ST_Intersects(l.the_geom, '|| reference_geom_col ||') AND l.the_geom && '|| reference_geom_col ||') r) SELECT b.previous+1 as depth, d.the_geom, d.pcount FROM b, d ) t ) SELECT depth, the_geom, cell_value::float8 FROM quadcells WHERE ST_IsEmpty(the_geom)=false AND (cell_value <= '|| threshold_value ||' OR (cell_value > '|| threshold_value ||' AND depth = '|| max_depth||'))' ; END; $$ LANGUAGE 'plpgsql' VOLATILE;

--------------------------------------------------------------------------- -- Code Desciption: --------------------------------------------------------------------------- -- PostgreSQL/PostGIS custom function for subdividing a square polygon into four child polygons -- Dependencies: DE_MakeSquare() -- Developed by: mark[at]dimensionaledge[dot]com -- Licence: GNU GPL version 3.0 --------------------------------------------------------------------------- -- Usage Example: --------------------------------------------------------------------------- -- SELECT DE_MakeRegularQuadCells(DE_MakeSquare(ST_MakePoint(0,0),1)); --------------------------------------------------------------------------- CREATE OR REPLACE FUNCTION DE_MakeRegularQuadCells(parent GEOMETRY) RETURNS SETOF GEOMETRY AS $$ DECLARE halfside float8; i INTEGER DEFAULT 1; srid INTEGER; centerpoint GEOMETRY; centersquare GEOMETRY; quadcell GEOMETRY; BEGIN srid := ST_SRID(parent); centerpoint := ST_Centroid(parent); halfside := abs(ST_Xmax(parent) - ST_Xmin(parent))/2; centersquare := ST_ExteriorRing(DE_MakeSquare(centerpoint, halfside)); WHILE i < 5 LOOP quadcell := DE_MakeSquare(ST_PointN(centersquare, i), halfside); RETURN NEXT quadcell; i := i + 1; END LOOP; RETURN; END $$ LANGUAGE 'plpgsql' IMMUTABLE;

--------------------------------------------------------------------------- -- Code Desciption: --------------------------------------------------------------------------- -- PostgreSQL/PostGIS custom function for generating a square polygon of a specified size -- Dependencies: nil -- Developed by: mark[at]dimensionaledge[dot]com -- Licence: GNU GPL version 3.0 --------------------------------------------------------------------------- -- Usage Example: --------------------------------------------------------------------------- -- SELECT DE_MakeSquare(ST_MakePoint(0,0),1); --------------------------------------------------------------------------- CREATE OR REPLACE FUNCTION DE_MakeSquare(centerpoint GEOMETRY, side FLOAT8) RETURNS GEOMETRY AS $$ SELECT ST_SetSRID(ST_MakePolygon(ST_MakeLine( ARRAY[ st_makepoint(ST_X(centerpoint)-0.5*side, ST_Y(centerpoint)+0.5*side), st_makepoint(ST_X(centerpoint)+0.5*side, ST_Y(centerpoint)+0.5*side), st_makepoint(ST_X(centerpoint)+0.5*side, ST_Y(centerpoint)-0.5*side), st_makepoint(ST_X(centerpoint)-0.5*side, ST_Y(centerpoint)-0.5*side), st_makepoint(ST_X(centerpoint)-0.5*side, ST_Y(centerpoint)+0.5*side) ] )),ST_SRID(centerpoint)); $$ LANGUAGE 'sql' IMMUTABLE STRICT;

This tutorial introduces a flexible method for generating flight paths in PostGIS using a custom bezier curve function. Unlike Great Circles, bezier curves provide the user the ability to set the amount of ‘flex’ or ‘bend’ exhibited by the curve, the number of vertices (useful for timeseries animations), and a break value at a predefined meridian to ensure the visual integrity of the curve is maintained when plotting in your favourite projection. I also have a 3D version of the bezier curve function which I’ll share at a later date.

The full code for this tutorial, including the QGIS technique for centering a world map in the Pacific Ocean, is available on github here.

I won’t repeat everything which is already documented in the github tutorial code and custom bezier curve function, except make mention of the five arguments used by the bezier curve function itself.

DE_BezierCurve2D(origin geometry, destination geometry, theta numeric, num_points integer, breakval numeric)

- origin geometry
- destination geometry
- theta, which is the maximum bend in the East-West plane
- the number of Linestring points or vertices
- the break value which splits the Linestring into a Multilinestring at a specified meridian

Example: DE_BezierCurve2D(o_geom, d_geom, 25, 1000, 30)

Origin and destination geometries should be self-explanatory.

Theta defines the maximum degree of ‘flex’ or ‘bend’ in the East-West plane, to which a SIN function is applied to produce the effect of straighter lines in the North-South plane. If you prefer that all curves exhibit the same degree of bend irrespective of the azimuth between their origins and destinations, then modify the bezier function by replacing SIN(ST_Azimuth(ST_Point(o_x, o_y), ST_Point(d_x, d_y)) with the value ‘1’. Or if you want to stipulate a minimum bend in the North-South plane, then enclose the associated formula within a PostgreSQL ‘greatest()’ function.

The number of Linestring points or vertices is a really handy feature for animating flight paths where you need to synchronise movements along flight paths using a clock. To do this, set the number of points equal to the flight duration between each origin destination pair. So a 60 minute flight gets 60 points or vertices. A 120 minute flight gets 120 points or vertices. Then dump the flight path vertices into a points table which you can then index to a flight timetable.

The last argument is the break value, which is the meridian at which the flight path will be broken into two Linestrings (thus becoming a Multilinestring). The break value should equal the PROJ4 ‘+lon_0′ value associated with the CRS which you intend to view the data in, plus or minus 180 degrees. An example is given in the github tutorial file.

At some point I’ll release my 3D version of this function which is very cool for making 3D flight path animations.

Enjoy!

When writing my first blog post “Eating the Elephant In Small Bites”, a question on the PostGIS mailing list caught my eye because the challenge posed was not dissimilar to something I once encountered when working with Australian land use data. This time the dataset related to land cover in Alberta, Canada, and the PostGIS query using ST_Interection() was reportedly taking many, many days to process one particular class of land cover. Fortunately the Alberta dataset is freely available, so I thought to test my PostGIS vector tiling and Map Reduce code pattern – and with great effect! This post describes how I addressed the Alberta land class problem, building upon the concepts and practices introduced in my first blog post to deliver query times of just minutes. As with the “Eating the Elephant in Small Bites” tutorial, our machine setup for this exercise is a CentOS6.5 Linux server instance running on AWS, with 16vCPUs and 30GB of memory.

The full code for this solution can be downloaded from github here

Upon downloading and ingesting the Alberta land cover dataset into PostGIS, the problem becomes self-evident. The land class ‘lc_class 34′ comprises the least number of geometries, but with an average of 119 rings per polygon.

SELECT lc_class, SUM(ST_NumGeometries(wkb_geometry)) as num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points, round(SUM(ST_NRings(wkb_geometry))::numeric/SUM(ST_NumGeometries(wkb_geometry))::numeric,3) as rings_per_geom FROM lancover_polygons_2010 GROUP BY lc_class ORDER BY 1; lc_class | num_geoms | num_points | rings_per_geom ----------+-----------+------------+---------------- 20 | 83793 | 1306686 | 1.029 31 | 1262 | 33366 | 1.115 32 | 5563 | 291666 | 1.560 33 | 12231 | 198385 | 1.023 34 | 366 | 2646625 | 119.046 50 | 196681 | 4750590 | 1.165 110 | 154816 | 3137196 | 1.078 120 | 83069 | 2833293 | 1.282 210 | 150550 | 6260788 | 1.522 220 | 197666 | 4793592 | 1.150 230 | 112034 | 2419651 | 1.045 (11 rows)

Visually, this is how ‘lc_class 34′ appears in QGIS with green fill.

Upon closer inspection, we see ‘lc_class 34′ has a trellis-like structure where each contiguous block of white space (i.e. the “negative space”) is a polygon inner ring. And therein lies the problem. Like with multipolygons, complex polygons with inner rings are stored as arrays, meaning queries are often slowed by the nature of array operations, plus indexes are less effective than if each array element is dumped and stored as its own row. But luckily for us, there are standard PostGIS functions we can use to address this.

The solution to the Alberta ‘lc_class 34′ problem I will describe sequentially, starting with how to prepare the data by dumping the multipolygons and polyon rings, followed by a description of the fn_worker SQL code pattern. Finally we parallelise the code across all available cores.

Data preparation involves a two step “dumping” process, starting with ST_Dump() of the multipolygons, followed by ST_DumpRings() of the polygons dumped by step 1. Note that we record the path number of each polygon ring that is dumped. Exterior rings by default have a path value = 0, whilst interior rings have a path value > 0 signifying the interior ring number.

Next we generate a grid layout to serve as the basis for our tiles. For this exercise, we will use a 2km x 2km tile size as we wish to keep our "chopped" up geometries fairly small to enable fast queries of the 'lc_class 34' tiled geometries at a later date.

- #######################################
- ######### GRID CREATION ##########
- #######################################
- # bash function for creating a regular vector grid in PostGIS
- fn_generategrid() {
- # get sql custom functions
- wget https://raw.githubusercontent.com/dimensionaledge/cf_public/master/lattices/DE_RegularGrid.sql -O DE_RegularGrid.sql
- wget https://raw.githubusercontent.com/dimensionaledge/cf_public/master/shapes/DE_MakeSquare.sql -O DE_MakeSquare.sql
- # load sql custom functions
- for i in *.sql; do
- psql -U $username -d $dbname -f $i
- done
- SQL=$(cat<<EOF
- -------------------------
- ------- SQL BLOCK -------
- -------------------------
- DROP TABLE IF EXISTS regular_grid_2k;
- CREATE TABLE regular_grid_2k AS (
- WITH s AS (SELECT DE_RegularGrid(ST_Envelope(ST_Collect(wkb_geometry)),2000) as wkb_geometry FROM abmiw2wlcv_48tiles)
- SELECT row_number() over() as tid, wkb_geometry::geometry(Polygon, 3400) FROM s);
- -------------------------
- EOF
- )
- echo "$SQL" # comment to suppress printing
- # execute SQL STATEMENT or comment # to skip
- psql -U $username -d $dbname -c "$SQL"
- }
- # end of bash function
- # call the bash function or comment # to skip
- fn_generategrid
- #######################################

####################################### ######### GRID CREATION ########## ####################################### # bash function for creating a regular vector grid in PostGIS fn_generategrid() { # get sql custom functions wget https://raw.githubusercontent.com/dimensionaledge/cf_public/master/lattices/DE_RegularGrid.sql -O DE_RegularGrid.sql wget https://raw.githubusercontent.com/dimensionaledge/cf_public/master/shapes/DE_MakeSquare.sql -O DE_MakeSquare.sql # load sql custom functions for i in *.sql; do psql -U $username -d $dbname -f $i done SQL=$(cat<<EOF ------------------------- ------- SQL BLOCK ------- ------------------------- DROP TABLE IF EXISTS regular_grid_2k; CREATE TABLE regular_grid_2k AS ( WITH s AS (SELECT DE_RegularGrid(ST_Envelope(ST_Collect(wkb_geometry)),2000) as wkb_geometry FROM abmiw2wlcv_48tiles) SELECT row_number() over() as tid, wkb_geometry::geometry(Polygon, 3400) FROM s); ------------------------- EOF ) echo "$SQL" # comment to suppress printing # execute SQL STATEMENT or comment # to skip psql -U $username -d $dbname -c "$SQL" } # end of bash function # call the bash function or comment # to skip fn_generategrid #######################################

We also create a vector tile table to hold the outputs of our worker function. Note that the worker function does the tiling of the exterior rings separately to the interior rings. The unions of each are taken before applying ST_Difference to remove the "negative space" of the interior rings from each tile.

- #######################################
- ##### DEFINE WORKER FUNCTION ######
- #######################################
- # define the worker function to be executed across all cores
- fn_worker (){
- source $dbsettings
- SQL=$(cat<<EOF
- -------------------------
- ----- SQL STATEMENT -----
- -------------------------
- INSERT INTO vector_tiles
- WITH
- f0 AS (
- SELECT
- tid,
- wkb_geometry as the_geom
- FROM regular_grid_2k
- WHERE tid >= $1 AND tid < $2
- ),
- f1_p0 AS (
- SELECT
- f0.tid,
- CASE WHEN ST_Within(f0.the_geom,rt.wkb_geometry) THEN f0.the_geom
- ELSE ST_Intersection(f0.the_geom,rt.wkb_geometry) END as the_geom
- FROM f0, $3 as rt
- WHERE ST_Intersects(f0.the_geom, rt.wkb_geometry) AND f0.the_geom && rt.wkb_geometry AND rt.path = 0
- ),
- f1_p0u AS (
- SELECT
- tid,
- ST_Union(the_geom) as the_geom
- FROM f1_p0
- GROUP BY tid
- ),
- f1_p1 AS (
- SELECT
- f0.tid,
- CASE WHEN ST_Within(f0.the_geom,rt.wkb_geometry) THEN f0.the_geom
- ELSE ST_Intersection(f0.the_geom,rt.wkb_geometry)
- END as the_geom
- FROM f0, $3 as rt
- WHERE ST_Intersects(f0.the_geom, rt.wkb_geometry) AND f0.the_geom && rt.wkb_geometry AND rt.path > 0
- ),
- f1_p1u AS (
- SELECT
- tid,
- ST_Union(the_geom) as the_geom
- FROM f1_p1
- GROUP BY tid
- ),
- f2 AS (
- SELECT
- f1_p0u.tid,
- CASE WHEN f1_p1u.tid IS NULL THEN f1_p0u.the_geom
- WHEN ST_IsEmpty(ST_Difference(f1_p0u.the_geom,f1_p1u.the_geom)) THEN NULL
- ELSE ST_Difference(f1_p0u.the_geom,f1_p1u.the_geom)
- END as the_geom
- FROM f1_p0u LEFT JOIN f1_p1u
- ON f1_p0u.tid = f1_p1u.tid
- )
- ------------------------
- ---- result ----
- ------------------------
- SELECT
- NEXTVAL('vector_tiles_fid_seq'),
- tid,
- ST_Multi(the_geom),
- 1
- FROM f2
- WHERE the_geom IS NOT NULL;
- -------------------------
- EOF
- )
- echo "$SQL" # comment to suppress printing
- # execute SQL STATEMENT
- psql -U $username -d $dbname -c "$SQL"
- }
- # end of worker function
- # make worker function visible to GNU Parallel across all cores
- export -f fn_worker
- #######################################

####################################### ##### DEFINE WORKER FUNCTION ###### ####################################### # define the worker function to be executed across all cores fn_worker (){ source $dbsettings SQL=$(cat<<EOF ------------------------- ----- SQL STATEMENT ----- ------------------------- INSERT INTO vector_tiles WITH f0 AS ( SELECT tid, wkb_geometry as the_geom FROM regular_grid_2k WHERE tid >= $1 AND tid < $2 ), f1_p0 AS ( SELECT f0.tid, CASE WHEN ST_Within(f0.the_geom,rt.wkb_geometry) THEN f0.the_geom ELSE ST_Intersection(f0.the_geom,rt.wkb_geometry) END as the_geom FROM f0, $3 as rt WHERE ST_Intersects(f0.the_geom, rt.wkb_geometry) AND f0.the_geom && rt.wkb_geometry AND rt.path = 0 ), f1_p0u AS ( SELECT tid, ST_Union(the_geom) as the_geom FROM f1_p0 GROUP BY tid ), f1_p1 AS ( SELECT f0.tid, CASE WHEN ST_Within(f0.the_geom,rt.wkb_geometry) THEN f0.the_geom ELSE ST_Intersection(f0.the_geom,rt.wkb_geometry) END as the_geom FROM f0, $3 as rt WHERE ST_Intersects(f0.the_geom, rt.wkb_geometry) AND f0.the_geom && rt.wkb_geometry AND rt.path > 0 ), f1_p1u AS ( SELECT tid, ST_Union(the_geom) as the_geom FROM f1_p1 GROUP BY tid ), f2 AS ( SELECT f1_p0u.tid, CASE WHEN f1_p1u.tid IS NULL THEN f1_p0u.the_geom WHEN ST_IsEmpty(ST_Difference(f1_p0u.the_geom,f1_p1u.the_geom)) THEN NULL ELSE ST_Difference(f1_p0u.the_geom,f1_p1u.the_geom) END as the_geom FROM f1_p0u LEFT JOIN f1_p1u ON f1_p0u.tid = f1_p1u.tid ) ------------------------ ---- result ---- ------------------------ SELECT NEXTVAL('vector_tiles_fid_seq'), tid, ST_Multi(the_geom), 1 FROM f2 WHERE the_geom IS NOT NULL; ------------------------- EOF ) echo "$SQL" # comment to suppress printing # execute SQL STATEMENT psql -U $username -d $dbname -c "$SQL" } # end of worker function # make worker function visible to GNU Parallel across all cores export -f fn_worker #######################################

Next we create a job list which we proceed to execute in parallel. And voilà!

- #######################################
- ########## CREATE JOB LIST ###########
- #######################################
- # create job list to feed GNU Parallel.
- SQL=$(cat<<EOF
- -------------------------
- ------- SQL BLOCK -------
- -------------------------
- -- create joblist where block size = 1000 tiles (i.e. tiles processed in batches of 1000)
- COPY (SELECT i as lower, i+1000 as upper FROM generate_series(1,250000,1000) i) TO STDOUT WITH CSV;
- -------------------------
- EOF
- )
- echo "$SQL" # comment to suppress printing
- # execute SQL STATEMENT
- psql -U $username -d $dbname -c "$SQL" > joblist.csv
- #######################################
- #######################################
- ########## EXECUTE JOBS ###########
- #######################################
- cat joblist.csv | parallel --colsep ',' fn_worker {1} {2} landcover_dumped_34_rings
- wait
- #######################################

####################################### ########## CREATE JOB LIST ########### ####################################### # create job list to feed GNU Parallel. SQL=$(cat<<EOF ------------------------- ------- SQL BLOCK ------- ------------------------- -- create joblist where block size = 1000 tiles (i.e. tiles processed in batches of 1000) COPY (SELECT i as lower, i+1000 as upper FROM generate_series(1,250000,1000) i) TO STDOUT WITH CSV; ------------------------- EOF ) echo "$SQL" # comment to suppress printing # execute SQL STATEMENT psql -U $username -d $dbname -c "$SQL" > joblist.csv ####################################### ####################################### ########## EXECUTE JOBS ########### ####################################### cat joblist.csv | parallel --colsep ',' fn_worker {1} {2} landcover_dumped_34_rings wait #######################################

The results show the impact of a larger tile size on query run times, and the resultant complexity of the tiled geometries produced (measured by the rings_per_geom). Whilst quadrupling the area of the tile size (from 2x2 to 4x4) reduces the run time from 779 seconds to 282 seconds, the rings_per_geom increases - which is understandable given the trellis-like structure of the source data. Both scenarios offer a vast improvement on the many, many days it reportedly took to process the data using a standard ST_Intersection query. As to the acceptability of the tile-size trade-off, I think it really depends on the criticality of query times with respect to the downstream processes that will ultimately consume the tiled geometries. It thus becomes a matter of judgement.

2k x 2k tiles TOTAL SCRIPT TIME: 779 (batch size =1000 tiles) num_geoms | num_points | rings_per_geom -----------+------------+---------------- 144799 | 3471051 | 1.055 4k x 4k tiles TOTAL SCRIPT TIME: 282 (batch size =250 tiles) num_geoms | num_points | rings_per_geom -----------+------------+---------------- 57775 | 2966397 | 1.263

The Alberta land cover problem highlights a few of the challenges that analysts typically encounter when working with large, complex datasets. The solution presented here is a practical example of how vector tiling and Map Reduce concepts can deliver real geo-processing efficiency improvements. However the real business value of reducing query times - from days to minutes - is to accelerate the 'business time to insight' and to amplify the speed or volume of iterations the analyst can consider as part of the geospatial 'value discovery' process. As this exercise attests, the business impact that Spatial IT can make over traditional desktop GIS approaches in the context of ever-shortening decision windows is nothing short of 'game changing'. Welcome to the emerging world of Big Data.

There will be two PostGIS workshops at FOSS4G NA 2015 on March 9th. Signup if you haven't already.

PostGIS Up and Running 9AM - 12 PM. This is an introductory workshop where we'll cover the basics of configuring PostGIS and using the PostGIS geometry and geography types. We also plan to demonstrate some new features coming in PostGIS 2.2, particularly of the 3D kind. If time permitting, we'll do a quick coverage of pgRouting as well.

Someone asked on IRC if we will be handing out certificates of completion to folks who complete the workshop. Some people need this because they are allowed to attend workshops on company time, but not conferences. The thought hadn't crossed our mind, but we like the idea a lot. So yes you can have a certificate if you stay thru the whole session complete with Regina and Leo's seal of approval. We might even have some door prizes.

Advanced spatial analysis with PostGIS. Pierre Racine will be leading this workshop. Expect to be blown away by images of rasters dancing on legs of geometries. He'll also have some other cool advanced spatial analysis stuff beyond raster. Expect a lot of geometry processing tricks in this one.

Sadly I think our PostGIS In Action 2ed is going to be released a little after conference time and probably won't be ready until mid March, so probably just a wee bit too late for FOSS4G NA 2015, but just in time for PGCon.US New York 2015 March 25th-26th . Final book proofing is like getting our teeth pulled. I really hope it's worth the wait. We'll have coupons but no book. We will have some copies of our PostgreSQL: Up and Running 2nd Edition available though. If you've already bought one of our books and want it autographed, bring it along on your trip.