### ### Planet PostGIS

Welcome to Planet PostGIS

November 21, 2020

CruncyData

Waiting for PostGIS 3.1: Grid Generators

Summarizing data against a fixed grid is a common way of preparing data for analysis. Fixed grids have some advantages over natural and administrative boundaries:

by Paul Ramsey at November 21, 2020 03:00 PM

November 20, 2020

PostGIS Development

PostGIS 3.0.3

The PostGIS Team is pleased to release PostGIS 3.0.3.

Best served with PostgreSQL 13.1, and GEOS 3.8.1

pgRouting 3.1.1

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 20, 2020 12:00 AM

November 19, 2020

Paul Ramsey

Waiting for Postgis 3.1: Vector tile improvements

This is a guest post from Raúl Marín, a core PostGIS contributor and a former colleague of mine at Carto. Raúl is an amazing systems engineer and has been cruising through the PostGIS code base making things faster and more efficient. You can find the original of this post at his new personal tech blog. – Paul


I’m not big on creating new things, I would rather work on improving something that’s already in use and has proven its usefulness. So whenever I’m thinking about what I should do next I tend to look for projects or features that are widely used, where the balance between development and runtime costs favors a more in depth approach.

Upon reviewing the changes of the upcoming PostGIS 3.1 release, it shouldn’t come as a surprise then that most of my contributions are focused on performance. When in doubt, just make it faster.

Since CARTO, the company that pays for my lunch, uses PostGIS’ Vector Tile functions as its backend for dynamic vector maps, any improvement there will have a clear impact on the platform. This is why since the appearance of the MVT functions in PostGIS 2.4 they’ve been enhanced in each major release, and 3.1 wasn’t going to be any different.

In this occasion the main reason behind the changes wasn’t the usual me looking for trouble, but the other way around. As ST_AsMVT makes it really easy to extract information from the database and into the browser, a common pitfall is to use SELECT * to extract all available columns which might move a lot of data unnecessarily and generate extremely big tiles. The easy solution to this problem is to only select the properties needed for the visualization but it’s hard to apply it retroactively once the application is in production and already depending on the inefficient design.

So there I was, looking into why the OOM killer was stopping databases, and discovering queries using a massive amount of resources to generate tiles 50-100 times bigger than they should (the recommendation is smaller than 500 KB). And in this case, the bad design of extracting all columns from the dataset was worsened by the fact that is was being applied to a large dataset; this triggered PostgreSQL parallelism requiring extra resources to generate chunks in parallel and later merge them together. In PostGIS 3.1 I introduced several changes to improve the performance of these 2 steps: the parallel processing and the merge of intermediate results.

The changes

Without getting into too much detail, the main benefit comes from changing the vector tile .proto such that a feature can only hold one value at a time. This is what the specification says, but not what the .proto enforces, therefore the internal library was allocating memory that it never used.

There are other additional changes, such as improving how values are merged between parallel workers, so feel free to have a look at the final commit itself if you want more details.

Performance comparison

The best way to see the impact of these changes is through some examples. In both cases I am generating the same tile, in the same exact server and with the same dependencies; the only change was to replace the PostGIS library, which in 3.0 to 3.1 doesn’t require an upgrade.

In the first example the tile contains all the columns of the 287k points in it. As I’ve mentioned before, it is discouraged to do this, but it is the simplest query to generate.

And for the second example, I’m generating the same tile but now only including the minimal columns for the visualization:

We can see, both in 3.0 and 3.1, that adding only the necessary properties makes things 10 times as fast as with the full data, and also that Postgis 3.1 is 30-40% faster in both situations.

Memory usage

Aside from speed, this change also greatly reduces the amount of memory used to generate a tile.

To see it in action, we monitor the PostgreSQL process while it’s generating the tile with all the properties. In 3.0, we observe in the blue line that the memory usage increases with time until it reaches around 2.7 GB at the end of the transaction.

We now monitor the same request on a server using Postgis 3.1. In this case the server uses around a third of the memory as in 3.0 (1GB vs 2.7GB) and, instead of having a linear increase, the memory is returned back to the system as soon as possible.

To sum it all up: PostGIS 3.1 is faster and uses less memory when generating large vector tiles.

November 19, 2020 08:00 PM

PostGIS Development

PostGIS 3.1.0alpha3

The PostGIS Team is pleased to release the third alpha of upcoming PostGIS 3.1.0 release. This version is exposes some of the new performance and feature enhancements in not yet relesed GEOS 3.9 as well as numerous speed enhancements not requiring newer GEOS. Requires GEOS 3.6+ and PostgreSQL 9.6+. To use MVT you will need protobuf-c 1.1. or higher.

Best served with

PostgreSQL 13.1, GEOS 3.7 or higher is recommended.

pgRouting 3.1.1

Continue Reading by clicking title hyperlink ..

by Regina Obe at November 19, 2020 12:00 AM

November 17, 2020

CruncyData

Waiting for PostGIS 3.1: Performance

Open source developers sometimes have a hard time figuring out what feature to focus on to generate the greatest value for end users. As a result, they will often default to performance.

by Paul Ramsey at November 17, 2020 04:56 PM

November 10, 2020

CruncyData

Virtual PostGIS Day 2020 is Nov. 19

Authored by Steve Pousty

Today we are going to talk a little bit about spatial databases and a virtual event Crunchy Data is hosting with several friends and community members. We're putting together an awesome PostGIS Day virtual conference on Thursday, Nov 19th. Last year we hosted our first PostGIS Day in-person in St. Louis and although we can't gather in the same way this year, going virtual allows us to give even more talks! Registration is now open so be sure to sign up and log into the presentations throughout the day.

by Crunchy Data at November 10, 2020 02:20 PM

October 26, 2020

Boston GIS (Regina Obe, Leo Hsu)

Waiting for PostGIS 3.1: ST_Subdivide and other function support with fixed precision

One of the new features coming in PostGIS 3.1 is fixed precision support. This new feature will require compilation with not yet released GEOS 3.9. There are a couple of functions already in PostGIS 3.1 that have this new feature -- they are ST_Subdivide, ST_Union, ST_SymDifference, ST_Union, and ST_UnaryUnion as summarized in What's new in PostGIS 3.1. To take advantage of these new to die for features, you'll need to have compiled your PostGIS with development GEOS 3.9 which is planned for release around the same time we get around to releasing PostGIS 3.1. Windows users can download binaries with GEOS 3.9 support from PostGIS windows experimental binaries

The ST_Union feature should improve a lot of cases where people ran into topological exceptions. Perhaps I'll demonstrate that in a separate article once I've given it a test drive.

Continue reading "Waiting for PostGIS 3.1: ST_Subdivide and other function support with fixed precision"

by Regina Obe (nospam@example.com) at October 26, 2020 03:33 AM

October 12, 2020

Paul Ramsey

Talking PostGIS on Podcasts

Here in the Covid-times, I haven’t been able to keep up my previous schedule of speaking at conferences, but I have managed to participate in a couple of episodes of the MapScaping Podcast, hosted by Daniel O’Donohue.

Podcast

Daniel is a great interviewer and really puts together a tight show. So far I’ve been on two, and I quietly hope to join him again some time in the future.

October 12, 2020 08:00 AM

August 15, 2020

PostGIS Development

PostGIS 3.0.2, 2.5.5, 2.4.9 Released

The PostGIS development team is pleased to provide bug fix and performance enhancements 3.0.2, 2.5.5, 2.4.9 for the 3.0, 2.5, and 2.4 stable branches.

Continue Reading by clicking title hyperlink ..

by Regina Obe at August 15, 2020 12:00 AM

July 18, 2020

PostGIS Development

PostGIS 3.1.0alpha2

The PostGIS Team is pleased to release the second alpha of upcoming PostGIS 3.1.0 release.

Best served with

PostgreSQL 13beta2, GEOS 3.7 or higher is recommended.

ST_MaximumInscribedCircle requires compilation with GEOS 3.9.0 in development to be enabled.

pgRouting 3.1.0 which will also be released soon.

Continue Reading by clicking title hyperlink ..

by Regina Obe at July 18, 2020 12:00 AM

June 11, 2020

Paul Ramsey

Developers Diary 2

Have you ever watched a team of five-year-olds play soccer? The way the mass of children chases the ball around in a group? I think programmers do that too.

Get the ball!

There’s something about working on a problem together that is so much more rewarding than working separately, we cannot help but get drawn into other peoples problems. There’s a lot of gratification to be had in finding a solution to a shared difficulty!

Even better, different people bring different perspectives to a problem, and illuminate different areas of improvement.

Maximum Inscribed Circle

A couple months ago, my colleague Martin Davis committed a pair of new routines into JTS, to calculate the largest circles that can fit inside a polygon or in a collection of geometries.

Maximum Inscribed Circle

We want to bring all the algorithmic goodness of JTS to PostGIS, so I took up the first step, and ported “maximum inscribed circle” to GEOS and to PostGIS.

When I ported the GEOS test cases, I turned up some odd performance problems. The calculation seemed to be taking inordinately long for larger inputs. What was going on?

The “maximum inscribed circle” algorithm leans heavily on a routine called IndexedFacetDistance to calculate distances between polygon boundaries and candidate circle-centers while converging on the “maximum inscribed circle”. If that routine is slow, the whole algorithm will be slow.

Dan Baston, who originally ported the “IndexedFacetDistance” class got interested and started looking at some test cases of his own.

He found he could improve his old implementation using better memory management that he’d learned in the meantime. He also found some short-circuits to envelope distance calculation that improved performance quite a bit.

In fact, they improved performance so much that Martin ported them back to JTS, where he found that for some cases he could log a 10x performance in distance calculations.

There’s something alchemical about the whole thing.

  • There was a bunch of long-standing code nobody was looking at.
  • I ported an unrelated algorithm which exercised that code.
  • I wrote a test case and reported some profiling information.
  • Other folks with more knowledge were intrigued.
  • They fed their knowledge back and forth and developed more tests.
  • Improvements were found that made everything faster.

I did nothing except shine a light in a dark hole, and everyone else got very excited and things happened.

Toast Caching Redux

In a similar vein, as I described in my last diary entry, a long-standing performance issue in PostGIS was the repeated reading of large geometries during spatial joins.

Much of the problem was solved by dropping a very small “TOAST cache” into the process by which PostGIS reads geometries in functions frequently used in spatial joins.

TOAST

I was so happy with the improvement the TOAST cache provided that I just stopped. Fortunately, my fellow PostGIS community member Raúl Marín was more stubborn.

Having seen my commit of the TOAST cache, and having done some work in other caching parts of PostGIS, he took up the challenge and integrated the TOAST cache with the existing index caches.

The integrated system now uses TOAST identifiers to note identical repeated inputs and avoid both unneccessary reads off disk and unncessary cache checks of the index cache.

The result is that, for spatial joins over large objects, PostGIS 3.1 will be as much as 60x faster than the performance in PostGIS 3.0.

I prepared a demo for a bid proposal this week and found that an example query that took 800ms on my laptop took a full minute on the beefy 16-core demo server. What had I done wrong? Ah! My laptop is running the latest PostGIS code (which will become 3.1) while the cloud server was running PostGIS 2.4. Mystery solved!

Port, Port, Port

I may have mentioned that I’m not a very good programmer.

My current task is definitely exercising my imposter syndrome: porting Martin’s new overlay code from JTS to GEOS.

I knew it would take a long time, and I knew it would be a challenge; but knowing and experiencing are quite different things.

The challenges, as I’ve experienced them are:

  • Moving from Java’s garbage collected memory model to C++’s managed memory model means that I have to understand the object life-cycle which is implicit in Java and make it explicit in C++, all while avoiding accidentally introducing a lot of memory churn and data copying into the GEOS port. Porting isn’t a simple matter of transcribing and papering over syntactic idiom, it involves first understanding the actual JTS algorithms.
  • The age of the GEOS code base, and number of contributors over time, mean that there are a huge number of different patterns to potentially follow in trying to make a “consistent” port to GEOS. Porting isn’t a matter of blank-slate implementation of the JTS code – the ported GEOS code has to slot into the existing GEOS layout. So I have to spend a lot of time learning how previous implementations chose to handle life cycles and call patterns (pass reference, or pointer? yes. Return value? or void return and output parameter? also yes.)
  • My lack of C++ idiom means I spend an excessive amount of time looking up core functions and methods associated with them. This is the only place I’ve felt myself measurably get better over the past weeks.

I’m still only just getting started, having ported some core data structures, and little pieces of dependencies that the overlay needs. The reward will be a hugely improved overlay code for GEOS and thus PostGIS, but I anticipate the debugging stage of the port will take quite a while, even when the code is largely complete.

Wish me luck, I’m going to need it!

If you would like to test the new JTS overlay code, it resides on this branch.
If you would like to watch me suffer as I work on the port, the GEOS branch is here.

June 11, 2020 08:00 AM

May 31, 2020

PostGIS Development

PostGIS 2.3.11

The PostGIS Team is pleased to release PostGIS 2.3.11. This is the last bug fix release of the PostGIS 2.3 series. Please upgrade to 2.4 or higher if you want to continue receiving bug fixes.

If you come across any issues, feel free to report via our ticket tracker https://trac.osgeo.org/postgis or mailing list with details as described here. For security issues, send reports to security@postgis.net.

Continue Reading by clicking title hyperlink ..

by Regina Obe at May 31, 2020 12:00 AM

May 05, 2020

Michal Zimmermann

PostGIS Data Anonymization

Among all the sensitive spatial data being collected through cellphones and credit cards, our address of residency is probably the most delicate one. Can it be anonymized/pseudonymized/obscured before you share it with your business partners?

Imagine given a set of address points for each of your clients and the set of all address points in the country, you should adjust it in the following way:

  • find the two nearest address points for each address point of your client
  • find the center of these two and the client address point
  • measure the distance of the computed center to each of three points and keep the maximum value
  • make the biggest distance even bigger by adding 10 % of its value
  • ceil the value
  • output the new position and the ceiled distance

This shifts each address point by a dynamic distance, giving us at least three points within the given distance (one of them being the original address point).

SELECT
    tmp.code,
    ST_X(tmp.new_position) x,
    ST_Y(tmp.new_position) y,
    ceil(MAX(biggest_distance) + MAX(biggest_distance) * 0.1) round_distance
FROM (
    SELECT
        tmp.code,
        tmp.geom,
        ST_Centroid((ST_Union(two_closest_points, tmp.geom))) new_position,
        -- get distance to two closest points and the client address point
        ST_Centroid((ST_Union(two_closest_points, tmp.geom))) <-> (ST_DumpPoints(ST_Union(two_closest_points, tmp.geom))).geom biggest_distance
    FROM (
        SELECT
            r1.code,
            r1.geom,
            ST_Union(neighbours.geom) two_closest_points
        FROM address_points r1,
        LATERAL (
            -- keep two closest points to each client address point
            SELECT
                r2.code,
                r2.geom,
                r1.geom <-> r2.geom distance
            FROM address_points r2
            WHERE r1.code <> r2.code
            ORDER BY r1.geom <-> r2.geom ASC
            LIMIT 2
        ) neighbours
        GROUP BY
            r1.code,
            r1.geom
    ) tmp
) tmp
GROUP BY
    tmp.code,
    tmp.geom,
    tmp.new_position;

You might want to use LATERAL for tasks like this.

by Michal Zimmermann at May 05, 2020 02:00 PM

April 16, 2020

Paul Ramsey

Developers Diary 1

I’m not a particularly good developer.

I don’t plan well, I tend to hack first and try and find the structure afterwards. I am easily distracted. It takes me an exceedingly long time to marshal a problem in my head enough to attack it.

That said, the enforced slow-down from pandemic time has given me the opportunity to sit and look at code, knowing nothing else is coming down the pipe. There are no talks to prepare, no big-think keynotes to draft. I enjoy those things, and I really enjoy the ego-boost of giving them, but the preparation of them puts me in a mental state that is not conducive to doing code work.

So the end of travel has been good, for at least one aspect of my professional work.

The Successful Failure

Spatial operations against large objects have always been a performance hot spot.

The first problem is that large objects are … large. So if you have algorithms that scale O(n^2) on the number of vertices large objects will kill you. Guess what? Distance, intersects tests, and so on are all O(n^2) in their basic implementations.

We solved this problem a long time ago in PostGIS by putting in an extra layer of run-time indexing.

INDEXING!

During a query (for those functions where it makes sense) if we see the same object twice in a row, we build an index on the edges of that object and keep the index in memory, for the life of the query. This gives us O(log(n)) performance for intersects, point-in-polygon, and so on. For joins in particular, this pattern of “seeing the same big thing multiple times” is very common.

This one small trick is one reason PostGIS is so much faster than “the leading brands”.

However, in order to “see the same object twice” we have to, for each function call in the query, retrieve the whole object, in order to compare it against the one we are holding in memory, to see if it is the same.

Here we run into an issue with our back-end.

PostgreSQL deals with large objects by (a) compressing them and (b) cutting the compressed object into slices and storing them in a side table. This all happens in the background, and is why you can store 1GB objects transparently in a database that has only an 8KB page size.

It’s quite computationally expensive, though. So much so that I found that simply bypassing the compression part of this feature could provide 5x performance gains on our spatial join workload.

Faster

At a code sprint in 2018, the PostGIS team agreed on the necessary steps to work around this long-standing performance issue.

  • Enhance PostgreSQL to allow partial decompression. This would allow the PostGIS caching system to retrieve just a little bit of large objects and use that part to determine if the object was not already in the cache.
  • Enhance the PostGIS serialization scheme to add a hashcode at the front of each large object. This way “is this a new object” could be answered with just a few bytes of hash, instead of checking the whole object.
  • Actually update the caching code code to use hash code and avoid unneccessary object retrievals.

Since this involved a change in PostgreSQL, which runs on an annual release cycle, and a change to the PostGIS serialization scheme, which is a major release marker, the schedule for this work was… long term.

Long Term

Still, I managed to slowly chip away at it, goal in mind:

That left adding the hash code to the front of the objects, and using that code in the PostGIS statement cache.

And this is where things fall apart.

Things Fall Apart

The old statement cache was focussed on ensuring the in-memory indexes were in place. It didn’t kick in until the object had already been retrieved. So avoiding retrieval overhead was going to involve re-working the cache quite a bit, to handle both object and index caching.

I started on the work, which still lives on in this branch, but the many possible states of the cache (do I have part of an object? a whole object? an indexed object?) and the fact that it was used in multiple places by different indexing methods (geography tree, geometry tree, GEOS tree), made the change worrisomely complex.

And so I asked a question, that I should have asked years ago, to the pgsql-hackers list:

… within the context of a single SQL statement, will the Datum values for a particular object remain constant?

Basically, could I use the datum values as unique object keys without retrieving the whole object? That would neatly remove any need to retrieve full objects in order to determine if the cache needed to be updated. As usual, Tom Lane had the answer:

Jeez, no, not like that.

Oh, “good news”, I guess, my work is not in vain. Except wait, Tom included a codicil:

The case where this would actually be worth doing, probably, is where you are receiving a toasted-out-of-line datum. In that case you could legitimately use the toast pointer ID values (va_valueid + va_toastrelid) as a lookup key for a cache, as long as it had a lifespan of a statement or less.

Hm. So for a subset of objects, it was possible to generate a unique key without retrieving the whole object.

Facepalm

And that subset – “toasted-out-of-line datum” – were in fact the objects causing the hot spot: objects large enough to have been compressed and then stored in a side table in 8KB chunks.

What if, instead of re-writing my whole existing in-memory index cache, I left that in place, and just added a simple new cache that only worried about object retrieval. And only cached objects that it could obtain unique keys for, these “toasted-out-of-line” objects. Would that improve performance?

It did. By 20 times on my favourite spatial join benchmark. In increased it by 5 times on a join where only 10% of the objects were large ones. And on joins where none of the objects were large, the new code did not reduce performance at all.

And here’s the punch line: I’ve known about the large object hot spot for at least 5 years. Probably longer. I put off working on it because I thought the solution involved core changes to PostgreSQL and PostGIS, so first I had to put those changes in, which took a long time.

Once I started working on the “real problem”, I spent a solid week:

  • First on a branch to add hash codes, using the new serialization mechanisms from PostGIS 3.
  • Then on a unified caching system to replace the old in-memory index cache.

And then I threw all that work away, and in about 3 hours, wrote and tested the final patch that gave a 20x performance boost.

So, was this a success or a failure?

Stupid

I’ve become inured to the huge mismatch in “time spent versus code produced”, particularly when debugging. Spending 8 hours stepping through a debugger to generate a one-line patch is pretty routine.

But something about the mismatch between my grandious and complex solution (partial retrieval! hash code!) and the final solution (just ask! try the half-measure! see if it’s better!) has really gotten on my nerves.

I like the win, but the path was a long and windy one, and PostGIS users have had slower queries than necessary for years because I failed to pose my simple question to the people who had an answer.

The Successful Success

Contra to that story of the past couple weeks, this week has been a raging success. I keep pinching myself and waiting for something to go wrong.

A number of years ago, JTS got an improvement to robustness in some operations by doing determinant calculations in higher precision than the default IEEE double precision.

Those changes didn’t make it into GEOS. There was an experimental branch, that Mateusz Loskot put together, and it sat un-merged for years, until I picked it up last fall, rebased it and merged it. I did so thinking that was the fastest way, and probably it was, but it included a dependency on a full-precision math library, ttmath, which I added to our tree.

Stupid

Unfortunately, ttmath is basically unmaintained now.

And ttmath is arbitrary precision, while we really only need “higher precision”. JTS just uses a “double double” implementation, that uses the register space of two doubles for higher precision calculations.

And ttmath doesn’t support big-endian platforms (like Sparc, Power, and other chips), which was the real problem. We couldn’t go on into the future without support for these niche-but-not-uncommon platforms.

And ttmath includes some fancy assembly language that makes the build system more complex.

Fortunately, the JTS DD is really not that large, and it has no endian assumptions in it, so I ported it and tested it out against ttmath.

It’s smaller.

It’s faster. (About 5-10%. Yes, even though it uses no special assembly tricks, probably because it doesn’t have to deal with arbitrary precision.)

And here’s the huge surprise: it caused zero regression failures! It has exactly the same behaviour as the old implementation!

Perfect

So needless to say, once the branch was stable, I merged it in and stood there in wonderment. It seems implausable that something as foundational as the math routines could be swapped out without breaking something.

The whole thing took just a few days, and it was so painless that I’ve also made a patch to the 3.8 stable series to bring the new code back for big endian platform support in the mean time.

The next few days I’ll be doing ports of JTS features and fixes that are net-new to GEOS, contemplative work that isn’t too demanding.

Some days everything is easy.

Some days everything is hard.

Don’t let the hard days hold you back!

Perfect

April 16, 2020 08:00 AM

March 02, 2020

Anita Graser (Underdark)

Movement data in GIS #29: power your web apps with movement data using mobilitydb-sqlalchemy

This is a guest post by Bommakanti Krishna Chaitanya @chaitan94

Introduction

This post introduces mobilitydb-sqlalchemy, a tool I’m developing to make it easier for developers to use movement data in web applications. Many web developers use Object Relational Mappers such as SQLAlchemy to read/write Python objects from/to a database.

Mobilitydb-sqlalchemy integrates the moving objects database MobilityDB into SQLAlchemy and Flask. This is an important step towards dealing with trajectory data using appropriate spatiotemporal data structures rather than plain spatial points or polylines.

To make it even better, mobilitydb-sqlalchemy also supports MovingPandas. This makes it possible to write MovingPandas trajectory objects directly to MobilityDB.

For this post, I have made a demo application which you can find live at https://mobilitydb-sqlalchemy-demo.adonmo.com/. The code for this demo app is open source and available on GitHub. Feel free to explore both the demo app and code!

In the following sections, I will explain the most important parts of this demo app, to show how to use mobilitydb-sqlalchemy in your own webapp. If you want to reproduce this demo, you can clone the demo repository and do a “docker-compose up –build” as it automatically sets up this docker image for you along with running the backend and frontend. Just follow the instructions in README.md for more details.

Declaring your models

For the demo, we used a very simple table – with just two columns – an id and a tgeompoint column for the trip data. Using mobilitydb-sqlalchemy this is as simple as defining any regular table:

from flask_sqlalchemy import SQLAlchemy
from mobilitydb_sqlalchemy import TGeomPoint

db = SQLAlchemy()

class Trips(db.Model):
   __tablename__ = "trips"
   trip_id = db.Column(db.Integer, primary_key=True)
   trip = db.Column(TGeomPoint)

Note: The library also allows you to use the Trajectory class from MovingPandas as well. More about this is explained later in this tutorial.

Populating data

When adding data to the table, mobilitydb-sqlalchemy expects data in the tgeompoint column to be a time indexed pandas dataframe, with two columns – one for the spatial data  called “geometry” with Shapely Point objects and one for the temporal data “t” as regular python datetime objects.

from datetime import datetime
from shapely.geometry import Point

# Prepare and insert the data
# Typically it won’t be hardcoded like this, but it might be coming from 
# other data sources like a different database or maybe csv files
df = pd.DataFrame(
   [
       {"geometry": Point(0, 0), "t": datetime(2012, 1, 1, 8, 0, 0),},
       {"geometry": Point(2, 0), "t": datetime(2012, 1, 1, 8, 10, 0),},
       {"geometry": Point(2, -1.9), "t": datetime(2012, 1, 1, 8, 15, 0),},
   ]
).set_index("t")

trip = Trips(trip_id=1, trip=df)
db.session.add(trip)
db.session.commit()

Writing queries

In the demo, you see two modes. Both modes were designed specifically to explain how functions defined within MobilityDB can be leveraged by our webapp.

1. All trips mode – In this mode, we extract all trip data, along with distance travelled within each trip, and the average speed in that trip, both computed by MobilityDB itself using the ‘length’, ‘speed’ and ‘twAvg’ functions. This example also shows that MobilityDB functions can be chained to form more complicated queries.

mobilitydb-sqlalchemy-demo-1

trips = db.session.query(
   Trips.trip_id,
   Trips.trip,
   func.length(Trips.trip),
   func.twAvg(func.speed(Trips.trip))
).all()

2. Spatial query mode – In this mode, we extract only selective trip data, filtered by a user-selected region of interest. We then make a query to MobilityDB to extract only the trips which pass through the specified region. We use MobilityDB’s ‘intersects’ function to achieve this filtering at the database level itself.

mobilitydb-sqlalchemy-demo-2

trips = db.session.query(
   Trips.trip_id,
   Trips.trip,
   func.length(Trips.trip),
   func.twAvg(func.speed(Trips.trip))
).filter(
   func.intersects(Point(lat, lng).buffer(0.01).wkb, Trips.trip),
).all()

Using MovingPandas Trajectory objects

Mobilitydb-sqlalchemy also provides first-class support for MovingPandas Trajectory objects, which can be installed as an optional dependency of this library. Using this Trajectory class instead of plain DataFrames allows us to make use of much richer functionality over trajectory data like analysis speed, interpolation, splitting and simplification of trajectory points, calculating bounding boxes, etc. To make use of this feature, you have set the use_movingpandas flag to True while declaring your model, as shown in the below code snippet.

class TripsWithMovingPandas(db.Model):
   __tablename__ = "trips"
   trip_id = db.Column(db.Integer, primary_key=True)
   trip = db.Column(TGeomPoint(use_movingpandas=True))

Now when you query over this table, you automatically get the data parsed into Trajectory objects without having to do anything else. This also works during insertion of data – you can directly assign your movingpandas Trajectory objects to the trip column. In the below code snippet we show how inserting and querying works with movingpandas mode.

from datetime import datetime
from shapely.geometry import Point

# Prepare and insert the data
# Typically it won’t be hardcoded like this, but it might be coming from 
# other data sources like a different database or maybe csv files
df = pd.DataFrame(
   [
       {"geometry": Point(0, 0), "t": datetime(2012, 1, 1, 8, 0, 0),},
       {"geometry": Point(2, 0), "t": datetime(2012, 1, 1, 8, 10, 0),},
       {"geometry": Point(2, -1.9), "t": datetime(2012, 1, 1, 8, 15, 0),},
   ]
).set_index("t")

geo_df = GeoDataFrame(df)
traj = mpd.Trajectory(geo_df, 1)

trip = Trips(trip_id=1, trip=traj)
db.session.add(trip)
db.session.commit()

# Querying over this table would automatically map the resulting tgeompoint 
# column to movingpandas’ Trajectory class
result = db.session.query(TripsWithMovingPandas).filter(
   TripsWithMovingPandas.trip_id == 1
).first()

print(result.trip.__class__)
# <class 'movingpandas.trajectory.Trajectory'>

Bonus: trajectory data serialization

Along with mobilitydb-sqlalchemy, recently I have also released trajectory data serialization/compression libraries based on Google’s Encoded Polyline Format Algorithm, for python and javascript called trajectory and trajectory.js respectively. These libraries let you send trajectory data in a compressed format, resulting in smaller payloads if sending your data through human-readable serialization formats like JSON. In some of the internal APIs we use at Adonmo, we have seen this reduce our response sizes by more than half (>50%) sometimes upto 90%.

Want to learn more about mobilitydb-sqlalchemy? Check out the quick start & documentation.


This post is part of a series. Read more about movement data in GIS.

by chaitan94 at March 02, 2020 05:51 PM

February 28, 2020

PostGIS Development

PostGIS 2.5.4

The PostGIS Team is pleased to release PostGIS 2.5.4.

Continue Reading by clicking title hyperlink ..

by Paul Ramsey at February 28, 2020 12:00 AM

February 27, 2020

Paul Ramsey

PostGIS Day in STL

Every year, on the second Wednesday of November, Esri (“the Microsoft of GIS”) promotes a day of celebration, “GIS Day” in which the members of our community unite to tell the world about the wonders of cartography and spatial data and incidentally use their software a lot in the process.

And every year, for the last number of years, on the day after “GIS Day”, a motley crew of open source users and SQL database afficionados observe “PostGIS Day”. Until this fall, I had never had a chance to personally participate in a PostGIS Day event, but this year Crunchy sponsored a day in St Louis, and I got to talk an awful lot about PostGIS.

It was really good, and I feel like there’s lots more to be done, if only on the subject of spatial SQL and analysis in the database. Here’s the talks I gave, the balance are on the event page.

PostGIS Introduction

Serving Dynamic Vector Tiles

Geocoding and Text Search in PostGIS

PostGIS 3.0 Overview

February 27, 2020 08:00 AM

February 20, 2020

PostGIS Development

PostGIS 3.0.1

The PostGIS Team is pleased to release PostGIS 3.0.1.

Best served with PostgreSQL 12.2, GEOS 3.8.0, SFCGAL 1.3.7, GDAL 3.0.4, PROJ 6.3.1, protobuf-c 1.3.3, json-c 0.13.1.

Continue Reading by clicking title hyperlink ..

by Darafei Praliaskouski at February 20, 2020 12:00 AM

February 02, 2020

PostGIS Development

PostGIS 3.1.0alpha1

The PostGIS Team is pleased to release the first alpha of upcoming PostGIS 3.1.0 release.

Best served with PostgreSQL 12.1, GEOS 3.8.0.

Continue Reading by clicking title hyperlink ..

by Darafei Praliaskouski at February 02, 2020 12:00 AM

November 18, 2019

Paul Ramsey

OGR FDW Spatial Filtering

The OGR FDW now pushes spatial filters down to remote data sources!

Whuuuut?!?!?

The Basics

OK, first, “OGR” is a subcomponent of the GDAL toolkit that allows generic access to dozens of different geospatial file formats. The OGR part handles the “vector” data (points, lines and polygons) and the GDAL part handles the “raster” data (imagery, elevation grids).

Second, “FDW” is a “foreign data wrapper”, an extension API for PostgreSQL that allows developers to connect non-database information to the database and present it in the form of a table.

The simplest FDWs, like the Oracle FDW, just make remote database tables in foreign systems look like local ones. Connecting two databases is “easy” because they share the same data model: tables of typed columns and rows of data.

The OGR data model is pleasantly similar to the database data model. Every OGR “datasource” (database) has “layers” (tables) made of “fields” (columns) with data types like “string” (varchar) and “number” (integer, real).

Now, combine the two ideas of “OGR” and “FDW”!

The “OGR FDW” uses the OGR library to present geospatial data sources as tables inside a PostgreSQL database. The FDW abstraction layer lets us make tables, and OGR abstraction layer lets those tables be sourced from almost any geospatial file format or server.

It’s an abstraction layer over an abstraction layer… the best kind!

Setup the FDW

Here’s an example that connects to a “web feature service” (WFS) from Belgium (we all speak Flemish, right?) and makes a table of it.


CREATE EXTENSION postgis;
CREATE EXTENSION ogr_fdw;

CREATE SERVER wfsserver 
  FOREIGN DATA WRAPPER ogr_fdw 
  OPTIONS (
    datasource 'WFS:http://geoservices.informatievlaanderen.be/overdrachtdiensten/Haltes/wfs',
    format 'WFS',
    config_options 'CPL_DEBUG=ON'
  );

CREATE FOREIGN TABLE haltes (
    fid bigint,
    shape Geometry(Point,31370),
    gml_id varchar,
    uidn double precision,
    oidn double precision,
    stopid double precision,
    naamhalte varchar,
    typehalte integer,
    lbltypehal varchar,
    codegem varchar,
    naamgem varchar
  ) 
  SERVER wfsserver 
  OPTIONS (
    layer 'Haltes:Halte'
  );

Pushdown from FDW

Let’s run a query on the haltes table, and peak into what the OGR FDW is doing, by setting the debug level to DEBUG1.

SET client_min_messages = DEBUG1;

SELECT gml_id, ST_AsText(shape) AS shape, naamhalte, lbltypehal
  FROM haltes 
  WHERE lbltypehal = 'Niet-belbus'
    AND shape && ST_MakeEnvelope(207950, 186590, 207960, 186600, 31370);

We get back one record, and two debug entries:

DEBUG:  OGR SQL: (LBLTYPEHAL = 'Niet-belbus')
DEBUG:  OGR spatial filter (207950 186590, 207960 186600)
-[ RECORD 1 ]-----------------------
gml_id     | Halte.10328
shape      | POINT(207956 186596)
naamhalte  | Lummen Frederickxstraat
lbltypehal | Niet-belbus

The debug entries are generated by the OGR FDW code, when it recognizes there are parts of the SQL query that can be passed to OGR:

  • OGR understands some limited SQL syntax, and OGR FDW passes those parts of any PostgreSQL query down to OGR.
  • OGR can handle simple bounding box spatial filters, and when OGR FDW sees the use of the && PostGIS operator, it passes the filter constant down to OGR.

So OGR FDW is passing the attribute and spatial filters from the SQL down to the OGR layer. But are they then being passed on to the remote datasource?

Pushdown from OGR

Every OGR “driver” is capable of pushing different amounts of logic down to the source data.

  • A driver that reads a file format cannot push anything down: there is no logic in a file.
  • A driver that reads from a database can push a lot down: databases are rich and powerful execution engines in their own right.

Our example data source, the Belgian “web feature server” actually supports both attribute and spatial filters, and the OGR driver will pass them down.

We can see OGR passing the filters down because when we created the server, we set config_options 'CPL_DEBUG=ON', to expose the GDAL logging information to our PostgreSQL server.

The GDAL debug entries are visible when we set the logging level to DEBUG2

SET client_min_messages = DEBUG2;

SELECT gml_id, ST_AsText(shape) AS shape, naamhalte, lbltypehal
  FROM haltes 
  WHERE lbltypehal = 'Niet-belbus'
    AND shape && ST_MakeEnvelope(207950, 186590, 207960, 186600, 31370);

Now we get a whole slew of logging, but I’m only going to pull out one line, the line that shows the WFS query that OGR sends to the remote server:

DEBUG:  GDAL None [0] WFS: http://geoservices.informatievlaanderen.be/overdrachtdiensten/Haltes/wfs?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetFeature&TYPENAME=Haltes:Halte&FILTER=%3CFilter%20xmlns%3D%22http:%2F%2Fwww.opengis.net%2Fogc%22%20xmlns:Haltes%3D%22informatievlaanderen.be%2FHaltes%22%20xmlns:gml%3D%22http:%2F%2Fwww.opengis.net%2Fgml%22%3E%3CAnd%3E%3CPropertyIsEqualTo%3E%3CPropertyName%3ELBLTYPEHAL%3C%2FPropertyName%3E%3CLiteral%3ENiet%2Dbelbus%3C%2FLiteral%3E%3C%2FPropertyIsEqualTo%3E%3CBBOX%3E%3CPropertyName%3EHaltes:SHAPE%3C%2FPropertyName%3E%3Cgml:Box%3E%3Cgml:coordinates%3E207950.0000000000000000,186590.0000000000000000%20207960.0000000000000000,186600.0000000000000000%3C%2Fgml:coordinates%3E%3C%2Fgml:Box%3E%3C%2FBBOX%3E%3C%2FAnd%3E%3C%2FFilter%3E

Awesome, right?

That’s pretty much un-readable, but if I copy out the value in the FILTER request variable, and reverse the URL encoding, I get this:

<Filter
  xmlns="http://www.opengis.net/ogc"
  xmlns:Haltes="informatievlaanderen.be/Haltes"
  xmlns:gml="http://www.opengis.net/gml">
  <And>
    <PropertyIsEqualTo>
      <PropertyName>LBLTYPEHAL</PropertyName>
      <Literal>Niet-belbus</Literal>
    </PropertyIsEqualTo>
    <BBOX>
      <PropertyName>Haltes:SHAPE</PropertyName>
      <gml:Box>
        <gml:coordinates>
          207950.0000000000000000,186590.0000000000000000
          207960.0000000000000000,186600.0000000000000000
        </gml:coordinates>
      </gml:Box>
    </BBOX>
  </And>
</Filter>

I know, who ever thought that jamming an XML encoded version of a SQL filter into an HTTP GET request was a good idea? (Some very very nice people.)

Anyways, as you can see, both the attribute and spatial portions of our original SQL query have been re-encoded as a WFS XML filter, and sent to the remote server.

OGR FDW correctly pushed the attribute and spatial portions of the WHERE clause into OGR, and OGR correctly pushed those filters into the dialect of the driver we were using, in this case the WFS driver.

The End

The really really cool part is that if we had been using, for example, the Oracle driver, OGR would have instead generated Oracle-compatible SQL and pushed that down!

It’s an abstraction layer over an abstraction layer… the best kind!

November 18, 2019 08:00 AM

November 16, 2019

Anita Graser (Underdark)

Movement data in GIS #25: moving object databases

Recently there has been some buzz on Twitter about a new moving object database (MOD) called MobilityDB that builds on PostgreSQL and PostGIS (Zimányi et al. 2019). The MobilityDB Github repo has been published in February 2019 but according to the following presentation at PgConf.Russia 2019 it has been under development for a few years:

Of course, moving object databases have been around for quite a while. The two most commonly cited MODs are HermesDB (Pelekis et al. 2008) which comes as an extension for either PostgreSQL or Oracle and is developed at the University of Piraeus and SECONDO (de Almeida et al. 2006) which is a stand-alone database system developed at the Fernuniversität Hagen. However, both MODs remain at the research prototype level and have not achieved broad adoption.

It will be interesting to see if MobilityDB will be able to achieve the goal they have set in the title of Zimányi et al. (2019) to become “a mainstream moving object database system”. It’s promising that they are building on PostGIS and using its mature spatial analysis functionality instead of reinventing the wheel. They also discuss why they decided that PostGIS trajectories (which I’ve written about in previous posts) are not the way to go:

However, the presentation does not go into detail whether there are any straightforward solutions to visualizing data stored in MobilityDB.

According to the Github readme, MobilityDB runs on Linux and needs PostGIS 2.5. They also provide an online demo as well as a Docker container with MobilityDB and all its dependencies. If you give it a try, I would love to hear about your experiences.

References

  • de Almeida, V. T., Guting, R. H., & Behr, T. (2006). Querying moving objects in secondo. In 7th International Conference on Mobile Data Management (MDM’06) (pp. 47-47). IEEE.
  • Pelekis, N., Frentzos, E., Giatrakos, N., & Theodoridis, Y. (2008). HERMES: aggregative LBS via a trajectory DB engine. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 1255-1258). ACM.
  • Zimányi, E., Sakr, M., Lesuisse, A., & Bakli, M. (2019). MobilityDB: A Mainstream Moving Object Database System. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases (pp. 206-209). ACM.

This post is part of a series. Read more about movement data in GIS.

by underdark at November 16, 2019 02:14 PM

November 13, 2019

Paul Ramsey

ST_Subdivide all the Things

This post originally appeared in the CARTO blog.

One of the things that makes managing geospatial data challenging is the huge variety of scales that geospatial data covers: areas as large as a continent or as small as a man-hole cover.

The data in the database also covers a wide range, from single points, to polygons described with thousands of vertices. And size matters! A large object takes more time to retrieve from storage, and more time to run calculations on.

The Natural Earth countries file is a good example of that variation. Load the data into PostGIS and inspect the object sizes using SQL:

SELECT admin, ST_NPoints(the_geom), ST_MemSize(the_geom) 
FROM ne_10m_admin_0_countries 
ORDER BY ST_NPoints;
  • Coral Sea Islands are represented with a 4 point polygon, only 112 bytes.
  • Canada is represented with a 68159 point multi-polygon, 1 megabytes in size!

Countries by Size in KB

Over half (149) of the countries in the table are larger than the database page size (8Kb) which means they will take extra time to retrieve.

SELECT Count(*) 
FROM ne_10m_admin_0_countries 
WHERE ST_MemSize(the_geom) > 8192;

We can see the overhead involved in working with large data by forcing a large retrieval and computation.

Load the Natural Earth populated places into PostGIS as well, and then run a full spatial join between the two tables:

SELECT Count(*)
FROM ne_10m_admin_0_countries countries 
JOIN ne_10m_populated_places_simple places 
ON ST_Contains(countries.the_geom, places.the_geom)

Even though the places table (7322) and countries table (255) are quite small the computation still takes several seconds (about 30 seconds on my computer).

The large objects cause a number of inefficiencies:

  • Geographically large areas (like Canada or Russia) have large bounding boxes, so the indexes don’t work as efficiently in winnowing out points that don’t fall within the countries.
  • Physically large objects have large vertex lists, which take a long time to pass through the containment calculation. This combines with the poor winnowing to make a bad situation worse.

How can we speed things up? Make the large objects smaller using ST_Subdivide()!

First, generate a new, sub-divided countries table:

CREATE TABLE ne_10m_admin_0_countries_subdivided AS
SELECT ST_SubDivide(the_geom) AS the_geom, admin 
FROM ne_10m_admin_0_countries;

Now we have the same data, but no object is more than 255 vertices (about 4Kb) in size!

Subdivided Countries by Size in KB

Run the spatial join torture test again, and see the change!

SELECT Count(*)
FROM ne_10m_admin_0_countries_subdivided countries 
JOIN ne_10m_populated_places_simple places 
ON ST_Contains(countries.the_geom, places.the_geom)

On my computer, the return time about 0.5 seconds, or 60 times faster, even though the countries table is now 8633 rows. The subdivision has accomplished two things:

  • Each polygon now covers a smaller area, so index searches are less likely to pull up points that are not within the polygon.
  • Each polygon is now below the page size, so retrieval from disk will be much faster.

Subdividing big things can make map drawing faster too, but beware: once your polygons are subdivided you’ll have turn off the polygon outlines to avoid showing the funny square boundaries in your rendered map.

Happy mapping and querying!

November 13, 2019 08:00 AM

November 06, 2019

Paul Ramsey

GZip in PostgreSQL

I love PostgreSQL extensions.

Extensions are the truest expression of the second principle of the original “design of Postgres” vision, to

provide user extendibility for data types, operators and access methods.

Extensions allow users to do more with PostgreSQL than just basic storage and retrieval. PostgreSQL is a full-on integration environment, like Python or Perl, and you can build very complete data manipulation pipelines very close to the metal using native and extension features of PostgreSQL.

Even though I’m a contributor to one of the largest PostgreSQL extensions, I have particularly come to love small extensions, that do one simple thing, particularly one simple thing we maybe take for granted in other environments.

My old HTTP extension is just a binding of libcurl to a SQL interface, so users can do web queries inside the SQL environment.

And today I’ve finished up a GZIP extension, that is just a binding of zlib to SQL, so that users can… compress and decompress things.

It’s not a lot, but it’s a little.

The GZIP entension came about because of an email on the PostGIS development list, where Yuri noted

The amazing ST_AsMVT() has two common usage patterns: copy resulting MVTs to a tile cache (e.g. .mbtiles file or a materialized view), or serve MVT to the users (direct SQL->browser approach). Both patterns still require one additional data processing step – gziping.

Huh. And this use case also applies to people generating GeoJSON directly in the database and sending it out to web clients.

The PostgreSQL core has generally frowned on compression functions at the SQL level, because the database already does compression of over-sized tuples as necessary. The last thing we want is people manually applying compression to column values, and then stuffing them into rows where the database will then have to re-compress them internally. From the perspective of storage efficiency, just standing back and letting PostgreSQL do its work is preferable.

But from the perspective of an integration environment, where an application might be expected to emit or consume compressed data, having a tool in SQL to pack and unpack that data is potentially quite useful.

So I did the tiny binding to zlib and packed it up in an extension.

I hope lots of people find it useful.

November 06, 2019 08:00 AM

October 20, 2019

PostGIS Development

PostGIS 3.0.0

The PostGIS development team is pleased to release PostGIS 3.0.0.

This release works with PostgreSQL 9.5-12 and GEOS >= 3.6.

If you are using postgis_sfcgal extension, you need to compile against SFCGAL 1.3.1 or higher.

Best served with PostgreSQL 12 , GEOS 3.8.0 and pgRouting 3.0.0-beta.

Continue Reading by clicking title hyperlink ..

by Regina Obe at October 20, 2019 12:00 AM

October 15, 2019

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PostGIS 3.0.0 coming soon - Try 3.0.0rc2 at a package repo near you

PostGIS 3.0.0 is planned for release early next week. In the meantime you will find PostGIS 3.0.0rc1 or rc2 available via yum.postgresql.org, apt.postgresql.org, and EDB Windows 64-bit stackbuilder for PostgreSQL 12.

Continue reading "PostGIS 3.0.0 coming soon - Try 3.0.0rc2 at a package repo near you"

by Leo Hsu and Regina Obe (nospam@example.com) at October 15, 2019 11:15 PM

October 13, 2019

PostGIS Development

PostGIS 3.0.0rc2

The PostGIS development team is pleased to release PostGIS 3.0.0rc2. This will be the final RC before release.

This release works with PostgreSQL 9.5-12 and GEOS >= 3.6

Best served with PostgreSQL 12 , GEOS 3.8.0 and pgRouting 3.0.0-alpha.

Continue Reading by clicking title hyperlink ..

by Regina Obe at October 13, 2019 12:00 AM

October 08, 2019

PostGIS Development

PostGIS 3.0.0rc1

The PostGIS development team is pleased to release PostGIS 3.0.0rc1.

This release works with PostgreSQL 9.5-12 and GEOS >= 3.6

Best served with PostgreSQL 12 , GEOS 3.8.0rc2 and pgRouting 3.0.0-alpha.

Continue Reading by clicking title hyperlink ..

by Regina Obe at October 08, 2019 12:00 AM

September 28, 2019

PostGIS Development

PostGIS 3.0.0beta1

The PostGIS development team is pleased to release PostGIS 3.0.0beta1.

This release works with PostgreSQL 9.5-12RC1 and GEOS >= 3.6

Best served with PostgreSQL 12RC1 and GEOS 3.8.0beta1 both of which came out in the past couple of days.

Continue Reading by clicking title hyperlink ..

by Regina Obe at September 28, 2019 12:00 AM

August 26, 2019

Paul Ramsey

Waiting for PostGIS 3: GEOS 3.8

While PostGIS includes lots of algorithms and functionality we have built ourselves, it also adds geospatial smarts to PostgreSQL by linking in specialized libraries to handle particular problems:

  • Proj for coordinate reference support;
  • GDAL for raster functions and formats;
  • GEOS for computational geometry (basic operations);
  • CGAL for more computational geometry (3D operations); and
  • for format support, libxml2, libjsonc, libprotobuf-c

Many of the standard geometry processing functions in PostGIS are actually evaluated inside the GEOS library, so updates in GEOS are very important to PostGIS – they add new functionality or smooth the behaviour of existing functions.

Functions backed by GEOS include:

These functions are all “overlay operation” functions – they take in geometry arguments and construct new geometries for output. Under the covers is an operation called an “overlay”, which combines all the edges of the inputs into a graph and then extracts new outputs from that graph.

While the “overlay operations” in GEOS are very reliable, they are not 100% reliable. When operations fail, the library throws the dreaded TopologyException, which indicates the graph is in an inconsistent and unusable state.

Because there are a lot of PostGIS users and they manage a lot of data, there are a non-zero number of cases that cause TopologyExceptions, and upset users. We would like take that number down to zero.

Update: Next-generation overlay did not make the 3.8 GEOS release and will be part of 3.9 instead.

With luck, GEOS 3.8 will succeed in finally bringing fully robust overlay operations to the open source community. The developer behind the GEOS algorithms, Martin Davis, recently joined Crunchy Data, and has spent this summer working on a new overlay engine.

Overlay failures are caused when intersections between edges result in inconsistencies in the overlay graph. Even using double precision numbers, systems have only 51 bits of precision to represent coordinates, and that fixed precision can result in graphs that don’t correctly reflect their inputs.

The solution is building a system that can operate on any fixed precision and retain valid geometry. As an example, here the new engine builds valid representations of Europe at any precision, even ludicrously coarse ones.

europe at different precisions

In practice, the engine will be used with a tolerance that is close to double precision, but still provides enough slack to handle tricky cases in ways that users find visually “acceptable”. Initially the new functionality should slot under the existing PostGIS functions without change, but in the future we will be able to expose knobs to allow users to explicitly set the precision domain they want to work in.

GEOS 3.8 may not be released in time for PostGIS 3, but it will be a close thing. In addition to the new overlay engine, a lot of work has been done making the code base cleaner, using more “modern” C++ idioms, and porting forward new fixes to existing algorithms.

August 26, 2019 08:00 AM

June 28, 2019

Stephen Mather

Tracing golden monkeys through time

My collegue TUYISINGIZE Deogratias (“Deo) and others at Dian Fossey Gorilla Fund International have been studying golden monkeys (Cercopithecus kandti) in Rwanda. Golden monkeys are an endangered monkey along the Albertine Rift (including the Virungas, host to the endangered mountain gorilla). They are also cute as can be, but more on that another time.

Golden monkey (Cercopithecus kandti) head.jpg

Deo has been leading efforts to track the golden monkeys in several locations across their range, observing their habits. Among the data gathered is the location of the groups of the monkeys as they move through their range. One element we want to understand from these data are how much does each group move per day.

The raw data look something like this:

raw_data

So we tweak things a bit to get ids in order of date and time, and also prep the data so that the date and time are proper types in PostgreSQL:

DROP TABLE IF EXISTS goldenmonkeys_sorted;
CREATE TABLE goldenmonkeys_sorted AS
(
    WITH nodate AS (
        SELECT gid, geom, id, lat, lon, alt, dater || ' ' || timer AS dater, month AS monther, season, groupid FROM hr_g_all
    )
    , sorted AS (
	SELECT gid, geom, id, lat, lon, alt, dater::TIMESTAMP WITH TIME ZONE AS datetimer, monther, season, groupid FROM nodate
		ORDER BY groupid, datetimer
    )
    SELECT gid AS id, ROW_NUMBER() OVER( PARTITION BY gid) AS gid, datetimer, date(datetimer) AS dater, monther, season, groupid, geom FROM sorted
        ORDER BY gid
);

Resulting in the following:

massaged_data

gishwatiGolden Monkey Ranging in Gishwati National Park

Ok. Now we want to turn this into traces of the movements of the monkeys everyday. Something like this:

gishwati_trace

But for every trace, for every day for each group.

We will create a function that leverages WITH RECURSIVE. We’ve seen this before. WITH RECURSIVE allows us to take each record in sequence and perform operations with the previous record, in this case calculating travel time, travel distance, and combining the individual points into a single line with ST_MakeLine.

CREATE OR REPLACE FUNCTION goldenmonkey_time (date, text)
RETURNS TABLE(dater date, monther text, traveltime interval, distance float, geom geometry) AS $$
WITH RECURSIVE gtime AS (
SELECT gid, dater, monther, datetimer, datetimer (SELECT min(datetimer) FROM goldenmonkeys_sorted WHERE dater = $1 AND groupid = $2) AS timediff, 0::float AS distance, geom
FROM goldenmonkeys_sorted WHERE dater = $1 AND groupid = $2
UNION ALL
SELECT n.gid, w.dater, w.monther, w.datetimer, n.datetimer w.datetimer AS timediff, ST_Distance(n.geom, w.geom) AS distance, n.geom
FROM goldenmonkeys_sorted n, gtime w
WHERE w.dater = $1 AND n.gid::integer = w.gid::integer + 1
)
SELECT max(dater) AS dater, max(monther) AS monther, max(timediff) AS traveltime, ST_Length(ST_MakeLine(geom)) AS length, ST_MakeLine(geom) FROM gtime;
$$ LANGUAGE SQL;

Now to use our function, we need a list of dates and groups so we can calculate this for each day:

WITH dategroup AS (
SELECT
DISTINCT dater, groupid
FROM
goldenmonkeys_sorted ORDER by groupid, dater
)
SELECT groupid, (goldenmonkey_time(dater, groupid)).dater,
(goldenmonkey_time(dater, groupid)).monther,
(goldenmonkey_time(dater, groupid)).traveltime,
(goldenmonkey_time(dater, groupid)).distance,
(goldenmonkey_time(dater, groupid)).geom FROM dategroup
ORDER BY groupid, (goldenmonkey_time(dater, groupid)).dater;

view raw
using_monkeytime.sql
hosted with ❤ by GitHub

Now we have traces not just for one day and group, but all traces and groups:

gishwati_tracesz

gishwati_traces

 

by smathermather at June 28, 2019 10:51 AM

May 22, 2019

Anita Graser (Underdark)

Movement data in GIS #23: trajectories in context

Today’s post continues where “Why you should be using PostGIS trajectories” leaves off. It’s the result of a collaboration with Eva Westermeier. I had the pleasure to supervise her internship at AIT last year and also co-supervised her Master’s thesis [0] on the topic of enriching trajectories with information about their geographic context.

Context-aware analysis of movement data is crucial for different domains and applications, from transport to ecology. While there is a wealth of data, efficient and user-friendly contextual trajectory analysis is still hampered by a lack of appropriate conceptual approaches and practical methods. (Westermeier, 2018)

Part of the work was focused on evaluating different approaches to adding context information from vector datasets to trajectories in PostGIS. For example, adding land cover context to animal movement data or adding information on anchoring and harbor areas to vessel movement data.

Classic point-based model vs. line-based model

The obvious approach is to intersect the trajectory points with context data. This is the classic point data model of contextual trajectories. It’s straightforward to add context information in the point-based model but it also generates large numbers of repeating annotations. In contrast, the line data model using, for example, PostGIS trajectories (LinestringM) is more compact since trajectories can be split into segments at context borders. This creates one annotation per segment and the individual segments are convenient to analyze (as described in part #12).

Spatio-temporal interpolation as provided by the line data model offers additional advantages for the analysis of annotated segments. Contextual segments start and end at the intersection of the trajectory linestring with context polygon borders. This means that there are no gaps like in the point-based model. Consequently, while the point-based model systematically underestimates segment length and duration, the line-based approach offers more meaningful segment length and duration measurements.

Schematic illustration of a subset of an annotated trajectory in two context classes, a) systematic underestimation of length or duration in the point data model, b) full length or duration between context polygon borders in the line data model (source: Westermeier (2018))

Another issue of the point data model is that brief context changes may be missed or represented by just one point location. This makes it impossible to compute the length or duration of the respective context segment. (Of course, depending on the application, it can be desirable to ignore brief context changes and make the annotation process robust towards irrelevant changes.)

Schematic illustration of context annotation for brief context changes, a) and b)
two variants for the point data model, c) gapless annotation in the line data model (source: Westermeier (2018) based on Buchin et al. (2014))

Beyond annotations, context can also be considered directly in an analysis, for example, when computing distances between trajectories and contextual point objects. In this case, the point-based approach systematically overestimates the distances.

Schematic illustration of distance measurement from a trajectory to an external
object, a) point data model, b) line data model (source: Westermeier (2018))

The above examples show that there are some good reasons to dump the classic point-based model. However, the line-based model is not without its own issues.

Issues

Computing the context annotations for trajectory segments is tricky. The main issue is that ST_Intersection drops the M values. This effectively destroys our trajectories! There are ways to deal with this issue – and the corresponding SQL queries are published in the thesis (p. 38-40) – but it’s a real bummer. Basically, ST_Intersection only provides geometric output. Therefore, we need to reconstruct the temporal information in order to create usable trajectory segments.

Finally, while the line-based model is well suited to add context from other vector data, it is less useful for context data from continuous rasters but that was beyond the scope of this work.

Conclusion

After the promising results of my initial investigations into PostGIS trajectories, I was optimistic that context annotations would be a straightforward add-on. The line-based approach has multiple advantages when it comes to analyzing contextual segments. Unfortunately, generating these contextual segments is much less convenient and also slower than I had hoped. Originally, I had planned to turn this work into a plugin for the Processing toolbox but the results of this work motivated me to look into other solutions. You’ve already seen some of the outcomes in part #20 “Trajectools v1 released!”.

References

[0] Westermeier, E.M. (2018). Contextual Trajectory Modeling and Analysis. Master Thesis, Interfaculty Department of Geoinformatics, University of Salzburg.


This post is part of a series. Read more about movement data in GIS.

by underdark at May 22, 2019 07:31 PM

March 25, 2019

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PGConf US 2019 Data Loading Slides up

I gave a talk at PGConf US 2019 on some of the many ways you can load data into PostgreSQL using open source tools. This is similar to the talk I gave last year but with the addition of the pgloader commandline tool and the http PostgreSQL extension.

HTML slides PDF slides

Even though it was a talk Not much about PostGIS, but just tricks for loading data, I managed to get a mouthful of PostGIS in there.

by Leo Hsu and Regina Obe (nospam@example.com) at March 25, 2019 10:00 PM

February 11, 2019

Postgres OnLine Journal (Leo Hsu, Regina Obe)

Compiling http extension on ubuntu 18.04

We recently installed PostgreSQL 11 on an Ubuntu 18.04 using apt.postgresql.org. Many of our favorite extensions were already available via apt (postgis, ogr_fdw to name a few), but it didn't have the http extension we use a lot. The http extension is pretty handy for querying things like Salesforce and other web api based systems. We'll outline the basic compile and install steps. While it's specific to the http extension, the process is similar for any other extension you may need to compile.

Continue reading "Compiling http extension on ubuntu 18.04"

by Leo Hsu and Regina Obe (nospam@example.com) at February 11, 2019 08:31 AM

January 01, 2019

Boston GIS (Regina Obe, Leo Hsu)

Using pg_upgrade to upgrade PostgreSQL 9.3 PostGIS 2.1 to PostgreSQL 11 2.5 on Yum

In a previous article Using pg upgrade to upgrade PostGIS without installing older version I demonstrated a trick for upgrading to a newer PostgreSQL instance from PostGIS 2.2 - 2.whatever without having to install the older version of PostGIS in your new PostgreSQL service. Unfortunately that trick does not work if coming from PostGIS 2.1 because in PostGIS 2.2 we renamed a c lib function that backed sql functions in 2.1.

Fear not. There is still a way to upgrade from 2.1 to 2.5 without installing an older version of PostGIS in your new PostgreSQL instance. To do so, you need to add a step and that is to remove the functions in 2.1 that are backed by this renamed lib function. In upcoming PostGIS 3.0, we've added this function back and have it throw an error so that even coming from PostGIS 2.1, you can upgrade just the same as you do from later versions.

Continue reading "Using pg_upgrade to upgrade PostgreSQL 9.3 PostGIS 2.1 to PostgreSQL 11 2.5 on Yum"

by Regina Obe (nospam@example.com) at January 01, 2019 06:48 AM

December 19, 2018

Michal Zimmermann

CentOS PostGIS Upgrade Hell… Yet Again

PostGIS upgrades used to be a nightmare. Broken dependencies, version mismatches, you name it. Upgrading PostgreSQL 10 with PostGIS 2.4 to PostgreSQL 11 on CentOS has been my mission impossible for two days. And it doesn’t seem to come to an end.

What? Why?

We’re running fairly large spatially enabled PostgreSQL 10 database cluster. To keep up with pretty fast development, I was hoping to pg_upgrade it to PostgreSQL 11.

Tried and failed

I’ve been trying different upgrade strategies with PostgreSQL 11 already running to no avail. Here comes the list.

Install PostGIS 2.4 to PostgreSQL 11 and pg_upgrade

yum install postgis24_11
systemctl stop postgresql-11

su postgres
/usr/pgsql-11/bin/pg_upgrade \
  --check \
  -b /usr/pgsql-10/bin/ -B /usr/pgsql-11/bin/ \
  -d /var/lib/pgsql/10/data -D /var/lib/pgsql/11/data \
  --link \
  -U root \
  -o ' -c config_file=/var/lib/pgsql/10/data/postgresql.conf' -O ' -c config_file=/var/lib/pgsql/11/data/postgresql.conf'

This results in:

Your installation references loadable libraries that are missing from the new installation. You can add these libraries to the new installation, or remove the functions using them from the old installation. A list of problem libraries is in the file: loadable_libraries.txt

loadable_libraries.txt says the following:

could not load library "$libdir/postgis-2.4": ERROR:  could not load library "/usr/pgsql-11/lib/postgis-2.4.so": /usr/pgsql-11/lib/postgis-2.4.so: undefined symbol: geod_polygon_init

Duckduckgoing I found the related PostgreSQL mailing list thread.

Build and install PostGIS 2.4 from source to PostgreSQL 11 and pg_upgrade

The bug report says there’s something wrong with proj4 version, so I chose proj49 and geos37.

yum install proj49 proj49-devel
wget https://download.osgeo.org/postgis/source/postgis-2.4.6.tar.gz
tar -xzvf postgis-2.4.6.tar.gz
cd postgis-2.4.6

./configure \
  --with-pgconfig=/usr/pgsql-11/bin/pg_config \
  --with-geosconfig=/usr/geos37/bin/geos-config \
  --with-projdir=/usr/proj49/

make &amp;&amp; make install

CREATE EXTENSION postgis fails with could not load library "/usr/pgsql-11/lib/postgis-2.4.so": /usr/pgsql-11/lib/postgis-2.4.so: undefined symbol: geod_polygon_init. Oh my.

Install PostGIS 2.5 to PostgreSQL 10 and pg_upgrade

Running out of ideas, I tried to install PostGIS 2.5 to our PostgreSQL 10 cluster and pg_upgrade.

yum install postgis25_10

The resulting error appeared almost instantly:

Transaction check error:
file /usr/pgsql-10/bin/shp2pgsql-gui from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/lib/liblwgeom.so from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/lib/postgis-2.4.so from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer.sql from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer_data_us.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer_data_us.sql from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_sfcgal.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_tiger_geocoder.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_topology.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64

What the…

Build and install PostGIS 2.5 from source to PostgreSQL 10 and pg_upgrade

wget https://download.osgeo.org/postgis/source/postgis-2.5.1.tar.gz
tar -xzvf postgis-2.5.1.tar.gz
cd postgis-2.5.1

./configure \
  --with-pgconfig=/usr/pgsql-10/bin/pg_config \
  --with-geosconfig=/usr/geos37/bin/geos-config

make &amp;&amp; make install

CREATE EXTENSION postgis fails with ERROR: could not load library "/usr/pgsql-10/lib/postgis-2.5.so": /usr/pgsql-10/lib/postgis-2.5.so: undefined symbol: GEOSFrechetDistanceDensify. Again? Really?

GEOSFrechetDistanceDensify was added in GEOS 3.7 (linked in ./configure), yet ldd /usr/pgsql-10/lib/postgis-2.5.so says:

linux-vdso.so.1 =&gt;  (0x00007ffd4c5fa000)
libgeos_c.so.1 =&gt; /usr/geos36/lib64/libgeos_c.so.1 (0x00007f68ddf5a000)
libproj.so.0 =&gt; /lib64/libproj.so.0 (0x00007f68ddd07000)
libjson-c.so.2 =&gt; /lib64/libjson-c.so.2 (0x00007f68ddafc000)
libxml2.so.2 =&gt; /lib64/libxml2.so.2 (0x00007f68dd792000)
libm.so.6 =&gt; /lib64/libm.so.6 (0x00007f68dd48f000)
libSFCGAL.so.1 =&gt; /lib64/libSFCGAL.so.1 (0x00007f68dc9c0000)
libc.so.6 =&gt; /lib64/libc.so.6 (0x00007f68dc5f3000)
libgeos-3.6.3.so =&gt; /usr/geos36/lib64/libgeos-3.6.3.so (0x00007f68dc244000)
libstdc++.so.6 =&gt; /lib64/libstdc++.so.6 (0x00007f68dbf3d000)
libgcc_s.so.1 =&gt; /lib64/libgcc_s.so.1 (0x00007f68dbd27000)
libdl.so.2 =&gt; /lib64/libdl.so.2 (0x00007f68dbb22000)
libz.so.1 =&gt; /lib64/libz.so.1 (0x00007f68db90c000)
liblzma.so.5 =&gt; /lib64/liblzma.so.5 (0x00007f68db6e6000)
/lib64/ld-linux-x86-64.so.2 (0x000055960f119000)
libCGAL.so.11 =&gt; /usr/lib64/libCGAL.so.11 (0x00007f68db4bd000)
libCGAL_Core.so.11 =&gt; /usr/lib64/libCGAL_Core.so.11 (0x00007f68db284000)
libmpfr.so.4 =&gt; /usr/lib64/libmpfr.so.4 (0x00007f68db029000)
libgmp.so.10 =&gt; /usr/lib64/libgmp.so.10 (0x00007f68dadb0000)
libboost_date_time-mt.so.1.53.0 =&gt; /usr/lib64/libboost_date_time-mt.so.1.53.0 (0x00007f68dab9f000)
libboost_thread-mt.so.1.53.0 =&gt; /usr/lib64/libboost_thread-mt.so.1.53.0 (0x00007f68da988000)
libboost_system-mt.so.1.53.0 =&gt; /usr/lib64/libboost_system-mt.so.1.53.0 (0x00007f68da783000)
libboost_serialization-mt.so.1.53.0 =&gt; /usr/lib64/libboost_serialization-mt.so.1.53.0 (0x00007f68da517000)
libpthread.so.0 =&gt; /lib64/libpthread.so.0 (0x00007f68da2fa000)
librt.so.1 =&gt; /usr/lib64/librt.so.1 (0x00007f68da0f2000)

I’m nearly desperate after spending two days trying to break through. I have ~ 300 GB of PostgreSQL data to migrate to the current version and there seems to be no possible way to do it in CentOS.

One more thing to note: using yum install postgis25_11 and CREATE EXTENSION postgis in v11 database fails with the exact same error like the one above. I really enjoy working with PostgreSQL and PostGIS, yet there’s hardly something I fear more than trying to upgrade those two things together.

by Michal Zimmermann at December 19, 2018 09:00 AM

December 10, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PostGIS 2.5.1 Bundle for Windows

PostGIS 2.5.1 was released on November 18th 2018 and I finished off packaging the PostGIS 2.5.1 windows builds and installers targeted for PostgreSQL EDB distribution this weekend and pushing them up to stackbuilder. This covers PostgreSQL 9.4-11 64-bit and PostgreSQL 95-10 (32bit).

Note that PostGIS 2.5 series will be the last of the PostGIS 2s. Goodbye PostGIS 2.* and start playing with the in-development version of PostGIS 3. Snapshot binaries for PostGIS 3.0 windows development are also available on the PostGIS windows download page. These should work for both BigSQL and EDB distributions.

Continue reading "PostGIS 2.5.1 Bundle for Windows"

by Leo Hsu and Regina Obe (nospam@example.com) at December 10, 2018 12:18 AM

November 17, 2018

Simon Greener

Spatial Representation of a Communications Pit

This article shows how to represent a communications pit using logical referencing rather than hard spatial objects in PostGIS. The solution was originally created on Oracle.

by Simon Greener at November 17, 2018 12:05 AM

October 25, 2018

Safe Software

A Beginner’s Guide to… Bulk Database Updates with FME

This article is a simple guide to bulk database updates with FME.

Sometimes my blog posts are like a courtroom novel. The recent article on precision is an example of this. I reach a successful verdict, but getting there involves an opening statement, a journey through obscure aspects of theory, cross-examining our functionality, and making a final argument for various design decisions. Only after all that can I prove beyond doubt that our work is fit for purpose.

That sort of article is fun to write and it does illustrate Safe’s thought processes in designing FME functionality. But sometimes – like reading a courtroom novel – you want to skip to the final chapter. You just want the juicy part where our hero attorney has already got the result, and simply illuminates the solution in logical steps.

Well, today, I am that hero attorney. I am the Perry Mason of the FME world, and this article is the final chapter of “Updating Databases with FME”. No theory or cross examination involved. Step by step I’ll cast light on how FME made it easy to push mass updates to a database.

There is just one design issue I’ll touch on, and it is important. In law there are multiple jurisdictions, and a technique Perry Mason tries in a California court may not work for Atticus Finch in Alabama. Similarly, in our world of data there are multiple databases. But! FME techniques you use for Oracle will work for SQL Server, and FME techniques you use for Postgres will work for an Esri Geodatabase.

How can this happen? It’s because we’ve worked really hard to harmonize (or standardize) the interfaces inside FME. So whatever database applies in your jurisdiction – or even if your work involves several database types – this post covers you.

So let’s get on with it…

Setting up an FME Database Connection

When you want to read to or write from a database – including database updates – you need authorization. FME defines authorization parameters using a connection tool. It’s accessed through Tools > FME Options in FME Workbench. So start Workbench, select Tools > FME Options and you will get this dialog:

When I click on Database Connections, I get a list of available connections. If I want to create a new one I press the plus button and get this dialog:

Just enter your database details in there, test and then save. Now you have a connection defined, you can use it wherever you like in FME. So first let’s use it to insert some data into a database…

Inserting Data to a Database with FME

To carry out database updates, first you need data in the database! Let’s say you want to read a dataset – maybe an Excel spreadsheet – and write it to a database, creating a table at the same time. That’s simply done in FME by generating a new workspace and choosing a database format. The generate option exists on the start page of FME Workbench, or you can use the shortcut Ctrl+G. That opens the basic dialog for defining a translation:

Here I have filled in the fields to define a translation of data about parks in the city of Vancouver. The translation is from MapInfo TAB (a spatial data format) to PostGIS (a standard PostgreSQL database with a spatial extension). The spatial part is not necessary – the same setup works for data without a spatial component – but maps are cool so I’ll go with it.

The database connection I selected is the one I defined above, saving me the effort of re-entering my authentication details.

Click OK and that dialog creates a workspace that looks like this:

Each object on the left is a table, layer or class in your source data. Each object on the right is a table in your database.

Setting Database Parameters

When using databases the key settings for each table are accessed by clicking the cogwheel icon on those objects, like this:

Database Insert Parameters

The table name is the first parameter, and so I can rename the table to be something different; and I can choose which schema (Table Qualifier) to write to as well.

But the most important parameter (Feature Operation) tells us that we are INSERTing data, and I can also choose in what way the table is created:

So I can choose to create the table regardless of if it exists (Drop and Create), create it if it doesn’t already exist (Create if Needed), just add to the existing table (Use Existing), or empty it if it already exists (Truncate Existing).

Which I use depends on the scenario I am working through, but in this case – to create and fill a table – I’ll use Create if Needed. The advantage over Drop and Create is that if another user already has a table with that name (and I haven’t checked) then at least I won’t delete their content first.

Anyway, I run the workspace and FME loads the data:

Of course, at some point in the future I might find the source of the data (Parks.tab) has changed, and I need to update my data based on that changed dataset…

Updating Records in a Database with FME

Say I receive a dataset named ParksUpdates.tab. The simplest way to update my database is to do the same process as above, but to use Drop and Create as the table operation. That way I am just replacing everything:

But that relies on ParksUpdates.tab being the FULL replacement dataset. What if it only includes the records that need an update? Well in that scenario I simply change the operation to UPDATE (instead of INSERT) and choose to Use Existing table:

Database Updates Parameters

Notice that when I pick UPDATE, then another parameter becomes available to me: Match Columns. I need this to define which feature updates which record. In this case I have an attribute called parkid in my source data and a field (column) named parkid in my database table; so that’s the attribute I select:

So if an incoming feature has the attribute parkid=13, then its contents are used to update the database record where parkid=13.

That’s simple enough, but to add a little complexity (not too much) there is also a WHERE clause I can use instead. This lets me define the match where the attribute and field names are not the same (for example ParkNumber and parkid) but also allows me to add extra conditions using field names:

 

Here, for example, I’m updating records where ParkNumber = parkid, but also only where the neighborhoodname field is “Downtown”. So records outside of the Downtown area aren’t updated, even if the park ID matches. I could do a similar test for a status field (active, inactive) among many other examples.

So we do updates here, and that’s simple enough; but sometimes we also want to delete records…

Deleting Records from a Database with FME

Let’s assume my UpdatedParks dataset is a list of records to delete from the database table, not add. To do that I simply change the operation from UPDATE to DELETE:

Database Delete Parameters

I get the same Match Column parameter (or WHERE clause) to define which incoming features should delete which existing records, and this is again easy to define.

So deletes are no more complex than updates; the key question is what happens when I want to both delete and update records simultaneously?

Updating AND Deleting Database Records with FME

Let’s say some of the incoming records are updates, while others are deletions:

Obviously I can’t set the operation parameter to both DELETE and UPDATE for the entire table. What I do instead is tag each feature with the operation it will carry out. I do this using an attribute called fme_db_operation:

Database Updates AND Deletes

You can see here that I have added an attribute to each stream of data, using an AttributeCreator transformer. The attribute name is fme_db_operation. For one lot of data I set the value to UPDATE. The other set of data has a value of DELETE. This is how I tag each feature with its own operation.

I still have to set the operation type on the table itself. But this time, rather than choosing Insert, Update, or Delete, I choose the option labelled fme_db_operation:

Now when I run the workspace, the features tagged UPDATE update database records, while the features tagged DELETE delete database records. The Match Column (or WHERE clause) provides a match between features and records.

The one assumption is that we already know which feature are deletes and which are updates. In the above example, the source data is already divided into two. If we aren’t sure of that then we might need to do what is called Change Detection

Change Detection and Database Updates with FME

Change Detection is where we have a new dataset and want to compare it to existing records to find what has changed. Here is such a workspace:

It looks quite simple, and in fact it is. I have added a reader (Readers > Add Reader) to read the existing contents of my database table, and an UpdateDetector transformer to compare these records to the UpdatedParks dataset, to identify where changes have occurred. I can detect changes either on field values, or spatial contents, or both.

Then it’s just a case of writing the results back to the database table. I don’t even need to create the fme_db_operation attribute; the UpdateDetector has done that for me. I must just check that the table is set with the correct operation (fme_db_operation) and that Match Column is set.

At this point you probably know more than enough to carry out database updates; but there is one more scenario I can perhaps mention. What if each feature has a different match column? In that case you can write your match in the form of a where clause, and store it as an attribute. Then use that attribute for the match in the table parameters:

…just like that!

NOTE: In FME2019 the ChangeDetector has undergone various improvements and should be your go-to transformer instead of the UpdateDetector.

Wrap Up

I hope that was a good explanation for database updates, short on theory and long on practical examples. Speaking of which, we have an online tutorial on database updates, where you can carry out exercises, step by step, using similar examples to what I showed today. So if you want to try some of these techniques in a safe practice environment, click the above link to visit the tutorial.

As I mentioned, although I used Postgres here, most of our database formats use an identical interface because we’ve invested a lot of time standardizing them all. That work is still ongoing, so one of the tutorial articles includes a list of standardized formats, and what to do if your format is not yet updated.

And now, if you’ll excuse me, I’m going to do like Perry Mason and wrap up my case by going out to celebrate with steak and cocktails!

PS: Safe Software now has an Instagram page. I’m not sure what an instagram is, but here’s a picture of me presenting at a prior user conference (oh, and riding an inflatable horse):

https://www.instagram.com/safesoftware/

by Mark Ireland at October 25, 2018 10:03 AM

September 13, 2018

Simon Greener

PGAdmin Finally Has A Spatial Viewer

A great day has arrived for PostgreSQL developers (not just Spatial geeks) in that finally an integrated spatial viewer has been added to PgAdmin 4.3 as announced by Regina Obe

You can see the announcement and some examples here

by Simon Greener at September 13, 2018 11:10 PM

September 09, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PGOpen 2018 Data Loading Presentation Slides

At PGOpen 2018 in San Francisco, we gave a talk on 10 ways to load data into Posgres. This is one of the rare talks where we didn't talk much about PostGIS. However we did showcase tools ogr_fdw, ogr2ogr, shp2pgsql, which are commonly used for loading spatial data, but equally as good for loading non-spatial data. Below are the slide links.

Continue reading "PGOpen 2018 Data Loading Presentation Slides"

by Leo Hsu and Regina Obe (nospam@example.com) at September 09, 2018 04:16 AM

September 08, 2018

Boston GIS (Regina Obe, Leo Hsu)

pgAdmin4 now offers PostGIS geometry viewer

pgAdmin4 version 3.3 released this week comes with a PostGIS geometry viewer. You will be able to see the graphical output of your query directly in pgAdmin, provided you output a geometry or geography column. If your column is of SRID 4326 (WGS 84 lon/lat), pgAdmin will automatically display against an OpenStreetMap background.

We have Xuri Gong to thank for working on this as a PostGIS/pgAdmin Google Summer of Code (GSOC) project. We'd like to thank Victoria Rautenbach and Frikan Erwee for mentoring.

Continue reading "pgAdmin4 now offers PostGIS geometry viewer"

by Regina Obe (nospam@example.com) at September 08, 2018 09:13 PM

July 19, 2018

Anita Graser (Underdark)

Movement data in GIS #15: writing a PL/pgSQL stop detection function for PostGIS trajectories

Do you sometimes start writing an SQL query and around at line 50 you get the feeling that it might be getting out of hand? If so, it might be useful to start breaking it down into smaller chunks and wrap those up into custom functions. Never done that? Don’t despair! There’s an excellent PL/pgSQL tutorial on postgresqltutorial.com to get you started.

To get an idea of the basic structure of a PL/pgSQL function and to proof that PostGIS datatypes work just fine in this context, here’s a basic function that takes a trajectory geometry and outputs its duration, i.e. the difference between its last and first timestamp:

CREATE OR REPLACE FUNCTION AG_Duration(traj geometry) 
RETURNS numeric LANGUAGE 'plpgsql'
AS $BODY$ 
BEGIN
RETURN ST_M(ST_EndPoint(traj))-ST_M(ST_StartPoint(traj));
END; $BODY$;

My end goal for this exercise was to implement a function that takes a trajectory and outputs the stops along this trajectory. Commonly, a stop is defined as a long stay within an area with a small radius. This leads us to the following definition:

CREATE OR REPLACE FUNCTION AG_DetectStops(
   traj geometry, 
   max_size numeric, 
   min_duration numeric)
RETURNS TABLE(sequence integer, geom geometry) 
-- implementation follows here!

Note how this function uses RETURNS TABLE to enable it to return all the stops that it finds. To add a line to the output table, we need to assign values to the sequence and geom variables and then use RETURN NEXT.

Another reason to use PL/pgSQL is that it enables us to write loops. And loops I wanted for my stop detection function! Specifically, I wanted to go through all the points in the trajectory:

FOR pt IN SELECT (ST_DumpPoints(traj)).geom LOOP
-- here comes the magic!
END LOOP;

Eventually the function should go through the trajectory and identify all segments that stay within an area with max_size diameter for at least min_duration time. To test for the area size, we can use:

IF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true; 

Putting everything together, my current implementation looks like this:

CREATE OR REPLACE FUNCTION AG_DetectStops(traj geometry, max_size numeric, min_duration numeric)
    RETURNS TABLE(sequence integer, geom geometry) LANGUAGE 'plpgsql'
AS $BODY$
DECLARE 
    pt geometry;
    segment geometry;
    is_stop boolean;
    previously_stopped boolean;
    stop_sequence integer;
    p1 geometry;
BEGIN
segment := NULL;
sequence := 0;
is_stop := false;
previously_stopped := false;
p1 := NULL;
FOR pt IN SELECT (ST_DumpPoints(traj)).geom 
LOOP
   IF segment IS NULL AND p1 IS NULL THEN 
      p1 := pt; 
   ELSIF segment IS NULL THEN 
      segment := ST_MakeLine(p1,pt); 
      p1 := NULL;
      IF ST_Length((segment)) <= max_size THEN is_stop := true; END IF;
   ELSE 
      segment := ST_AddPoint(segment,pt); 
      -- if we're in a stop, we want to grow the segment, otherwise we remove points to the specified min_duration
      IF NOT is_stop THEN
         WHILE ST_NPoints(segment) > 2 AND AG_Duration(ST_RemovePoint(segment,0)) >= min_duration LOOP
            segment := ST_RemovePoint(segment,0); 
         END LOOP;
      END IF;
      -- a stop is identified if the segment stays within a circle of diameter = max_size
      IF ST_Length((segment)) <= max_size THEN is_stop := true;  -- not necessary but faster
      ELSIF ST_Distance((ST_StartPoint(segment)),(pt)) > max_size THEN is_stop := false;
      ELSIF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true;											
      ELSE is_stop := false; 
      END IF;
      -- if we found the end of a stop, we need to check if it lasted long enough
      IF NOT is_stop AND previously_stopped THEN 
         IF ST_M(ST_PointN(segment,ST_NPoints(segment)-1))-ST_M(ST_StartPoint(segment)) >= min_duration THEN
            geom := ST_RemovePoint(segment,ST_NPoints(segment)-1); 
            RETURN NEXT;
            sequence := sequence + 1;
            segment := NULL;
            p1 := pt;
	     END IF;
      END IF;
   END IF;
   previously_stopped := is_stop;
END LOOP;
IF previously_stopped AND AG_Duration(segment) >= min_duration THEN 
   geom := segment; 
   RETURN NEXT; 
END IF;
END; 
$BODY$;	

While this function is not really short, it’s so much more readable than my previous attempts of doing this in pure SQL. Some of the lines for determining is_stop are not strictly necessary but they do speed up processing.

Performance still isn’t quite where I’d like it to be. I suspect that all the adding and removing points from linestring geometries is not ideal. In general, it’s quicker to find shorter stops in smaller areas than longer stop in bigger areas.

Let’s test! 

Looking for a testing framework for PL/pgSQL, I found plpgunit on Github. While I did not end up using it, I did use its examples for inspiration to write a couple of tests, e.g.

CREATE OR REPLACE FUNCTION test.stop_at_beginning() RETURNS void LANGUAGE 'plpgsql'
AS $BODY$
DECLARE t0 integer; n0 integer;
BEGIN
WITH temp AS ( SELECT AG_DetectStops(
   ST_GeometryFromText('LinestringM(0 0 0, 0 0 1, 0.1 0.1 2, 2 2 3)'),
   1,1) stop 
)
SELECT ST_M(ST_StartPoint((stop).geom)), 
       ST_NPoints((stop).geom) FROM temp INTO t0, n0;	
IF t0 = 0 AND n0 = 3
   THEN RAISE INFO 'PASSED - Stop at the beginning of the trajectory';
   ELSE RAISE INFO 'FAILED - Stop at the beginning of the trajectory';
END IF;
END; $BODY$;

Basically, each test is yet another PL/pgSQL function that doesn’t return anything (i.e. returns void) but outputs messages about the status of the test. Here I made heavy use of the PERFORM statement which executes the provided function but discards the results:


Update: The source code for this function is now available on https://github.com/anitagraser/postgis-spatiotemporal

by underdark at July 19, 2018 09:31 PM

July 08, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

Using procedures for batch geocoding and other batch processing

One of the features we are looking forward to in upcoming PostgreSQL 11 is the introduction of procedures via the CREATE PROCEDURE ANSI-SQL construct. The major benefit that sets apart procedures from functions is that procedures are not wrapped in an outer transaction and can have COMMITs within them. This means it's not an all or nothing like it is with functions. Even if you stop a procedure in motion, whatever work has been done and committed is saved. In the case of functions, a stop or failure would roll-back all the work. It also means you can see work in progress of a stored procedure since the work will already have been committed. This is a huge benefit for batch processing. Batch processing covers a lot of use-cases of PostGIS users since a good chunk of PostGIS work involves doing some kind of batch processing of data you get from third-parties or machines.

Continue reading "Using procedures for batch geocoding and other batch processing"

by Leo Hsu and Regina Obe (nospam@example.com) at July 08, 2018 05:40 AM

June 25, 2018

Boston GIS (Regina Obe, Leo Hsu)

New in QGIS 3.2 Save Project to PostgreSQL

We've got customers discovering PostGIS and GIS in general or migrating away from ArcGIS family of tools. When they ask, "How do I see my data?", we often point them at QGIS which is an open source GIS desktop with rich integration with PostGIS/PostgreSQL.

QGIS is something that is great for people who need to live in their GIS environment since it allows for easily laying on other datasources, web services and maps. The DBManager tool allows for more advanced querying (like writing Spatial SQL queries that take advantage of the 100s of functions PostGIS has to offer) , ability to import/export data, and create PostgreSQL views.

QGIS has this thing called Projects, which allow for defining map layers and the symbology associated with them. For example what colors do you color your roads, and any extra symbols, what field attributes do you overlay - street name etc. Projects are usually saved in files with a .qgs or .qgz extension. If you spent a lot of time styling these layers, chances are you want to share them with other people in your group. This can become challenging if your group is not connected via network share.

Continue reading "New in QGIS 3.2 Save Project to PostgreSQL"

by Regina Obe (nospam@example.com) at June 25, 2018 05:47 AM

June 18, 2018

Boston GIS (Regina Obe, Leo Hsu)

Coming PostGIS 2.5 ST_OrientedEnvelope

PostGIS 2.5 is just around the corner. One of the new functions coming is the ST_OrientedEnvelop. This is something I've been waiting for for years. It is the true minimum bounding rectangle, as opposed to ST_Envelope which is an axis aligned bounding rectable.

Below is a pictorial view showing the difference between the two.

Continue reading "Coming PostGIS 2.5 ST_OrientedEnvelope"

by Regina Obe (nospam@example.com) at June 18, 2018 09:42 PM

June 09, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

PostgresVision 2018 Slides and Impressions

Leo and I attended PostgresVision 2018 which ended a couple of days ago.

We gave a talk on spatial extensions with main focus being PostGIS. Here are links to our slides PostgresVision2018_SpatialExtensions HTML version PDF.

Unfortunately there are no slides of the pgRouting part, except the one that says PGRouting Live Demos because Leo will only do live demos. He has no fear of his demos not working.

Side note, if you are on windows and use the PostGIS bundle, all the extensions listed in the PostGIS box of the spatial extensions diagram, as well as the pointcloud, pgRouting, and ogr_fdw are included in the bundle.

Continue reading "PostgresVision 2018 Slides and Impressions"

by Leo Hsu and Regina Obe (nospam@example.com) at June 09, 2018 08:56 AM

April 16, 2018

Anita Graser (Underdark)

Movement data in GIS #12: why you should be using PostGIS trajectories

In short: both writing trajectory queries as well as executing them is considerably faster using PostGIS trajectories (as LinestringM) rather than the commonly used point-based approach.

Here are a couple of examples to give you an impression of the differences.

Spoiler alert! Trajectory queries are up to 500 times faster than comparable point-based queries.

A quick look at indexing

In both cases, we have indexed the tracker id, geometry, and time columns to speed up query processing.

The trajectory table has 3 indexes

  • gist (time_range)
  • gist (track gist_geometry_ops_nd)
  • btree (tracker)

The point-based table has 4 indexes

  • gist (pt)
  • btree (trajectory_id)
  • btree (tracker)
  • btree (t)

Length

First, let’s see how to determine trajectory length for all observed moving objects (identified by a tracker id).

Using the point-based approach, we first need to ensure that the points are in the correct temporal order, create the lines, and finally sum up their length:

WITH ordered AS (
 SELECT trajectory_id, tracker, t, pt
 FROM geolife.trajectory_pt
 ORDER BY t
), tmp AS (
 SELECT trajectory_id, tracker, st_makeline(pt) traj
 FROM ordered 
 GROUP BY trajectory_id, tracker
)
SELECT tracker, round(sum(ST_Length(traj::geography)))
FROM tmp
GROUP BY tracker 
ORDER BY tracker

With trajectories, we can go right to computing lengths:

SELECT tracker, round(sum(ST_Length(track::geography)))
FROM geolife.trajectory_ext
GROUP BY tracker
ORDER BY tracker

On my test system, the trajectory query run time is 22.7 sec instead of 43.0 sec for the point-based approach:

Duration

Compared to trajectory length, duration is less complicated in the point-based approach:

WITH tmp AS (
 SELECT trajectory_id, tracker, min(t) start_time, max(t) end_time
 FROM geolife.trajectory_pt
 GROUP BY trajectory_id, tracker
)
SELECT tracker, sum(end_time - start_time)
FROM tmp
GROUP BY tracker
ORDER BY tracker

Still, the trajectory query is less complex and much faster at 31 ms instead of 6.0 sec:

SELECT tracker, sum(upper(time_range) - lower(time_range))
FROM geolife.trajectory_ext
GROUP BY tracker
ORDER BY tracker

Temporal filter

Extracting trajectories that occurred during a certain time frame is another common use case:

WITH tmp AS (
 SELECT trajectory_id, tracker, min(t) start_time, max(t) end_time
 FROM geolife.trajectory_pt
 GROUP BY trajectory_id, tracker
)
SELECT trajectory_id, tracker, start_time, end_time
FROM tmp
WHERE end_time > '2008-11-26 11:00'
AND start_time < '2008-11-26 15:00'
ORDER BY tracker

This point-based query takes 6.0 sec while the shorter trajectory query finishes in 12 ms:

SELECT id, tracker, time_range
FROM geolife.trajectory_ext
WHERE time_range && '[2008-11-26 11:00+1,2008-11-26 15:00+01]'::tstzrange

or equally fast (12 ms) by making use of the n-dimensional index:

WHERE track &&&	ST_Collect(
 ST_MakePointM(-180, -90, extract(epoch from '2008-11-26 11:00'::timestamptz)),
 ST_MakePointM(180, 90, extract(epoch from '2008-11-26 15:00'::timestamptz))
)

Spatial filter

Finally, of course, let’s have a look at spatial filters, for example, trajectories that start in a certain area:

WITH my AS ( 
 SELECT ST_Buffer(ST_SetSRID(ST_MakePoint(116.31894,39.97472),4326),0.0005) areaA
), tmp AS (
 SELECT trajectory_id, tracker, min(t) t 
 FROM geolife.trajectory_pt
 GROUP BY trajectory_id, tracker
)
SELECT distinct traj.tracker, traj.trajectory_id 
FROM tmp
JOIN geolife.trajectory_pt traj
ON tmp.trajectory_id = traj.trajectory_id AND traj.t = tmp.t
JOIN my
ON ST_Within(traj.pt, my.areaA)

This point-based query takes 6.0 sec while the shorter trajectory query finishes in 488 ms:

WITH my AS ( 
 SELECT ST_Buffer(ST_SetSRID(ST_MakePoint(116.31894, 39.97472),4326),0.0005) areaA
)
SELECT id, tracker, ST_AsText(track)
FROM geolife.trajectory_ext
JOIN my
ON areaA && track
AND ST_Within(ST_StartPoint(track), areaA)

For more generic “does this trajectory intersect another geometry”, the points can also be aggregated to a linestring on the fly but that takes 21.9 sec:

I’ll be presenting more work on PostGIS trajectories at GI_Forum in Salzburg in July. In the talk, I’ll also have a look at the custom PG-Trajectory datatype. Here’s the full open-access paper:

Graser, A. (2018) Evaluating Spatio-temporal Data Models for Trajectories in PostGIS Databases. GI_Forum ‒ Journal of Geographic Information Science, 1-2018, 16-33. DOI: 10.1553/giscience2018_01_s16.

You can find my fork of the PG-Trajectory project – including all necessary fixes – on Bitbucket.


This post is part of a series. Read more about movement data in GIS.

by underdark at April 16, 2018 08:30 PM

February 15, 2018

Postgres OnLine Journal (Leo Hsu, Regina Obe)

FDWS for PostgreSQL Windows 32 and 64-bit

We've updated our binaries for PostgreSQL 10 windows, both 32 and 64-bit. The 64-bit should work fine with EnterpriseDb windows as well as BigSQL.

Continue reading "FDWS for PostgreSQL Windows 32 and 64-bit"

by Leo Hsu and Regina Obe (nospam@example.com) at February 15, 2018 05:25 AM

November 16, 2017

Boston GIS (Regina Obe, Leo Hsu)

Happy PostGIS Day 2017

To commemorate PostGIS Day 2017, I put together a 3D scene listing my favorite functions.

You can see it PostGIS Day in 3D

by Regina Obe (nospam@example.com) at November 16, 2017 11:51 PM

October 28, 2017

Anita Graser (Underdark)

Movement data in GIS #10: open tools for AIS tracks from MarineCadastre.gov

MarineCadastre.gov is a great source for AIS data along the US coast. Their data formats and tools though are less open. Luckily, GDAL – and therefore QGIS – can read ESRI File Geodatabases (.gdb).

MarineCadastre.gov also offer a Track Builder script that creates lines out of the broadcast points. (It can also join additional information from the vessel and voyage layers.) We could reproduce the line creation step using tools such as Processing’s Point to path but this post will show how to create PostGIS trajectories instead.

First, we have to import the points into PostGIS using either DB Manager or Processing’s Import into PostGIS tool:

Then we can create the trajectories. I’ve opted to create a materialized view:

The first part of the query creates a temporary table called ptm (short for PointM). This step adds time stamp information to each point. The second part of the query then aggregates these PointMs into trajectories of type LineStringM.

CREATE MATERIALIZED VIEW ais.trajectory AS
 WITH ptm AS (
   SELECT b.mmsi,
     st_makepointm(
       st_x(b.geom), 
       st_y(b.geom), 
       date_part('epoch', b.basedatetime)
     ) AS pt,
     b.basedatetime t
   FROM ais.broadcast b
   ORDER BY mmsi, basedatetime
 )
 SELECT row_number() OVER () AS id,
   st_makeline(ptm.pt) AS st_makeline,
   ptm.mmsi,
   min(ptm.t) AS min_t,
   max(ptm.t) AS max_t
 FROM ptm
 GROUP BY ptm.mmsi
WITH DATA;

The trajectory start and end times (min_t and max_t) are optional but they can help speed up future queries.

One of the advantages of creating trajectory lines is that they render many times faster than the original points.

Of course, we end up with some artifacts at the border of the dataset extent. (Files are split by UTM zone.) Trajectories connect the last known position before the vessel left the observed area with the position of reentry. This results, for example, in vertical lines which you can see in the bottom left corner of the above screenshot.

With the trajectories ready, we can go ahead and start exploring the dataset. For example, we can visualize trajectory speed and/or create animations:

Purple trajectory segments are slow while green segments are faster

We can also perform trajectory analysis, such as trajectory generalization:

This is a first proof of concept. It would be great to have a script that automatically fetches the datasets for a specified time frame and list of UTM zones and loads them into PostGIS for further processing. In addition, it would be great to also make use of the information in the vessel and voyage tables, thus splitting up trajectories into individual voyages.


This post is part of a series. Read more about movement data in GIS.

by underdark at October 28, 2017 01:22 PM