As a technologist at HumanGeo, you’re often asked to perform some kind of analysis on geospatial data, and quickly! We frequently work on short turnaround times for our customers so anything that gives us a boost is welcome, which is probably why so many of us love Python. As evidenced by the volume of scientific talks at PyCon 2014, we can also lean on the great work of the scientific community. Python lets us go from zero to answer within hours or days, not weeks.
I recently had to do some science on the way we can observe clusters of points on the map – to show how regions of social significance emerge. Luckily I was able to lean heavily on Shapely which is a fantastic Python library for performing geometric operations on points, shapes, lines, etc. As an aside, if you are doing any sort of geospatial work with Python, you’ll want to pip install shapely
. Once we found a cluster of points which we believed were identifying a unique region, we needed to draw a boundary around the region so that it could be more easily digested by a geospatial analyst. Boundaries are just polygons that enclose something, so I’ll walk through some of your options and attempt to provide complete code examples.
The first step towards geospatial analysis in Python is loading your data. In the example below, I have a shapefile containing a number of points which I generated manually with QGIS. I’ll use the fiona library to read the file in, and then create point objects with shapely.
import fiona import shapely.geometry as geometry input_shapefile = 'concave_demo_points.shp' shapefile = fiona.open(input_shapefile) points = [geometry.shape(point['geometry']) for point in shapefile]
The points list can now be manipulated with Shapely. First, let’s plot the points to see what we have.
import pylab as pl x = [p.coords.xy[0] for p in points] y = [p.coords.xy[1] for p in points] pl.figure(figsize=(10,10)) _ = pl.plot(x,y,'o', color='#f16824')
We can now interrogate the collection. Many shapely operations result in a different kind of geometry than the one you’re currently working with. Since our geometry is a collection of points, I can instantiate a MultiPoint, and then ask that MultiPoint for its envelope, which is a Polygon. Easily done like so:
point_collection = geometry.MultiPoint(list(points)) point_collection.envelope
We should take a look at that envelope. matplotlib can help us out, but polygons aren’t functions, so we need to use PolygonPatch.
from descartes import PolygonPatch def plot_polygon(polygon): fig = pl.figure(figsize=(10,10)) ax = fig.add_subplot(111) margin = .3 x_min, y_min, x_max, y_max = polygon.bounds ax.set_xlim([x_min-margin, x_max+margin]) ax.set_ylim([y_min-margin, y_max+margin]) patch = PolygonPatch(polygon, fc='#999999', ec='#000000', fill=True, zorder=-1) ax.add_patch(patch) return fig _ = plot_polygon(point_collection.envelope) _ = pl.plot(x,y,'o', color='#f16824')
So without a whole lot of code, we were able to get the envelope of the points, which is the smallest rectangle that contains all of the points. In the real world, boundaries are rarely so uniform and straight, so we were naturally led to experiment with the convex hull of the points. Convex hulls are polygons drawn around points too – as if you took a pencil and connected the dots on the outer-most points. Shapely has convex hull as a built in function so let’s try that out on our points.
convex_hull_polygon = point_collection.convex_hull _ = plot_polygon(convex_hull_polygon) _ = pl.plot(x,y,'o', color='#f16824')
A tighter boundary, but it ignores those places in the “H” where the points dip inward. For many applications, this is probably good enough but we wanted to explore one more option which is known as a concave hull or alpha shape. At this point we’ve left the built-in functions of Shapely and we’ll have to write some more code. Thankfully, smart people like Sean Gillies, the author of Shapely and fiona, have done the heavy lifting. His post on the fading shape of alpha gave me a great place to start. I had to fill in some gaps that Sean left so I’ll recreate the entire working function here.
from shapely.ops import cascaded_union, polygonize from scipy.spatial import Delaunay import numpy as np import math def alpha_shape(points, alpha): """ Compute the alpha shape (concave hull) of a set of points. @param points: Iterable container of points. @param alpha: alpha value to influence the gooeyness of the border. Smaller numbers don't fall inward as much as larger numbers. Too large, and you lose everything! """ if len(points) < 4: # When you have a triangle, there is no sense # in computing an alpha shape. return geometry.MultiPoint(list(points)) .convex_hull def add_edge(edges, edge_points, coords, i, j): """ Add a line between the i-th and j-th points, if not in the list already """ if (i, j) in edges or (j, i) in edges: # already added return edges.add( (i, j) ) edge_points.append(coords[ [i, j] ]) coords = np.array([point.coords[0] for point in points]) tri = Delaunay(coords) edges = set() edge_points = [] # loop over triangles: # ia, ib, ic = indices of corner points of the # triangle for ia, ib, ic in tri.vertices: pa = coords[ia] pb = coords[ib] pc = coords[ic] # Lengths of sides of triangle a = math.sqrt((pa[0]-pb[0])**2 + (pa[1]-pb[1])**2) b = math.sqrt((pb[0]-pc[0])**2 + (pb[1]-pc[1])**2) c = math.sqrt((pc[0]-pa[0])**2 + (pc[1]-pa[1])**2) # Semiperimeter of triangle s = (a + b + c)/2.0 # Area of triangle by Heron's formula area = math.sqrt(s*(s-a)*(s-b)*(s-c)) circum_r = a*b*c/(4.0*area) # Here's the radius filter. #print circum_r if circum_r < 1.0/alpha: add_edge(edges, edge_points, coords, ia, ib) add_edge(edges, edge_points, coords, ib, ic) add_edge(edges, edge_points, coords, ic, ia) m = geometry.MultiLineString(edge_points) triangles = list(polygonize(m)) return cascaded_union(triangles), edge_points concave_hull, edge_points = alpha_shape(points, alpha=1.87) _ = plot_polygon(concave_hull) _ = pl.plot(x,y,'o', color='#f16824')
That’s a mouthful, but the gist is that we are going to compute Delaunay triangles which establish a connection between each point and nearby points and then we remove some of the triangles that are too far from their neighbors. This removal part is key. By identifying candidates for removal we are saying that these points are too far from their connected points so don’t use that connection as part of the boundary. The result looks like this.
Better, but not great.
It turns out that the alpha value and the scale of the points matters a lot when it comes to how well the Delaunay triangulation method will work. You can usually play with the alpha value to find a suitable response, but unless you can scale up your points it might not help. For the sake of a good example, I’ll do both: scale up the “H” and try some different alpha values.
To get more points, I opened up QGIS, drew an “H” like polygon, used the tool to generate regular points, and then spatially joined them to remove any points outside the “H”. My new dataset looks like this:
input_shapefile = 'demo_poly_scaled_points_join.shp' new_shapefile = fiona.open(input_shapefile) new_points = [geometry.shape(point['geometry']) for point in new_shapefile] x = [p.coords.xy[0] for p in new_points] y = [p.coords.xy[1] for p in new_points] pl.figure(figsize=(10,10)) _ = pl.plot(x,y,'o', color='#f16824')
When we try the alpha shape transformation on these points we get a much more satisfying boundary. We can try a few permutations to find the best alpha value for these points with the following code. I combined each plot into an animated gif below.
from matplotlib.collections import LineCollection for i in range(9): alpha = (i+1)*.1 concave_hull, edge_points = alpha_shape(new_points, alpha=alpha) #print concave_hull lines = LineCollection(edge_points) pl.figure(figsize=(10,10)) pl.title('Alpha={0} Delaunay triangulation'.format( alpha)) pl.gca().add_collection(lines) delaunay_points = np.array([point.coords[0] for point in new_points]) pl.plot(delaunay_points[:,0], delaunay_points[:,1], 'o', hold=1, color='#f16824') _ = plot_polygon(concave_hull) _ = pl.plot(x,y,'o', color='#f16824')
So in this case, alpha of about 0.4 looks pretty good. We can use shapely’s buffer operation to clean up that polygon a bit and smooth out any of the jagged edges.
alpha = .4 concave_hull, edge_points = alpha_shape(new_points, alpha=alpha) plot_polygon(concave_hull) _ = pl.plot(x,y,'o', color='#f16824') plot_polygon(concave_hull.buffer(1)) _ = pl.plot(x,y,'o', color='#f16824')
And there you have it. Hopefully this has been a useful tour through some of your geometric boundary options in python. I recommend exploring the Shapely manual to find out about all of the other easy geometric operations you have at your fingertips. Also, if you dig Python and playing with maps, we want to hear from you.
Can you provide the concave_demo_points.shp file that you use in your example?
Sure thing. I just created a repo on github that contains the shapefiles and an ipython notebook that should contain the same code. See this: https://github.com/dwyerk/boundaries
thank you for this post, this is very relevant too my work. My colleague has programmed a few different algorithms in Matlab for this same task, but I am happy their is a python implementation.
This is incorrectly formatted python…
Kaelan – could you be more specific about the incorrectly formatted python that you see? It could be a copy-and-paste error, but it looks OK to me.
The add_edge function in alpha_shape is never called. I thought it might be because a block of code was accidentally included in add_edge’s scope. Maybe post a github gist with the completed code?
Thanks!
P.S. Thanks for this, it appears to be the only solution on the entire damn internet
(replying here so you get a notification)
The add_edge function in the alpha_shape function is never called, and add_edge returns nothing
* alpha_shape returns nothing
Ah, yes, there was a pasting error. Line 34 through about 46 had to be dedented one. Thanks for catching that!
I have the original code in an ipython notebook if you’re interested in working from that instead of WordPress (which is clearly not the best way for me to show off code!)
Notebook viewer link: http://nbviewer.ipython.org/github/dwyerk/boundaries/blob/master/concave_hulls.ipynb
Original: https://github.com/dwyerk/boundaries/blob/master/concave_hulls.ipynb
Thank you! Works great
Hi, I am trying to ‘envelope’ a number of points. I am trying to implement your code using my points however i am getting the following error message when trying to get the concave hull, any ideas on what is going on?
File “./shapely_test2.sh”, line 88, in
concave_hull, edge_points = alpha_shape(points1, alpha=1.87)
File “./shapely_test2.sh”, line 84, in alpha_shape
m = geometry.MultiLineString(edge_points)
File “/usr/lib/python2.7/dist-packages/shapely/geometry/multilinestring.py”, line 47, in __init__
self._geom, self._ndim = geos_multilinestring_from_py(lines)
File “/usr/lib/python2.7/dist-packages/shapely/geometry/multilinestring.py”, line 106, in geos_multilinestring_from_py
exemplar = obs[0]
Many thanks
I seem to have got around my initial problem by using your code from the ipython notebook, it must have been caused by the error mention before. However, I dont seem to have plot_polygon defined anywhere. Where may I find this, or how else can I plot the polygon?
Joe: search this page for “def plot_polygon” and you’ll find the function there.
Seems relevant:
https://github.com/AndreyGeonya/hull
thanks for the nice tutorial! This code solves exactly the same problem I’m dealing with.
Since I could not make Fiona work on my pc, I was not able to reproduce the output. If I could make it work, I would have modified the code according to my problem which takes a .stl file as input not a .shp file. (I know that .stl contains 3D info, but ignoring one of the coordinates, one can obtain a list of 2D points) So, here are my questions:
1 – Do you have any recommendations about modifying the beginning of above code such that it takes a .stl file as input and performs the same computation?
2 – Could you please explain the contents of points obtained by this command “points = [geometry.shape(point[‘geometry’]) for point in shapefile]” . I also need to understand what is happening with “coords = np.array([point.coords[0] for point in points])”. If I can figure out what these two do, I think I’ll be able to make the modification by myself.
Eric, I’m not familiar with stl files but some Googling on the subject suggests that you might have to use the Python API of a tool like Blender or FreeCAD to access your data. Once you have access to the points, you can replace that code in the beginning with whatever the appropriate access calls are in those APIs.
The contents of `points` is simply the list of all Point geometries contained within the input shapefile. Each Point will contain coordinates and potentially other metadata about the point.
The `coords` array is a numpy array that contains just the coordinate pairs from those point geometries. Have a look at http://toblerity.org/shapely/manual.html#points for more info on points and http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html for more on numpy arrays.
Hello. Thank you for the nice contribution.
Is it possible to use the pyshp implemented reader instead of fiona?
Regards,
Frank
So I now tried but it does not work. If I use
shapefile=shapefile.Reader(input_shapefile)
points = [geometry.shape(point[‘geometry’])
for point in shapefile
it does not work. I get the response:
TypeError:iteration over non-sequence
Does anybody has an idea how to fix this?
Thanks,
Frank