Experiments in Routing, Part 1

This is the introductory post for a hopefully four-part series about using QGIS to find the shortest path between two points, not shortest as the crow flies, but following a given network of roads. This is called routing, it’s what’s Google Maps and other mapping software uses, and it relies on graph theory and network analysis to do its job. I’ll talk about the what and the why of this little experiment here; the how (for three different versions of how) will be the subject of subsequent posts.

UPDATE: Part 2 can be found here.

The reason I’m looking at all this goes back to my interest in cycling tourism, and my attempts to identify cycling-accessible amenities — convenience stores, restaurants, hotels, that sort of thing — along the Lehigh Towpath. My first attempt (you can find it here) basically looked at a region, within a mile (as the crow flies) of one section of the Lehigh River, and searching within that region for the amenities I was interested in. That was an interesting project in its own right, but, as I said in my earlier post, it didn’t really solve the right problem:  there are many places within a mile, or even a quarter mile of the river, that are not anywhere near accessible from the towpath: they may be on the wrong side of the river, say, or not near a towpath access point. To be considered accessible, the points of interest would need to be within a mile (or whatever arbitrary distance I end up choosing), by road, of an access point on the towpath.

map of roads near lehigh river
Data gathered, and almost ready for routing.

I didn’t really have a plan to make this happen yet, but with or without a specific plan, I figured my first order of business was to get the information I would use. That previous analysis used Google Maps, but I felt that their data was a bit encumbered (in terms of my rights to it), and it seemed that Google didn’t play as well as I’d like with QGIS anyway, so I decided to use the data available through Open Streetmap, for both the road network and the set of amenities. (I already had a collection of the towpath access point locations left over from a previous experiment.) I got those sets of data, and massaged them so that I only had the parts that fell within a mile of the bike paths in the Lehigh Valley. This gave me the data seen to the right, where the aqua lines are the road network, the red lines are bike trails (the towpath, plus the Palmer Bike Path), the red stars are trail access points, and the orange dots are the amenities (restaurants, fast food etc).

(One note about the road network: You probably can’t see it at this resolution, but I made a point of excluding roads that are not practical/legal/safe for cycling, like US-22, I-78 and a few others. There are also a number of places, like the New Street and Hill-to-Hill Bridges, where roads or the trail are connected via stairways to the bridges above; after our own struggles, a few years ago, with stairs and fully loaded touring bikes at the Ben Franklin Bridge, I decided to also exclude stairways from my network.)

So that gets us the data, what about the analysis? My first thoughts were to see if I could find all the points on the road network that were a mile away from an access point, then connect the dots to define a region, and then find all the amenities within that region. My second thoughts were that this approach would put me back in the same situation as my first attempt, since I could easily find roads that were not reachable within that region, such as bridges. (Bridges became my nemesis for a while.) I eventually decided that my best strategy would be to find the shortest route between each access point and each amenity, and select from the amenities based on the lengths of the routes I found.

To perform the actual routing analysis, I have three options:

In terms of a learning curve, I have some experience with networks in GRASS, and I feel at least a little comfortable with Python (and copy-paste, with scripts I find online), so pgRouting will probably be the most difficult for me to pick up. Meanwhile, the Network Analysis library can use the data I already have, but Open Streetmap deals with road networks in a way that’s not directly compatible with either GRASS or pgRouting — their topological models are different, but that’s an issue for a future post. I would have to either re-import the road network to get it to work with pgRouting, or further process the one I have for GRASS.

Each one of these approaches will be the subject of its own post. Given that the Python approach is not the hardest, and my data is already in the form I’d need, I am going to try my hand with the Network Analysis library first. Stay tuned for Part 2, whenever…