I finally got around to riding the southernmost part of the D&L about two weeks ago, riding from Yardley to Bristol and back, and ground-truthing the trail and access points. I can scratch that off my bucket list, and I don’t see any reason to ride south of Yardley again — this trail section, especially the Morrisville-Levittown portion, is nowhere near as nice as other areas — but I got what I needed to finish my trail amenities map. I may do a little exploring on the Black Diamond north of White Haven just for the sake of completeness, but I think I now have everything I was looking for.
- Tag Archives GIS
Anything to do with Geographical Information Systems or mapping.
I’m not sure if this is going to rise to the level of “new GIS project,” but I have been playing around a lot lately with the local transportation authority’s GTFS feed — where GTFS stands for “General Transit Feed Specification,” a standard for publishing public transit information on the Internet.
These feeds are like a cross between spreadsheets and database tables, and by a judicious massaging of the data you can extract bus stop and route information. Unfortunately, that massaging is a real necessity: the specification is built to convey a lot of information, and to cover a lot of different transit situations, so there’s no simple route-and-stop information — it’s buried in cross-references and spread across multiple tables. All this extraction and data crunching is fairly straightforward though, and there are even tools to automate the process (I use a QGIS plugin).
Or the process would be straightforward, if we were not dealing with LANTA. These feeds are updated periodically, and about a year ago the new LANTA feeds sort of devolved into chaos, with extra routes showing up that had no real world connection, odd use of abbreviations for bus stop names (abbreviations are sort of frowned upon, for what ought to be obvious reasons), and their cross-referencing system becoming unnecessarily complex. It was hard to figure out what was going on — I thought at first that it was my analysis software mangling the data, but no it was them.
Well, they’ve been working through a huge revamp of their entire bus route network, so maybe that was the source of some of the bogus data. The new routes and schedules went into effect on June 21, and an updated feed followed soon after; I downloaded the new one and crunched the data — and the garbage was all still there! But, I noticed that in among the old chaos was a new and much cleaner set of data, valid starting on the 21st, showing the new bus routes and the correctly-named bus stops. So now I do a double extraction, first massaging the feed into a useful form, then extracting from that the new, valid and cleaned-up route data. Voilá!
I have some vague plan to add these bus routes to OpenStreetMap, but that’s a big undertaking, and I would prefer to rely on eyewitness ground-truthing (ie riding the bus) than a data set — which means even more work. For now I’m content with just having got the damn data.
I’m not sure why I did it, but I installed PHP and Apache on my new computer, then moved a bunch of my “internal website” stuff over from storage. Everything seemed to work pretty well, so I tried the commuter routing program — I got errors, natch.
I looked at the error messages and realized that the pgRouting routines had changed, so my database routing functions were out of date — that led to me discover that even the newer version of PgAdmin3 doesn’t work well with my newer Postgresql version, especially when it comes to functions. So, I installed phpPgAdmin — which was also borked, and in the same way, but I was able to fix the source code. Even working properly it couldn’t do what I needed though, which was to modify my old function. I tried writing a new function through phpPgAdmin, which was extremely laborious, and basically re-wrote the original, broken function, so now I had two useless functions that I couldn’t modify. Ugggh, time for bed.
I woke up this morning and got it done old-school, writing a SQL script to define the function and running that from the command line. Presto, now I have a working function, and a working commuter routing program. Bonus: the new version of pgRouting is much faster (though that could be the new computer), and some routing errors are now fixed. Wish you could see it!
I’ve done a few more Road Scholar gigs this year, and my co-guide and I both feel that the ride choices could be improved, mainly by doing more bike paths and rail-trails, and doing less actual road riding. This would avoid the biggest issues we face (traffic and hills), and maybe allow the rides to be a bit longer and more enjoyable.
Meantime, I’d noticed a tendency, among our van drivers, to use Google Maps to navigate our pick-up, drop-off and other van access points. This is I think a good thing, but it’s led to map searches finding the wrong drop-off point — nearby features rather than the specific location we use. It works well enough that “OK, turn left here and pull into that parking lot” will get us there once we’re close enough, but navigating to an actual position (a given latitude and longitude, for instance) would work much better.
Finally, I thought it would be good to have an official repository somewhere, of the rides: their official routes (I use GPS to navigate on the rides) as well as waypoints, like lunch spots, points of interest along the ride, and those pick-up and drop-off points. Ideally, I would be able to load a ride into my GPS and have all info for the ride at my fingertips.
These all coalesced in my mind into the Great Big Ride Database GIS Project. The project would be made of three parts: storage of rides (official or otherwise) and waypoints into a ride database, transfer of rides/waypoints to and from my GPS, and analysis of the ride data.
First Steps, and Revolting Developments
I started by keeping “official versions” of our rides on RideWithGPS, and I would download them as GPX files onto my Garmin when I needed them. This would only take care of the route itself, however; I thought that there was also a need to maintain a list of waypoints associated with each route, so I decided to build some kind of database to hold routes and their waypoints.
Since I would like to be able to just hand over the ride information in some file format, my first attempt was to build the database as a GeoPackage file. This actually worked pretty well, when my plan was just to stuff the data into storage. But then, my plans started to morph: I needed to actually analyze the data (with a spatial query) to generate info I needed. The GeoPackage file should have been able to handle this, but I think I must have done something wrong back when I installed the underlying GeoPackage/SpatiaLite libraries, or I was doing something wrong now, but I just couldn’t get any spatial functions to work. After frustrating myself for a while I just moved the database over to PostGIS. My project was changing, but at least it worked.
So at this point, I started looking at the problem of getting the point data to places where I could use it — like onto and off of my Garmin. I collected a bunch of the waypoints as “saved locations” on my GPS, but then I couldn’t find any good way to export or upload them. (The Google tells me that Garmin apparently has some Windows programs that can manage waypoints, but that does me no good.)
I eventually dropped back and punted by writing a Python script. I scrounged around inside my Garmin and found a file called Locations.fit that seemed to be where the saved locations were stored, and used that
fitparselibrary to rummage inside the FIT file, eventually figuring out the (undocumented) structure used to store waypoints. I could now export the waypoints into a QGIS layer, then I managed to realize that I could import the waypoints to my GPS via a GPX file in the same way I could import rides via GPX, and could even combine waypoints with the ride trackpoints in the same file for importing. Major breakthrough! — though the Garmin seemingly ignores all waypoint information (symbology, comment) except the name.
So things are now a bit different than how I first planned it, but I have a system that works. Next up: evaluating potential routes.
A friend sent me a video how-to to build a 3d map the other day, and while I thought it was really cool I didn’t want to use the software in the video. I have some pretty good stuff already, I thought, and tried to find a way to do it with either GRASS or QGIS. GRASS was a bit of a bust: I really hate the interface they use for 3d, and couldn’t find much on how to drape one layer over another — it used to be easy!
QGIS wasn’t much better, but then I am a few versions behind. There is a plugin, however, which enabled me to make a 3D map website. So here’s mine:
I used the USGS topographic map from 1894, and “draped” it over the DEM I made for the Lehigh Valley cycle routing project (which DEM unfortunately has height in feet rather than meters, so the hill heights scale a bit big). The view in the picture is of Bethlehem and environs, with South Mountain and Lehigh Mountain on the left, and the Camel Hump, back when it was still Quaker Hill, in the upper right. Click the image and it’ll take you to the map website.
I noticed, when playing with that topo map, that for things like roads it doesn’t align everywhere with current maps. The map was provided with a CRS by USGS, but I suspect it was guesswork: there is no projection or datum information on the map itself. (The corners do line up exactly.) This may be because of surveying inaccuracies, back then or even for modern maps — I’m mostly using OpenStreetMap, after all — or it could be that the roads themselves were moved or straightened over the years, or they guessed wrong with the CSR. I thought it interesting then, that on the 3D map the hills and contour lines line up as well as they do: the surveyors knew where the hills were, at the very least.
I started looking into my new project the other day. The first steps will have to be extracting information from GPX or FIT files, and adding the information to a PostGIS database. I managed to do this in several ways, mostly through a combination of GPSBabel and ogr2ogr, though no single way has done exactly what I want yet: ogr2ogr automatically adds GPX data to the tables in a manner similar to what I want, but extension data (heart rate, temperature) is not treated the way I want, while the FIT data needs to be extracted first into a format readable by ogr2ogr, and then put in the right table form after being put in the database, all of which turned out to be surprisingly easy. (Even so, I may just choose to go with adding the data from GPX for now.)
The biggest problem I’ve run into so far is that GPSBabel does not extract all the data from the FIT file, and FIT is a proprietary, binary file format — I can’t get lap information, for example, just by scanning the file with awk or something. I may have to download and use the (again, proprietary) FIT SDK, in a C or other program I write myself. This may fit in well with what else I have to do, since I can call the parts of ogr2ogr I specifically need, directly from C.
Before it gets to that point though, I have to decide what I especially want to do with this data, which will tell me what I need to extract, what I need to save, and what I can disregard, or discard after processing. Do I want to build a full-blown replacement for Garmin Connect, where I keep all relevant data? Or do I want to just build something, like a web badge, to show a minimum of data about the ride, data like distance, duration and a map of the ride, with maybe a link to the ride’s Garmin activity page? I am leaning towards the minimalist approach (which would entail just saving one record per activity, with fields containing aggregate data), but I think I want at least some of the individual track point data because I may want to graph things like elevation or heart rate.
But maybe I don’t need to keep trackpoint data to build my graphs on the fly. Maybe I can make small graphs as PNG’s or GIF’s for the badge, and store those images in the database — hopefully they would be smaller than the trackpoints themselves. Alternately, I could store the entire FIT file (which is actually pretty small) in the database, and extract whatever I need on the fly. (I would still do a one-time analysis to get and store my aggregate data, since this might be a little too slow for on-the-fly data generation.) These choices will depend on the results of all the little coding/database/GIS experiments I’m doing now, extracting, converting and aggregating sample data.
Ten Years Gone: This is what I wrote on this date in 2008. We voted today, and I remain hopeful, but it is certainly not as happy a day as that one was, and even with good news I don’t think we’ll match that day.
My second GIS routing project is now finished; I just added the final touches to the front end a few minutes ago. It can be improved in several ways — the routing engine could be quite a bit faster, for one thing — and the data it runs on, from OpenStreetMap and other sources, should be updated periodically, but This Project version 1.0 is basically done. (I suppose I should add a write-up here before I put the thing to rest, but you know what I mean: the program/website itself is complete and fully functional.)
That means I need a new map project. The routing experiment was meant to have three projects, or rather one project done three ways: one each using QGIS, pgRouting, and GRASS, before I decided to branch out into separate projects. I’ve now got the first two completed, but I have no idea what to do for the GRASS project — I guess it will just have to wait until inspiration strikes. In the meantime, I may go back to the first project, or at least glean some of the results from it, to help build a web page for the Lehigh Towpath, something I can add to my old bike page. This may also morph into some trail promotion project in real life.
Yesterday was pretty nice, if cool, and Trick or Treat was really fun. Today is chilly, rainy, and windy, and I spent the day inside with no regrets. We’re going to see a concert, featuring Anne’s violin teacher, tonight in Palmerton.
Well, we’re back from the half marathon in Hershey, and now back also from our nap…
The race started at 7:30 AM, so we had to be there at say 6:30, so had to leave the house at 5:00, meaning we all had to get up by 4:30. We were all in bed by 9:00 last night, but it was still a hard morning. We got there about 6:45 — crazy parking traffic — and that was almost like “just in the nick of time” considering the bathroom lines, but Bruce & Heather lined up with no problems and the race went off without a hitch, then just as the race started we met up with Lorraine.
We walked around to several different vantages together, managed to see all our runners (Heather & Bruce, and Adelle & Liz who did it as a relay), and I even got a few photos. The whole thing was over by about 10:00. After navigating back through the parking traffic mess, we all met up for brunch at a place in Hersey. Good to get some food and to catch up with everyone, but it had been a cold, windy day and the place was chilly inside; we were glad to get back in the car and crank the heat. We were home by 2:00.
New Tools Bring New Opportunities
One area on my routing map has been a bit problematic: Rt 329 out of Northampton goes past a reservoir, or old quarry or something, and the DEM elevation data dips pretty hard right next to the road, as well as under it at a bridge. Since I find total ascent and descent for each road using interpolated DEM data at points along them, the roads that go over, or even just near, big elevation changes can have large ascent/descent values even of they are relatively flat.
The bridges have had an easy enough fix for a while: I simply make the ascent and descent (and adjusted ascent/descent) zero for each bridge, and I do the same for very short roads connecting to the bridge, like abutments. In other words, I fudge the data… (I figure the bridges are all fairly flat anyway except some longer, river-crossing ones, and since those are pretty far apart their actual ascent/descent values won’t affect the routing calculations much.)
Fixing these roads near the quarry was a bit harder. I didn’t want to set ascent/descent values down to zero for the whole long and moderately hilly road, but now that I can update the ascent/descent data much more quickly — this was that “the task went from several hours to under a minute” process improvement from the other day — I was able to do my fudging on the elevation-at-road-points data: I made the elevations in the “dipped” spots the same as the points just outside, then re-updated my database with the new script. It worked great, the roads now route more realistically in that area, and it took about 5 minutes to do.
So I’ve been messing with that Lehigh Valley bike commuter routing program again, and I have made some important strides:
- I found a way to update the recommend routes easy/advanced, etc, by maintaining separate tables of these routes as linestrings (which I can add and subtract, draw and redraw), then updating the relevant field in the main table using a spatial join. The update process is now automated simply by running SQL files, one for each type of route.
- I sat down with the workflow for updating the main map table, and managed to automate much of it — everything but the SAGA tasks, though I think I can automate them too, eventually. I also managed to streamline one of the more time-consuming tasks: Generating the ascent/descent tables used to take upwards of 12 hours when I first did it (using Python within QGIS), and my next iteration (using PostGIS) took about 20 minutes, but my latest method got it down to about 57 seconds. Fifty-seven seconds! All of these are now also stored as SQL files or functions, so they are available almost at the push of a button. (My goal is a shell script putting all of this together.)
These were both pretty big deals, since they were the only things keeping the project from being truly functional. Before this, keeping the database up-to-date was like pulling teeth. Unfortunately, I decided to add some front-end functionality, testing to see if selected points are within the Lehigh Valley, and that’s been a bit of a struggle, but if I can find a host for the project I think I can go live really soon.
I have a bunch more photos to put up about the final leg of our vacation (Ben’s graduation), but before I get to that I have a few other items, and a few other vacation photos, I want to post that really don’t go anywhere else.
Just a few photos of things around the cabin. Our place apparently was a camp once, having multiple primitive cabins, etc, and had been refurbished — and had the main house added — after years of downward fashionableness and possible abandonment; three cabins were still standing, one converted into a sort of detached den or game room, and the other two converted into separate sleeping quarters. Behind the cabins, as things were now arranged, was a small pond with a dam at one end. I’m not sure how important the pond had been in the past — it had the look of a kiddie fishing area — but now it was brown and scummy, and working its way back to being a meadow. (The lake was a lot better, but the muck at the bottom made for unpleasant swimming. Only Alex and I tried, and we only tried once.) There were other camp amenities, including a fire pit which we made use of on the chilly nights.
Shapes and Clusters
The clustering experiments were a success, but what I really want is to show the regions or neighborhoods where my cycling amenities are clustered. I’ve been trying several different ways to build a shape around a group of points:
- Convex Hull: this one is pretty nice, it’s the shape you’d get if a rubber band were stretched around the points. It’s also built into both QGIS and PostGIS. Unfortunately, if the point cluster has concavities the convex hull won’t show them — an L-shaped cluster would get a triangular region.
- Concave Hull: this one is also available in both QGIS and PostGIS, but I don’t trust it — I can’t find too much about how it really works, its very name doesn’t make all that much sense, and it requires parameters that are not as well documented as I’d like.
- Alpha Shape: the most promising of the bunch, defined pretty rigorously in “the literature,” and I like the l looks of the shapes it makes. Unfortunately, it doesn’t exist in either QGIS or PostGIS; it is available as a package in R, so I’ve spent some time this week getting R to run correctly after much neglect, then installing the “alphahull” package and trying it out. I managed to import my data and create alpha shapes; now I have to find how to convert and export the shapes back into my database.
There is one other method I just thought of, and pretty simple compared to these approaches: I could just make a heat map from the clustered amenities, then use a “contour line” function on the heat map raster. If the others don’t give satisfaction I may try this.
Today was a brief respite from days of heavy, almost continuous rain — more is coming, starting tomorrow. I took the opportunity to attack the jungle that once was our back yard, managed to use up all the weed-whacker twine, and ran over a yellow jacket’s nest (no stings, but a fairly hasty retreat into the house for a while), and the yard looks much better if not quite 100% yet.
We’ve also had a Warm Showers guest: a young Brit named Arron who landed in New York and is cycling across the US. He’s early in his ride, not quite acclimated to cycling, and he’s getting a real baptism by fire, or at least by rain and hills and poor road choice, but he was a trooper. He stayed for two nights before heading for Coopersburg.