The Wheat From The Chaff

I’m not sure if this is going to rise to the level of “new GIS project,” but I have been playing around a lot lately with the local transportation authority’s GTFS feed — where GTFS stands for “General Transit Feed Specification,” a standard for publishing public transit information on the Internet.

These feeds are like a cross between spreadsheets and database tables, and by a judicious massaging of the data you can extract bus stop and route information. Unfortunately, that massaging is a real necessity: the specification is built to convey a lot of information, and to cover a lot of different transit situations, so there’s no simple route-and-stop information — it’s buried in cross-references and spread across multiple tables. All this extraction and data crunching is fairly straightforward though, and there are even tools to automate the process (I use a QGIS plugin).

Or the process would be straightforward, if we were not dealing with LANTA. These feeds are updated periodically, and about a year ago the new LANTA feeds sort of devolved into chaos, with extra routes showing up that had no real world connection, odd use of abbreviations for bus stop names (abbreviations are sort of frowned upon, for what ought to be obvious reasons), and their cross-referencing system becoming unnecessarily complex. It was hard to figure out what was going on — I thought at first that it was my analysis software mangling the data, but no it was them.

Well, they’ve been working through a huge revamp of their entire bus route network, so maybe that was the source of some of the bogus data. The new routes and schedules went into effect on June 21, and an updated feed followed soon after; I downloaded the new one and crunched the data — and the garbage was all still there! But, I noticed that in among the old chaos was a new and much cleaner set of data, valid starting on the 21st, showing the new bus routes and the correctly-named bus stops. So now I do a double extraction, first massaging the feed into a useful form, then extracting from that the new, valid and cleaned-up route data. Voilá!

I have some vague plan to add these bus routes to OpenStreetMap, but that’s a big undertaking, and I would prefer to rely on eyewitness ground-truthing (ie riding the bus) than a data set — which means even more work. For now I’m content with just having got the damn data.


Comments are closed.