• Category Archives tech talk
  • Computers and programs, maps and GPS, anything to do with data big or small, as well as my take on the pieces of equipment I use in other hobbies — think bike components, camping gear etc.

  • A Sojourn* Into Clustering

    Posted on by Don

    I was looking at the towpath amenities project in the week before we went on vacation, mainly to play with database reporting software, and I noticed that my amenities all were pretty closely grouped together. This stands to reason, since the data is a ready-made cluster — it’s composed of amenities within a kilometer of Sand Island, so the clustering may just be an artifact of that search criterium — but also because the data set encompasses the compact  Main Street restaurant district. Continuing on with my reporting experiments, I looked at all amenities within a mile of Sand Island, and now found myself looking at two distinct groups of amenities, the one around Main Street, and another on the south side of the Lehigh. This also stands to reason — Chamber-of-Commerce types like to joke that we’re the city with two downtowns — but again I wondered if it was some artifact of the analysis, or even if I was seeing patterns that didn’t really exist, and that got me thinking of what I actually thought I meant by “cluster.”

    Turns out, it’s a fairly big subject, with different ways of describing what “cluster” might mean — usually (and intuitively), it’s a subset of similar items within a larger data set, but then what does “similar” mean, and how similar do the members of a cluster have to be, especially compared to the rest of the set? For each way of understanding what a cluster is, there are various ways of finding the clusters within a data set. This whole subject is apparently a big deal, a subject of ongoing research, and an important tool in the fields of machine learning and big data.

    My problem was spatial, so for me “similar” meant “close together in terms of location.” Some Googling found that there were plenty of GIS solutions to clustering problems, and in fact PostGIS contains several functions implementing the more common and important clustering algorithms, including DBSCAN, the algorithm that comes closest to what I think “clustering” should mean for my situation.

    And here is where things became complicated…

    The clustering functions are not available in the version of PostGIS that I had installed. So I decided to upgrade PostGIS, did a bit of research and found many articles with titles like “How to Brick Your Database By Updating PostGIS.” The process itself is not difficult, it uses old-school “make” rather than a package manager, and the pitfalls are easily avoided, but now I was scared and I thought I’d better back up my whole database system before continuing. What this meant though, was that first I had to make room on my hard drive, which has one (small, overcrowded) main partition and a (large, empty) secondary area. First thing would be to back up the secondary partition to the NAS drive — something I’ve been remiss on ever since I installed Mint — then I’d move both my music (35 GB) and my photos (12 GB) over to the secondary drive, and then update the music and photo software so it knew where all the files went — it was starting to sound like that song about the hole in the bucket…

    I got through the first part, backing up the drive (which took hours), before we went on vacation. There was no Internet at our cabin, and I didn’t bring my computer anyway, so the rest had to await my return. The remainder of the hard drive cleanup (music and photos) also took some time but went smoothly enough, and I did a full backup of my databases.

    From here the process was a bit anticlimactic: I downloaded the new version, ran make and typed a few things into the database, and I was done without bricking a damn thing. I needed to lay on my fainting couch and rest for a day after that, but when I finally got around to using the new functions they were a breeze.

    I found some clusters and drew polygons around them — the subjects of another post —  but I have more to do to figure out what these things are actually telling me.

    *Hat tip to Achewood, still my favorite Internet thing ever.


  • Cancel The Exorcist

    Posted on by Don

    I had a pedal-induced creak building in the 5010 over the past week or so. The last time I had anything like this it was one of the pivot bushings, so a few days ago I tightened them — no fix, and the creak was worse than ever yesterday. I did the pivots again today, and the creak remained.

    Next step was to look at the crank and bottom bracket. I’d never taken my crank off this bike, didn’t recognize the system (for the record: it’s a Race Face Aeffect crank with Cinch chainring tech), and tried for about an hour to remove it. This included at least 20 minutes looking through instructional videos, but this information seems to be some kind of secret…

    I finally found one that showed how, and here’s the secret: there is a dust cap on the drive side, removable by an 8mm Allen wrench, but you don’t remove it. Instead, use a 7mm Allen wrench inside the dust cap to unscrew an internal connection to the ISIS drive; this pushes against the dust cap and acts as a self-extractor. It came off smooth as silk — live and learn. (Note: take the dust cap off to re-install the crank, the internal screw is kind of finicky to get going.)

    I cleaned and lubed the crank parts, then looked at the bottom bracket and found my problem: the drive side had come loose, and that, coupled with the grit that subsequently got into the threads, was the likely cause of my creak. I pulled the BB, cleaned and greased the threads, put it all back together, and took it for a test ride. Perfect! No squeaks and no creaks, and that’s good because I don’t know what I would have needed to do next.


  • I Can’t Stop

    Posted on by Don

    Here’s one I made, using Carto:

    This looks a bit closer to usable, though I don’t see much real style control when making the map. Maybe a little studying over at Carto…

    Meanwhile, I tried one of my demo web pages that uses Leaflet, embedded in a test post, and it (the map) worked great. Unfortunately, the map itself was only one part of another website, and stuffing the entire thing into an iframe caused some display/clutter issues, so I decided to write another demo to see what I can do. This can sit and stew for a while, I think I got whatever out of my system — it’s time for a ride.


  • New Maps Marker Plugin Test

    Posted on by Don

    Here’s another map plugin, called “Maps Marker.” This one is also based on Leaflet, but seems to have many more features, at least at first. Here’s a simple map:

    [mapsmarker layer=”2″]

    This may take some work to bring it up to speed: you have to load your own marker icons etc, and many features (e.g., GPX tracks) are unavailable, unless you get the Pro version — at €249 for the license, no thanks! Meantime, the actual markers didn’t even show up on the map (though they did show up on the post preview), let alone the pop-ups — again, this may be a problem with my theme, though explicitly creating a map for one specific marker does show up (sans pop-up or tooltip):

    [mapsmarker marker=”3″]

    It looked for a second like we were getting close, but still no cigar.

    UPDATE: I added some experimental CSS to this post, which at least made the pop-up content show itself more reasonably.


  • New Leaflet Plugin Test

    Posted on by Don

    I’ve decided to make my own maps here for posting about rides, and maybe for some other mapping tasks I might want to do, so I’m experimenting with map plugins. The first one is something called “Leaflet Map,” let’s see how this looks:

    Hmmm, seems OK, but there aren’t all that many options straight out of the box — I especially would like to be able to center the map, for one thing, and the map design (on the back end) seems pretty text based. Default tile server is standard OpenStreetMap, but there is an option for Mapbox (with a place for your API key), as well as other map tile providers — though I don’t see how to set the API key for anything other than Mapbox. This could be a problem for, say, ThunderForest tiles, which are my favorite map tiles and also require a key.

    Meanwhile, try clicking on the marker: a pop-up appears, but the pop-up is apparently only one letter in width, and its background is transparent, and I don’t see any easy way to change these. (These may be due to issues with my WP theme, but I like my theme and I don’t want to change it.) Like I said: Hmmm…

    If I keep using this map plugin, I may have to build a custom container for it, div tags and custom css, with a place for titles, etc. I have other plugins to try before it comes to that.

    UPDATE: I added some custom CSS for this one post, and it seemed to at least allow the pop-up content to show up. So there’s that…


  • Just An Experiment

    Posted on by Don

    Here is a test of an embedded Strava activity:

    There’s noting special about this particular run, I just picked it as an example to see what it looked like in my browser.

    Here’s another example of an embedded map, this time from Google Maps:

    Again, there is nothing special about this map — in fact, I’d be wary of using it, as it’s probably years out of date — I just picked it out from a bunch I made once. The point is to notice that the embedded map showed up.

    One more embedded map, this time from Ride With GPS:

    They all just seem to work, right? Contrast these with this one from Garmin:

    If you have anything other than Firefox, you may see the embedded activity (inside the box I added for clarity), but if you’re using (a more modern version of) Firefox you should just see a gray line and a blank space in the box — Firefox is blocking what it now considers an insecure script coming from Garmin. I talked to Garmin tech support, and they say it’s a Firefox problem — that is, their insecure script is really a Firefox problem — and they won’t be fixing it.

    This screws up about a half dozen pages here, and a few more on my old blog, and maybe even some other websites where I’ve embedded Garmin rides over the years. I think I may be going back and re-doing my ride pages in RideWithGps. Ugh, work… Oh well, lesson (re)learned: avoid counting on Garmin, especially their website.

    UPDATE (12/3/2018): Well, whaddaya know — now it works!


  • Gautama’s Gizmo Time

    Posted on by Don

    Today I’m writing at Lit, the relatively new coffee shop in Southside, while hanging with Anne. I could say I’m now living the dream: laptop and wi-fi in a caffienated third space, except that the dream actually is a bit noisier  than I would have liked…

    Sometimes I think I’m drawn to tech more because I like the idea of tech — the toys and gizmos, smartphones and memory sticks — like the feeling of being in Staples or another stationary store. All that paper! The desks! The organizers! Actually getting the new stationary, or learning the tech, leaves a hollow, let-down feeling, like after all your Christmas presents have been opened, or like a sugar-buzz come-down.

    My latest tech acquisition — my latest trip around the tech wheel of samsara — is a WordPress security plug-in. I’d noticed a lot of traffic on the site, which is cool, but the traffic was basically to my login page — not so cool. The plug-in I installed blocks IP’s that try to login with the wrong user name or password, plus a few other things, and I spent a good part of last night watching it catch and block malicious users. I know I’ll eventually get bored and achieve that come-down, but for now it’s hypnotic…


  • Springtime! Happy Easter!

    Posted on by Don

    I’m sitting at the Bethlehem Library right now — I was planning to maybe ride today but last night’s wet spring snow put the kibosh on that. I felt a bit cooped up and wanted to get out of the house for a bit, so I thought I’d check the new coffee shop (the Church Street Market) across the way but it’s closed on Mondays. I’ll be meeting Anne and maybe Deb at Wise Bean in a bit, but I had a hankering to do the laptop-and-café thing… Oh well, the library works.

    We had a fairly hectic weekend: Friday night was the Adult Easter Egg Hunt at Anne’s niece’s house, Saturday was a Seder at Toby & Erika’s, and yesterday we did an Easter brunch. I got in a ride on Saturday morning, and I’m glad I did since conditions were really good, and it looks like they won’t be good again for a little while. Spring’s coming, it’s just taking its time.

    Fun with Computer Maintenance: I got fed up with Adblock and installed uBlock Origin instead. Better, faster, less intrusive.

    PostGIS Fun: I merged all the bus routes into one GeoJSON file, then loaded them into the database. I then took that table and broke it into two others: one containing the bus stops, with their names, reference numbers,  OSM attributes, and geometries, and a bridging table (sans geometry) containing fields for the bus stop reference, route and stop order for all routes. Some new ideas: “select distinct on” and “window functions.” Works like a charm!

    Off to the Wise Bean…


  • Data Cleanup Automation Fun

    I’m still playing with LANTA’s bus routes data, girding myself for — eventually — adding the bus routes into OpenStreetMap, but I recently decided to rebuild my data, basically starting over from scratch with the PDF’s I got from the LANTA website. My current workflow is:

    PDF  —> CSV —> cleaned-up CSV —> GeoJSON —> cleaned-up GeoJSON —> (eventually) a PostGIS database

    The conversion from PDF to CSV was automatic, using a Java program I found online, and the CSV cleanup — fixing things like transposed latitude/longitude, missing minus signs etc — was done manually (using a text editor and LibreOffice Calc), which was relatively uneventful but  laborious — there are 68 individual bus routes, each with its own file.

    Things got even more laborious with the next two tasks. My first go-around with the conversion to GeoJSON was done manually within QGIS: load each of the CSV files, individually filling out the required parameters for each one. I wasn’t looking forward to converting the files individually, so I wrote a Python script to save all my route layers in GeoJSON format. (Just as an aside, I have to say I really like GeoJSON as a vector file format: I find it much easier to work with than the dated, unwieldy Shapefile standard; it’s also easier to open and work with in the JOSM editor. All my new data, if it’s not going into the database, is getting stored as GeoJSON files.)

    The “GeoJSON cleanup” is where I massage the data into the forms I want: some of the table columns are unnecessary, some need to be renamed, there are a few extra columns to add (and populate), and finally I wanted to convert the LANTA bus stop names format (in  Robbie-the-Robot-style ALL CAPS) to something a little easier o the eyes. Doing this manually would have been beyond laborious, so I wrote another Python script to massage the route files. This turned out to be more of a learning experience — as in, multiple versions of the program failed spectacularly until I got it right — and probably took longer than the brute force, manual changes approach would have, but at least it wasn’t laborious…

    I’m still not really sure why this version worked when others did not, but here is my code:

    # script to run through all visible layers,
    # adding/deleting/renaming fields as required
    # to convert from LANTA bus data to something
    # more like OpenStreetMap bus stop attributes
    # it also properly capitalizes feature names

    from PyQt4.QtCore import QVariant
    import re

    # some functions and regular expressions for string manipulation
    def match_lower( matchobj ):
    return matchobj.group().lower()
    def match_upper ( matchobj ):
    return matchobj.group().upper()
    reg = re.compile( 'Lati|Longi|Time|At Street|On Street|Direct|Placem' )
    reg_ns = re.compile( r'\(ns|fs|mid|off\)|\bncc\b|\bhs\b', re.IGNORECASE )
    reg_nth = re.compile( r'[1-9]?[0-9][a-z]{,2}', re.IGNORECASE )
    reg_leftParen = re.compile( r'([^\s])(\(\w*\))' )
    reg_rightParen = re.compile( r'(\))([^\s])' )
    reg_space = re.compile( r'\s+' )

    for layer in iface.mapCanvas().layers():

    # reset these variables for each new layer processed

    myLayerName = layer.name()
    highwayExists = False
    networkExists = False
    operatorExists = False
    publicExists = False
    busExists = False
    routeExists = False
    delList=[]

    layer.startEditing()
    pr = layer.dataProvider()

    # change the names of some fields
    fields = pr.fields()
    count = 0
    for field in fields:
    fieldName = field.name()
    print "test for changing field name " + fieldName + " count ", count
    if ( fieldName == 'Public Information Name' ):
    print 'changing to name'
    layer.renameAttribute( count, 'name' )
    if ( fieldName == 'Stop Number' ):
    print 'changing to ref'
    layer.renameAttribute( count, 'ref' )
    if ( fieldName == 'Stop Order' ):
    print 'changing to stop_order'
    layer.renameAttribute( count, 'stop_order' )
    count += 1
    layer.updateFields()

    layer.commitChanges()
    layer.reload()
    layer.startEditing()
    pr = layer.dataProvider()

    # delete some fields
    fields = pr.fields()
    count = 0
    for field in fields:
    fieldName = field.name()
    print "test for deleting fields " + fieldName + " count ", count
    m = reg.match( fieldName )
    if m:
    print fieldName
    delList.append( count )
    print delList
    count += 1
    pr.deleteAttributes(delList)
    layer.updateFields()

    layer.commitChanges()
    layer.reload()
    layer.startEditing()
    pr = layer.dataProvider()

    # add some fields, checking if they don't already exist
    count = 0
    fields = pr.fields()
    for field in fields:
    fieldName = field.name()
    print "test for adding fields " + fieldName + " count ", count
    if( fieldName == 'highway' ):
    highwayExists = True
    print 'highway', count
    if( fieldName == 'network' ):
    networkExists = True
    print 'network', count
    if(fieldName == 'public_transport' ):
    publicExists = True
    if(fieldName == 'bus' ):
    busExists = True
    if ( fieldName == 'operator' ):
    operatorExists = True
    if( field.name() == 'route' ):
    routeExists = True
    count += 1
    if( not highwayExists ):
    print "adding highway"
    pr.addAttributes( [ QgsField("highway", QVariant.String) ] )
    layer.updateFields()
    if( not networkExists ):
    print "adding network"
    pr.addAttributes( [ QgsField("network", QVariant.String) ] )
    if( not operatorExists ):
    print "adding operator"
    pr.addAttributes( [ QgsField("operator", QVariant.String) ] )
    if( not publicExists ):
    print "adding public_transportation"
    pr.addAttributes( [ QgsField("public_transport", QVariant.String) ] )
    if( not busExists ):
    print "adding bus"
    pr.addAttributes( [ QgsField("bus", QVariant.String) ] )
    if not routeExists:
    print "adding route"
    pr.addAttributes( [ QgsField('route', QVariant.String) ] )
    layer.updateFields()

    # add attributes to new fields for all features
    for feature in layer.getFeatures():
    feature['highway'] = "bus_stop"
    feature['network'] = 'LANTA'
    feature['operator'] = 'Lehigh and Northampton Transportation Authority'
    feature['public_transport'] = 'platform'
    feature['bus'] = 'yes'
    feature['route'] = myLayerName
    layer.updateFeature(feature)
    layer.updateFields()

    # clean up text in name field
    fieldName = 'name'
    for feature in layer.getFeatures():
    myString = feature[fieldName]
    myString = myString.title()
    myString = reg_ns.sub( match_upper, myString )
    myString = reg_nth.sub( match_lower, myString )
    myString = reg_space.sub( ' ', myString )
    myString = reg_leftParen.sub( r'\1 \2 ', myString )
    mystring = reg_rightParen.sub( r'\1 \2', myString )
    print myString
    feature[fieldName] = myString
    layer.updateFeature(feature)

    layer.commitChanges()
    layer.reload()

    Once I got this up and running, I realized that I wanted t do some more preliminary cleanup on my spreadsheets, so I was back to square one. I couldn’t really find out how to do a bulk load of my CSV files into QGIS, and I realized that QGIS was just using ogr2ogr under the hood, so I decided to do the bulk converting, CSV to GeoJSON, with a shell script that calls ogr2ogr. Yet another learning curve later, and it works great. More code:

    for i in *.csv
    do
    echo $i
    tail -n+2 $i | ogr2ogr -nln ${i%.csv} -f "GeoJSON" ${i%csv}geojson \
    CSV:/vsistdin/ -oo X_POSSIBLE_NAMES=Lon* -oo Y_POSSIBLE_NAMES=Lat* -oo KEEP_GEOM_COLUMNS=NO
    ogrinfo ${i%csv}geojson
    done

    It struck me then, that all my data was really text, and so working with it in a more unix-ey fashion, with shell scripting and text manipulation programs (sed, awk) to do the conversions directly from the CSV files, was probably my better strategy. Oh well, I did the first part of it with bash at least, and the Python script works well enough.

    UPDATE:

    I did it anyway, using awk to add/subtract/rename/Capitalize the CSV data before running it through ogr2ogr. The code (below) is about 35 lines (as opposed to 150 for the Python script, which only does part of the job anyway) and runs really fast:

    #! /bin/bash
    for i in *.csv
    do
    echo Converting $i to GeoJSON
    cat $i | awk -F "," -v rtname=${i%.csv} 'BEGIN {
    }
    $2 != "" && /Stop/ {
    print "ref,Latitude,Longitude,name,stop_order,highway,public_transport,bus,network,operator,route"
    }
    $2 != "" && /^[0-9]{4,4}/ {
    string = tolower($8)
    n=split(string,a," ")
    string=toupper(substr(a[1],1,1)) substr(a[1],2)
    for(i=2;i<=n;i++) {
    string = string " " toupper(substr(a[i],1,1)) substr(a[i],2)
    }
    t_str = "\(ns\)|\(fs\)|\(mid\)|\b[nN]cc\b|\b[Hh]s\b"
    if ( match( string, t_str, match_array ) ) {
    the_match = toupper(match_array[0])
    gsub(t_str, the_match, string)
    }
    gsub(/ *\(/, " (", string)
    $8 = string
    print $1 "," $2 "," $3 "," $8 "," $9 ",bus_stop,platform,yes,LANTA,Lehigh and Northampton Transportation Authority," rtname
    }' > test1.csv
    cat test1.csv
    ogr2ogr -nln ${i%.csv} -f "GeoJSON" ${i%csv}geojson test1.csv -oo KEEP_GEOM_COLUMNS=NO
    ogrinfo ${i%csv}geojson
    rm test1.csv
    done