matplotlib

Rise Above the Noise

I’ve done some analysis of my gps running data before, but mostly just some mapping. I’ve always wanted to bring in some more sophisticated analysis such as identifying runs with similar geographic features (e.g. track workouts) or identifying, categorizing, and comparing hills. To really get into either of these things, I first needed good elevation data which isn’t provided by my forerunner 220. In this post I’ll show some of the problems with the elevation data coming from my garmin 220, how to get elevation data from the RaceMap API (and compare a few other elevation api’s), and then examine how good the new elevation data is.

Mapping with an 800 Pound Gorilla

I’ve been focusing on python recently to become a bi-lingual data scientist. Probably my least favorite thing about python is its plotting libraries - there are too many options built on top of matplotlib which pre-dates pandas dataframes. This makes for some clunky code and blurry boundaries (both “is that a seaborn, pandas, or matplotlib function?” and situations with 3 equally messy solutions but in very different ways). In my opinion, ggplot2’s deep interplay with dataframes makes a lot more sense and ggplot’s layers make it easy to change plot type (just switch the geom_), add facets, and tweak aesthetics.

Expectation-Maximization

As part of some clustering work and learning about hidden Markov models, I’ve been doing some reading about the EM algorithm and it’s applications. It’s a pretty neat algorithm (I love iterative algorithms like Newton’s method and the Euclidean algorithm) so I thought I’d illustrate how it works. I’ve also been doing a bit more python recently, so I thought I would do all this in python rather than R.