Slifemap: 2,717 Runs, 1 Map

I’m excited to share that I’ve shipped something brand new in the “I spend too much time analyzing my running data” space!

Slifemap is a portmanteau of my last name (Slifer) and the common name for the visualization (lifemap) of every run that I’ve tracked with a device (from May 2011 → today), aggregated and rendered onto a single Google Map.

Usage of The Strava API requires me to say that Slifemap is powered by Strava. But this is currently an MVP-stage, closed project that’s not available for members of the Strava community to consume with their own data through OAuth.

To my knowledge, Slifemap is the only activity visualizer that uses Google Maps, which is a premium product with premium pricing. This was part of the impetus to build it, and is the reason why it’s only available by invite or by request.

Screenshots of my personal Slifemap are below. It can be viewed at slifemap.kevinslifer.com.

MAP MY RUN

It’s no secret that I like to run, and as the technology around the sport has evolved, it’s also no secret that I’ve become obsessed with capturing and analyzing my own running data.

This journey also started with Google Maps. Activity tracking originally consisted of plotting a route using Google Maps, then heading out to run it and hoping for the best. When I acquired an iPhone in 2011, I used MapMyRun for about a week, then discovered Endomondo and never looked back until it was shut down in 2020.

Strava eventually swept the market, and my running friends became split between the two platforms. I started to use both apps on my phone, mostly tracking runs through Endomondo and then syncing data to Strava with tapiriik. Competing services generally don’t want to be interoperable, and tapiriik acts as a third-party glue. Once Endomondo shut down, I consolidated to Strava – but eventually capitulated to the prevailing approach of using a Garmin watch and the Connect platform, which can natively push activities over to Strava.

The accidental benefit of this backstory is that I’ve preserved my entire 13 year digital running history in Strava.

LIFEMAPPING

During the COVID-19 lockdowns, I learned about the lifemap concept through CityStrides. “Run every street in your town” is interesting (I have 100% coverage of Hellertown, PA and Rehoboth Beach, DE), but the Lifemap feature was a bigger draw to me. Seeing runs from places that I’ve lived in and traveled to over the years was fascinating. Strava eventually followed suit with the personal heatmap.

Neither option gave me exactly what I was looking for. Having the personal heatmap embedded directly into Strava is convenient, but it’s a paid feature that I can’t put on display for my friends. And since it’s implemented as a heatmap, it de-emphasizes the visibility of roads that I’ve only run once. I often go out of my way to lifemap a specific road or trail, and I want the representation of that to be binary. CityStrides addresses both of these issues, but in the past year or so the site’s map has crashlooped the browser on my phone when I move around, making it unusable.

It doesn’t take much time for me to get into the mindset of “if I’m not happy with what’s available, I should build it myself” – and here we are.

SOMETHING OLD

Fun fact: this idea is three years old. Back in the early days of “Kilometers vs The Year” (and even before that), tracking running statistics required “updating the sheet” with your latest run data – on almost a daily basis.

I wanted to live in a world where a machine did this for me, and in the depths of the COVID lockdowns I convinced myself that I could learn Python and write a function to automate this in the cloud.

The vision outlined in the diagram (from July 2020) was more grand. I wanted to eventually do stuff with the data. But I settled for a Python function that loaded data from The Strava API into Google Cloud, where it was then queried by a Google Sheet, fully automating the data entry. It was a huge win at the time, and it’s still in use today.

SOMETHING NEW

The platform that makes Slifemap possible (in addition to continuing the automated data entry) is a modern take on the same concept. The current iteration is below:

This is a data processing pipeline. It ingests activity data from The Strava API, performs ETL on the underlying route details, then generates GeoJSON for the Maps JavaScript API.

There are three microservice jobs that perform the data processing, and a fourth microservice for the presentation layer. All four are written in Node.

I have several years of experience operating services on Cloud Run with the WordPress on GCP Free Tier project, so the entire compute platform is built on this model, where it conveniently scales to zero when not in use. The storage and data layers consist of Cloud Storage buckets and a BigQuery dataset.

The only prerequisite is that the desired running data exists in Strava. It doesn’t matter how the runs were tracked – they just need to be synced to Strava at some point.

Strava Activity Loader

The first step is to ingest the running data. The Python function’s approach of polling the List Athlete Activities API was replicated in Node to bootstrap this platform.

My original plan was to use the Webhook Events API and have Strava push activities instead of needing to poll. Getting something functional involved a steep learning curve for Node, so I settled for parity with a known workflow. The final state will be event-driven, and it will involve converting the ingestion from a scheduled poll to a webhook listener.

Polyline GeoJSON Encoder

Once the Strava data has been ingested, there’s work to be done. For mapping, the most important data element is the route. Routes are captured as a time series of latitude/longitude coordinates, but are commonly stored as encoded Polyline strings for brevity.

The Polyline format isn’t directly useful. It needs to be converted to GeoJSON, the current standard for encoding geographic information. The conversion is performed by the Mapbox Polyline JS package.

So behind the scenes, there’s a per-activity ETL of Polyline → GeoJSON, which yields a collection of GeoJSON objects.

The ETL involves an intermediate call to the Strava API to retrieve the Polyline for each activity. The Strava API has quota limits, so the ETL implements a per-run throttle in order to avoid issues, especially during backload scenarios.

Lifemap Generator

The Google Maps Javascript API accepts GeoJSON as an overlay data layer. But that data layer needs to first be materialized from the collection of GeoJSON objects. The generator performs this function.

Slifemap UI

Arguably the easiest part is the frontend. This is an HTML wrapper to the Google Maps Javascript API that says “Hey Google, give me a map of this location with this data on top of it.”

Express and EJS are thrown in to simplify the serving and support basic rendering customizations, but the frontend is just a single page where Google is doing most of the heavy lifting.

WHAT’S NEXT?

Hopefully it doesn’t take three more years to make the next generation of improvements, because there’s already a backlog of ideas that I want to experiment with.

Event-Driven Architecture

As previously mentioned, I want the flow of data to be near real-time. Fetching almost 3,000 activities from Strava on every poll is computationally expensive, and the more runs I track, the more expensive it gets. Because of this, the data is currently refreshed every 12 hours (between 12-1 AM and PM).

This could be optimized by incorporating the “after” parameter of the List Athlete Activities API call. It would make the load step more efficient; the request would be for new activities instead of all activities. Polling could then be done on a more frequent basis without making any other significant changes.

The better option is to switch from polling to using the Webhook Events API. This is a paradigm shift to an event-driven archiecture. It’s not trivial to implement, as the data emitted encompasses the CRD operations of the CRUD model (e.g. creation of a new activity, making a change to an existing activity, or deleting an activity). Polling is simply the R operation (reading the data).

For either option, the platform still needs to have the ability to perform an initial backload of historical data. Since the poll is currently for all activities, historical data is inherently addressed.

GeoJSON Segmentation

It would be interesting to segment the GeoJSON by dimension (for example, calendar year or locality) in order to render maps that visualize only those segments. Maybe I want to see a map of only my 2023 runs. Segmentation would enable this.

Enhanced Mapping

It would be even more interesting to enrich the map data and make the map interactive. For example, I’d like to click on a line and see the date of that run, as well as a link to the original Strava activity to see the details.

Additional Statistics

There’s quite a bit of other data available, such as the elevation gain, pace, time of day, etc. Map visualization is just one use case. While I’m mucking around in the data, there are questions that I could answer and incorporate into additional pages.

Building the Slifemap was a rewarding experience. It’s something that I literally use on a daily basis. I was bracing for it to take the entire winter to figure out, but it came together in about a month – not bad for needing to learn a few new things along the way.