I have helped Strava developing their current Relative Effort, a metric used to quantify training effort, combining intensity and duration. You can read Strava's official launch blog post here as well as an interview I gave here (the website hosting the interview is actually not available anymore, hence I am linking below only the official Strava blog mentioning this work).
PUBLICATION: Estimating running performance combining non-invasive physiological measurements and training patterns in free-living
See the original post on HRV4Training's Blog at this link.
In preparation for the Quantified Self Europe conference I went over the past 15 months of physiological (HR/HRV) and contextual (work hours, travel, sick days, training, fitness, etc.) data I collected, and found very interesting relations between my HRV and life stress. I blogged about it on medium, you can find the article here:
[data and R code for this post are available on github]
In this post I will cover three ways to estimate parameters for regression models; least squares, gradient descent and Monte Carlo methods. The aim is to introduce important methods widely used in machine learning, such as gradient descent and Monte Carlo, by linking them to a common "use case" in the different data science communities, such as linear regression.
Regression is one of the first prediction methods invented, and the three approaches I will be discussing, are typically used by three different communities. Least squares is probably the most common method, mainly employed by frequentist statisticians and also used as the default method for many easy to use packages in R or Python (e.g. the lm function in R).
Gradient descent is the machine learning approach to the problem. My favorite resource on the topic is the famous machine learning course by Andrew Ng on Coursera. In machine learning, the overall approach to problem solving and prediction is rather different compared to more classical statistics, even though it heavily relies on statistics (check out this answer by Sebastian Raschka on the origins of machine learning, I think it makes my point clear). I think one of the reasons why Andrew Ng uses gradient descent in the first place, instead of other methods like the least squares, is probably that he wants to stress the importance of the method in the machine learning community. By using gradient descent, he highlights how in machine learning it's often more important to approximate a solution by defining an iterative procedure able to efficiently explore the parameter space, instead of obtaining an exact analytical solution.
Finally, there is the Bayesian way of doing things. Monte Carlo methods are powerful tools to explore the parameter's space and obtain the full posterior distribution, instead of just point estimates. This is something that requires a bit more introduction and explanation, and while I tried to do so in the remaining of this blog post, this is certainly far from being a comprehensive resource on Bayesian modeling. For the ones that want to dig deeper into regression and Bayesian approaches I would suggest reading Gelman's book on hierarchical modeling, one of my favorite resources in the field.
[For this analysis I used the term/preterm dataset that you can find on Physionet. My data and code are also available on github]
A couple of weeks ago I read this post about cross-validation done wrong. During cross-validation, we are typically trying to understand how well our model can generalize, and how well it can predict our outcome of interest on unseen samples. The author of the blog post makes some good points, especially about feature selection. It is indeed common malpractice to perform feature selection before we go into cross-validation, something that should however be done during cross-validation, so that the selected features are only derived from training data, and not from pooled training and validation data.
However, the article doesn’t touch a problem that is a major issue in most clinical research, i.e. how to properly cross-validate when we have imbalanced data. As a matter of fact, in the context of many medical applications, we have datasets where we have two classes for the main outcome; normal samples and relevant samples. For example in a cancer detection application we might have a small percentages of patients with cancer (relevant samples) while the majority of samples might be healthy individuals. Outside of the medical space, this is true (even more) for the case for example of fraud detection, where the rate of relevant samples (i.e. frauds) to normal samples might be even in the order of 1 to 100 000.
problem at hand
The main motivation behind the need to preprocess imbalanced data before we feed them into a classifier is that typically classifiers are more sensitive to detecting the majority class and less sensitive to the minority class. Thus, if we don't take care of the issue, the classification output will be biased, in many cases resulting in always predicting the majority class. Many methods have been proposed in the past few years to deal with imbalanced data. This is not really my area of research, however since I started working on preterm birth prediction, I had to deal with the problem more often. Preterm birth refers to pregnancies shorter than 37 weeks, and results in about 6-7% of all deliveries in most European countries, and 11% of all deliveries in the U.S., therefore the data are quite imbalanced.
I recently came across two papers [1, 2] predicting term and preterm deliveries using Electrohysterography (EHG) data. The authors used one single cross-sectional EHG recording (capturing the electrical activity of the uterus) and claimed near perfect accuracy in discriminating between the two classes (AUC value of 0.99 , compared to AUC = 0.52-0.60 without oversampling).
This seemed to me like a clear case of overfitting and bad cross-validation, for a couple of reasons. First of all, let’s just look at the data:
The density plots above show the feature's distribution for four features over the two classes, term and preterm (f = false, the delivery was not preterm, in light red, t = true, the delivery was preterm, in light blue). As we can see there is really not much discriminative power here between conditions. The extracted features are completely overlapping between the two classes and we might have a "garbage in, garbage out" issue, more than a "this is not enough data" issue.
Just thinking about the problem domain, should also raise some doubts, when we see results as high as auc = 0.99. The term/preterm distinction is almost arbitrary, set to 37 weeks of pregnancy. If you deliver at 36 weeks and 6 days, you are labeled preterm. On the other hand, if you deliver at 37 weeks and 1 day, you are labeled term. Obviously, there is no actual difference due to being term or preterm between two people that deliver that close, it's just a convention, and as such, prediction results will always be affected and most likely very inaccurate around the 37 weeks threshold.
Since the dataset used is available for anyone to download and use from Physionet, in this post I will partially replicate the published results, and show how to properly cross-validate when oversampling data. Maybe some clarification on this issue will help in avoiding the same mistakes in the future.
This post is about machine learning for energy expenditure (EE) estimation. More specifically, I'll show how to model the relation between accelerometer, physiological data and EE using Bayesian models and hierarchical regression.
During my PhD I've been working on developing EE models combining accelerometer and physiological data acquired using wearable sensors. I mainly focused on developing personalization techniques able to normalize physiological data across individuals, without the need for individual calibration.
The topic of EE estimation or physical activity assessment is gaining more and more interest lately, with the release of many activity trackers in the consumer market, some of them claiming higher accuracy due to a combination of accelerometer and physiological data (e.g. Bodymedia, Basis or the Apple watch). However, simply combining multiple signals, without personalization, provides suboptimal results, as I'll show in this post.
Let's take heart rate (HR) as an example. HR is the most commonly used physiological parameter to monitor physical activity and is getting used more and more with the introduction of many wrist-based HR monitors. HR can be key in providing accurate, personalized estimates at the individual level due to the strong relation between oxygen consumption, HR and EE within one individual. Here we can see how EE and HR evolve during different activities performed by one individual. The signals follow a similar trend. Pearson's correlation coefficient between HR and EE is 0.98, clearly, HR can be used as a predictor of EE.
However, this individual-specific relation does not hold across individuals, challenging standard population-based approaches for EE estimation. As a result, individual calibration and laboratory tests are needed to normalize HR. The rationale behind the need for normalization is that individuals with similar body size expend similar amounts of energy during a certain activity, however their HR differs depending on other factors, for example, fitness.
Let's look at another example to clarify this point. Here we have walking, running and biking data from two participants, the similar body size (weight P1: 57 and P2: 52 kg, height P1: 166 and P2: 169 cm), results in similar levels of EE for the same activities, as shown in the two plots on the left side. However, the different fitness level (VO2max P1: 2100 ml/min and P2: 3130 ml/min) results in higher HR for the unfit participant, as shown in the two plot on the right end side. Thus, estimation models relying on HR to predict EE will result in underestimations and overestimations of EE.
The main focus of my research was then to define methods and models able to take into account variability in physiological signals between individuals without the need for individual calibration. Let's take a step back, and start with the basics.
cardiovascular endurance and fitness
Moving on, let’s introduce the concept of cardiovascular endurance or cardiorespiratory fitness. Cardiorespiratory fitness (from now on just fitness) is defined as the ability of the circulatory and respiratory systems to supply oxygen during sustained physical activity. Fitness is not only an objective measure of habitual physical activity, but also a useful diagnostic and prognostic health indicator for patients in clinical settings, as well as healthy individuals . Fitness is considered among the most important determinants of health and wellbeing.
In this post, my interest is purely related to performance in sports. So everything that follows should be considered in this context.
I wrote a piece on Medium about my experience as indie app developer. If you love to make things, code or are just thinking about getting started, you might find it interesting.
I'll cover three aspects:
my geotagged pics in 2014 and between december 2011 and april 2015
For a while I wanted to map my flights and the locations I've spent time in throughout the year, so last night I downloaded my location data, which I thought I was tracking for a long time using a few apps. Unfortunately that was not the case. Turns out most apps were not running in the background, and I forgot to start others after rebooting my phone.
What to do? There's one thing I always do. Taking pictures. Everywhere. These days most pictures are geotagged, or at least the ones taken using smartphones. I found a couple of tutorials online on how to extract metadata from JPEGs in R and Python, and eventually worked with R since thanks to Nathan Yau and this post, I could make better looking maps. Another helpful post was on timelyportfolio, explaining how to extract exif metadata using exiftool and R.
I haven't tested much the code, but if you are lucky, this should be sufficient to make your own map:
I've got only a few months per year where I'm relatively happy with my trainings. That's more or less between december and april, when temperatures in Holland are low (I suffer the heat too much). This year I measured my HR and HRV every morning while preparing for a half marathon, and I finally collected enough data to explore the two main aspects I'm interested in tracking while training:
As explained in another post, both changes are somehow related to HR and HRV. So let's have a look at what I've got in about three months of measurements.
Hardware & Software
All measurements were taken using either Under Armour's Armour39 or Polar's H7, given the high reliability. I took the measurements right after waking up, while still in bed.
I used HRV4Training, with the following settings:
HR & HRV
The only HRV feature I will consider in this analysis is rMSSD, together with average heart rate. There are a lot of reasons why rMSSD should be used instead of other features. Most importantly, its reliability for short duration measurements and the high correlation with training load shown in past research. For more information have a look at Andrew Flatt's and Simon Wegerif's blogs. They are both amazing resources if you are interested in HRV research with focus on training.
Getting some perspective on my values
Any physiological parameter is very personal and should always be looked at in relation to our own baseline. While your heart rate can be normalized with respect to your age-predicted maximal, the situation is a bit more complicated for HRV, where there is no predefined range or zones. However I did want to get some perspective on my values, compared to what is out there.
I plotted simulations from my data together with data simulated according to what was reported in literature about rMSSD values [3,4] (this is another advantage of using this feature, since frequency domain features are computed differently by everyone, and even if HF power is also considered a good proxy to parasympathetic activity, it's almost impossible to compare results published in literature).
The distributions are quite wide (published results come from very few subjects, typically in the order of 10). Hopefully soon enough I'll get enough data from HRV4Training to be able to provide better ranges for different populations (age and gender also play a factor here). Anyway for the moment this still shows some meaningful data, since I overlap with the "trained subjects" population, which falls between sedentary and athletes.
I just came back from a very exciting (and tiring) two days in Baltimore, where I was invited for the final of the Armour39 Challenge.
It all started in November last year, when Under Armour decided to crowd source their R&D and the future of their Armour 39 platform.
The Armour39 wearable sensor was launched early 2013. Similarly to what UA usually organizes for their Future Show, where last year more than 4000 applicants competed, they crowd sourced software development for this April's Digital Future Show. Given the experience I have with wearable technology and my passion for running, I thought it was a pretty good match, and decided to give it a try.
I'm gonna start with the outcome of the competition, followed by my thoughts after the whole thing finished, which are certainly biased by the outcome. Then, I'll cover in detail my submission.
The competition was divided into three phases. We had first to submit a proposal outlining our idea (sometime in November). 50 proposals were selected and moved on to the second phase. At this stage we all received a development kit including the A39 sensor and an SDK to access the non-standard data stream (i.e. everything that is not heart rate). We had between January and March to develop our prototypes, and had to submit a video and report (see below for both of them). Up to 15 projects were selected and moved on to the final phase, consisting of a presentation in Baltimore at the Digital Future Show. Eventually it was only 5 teams reaching the final:
The technique is called photoplethysmography (PPG for short) and consists in detecting changes in blood volume during a cardiac cycle, by illuminating the skin and measuring changes in light absorption. PPG has become quite a popular non-invasive method for extracting physiological measurements such as heart rate and oxygen saturation. However, most applications today focus simply on heart rate, and it is not clear from literature if HRV features can also be reliably extracted using a phone's camera .
Here is the good news: it is indeed possible to achieve good accuracy in HRV measurements using this technique, but the methods needed are slightly more complicated than acquiring a video and computing peak detection on the PPG signal (which is sufficient for heart rate measurement). This post covers the steps involved in the implementation of Camera HRV, the iPhone app I developed to measure HRV using the phone's camera. The algorithms are part of HRV4Training since version 3.2.
1 - Data acquisition from the phone's camera
2 - Filtering & smoothing
3 - Resampling with cubic spline interpolation
4 - Peak detection
5 - Artifact removal and features extraction
6 - Comparison with heart rate monitors (Polar H7)
7 - Tips