On Heart Rate Variability and "Readiness"

20/7/2021

The goal of this post is to provide some clarity and general considerations on heart rate variability (HRV), readiness and wearables. I will try to clarify why comparing HRV and readiness scores is of little use and what you should be comparing (if anything) for a more meaningful assessment of how these devices work. Most importantly, we will see how you can benefit from the data for both HRV and readiness.

What are we talking about?

HRV is a measure of physiological stress. For today's wearables and apps, it typically represents parasympathetic activity due to how it is measured (at rest, while sleeping or first thing in the morning) and computed (relying on high frequency changes captured by rMSSD). This means that a lower HRV with respect to your historical data, is associated with higher stress.

Readiness is a made up construct that most apps or wearables provide. The goal of readiness is to combine multiple parameters (one of them typically is HRV), to determine your level of recovery or ability to tackle the day (whatever that means in your case).

Why does this matter?

Due to the novelty of some of these metrics for consumers, issues in science communication, and whatnot, there is much confusion on either of them, to the point that often I see people comparing HRV from one wearable with readiness from another. While understandable (the tools are supposed to do the same thing, measure our recovery), this is like comparing apples with pears, it does not make much sense.

This is an important aspect to address because wearables and apps can be extremely helpful in better understanding physiological responses to the various stressors we face, but not all devices are equal, nor differences between the output of one or the other device necessarily mean that they cannot be trusted.

Why do wearables provide a readiness score?

Wearables provide a readiness score for a simple reason: they track many parameters, and try to break down that information into something more digestible for the consumer, which means generating a single readiness score (and normally some form of color coding).

Obviously, they mean well. Oura, Whoop, etc. - try to use the available input to give you an overview of what the data shows. For example, your HRV is low, you were very active for a few days, your sleep was poor, thus your readiness is low. It seems to make sense.

Comparisons between tools

A first limitation of readiness scores typically highlighted by people using more than one tool, is that they are inconsistent. Note that this is per se not a problem if we understand what the tool is doing. Not only the algorithm, but even the inputs can differ (e.g. Oura has temperature data).

If we use a certain tool, we can see how readiness over time changes based on various stressors and behaviors, and we might benefit from the feedback. However, the black box nature of these scores makes it more difficult to make informed decisions.

If you are comparing tools, you can go two ways:

Compare readiness scores from different systems: as mentioned above, there might be inconsistencies, due to the type of input (having or not having a certain sensor) how the same inputs are measured (using the whole night of HRV vs more noisy segments of the night), the weight given to certain inputs (e.g. sleep or activity being more important in one tool or the other). Due to what I discuss below in the limitations section, this comparison is in my view of little use. Readiness can work for you, but it does not have to match another device output and there is no gold standard.
Compare physiological data collected from different systems: this is where we want to have less inconsistencies and ideally the same trends. Resting heart rate, HRV, temperature, they need to show the same acute changes and chronic trends across wearables and apps, if we want to trust these tools and rely on them to assess individual stress responses and better manage our health and performance. This is why it is important to understand what we are measuring, and also what are the nuances of doing it a bit differently (e.g. morning vs full night vs a few minutes of the night for HRV). Even in this case, it is fine for the data to differ if we understand why that is the case (e.g. a late stressor having a different impact on morning vs night data or using 5 minutes of data collected during the night being a more noisy version of using the full night).

To summarize, if you are comparing different tools, please compare the actual physiological signals, ideally with respect to your normal values and in relation to acute and chronic stressors (learn more, here), otherwise you might just be comparing meaningless oscillations (normal day to day variability, which can be high for some of these physiological parameters).

Limitations of a readiness score

I've mentioned above how inconsistencies between readiness scores is often reported. Let's try to touch on some of the more nuanced issues. For example, the most important question for me is always the following: did my behavior or my physiology trigger a reduced readiness? Did I get a low score because my HRV was low or because yesterday I was very active? In my opinion, what we should care about is our body's physiological response, and that is what HRV captures. This is why in HRV4Training we don't create a score combining multiple parameters, the score is HRV, which is then contextualized with respect to your historical data and other parameters.

If for example your readiness score includes sleep, then if you have a poor night of sleep, reflected also in your lower than normal HRV, you end up penalizing twice your readiness score (low sleep quality and low HRV). If on the other hand, your HRV is fine, it means your body did not respond poorly to a disruption in sleep, and therefore we might not need to penalize your readiness because of sleep. This is just an example, but you can think of any stressors, e.g. if you had a hard workout yesterday and your body assimilated the stressor well, meaning your HRV is within your normal, then it would be a poor choice to penalize your readiness score using activity data. Certainly this method can give you the perception that readiness is working, you went hard and readiness is low, but if there is a systematic impact of physical activity on readiness scores, unrelated to your physiological response, then why do you even measure your physiology?

In other words, we measure physiological parameters representative of recovery (HR, HRV, breathing rate, temperature, etc.) but then we estimate the effect of other parameters (e.g. activity, "sleep quality", etc.) on recovery to compute a readiness score that should be cumulative of everything. The reality of course is that this can never be accurate. Even in an ideal world where activity, sleep and other parameters are correctly quantified there are so many other factors that will have an impact beyond what a wearable can measure (environmental factors, medication, diet, personal relationships, global pandemics, just to name a few).

Finally, often I feel like there is a mismatch between the good intentions of a readiness score and the target audience. If the user is an elite athlete, it is even more meaningless to spend time looking at made up metrics. As a professional (athlete or coach), the physiological response is what matters. If you do not know where to start, check out the guides I link at the end of this article, I can ensure you that you are not alone.

In defense of readiness

An important assumption I make throughout this post is that we know what HRV is and how to use it. Or in other words, we understand that when measured correctly, HRV reflects physiological responses to stressors. This means that a good HRV is defined as a stable value with respect to our history, not a slightly higher one (see this). Similarly, a good HRV reflects a positive response to training and lifestyle stressors currently present in our life. Most importantly, a positive response does not mean "train hard every day" but it means "proceed as planned", because you do need a plan. This is not trivial, as only recently we have better understood how to collect and interpret the data meaningfully, and how to better communicate these aspects.

Blind guidance from a wearable or app without a plan is why readiness exists, trying to include various aspects of your life (activity, sleep, HRV, etc.) so that the app can do the decision making for you. It is of course easier to look at a cumulative readiness number, than to look at physiological data (heart rate and HRV) and at how physiology changes in response to the various stressors you face.

If you are able to link an app's or wearable readiness score to how you feel subjectively and / or the stressors you face, over time, by all means that is a useful way to use the data. It could be that in your case the inputs and weights used by the algorithms reflect well your responses. Below I discuss why we do not do this and possible alternatives to make good use of your physiological data, so that you can interpret deviations from your normal that signal periods of higher stress.

Alternative approaches to readiness: HRV (and context!)

While I understand the reasoning behind the readiness score, in my opinion these scores are flawed because starting from a true physiological response (HR, HRV, etc.) we then confound it with behaviors and estimates (e.g. activity or sleep quality as well as other parameters "we think" might be relevant). These parameters are not all equal.

If you adjust your activity or sleep duration on some of today's wearables, your readiness will change. To me this is a huge red flag given the inability of such wearables to measure accurately either sleep or activity. Note that even if they were perfectly measured, physiology already reflects what you need to know.

All other parameters are key as contextual information, to understand how your physiology changes in relation to sleep or exercise habits for example, but this is different from using them directly to determine your ability to perform on a given day.

The holistic view (provided by a wearable) is a myth. A wearable has no idea of muscle damage and context. Sleep quality is inherently linked to night physiological data, etc. Not only the wearable or app is missing information, but aggregating information gives the false expectation that the data becomes somewhat more insightful, while it is simply diluting the insight.

In my opinion, looking at actual physiological data and how it deviates from your normal, and contextualizing such data with your subjective feeling, training data, etc. separately can be more helpful.

Wrap up And resources

In this post I tried to provide an overview of the differences between HRV and readiness. If you use different tools, pay attention to how the actual physiology changes across tools, and worry less about readiness, especially if you use these devices with clients or to manage your health or performance.

To learn more about how to use HRV and how to interpret the data with respect to your historical measurements and various stressors, check out my guide here.

To learn more about the differences between resting heart rate and HRV, why HRV is a more sensitive metric of stress and what are the implications in terms of the technology used to measure it, check out my other guide here.