Hacking Bluetooth Sports Tech for Fun and Profit — Part 5: How Accurate Is DIY HRV From a Hacked WHOOP?
Six nights, 356,000 records, thirteen algorithm variants, and a surprisingly honest answer from the data.
In the previous posts we broke into the WHOOP's proprietary BLE protocol, built a passive daemon that downloads buffered sensor data overnight, and decoded the 93-byte historical records — including heart rate, RR intervals, motion, skin temperature, and SpO2 raw ADC, but cracking the protocol is just the plumbing. The question that actually matters is: can we compute clinically meaningful HRV from this data?
The WHOOP app reports HRV as a single number each morning. We don't have their algorithm — it runs server-side on uploaded data. So we built our own, validated it against the Oura Ring 4 (which sits on the finger and has excellent published accuracy), and then asked: is the remaining gap our algorithm, or the sensor itself?
The algorithm
The metric is RMSSD — root mean square of successive differences in RR intervals, the standard time-domain HRV measure. The WHOOP's buffered 0x2F records contain 1 Hz snapshots: one heart rate, up to two RR intervals, and a motion index per second. The challenge is that PPG-derived RR intervals are noisy. You can't just compute RMSSD naïvely — it'll be dominated by peak-detection jitter rather than genuine cardiac variability.
Our approach uses five gates, applied per 5-minute window:
- Rest gate: ≥80% of records in the window must have motion ≤ 0x40 (the threshold cleanly separates rest from movement — the motion byte is bimodal at 0x3E–0x3F during sleep and 0xBD–0xBE during exercise)
- HR gate: Mean window heart rate must be below a threshold (more on this shortly)
- RR validation: Each RR interval must be within ±15% of the expected value (60000 / bpm) — this rejects peak-detection artefacts
- Consecutive pairs only: The squared difference is only computed between adjacent valid RR values. One bad interval breaks the chain rather than corrupting pairs on either side
- Minimum pairs: At least 5 valid consecutive pairs per window, otherwise the window is discarded
Per-night RMSSD is the mean of all qualifying window RMSSDs. This is essentially motion-gated, quality-filtered HRV — and it turns out to align well with published best practice for PPG-derived metrics.
The reference: Oura Ring 4
We're fortunate to have an Oura Ring 4 worn concurrently every night. Oura reports 5-minute RMSSD windows and a nightly aggregate. Published validation studies show the Oura Gen 4 achieves CCC = 0.99 against Polar H10 ECG with MAPE = 5.96% (Dial et al. 2025, Physiological Reports; 13 participants, 536 nights). That's about as good as consumer wearables get, so it's a credible reference — not a gold-standard ECG, but close enough to expose algorithmic problems.
Six nights of data
We pulled all raw records from the research database for 10–15 April: 356,109 joined HR + RR + motion records across six nights. Oura's reported RMSSD for the same period:
| Night | Oura RMSSD (ms) | WHOOP RMSSD (ms) | Δ |
|---|---|---|---|
| Apr 10 | 67.0 | 65.5 | -1.5 |
| Apr 11 | 67.0 | 66.9 | -0.1 |
| Apr 12 | 68.0 | 68.5 | +0.5 |
| Apr 13 | 60.0 | 62.3 | +2.3 |
| Apr 14 | 69.0 | 61.0 | -8.0 |
| Apr 15 | 58.0 | 55.1 | -2.9 |
MAE: 2.5 ms. Mean bias: -1.6 ms (WHOOP reads slightly lower). Pearson r: 0.71.
Five of six nights land within 3 ms of Oura. The outlier is April 14, where WHOOP reads 8 ms below Oura. We'll come back to that.
The parameter sweep
The algorithm has several tuneable parameters. We'd assumed the motion gate was the most important — after all, it's the most aggressive filter, discarding any window where more than 20% of records show movement. Were we throwing away good data?
We tested thirteen variants, adjusting one parameter at a time, plus several combinations:
Motion gate (0x40 → 0x80 → 0xA0 → 0xC0): Virtually no effect. Raising the motion threshold from 0x40 to 0x80 produces identical results — because during sleep, almost all records are already below the threshold. The gate is doing its job without being aggressive. It's a seatbelt that never fires.
Rest percentage (80% → 60% → 50%): Similarly flat. Lowering the fraction of a window that must be "at rest" barely changes the output. The windows that fail the rest gate are genuinely disrupted — movement is clustered, not sprinkled uniformly.
RR tolerance (±15% → ±20% → ±25%): Widening acceptance flips the bias from -1.6 ms to +1.2 ms at ±20%, and to +3.5 ms at ±25%. The current ±15% is well-calibrated — it rejects peak-detection artefacts without discarding physiological variation.
HR ceiling (60 → 65 → 70 bpm): This was the finding. Raising the maximum mean heart rate from 60 to 65 improved MAE from 2.5 to 2.4 ms — a modest gain, but it came from a surprising mechanism. At HR_MAX = 60, we exclude light-sleep windows where heart rate drifts to 61–64 bpm. These windows contain valid HRV data that Oura captures (the ring doesn't gate on heart rate). At 65, we include them. At 70, we pick up too many transitional and REM windows with poor PPG signal, and accuracy collapses (MAE = 4.0, r = 0.21).
Wide open (all gates relaxed): The worst performer. MAE = 5.5 ms, correlation goes negative. More data is not better data.
The honest answer
The algorithm is already near-optimal. The remaining gap — a MAE of 2.4 ms against a sensor that itself has ~6% error versus ECG — is sensor-intrinsic. Here's why:
Finger beats wrist and arm for PPG. The Oura sits on a finger, where the arteries are superficial, perfusion is strong, and there's little soft-tissue motion artefact during sleep. The WHOOP sits on the wrist or upper arm, where perfusion is weaker and there's more muscle movement to corrupt the optical signal. Dial's own results show this gap directly: the finger-worn Oura tracked ECG more tightly (CCC 0.99) than the wrist-worn WHOOP (CCC 0.94).
The published data backs this up. Dial et al. (2025) found WHOOP CCC = 0.94 with MAPE = 8.17% vs ECG, on the wrist. Our ~3.8% MAPE against Oura (which itself has ~6% MAPE against ECG) is consistent with wrist/arm PPG sitting a tier below finger PPG — and no amount of algorithmic tuning will close that physics gap.
About April 14: WHOOP reads 61.0 vs Oura's 69.0 — our largest single-night miss, an 8 ms gap. An 8 ms single-night difference is unremarkable: it sits within the night-to-night limits of agreement (roughly ±10 ms) that even the finger-worn Oura shows against ECG in the validation literature. Some nights, the wrist/arm PPG simply doesn't capture the full variability that the finger sees. The body isn't a uniform signal source.
What we changed
One constant: HR_MAX from 60 to 65. Everything else stays. The improvement is marginal in aggregate but philosophically important — we were excluding valid light-sleep data for no good reason. The WHOOP doesn't gate on heart rate when recording, and neither should our analysis.
What we can't change
To meaningfully improve WHOOP-derived HRV beyond where we are, we'd need raw PPG waveform access — sub-millisecond optical data for custom peak detection, rather than the 1 Hz summary records we currently get.
We went looking for this. We tried the realtime commands and watched what came back, and the data told an awkward truth: the realtime IMU commands stream inertial data — accelerometer and gyroscope — not raw PPG. On our GEN_4 strap that command is 0x6A (TOGGLE_IMU_MODE); the equivalent on the newer MG/5.0 hardware is 0x3F (SEND_R10_R11_REALTIME). Either way the records carry 6-axis inertial samples plus a single HR byte, used by the StrengthTrainer feature for rep counting. We'd assumed it was optical data based on the battery-drain profile; it isn't.
The actual raw PPG command is 0x6B (ENABLE_OPTICAL_DATA), but on our GEN_4 strap it simply does nothing — the write is accepted and no optical stream follows. Raw-optical streaming isn't a capability the 4.0 hardware exposes; it's a Maverick/Goose feature. There's a Labs command (0x51, START_RAW_DATA) gated behind a WHOOP Labs campaign enrolment, but on GEN_4 it streams IMU data only.
So on our hardware, three potential paths all lead to dead ends:
- 0x3F / 0x6A — IMU, not optical. Useful for motion artefact rejection, not for IBI timing
- 0x6B — raw PPG, but Maverick/Goose hardware only. Not available on GEN_4
- 0x51 — Labs research command (
START_RAW_DATA), gated behind campaign enrolment, probably IMU-only on GEN_4
The remaining theoretical improvements (signal quality indices per beat, multi-wavelength fusion) are also blocked. The 0x2F records don't carry per-beat confidence scores, and the SpO2 red/IR raw ADC fields are session-relative (the red LED auto-gain shifts between nights, rendering cross-session ratiometric analysis unreliable).
The gap is hardware-intrinsic. No amount of software will extract sub-millisecond IBI timing from a 1 Hz summary record on a device that won't expose its raw optical sensor data over BLE.
The bottom line
From 93 bytes of buffered PPG summary data, captured passively overnight with zero battery impact, our motion-gated RMSSD tracks Oura Ring 4 to within 2.4 ms MAE and correlates at r = 0.71 over six nights. The published literature says that's about as good as wrist/arm PPG can do against finger PPG, and both are within clinical utility for trend tracking.
The WHOOP is a good enough HRV sensor at rest, and the Oura is a more accurate one. Neither replaces an ECG for single-night decisions. What matters is the trend, and both devices agree on which direction it's moving.
This is part 5 of "Hacking Bluetooth Sports Tech for Fun and Profit." The algorithm runs in both Python (VESTIGATOR) and Rust (Promus), using identical constants. All data captured from personally-owned devices on a Raspberry Pi 5 research platform.
Comments ()