What Apple Watch sleep stages actually measure — and what they miss.
I've worn an Apple Watch for years. I think it's a remarkable device for what it does at the price you bought it for. The sleep tracking is honest, generally accurate, and — when you understand what it's measuring — useful.
I want to walk through what's actually happening when your watch tells you "you spent 1h 14m in REM last night," because the gap between what the watch sees and what a sleep lab sees is wider than most people realize, and once you know where the gap is, the score makes more sense.
This isn't a knock on Apple. They built a great device. It's just that wrist tracking has physical limits, and being clear about those limits helps you read your own data more accurately.
What the watch is physically measuring
Two channels do most of the work:
- PPG (photoplethysmography) — green LEDs on the underside of the watch shine into the skin, photodiodes measure the reflected light. The reflection changes with blood volume in the small vessels under the watch. From that, the device infers heart rate beat-by-beat, and from beat-to-beat variation, it infers HRV. It's the same technology your fingertip pulse-ox uses, just at the wrist.
- Accelerometer — measures motion in three axes. From the pattern and frequency of your wrist movement, the device infers stillness vs activity, and at sleep time, restless movements vs settled stretches.
- Wrist temperature (newer models) — a small temperature sensor that tracks deviations from your nightly baseline.
- Optional blood-oxygen samples — periodic SpO₂ checks during the night, used in the more recent breathing-disturbance flagging feature.
From these four signals — none of which directly measure sleep stages — a model running on the watch (or on the phone after sync) infers what stage you were probably in at any given minute. The inference is the part most people don't think about.
How the watch gets from PPG + motion to "sleep stage"
The model uses pattern recognition from a large training dataset where wrist signals were paired with simultaneous lab-measured sleep stages from polysomnography. The training set teaches the model: when heart rate looks like X, HRV looks like Y, and motion looks like Z, the lab probably scored this minute as REM. When all of those look like A/B/C, it was probably deep sleep. And so on.
The result is a probabilistic estimate, minute by minute. It's good — accuracy against PSG is typically 70–80% on broad stage classifications (wake / light / deep / REM) for well-trained models, which is impressive given the indirect inputs. But it's not a measurement; it's an inference. The watch is making an educated guess based on signals that are correlated with stages, not signals that define them.
To compare: a sleep lab puts EEG electrodes on your scalp. EEG is the definition of sleep stages — the stages are literally defined by the brain wave patterns the EEG sees. The watch is one or two layers of inference removed from the actual signal. That distance is why agreement is in the 70–80% range, not 99%.
What the watch is good at
Several things, in our experience and in the literature:
- Sleep / wake distinction. Reliable. The watch is very good at telling whether you were asleep or awake, which is the first and most consequential boundary.
- Total sleep time. Generally accurate to within 15–20 minutes against lab measurement.
- Trends over weeks. The week-over-week changes in your "deep sleep" or "REM" estimates are usually directionally right, even when the absolute minutes are off. Useful for noticing whether something changed in your life is also changing your sleep.
- HRV trend. Real, well-studied, useful at the multi-week scale.
- Breathing-disturbance flagging (recent generations) — when oxygen variability crosses a threshold, the watch flags it. Real signal, conservative threshold to avoid false alarms.
Where the watch necessarily falls short
Three places, all of them physical limits rather than software issues:
- Per-event resolution. The watch tells you you spent X minutes in REM. It can't easily tell you "your REM got chopped into 4 short fragments by breathing arousals." The minute-by-minute classification smooths over the micro-arousals that actually fragment the architecture.
- Acoustic events. The watch can't hear. Snoring, gasping, the silent stretches of a paused airway — none of these reach the wrist as signals the watch can flag. They might show up indirectly through heart rate or oxygen drops, but the events themselves are invisible to it.
- Compliance. Honest founder note: I sometimes take my watch off at night without realizing. Half-asleep brain, irritation under the strap, a bad day. When that happens the data is just gone for that night. A phone on the nightstand doesn't have this problem because you're not wearing it.
None of these are failures. They're the cost of putting a sensor on your wrist and asking it to do everything from heart rhythm to fall detection to stages. The breadth is part of why the watch is so useful. The breadth is also why the depth in any single domain is bounded.
How to read your watch's stages, in practice
Three rules I'd give if you wanted to take your watch's sleep data more seriously without overreading it:
- Read trends over single nights. One night of "low deep sleep" is noise. Two weeks of declining deep sleep is signal.
- Don't optimize on the absolute minutes. Whether your REM is "59 minutes" or "78 minutes" on any given night is too noisy a number. The shape and trend of the bar over weeks is more useful than the digit.
- Cross-reference when something feels off. If your watch says "good night" and your morning says otherwise, that's the case where adding a different sensor (a phone listening, a blood pressure cuff, anything) tends to surface the missing layer.
Where SomniSense fits
If you have a watch and you wonder whether you also need a phone-based monitor, the honest answer is: probably not, unless something specific is missing. The watch covers a wide angle. SomniSense covers one narrow thing — listening to breathing — that the watch can't do from where it sits. If your watch flags a sleep score that doesn't match how you feel, or your partner has mentioned something, or you're in the cohort where cardiovascular concerns make breathing pattern especially relevant, then the audio layer fills the specific gap.
For most people, the right answer is: keep your watch. It's doing real work. Add SomniSense when you have a specific question the watch can't answer. The two on the same nightstand, reading the same night, give you more than either alone.
Read next
If this is the kind of writing you'd want more of —
Drop your email. I'll send one note when SomniSense is downloadable. No marketing list, no second email unless you ask.
One email at launch. No newsletter, no list-sharing, no second email unless you ask.
First 7 days free at launch · then $7.99/mo or $49.99/yr.