RESEARCH · VALIDATION · OPEN METHODOLOGY

The research behind SomniSense.

SomniSense is a consumer wellness app. It's also the product end of a research program I've been working on for years — a validation study against in-lab sleep recordings, a small model that runs entirely on your phone, and a short portfolio of preprints describing how it works. This page is the research, laid out plainly. If you want the consumer version — the specific numbers, the "argue with me" framing — that's on /accuracy. This page is the why and the how underneath those numbers.

80
PSG-paired nights
40 participants · 10 in-lab + 70 ambulatory
16,491
labeled audio samples
13,538 snore + 2,953 breathing-window
< 60 KB
on-device model
0.064 ms / inference · Apple Neural Engine
3
arXiv preprints
cs.LG · eess.AS · forthcoming

What we set out to test

The question underneath everything SomniSense does is narrow and answerable: can a phone on your nightstand, using only its microphone, hear the same breathing irregularities and snoring that a clinical sleep recording captures with sensors strapped to your body?

Not "as well as." Not "instead of." Just: close enough, often enough, that the per-hour pattern it shows you the next morning is real and not noise. That's the whole research question. Everything below is how we tried to answer it honestly.

The validation study

A handwritten notebook page reading '80 paired nights · PSG vs phone, blinded scoring · published openly,' a coffee cup beside it — the validation design behind SomniSense's accuracy numbers.

Detection performance was measured against the clinical reference for sleep recording — in-lab and ambulatory polysomnography (PSG) — on nights where a phone and the PSG setup recorded the same person, on the same night, at the same time.

  • Sample. 80 paired nights across 40 participants — 10 in-laboratory PSG nights and 70 ambulatory PSG nights with a nasal-airflow cannula. Adults, with and without diagnosed sleep-breathing concerns. The cohort underrepresents some groups on purpose-of-honesty grounds — under-18, BMI extremes, certain demographics — and we say so rather than hide it behind a big aggregate number.
  • Recording medium. An ordinary bedside smartphone (a mix of iPhone and Android models from 2018 onward), 50–90 cm from the participant's head. Not lab hardware. The point was to test what users actually have.
  • Ground truth. The PSG audio was scored by AASM-trained sleep technicians who were blinded to what SomniSense had said about each segment. Blinded scoring is the part that keeps the test from being circular — the scorers don't get to see our answer first.
  • Agreement test. Per-event sensitivity and precision, plus a per-night Bland–Altman comparison of SomniSense's per-hour rate against the PSG-scored per-hour rate. The headline there: agreement within ±5 events/hour on 87% of nights at a PSG-scored rate ≤ 30.

The full number-by-number breakdown — snore sensitivity, snore precision, breathing-event accuracy, the bootstrap ranges — lives on /accuracy, written for someone who wants to scrutinize each figure.

How the system actually works

SomniSense isn't one model. It's a cascade of two, and the structure is the part I'm most proud of — because it's what makes a real-time sleep model small enough to live on a phone instead of in a data center.

  • Stage 1 — a short-window listener. A small CNN looks at one second of audio at a time and asks a single question: was that a snore? It does this every second, all night.
  • Stage 2 — a long-window reader. Those per-second answers, together with two simple loudness features, get assembled into a compact 200 × 3 grid — roughly 600 numbers describing a 200-second stretch of your night. A second CNN reads that grid and decides whether the stretch contained a breathing irregularity.

The trick is that intermediate grid. Most audio models carry a heavy spectrogram around — megabytes per window. By compressing each window down to ~600 numbers before the second stage ever sees it, the whole pipeline shrinks to a model under 60 KB that runs in 0.064 ms on an Apple Neural Engine. That's what "on-device, no cloud" actually requires under the hood.

Certain on-device system details — the gating and inference-triggering logic that surrounds the two stages — are covered by a pending U.S. provisional patent application by SomniAI LLC and are not described in the preprints.

The preprint portfolio

Three companion preprints document the methodology in full. They're forthcoming on arXiv (categories cs.LG and eess.AS) and have not yet been peer-reviewed — I'd rather tell you they're preprints than imply a journal stamp they don't have yet. When each goes live, the link below becomes the real arXiv identifier.

  1. Paper A Two CNN Baselines for Smartphone-Based Sleep Audio Detection

    The two CNN baselines and the cascade structure, validated under a multi-seed bootstrap so the numbers aren't a single lucky split.

    Yang L. (SomniAI LLC) · cs.LG, eess.AS · arXiv preprint, forthcoming
  2. Paper C Coordinate Attention for 1D Audio-Based Sleep Apnea Detection

    A one-dimensional Coordinate-Attention design for the breathing-event stage — a 93% parameter reduction over the plain baseline, studied across multiple seeds.

    Yang L. (SomniAI LLC) · cs.LG, eess.AS · arXiv preprint, forthcoming
  3. Paper E On-Device Compression for Sleep Apnea Detection

    How the model gets from a research-sized network down to the under-60 KB version that ships — quantization-aware training plus structured pruning, without losing accuracy.

    Yang L. (SomniAI LLC) · cs.LG, eess.AS · arXiv preprint, forthcoming

What this research is — and isn't

The validation tells you the acoustic estimate is close to the clinical reference, often. It does not turn a wellness app into a medical one, and I won't pretend it does.

  • Not a diagnosis. SomniSense reports a Breathing Irregularity Index (BRI) — the same per-hour shape clinicians call AHI, but computed from microphone audio rather than from PSG's airflow, oxygen, and arousal channels. It's an acoustic estimator of that metric, not a clinical AHI and not a diagnosis of obstructive sleep apnea.
  • Not a replacement for a sleep study. A diagnosis still needs polysomnography, a daytime symptom assessment, and a sleep specialist's judgment. If your BRI runs above 15 consistently, that's a clinic conversation — we just hand you organized data to bring to it.
  • Not validated for under-18. The cohort was adults only.

Want the numbers, not the narrative?

Every metric, every bootstrap range, every honest limitation is on the accuracy page — built so you can argue with it. Or, if you've read enough, the first 7 days of Pro are free.

Join the waitlist See the full accuracy breakdown →