RESEARCH · VALIDATION · OPEN METHODOLOGY

The research behind SomniSense.

SomniSense is a consumer wellness app. It's also the product end of a research program I've been working on for years — a validation study against in-lab sleep recordings, a small model that runs entirely on your phone, and a short portfolio of preprints describing how it works. This page is the research, laid out plainly. If you want the consumer version — the specific numbers, the "argue with me" framing — that's on /accuracy. This page is the why and the how underneath those numbers.

PSG-paired nights

40 participants · 10 in-lab + 70 ambulatory

16,491

labeled audio samples

13,538 snore + 2,953 breathing-window

< 60 KB

on-device model

0.064 ms / inference · Apple Neural Engine

open preprints

Zenodo DOIs · cs.LG · eess.AS

What we set out to test

The question underneath everything SomniSense does is narrow and answerable: can a phone on your nightstand, using only its microphone, hear the same breathing irregularities and snoring that a clinical sleep recording captures with sensors strapped to your body?

Not "as well as." Not "instead of." Just: close enough, often enough, that the per-hour pattern it shows you the next morning is real and not noise. That's the whole research question. Everything below is how we tried to answer it honestly.

The validation study

A handwritten notebook page reading '80 paired nights · PSG vs phone, blinded scoring · published openly,' a coffee cup beside it — the validation design behind SomniSense's accuracy numbers.

Detection performance was measured against the clinical reference for sleep recording — in-lab and ambulatory polysomnography (PSG) — on nights where a phone and the PSG setup recorded the same person, on the same night, at the same time.

Sample. 80 paired nights across 40 participants — 10 in-laboratory PSG nights and 70 ambulatory PSG nights with a nasal-airflow cannula. Adults, with and without diagnosed sleep-breathing concerns. The cohort underrepresents some groups on purpose-of-honesty grounds — under-18, BMI extremes, certain demographics — and we say so rather than hide it behind a big aggregate number.
Recording medium. An ordinary bedside smartphone (a mix of iPhone and Android models from 2018 onward), 50–90 cm from the participant's head. Not lab hardware. The point was to test what users actually have.
Ground truth. The PSG audio was scored by AASM-trained sleep technicians who were blinded to what SomniSense had said about each segment. Blinded scoring is the part that keeps the test from being circular — the scorers don't get to see our answer first.
Agreement test. Per-event sensitivity and precision, plus a per-night Bland–Altman comparison of SomniSense's per-hour rate against the PSG-scored per-hour rate. The headline there: agreement within ±5 events/hour on 87% of nights — a preliminary system-level result; the full Bland–Altman analysis, including across severity ranges, is to be documented in a forthcoming preprint.

The full number-by-number breakdown — snore sensitivity, snore precision, breathing-event accuracy, the bootstrap ranges — lives on /accuracy, written for someone who wants to scrutinize each figure.

How the system actually works

SomniSense isn't one model. It's a cascade of two, and the structure is the part I'm most proud of — because it's what makes a real-time sleep model small enough to live on a phone instead of in a data center.

Stage 1 — a short-window listener. A small CNN looks at one second of audio at a time and asks a single question: was that a snore? It does this every second, all night.
Stage 2 — a long-window reader. Those per-second answers, together with two simple loudness features, get assembled into a compact 200 × 3 grid — roughly 600 numbers describing a 200-second stretch of your night. A second CNN reads that grid and decides whether the stretch contained a breathing irregularity.

The trick is that intermediate grid. Most audio models carry a heavy spectrogram around — megabytes per window. By compressing each window down to ~600 numbers before the second stage ever sees it, the whole pipeline shrinks to a model under 60 KB that runs in 0.064 ms on an Apple Neural Engine. That's what "on-device, no cloud" actually requires under the hood.

Certain on-device system details — the gating and inference-triggering logic that surrounds the two stages — are covered by a pending U.S. provisional patent application by SomniAI LLC and are not described in the preprints.

The papers

The complete two-stage cascade is written up as one paper — a Research Square preprint with a citable DOI. Underneath it, three companion preprints document each piece in full — published openly on Zenodo, each with its own DOI (categories cs.LG and eess.AS). None have been peer-reviewed — so I'll call them preprints, not papers with a journal stamp. The DOI links are below.

Paper A — A Cascaded Two-Stage CNN Pipeline for Smartphone Sleep-Audio Detection
The two CNN baselines and the cascade structure, validated under a multi-seed bootstrap so the numbers aren't a single lucky split.

Yang L. (SomniAI LLC) · cs.LG, eess.AS · Zenodo · doi:10.5281/zenodo.20662374
Paper C — Coordinate Attention for 1D Audio-Based Sleep Apnea Detection
A one-dimensional Coordinate-Attention design for the breathing-event stage — a 93% parameter reduction over the plain baseline, studied across multiple seeds.

Yang L. (SomniAI LLC) · cs.LG, eess.AS · Zenodo · doi:10.5281/zenodo.20663376
Paper E — On-Device Compression for Sleep Apnea Detection
How the model gets from a research-sized network down to the under-60 KB version that ships — quantization-aware training plus structured pruning, without losing accuracy.

Yang L. (SomniAI LLC) · cs.LG, eess.AS · Zenodo · doi:10.5281/zenodo.20663768

Why preprints, before peer review

Peer review takes months. Waiting would mean giving you nothing concrete to check in the meantime. So the methodology goes out now, openly, as preprints you can read and argue with — with the explicit caveat that they haven't been formally reviewed. When that changes, this page changes, and I'll say what changed.

What this research is — and isn't

The validation tells you the acoustic estimate is close to the clinical reference, often. It does not turn a wellness app into a medical one, and I won't pretend it does.

Not a diagnosis. SomniSense reports a Breathing Irregularity Index (BRI) — the same per-hour shape clinicians call AHI, but computed from microphone audio rather than from PSG's airflow, oxygen, and arousal channels. It's an acoustic estimator of that metric, not a clinical AHI and not a diagnosis of obstructive sleep apnea.
Not a replacement for a sleep study. A diagnosis still needs polysomnography, a daytime symptom assessment, and a sleep specialist's judgment. If your BRI runs above 15 consistently, that's a clinic conversation — we just hand you organized data to bring to it.
Not validated for under-18. The cohort was adults only.

Reading this because of a symptom?

The research is the proof layer. If you got here from something you're actually experiencing, these start from the symptom instead of the method:

Want the numbers, not the narrative?

Every metric, every bootstrap range, every honest limitation is on the accuracy page — built so you can argue with it. Or, if you've read enough, the first 7 days of Pro are free.

Get the app — free to start See the full accuracy breakdown →