VALIDATED ACCURACY · METHODOLOGY OPEN

How accurate is SomniSense — really?

Most sleep apps quote a single rounded "accurate" percentage and stop there. I'm going to tell you something less flashy: we publish several specific numbers — per-event, per-night, on-device latency — and the methodology behind each. You can argue with our numbers if you read this whole page. That's the bar I'm trying to meet.

89-94%
SRI precision
5-seed bootstrap range · Zenodo preprint
88.5%
BRI accuracy
production on-device model
56 KB
on-device model
0.064 ms / inference · Apple Neural Engine
87%
BRI vs PSG event rate · ±5
n=80 paired PSG nights · system-level
An at-home sleep validation setup on a wooden nightstand: an iPhone Pro and a Pixel taped with painter's tape recording sleep simultaneously, with PSG sensor cables and a digital alarm clock at 6:17 a.m. — the kind of paired-night test that produced our snore sensitivity and BRI agreement numbers.

Why I'm publishing all of this before the paper

I'll tell you the honest reason. The peer-reviewed paper is in active preparation. It hasn't published yet. Once submitted, peer review typically takes another 3–6 months — sometimes longer.

If I waited, you'd have nothing concrete to compare us against in the meantime. So I'm publishing the numbers and the methodology now, with the explicit caveat that they're from our internal study and haven't been formally peer-reviewed yet. When the paper publishes, this page gets the citation. If peer review changes any number meaningfully, this page changes — and I'll explain what changed and why, openly.

The reason for that approach is simple: every consumer health app is trained to make accuracy claims that look impressive and don't get scrutinized. I'd rather publish a less impressive number that's correct, with the methodology you can argue with, than wait six months and publish a marketing-grade rounded "98%."

That's the deal.

The single-number problem

A single rounded "accurate" percentage sounds great until you ask: accurate at what, on what population, against what ground truth. Accurate at detecting that there was a sound? Distinguishing snores from background noise? Counting the right number of breathing pauses in a night? These are different questions with different answers. Collapsing them into one number is the thing you do when you don't want people to look too closely.

SomniSense breaks accuracy into several numbers, each tied to a specific question. They tell different stories. Together they tell the whole one.

Two indices, one app — and why the validation numbers split that way

SomniSense produces two independent indices each night, and they're validated separately:

  • BRI (Breathing Irregularity Index) — apnea + hypopnea events per hour, derived from acoustic event detection. Same per-hour shape clinicians use for AHI, but computed from microphone audio rather than from PSG airflow/SpO2/arousal channels. Per-event production model: 88.5% accuracy, ~87% sensitivity / ~87% precision, 56 KB / 0.064 ms inference on Apple Neural Engine.
  • SRI (Snoring Rate Index) — snore events per hour. The first per-hour rate for snoring; the index that says how the room actually sounded. 91-93% sensitivity / 89-94% precision against PSG (5-seed bootstrap range; mean 91.67% / 89.01%).

The reason there are two numbers per index isn't marketing — it's that they answer different questions. BRI is medically interpretable (because the AHI metric exists and clinicians use the same per-hour shape). SRI is product-defined (we made it up — carefully, but still). The validation numbers are what tell you each of them is honest.

What our research measures, and what BRI actually does

Three different things, often mixed up. We try not to mix them:

  • Per-event detection (what our preprints validate). Given a 200-second window of bedside audio, our model classifies it as Normal or Abnormal (apnea/hypopnea event). Per-event classification accuracy: 88.5% in the production on-device model.
  • Per-night BRI (what the app reports each morning). The app aggregates per-event detections across your night: BRI = breathing irregularity events detected / sleep hours. This is a derived per-hour rate. Same shape clinicians use for AHI, computed from a different signal source.
  • OSA (Obstructive Sleep Apnea) diagnosis — what a sleep specialist does, not what SomniSense does. A clinical diagnosis based on polysomnography, daytime symptom assessment, and a physician's judgment. BRI is data; OSA diagnosis is a doctor's call.

The 87% Bland-Altman agreement below — "BRI within ±5 of the PSG-scored per-hour event rate" — is a system-level per-night validation: do our acoustic per-event detections, aggregated to per-hour, fall close to what PSG would have counted on the same night? Yes, 87% of the time within ±5 events/hour. That tells you BRI is a useful acoustic estimator of the AHI-shaped metric. It does not tell you BRI is a substitute for clinical OSA diagnosis. If your BRI is consistently elevated, see a sleep specialist.

The four numbers we actually have

The question The number What that means
Of the snore events you make, how many does SRI catch? 91-93% SRI (Snoring Rate Index) sensitivity range across 5 random seeds (5-seed bootstrap, mean 91.67%). The seeds we don't catch are mostly the quietest ones.
When SRI flags a snore, how often is it actually one? 89-94% SRI precision range across 5 random seeds (5-seed bootstrap, mean 89.01%). False positives are rare on the best splits; we report the range honestly.
Of breathing pauses that happen, how many does BRI flag? ~87% BRI (Breathing Irregularity Index) sensitivity. From the Coordinate-Attention 1D baseline (87.42%); production on-device model preserves this after compression.
When BRI flags a pause, how often did one really happen? ~87% BRI precision. Combined apnea + hypopnea events. Baseline measurement 86.75%; the 50% L1-pruned production model slightly improves this via implicit regularization.
How accurate is our production model overall? 88.5% Production model : 9,416 INT8 parameters, 56.4 KB on-device, 0.064 ms inference on Apple M2 Neural Engine. Overall classification accuracy on 200-second windows.
How does our per-hour event rate compare to PSG event scoring? 87% Per-night BRI within ±5 events/hour of the PSG-scored per-hour event rate for 87% of nights — Bland–Altman agreement, n=80 paired PSG nights / 40 participants. This is a preliminary system-level per-night result; the full agreement analysis across severity ranges is to be documented in a forthcoming survey preprint. BRI is an acoustic estimator of the AHI-shaped metric, not a clinical OSA diagnosis.

These come from 80 paired PSG nights across 40 participants (10 in-lab + 70 ambulatory PSG with nasal-airflow cannula) — meaning we ran SomniSense on a phone next to the same person, on the same night, while they were also being recorded with a real polysomnography setup. The audio was scored by AASM-trained sleep technicians who didn't know what SomniSense had said.

That last part — "didn't know what we said" — is what "blinded scoring" means. We don't get to pre-train our scorers on our own answers. Otherwise the test would be circular.

How we tested it (the methodology, plain)

A handwritten notebook page: '70 paired nights · PSG vs phone, blinded scoring · published openly.' A coffee cup beside it.

Detection performance was measured against gold-standard in-lab polysomnography (PSG) recordings, audio-annotated by certified sleep technicians. Every audio segment they scored was blinded to what SomniSense said about it. That's the only honest way to do it.

Specifically:

  • Sample: 80 paired nights / 40 participants (smartphone + PSG simultaneously) — 10 in-lab PSG + 70 ambulatory PSG with nasal-airflow cannula. Adults with and without diagnosed sleep breathing concerns. We state who's underrepresented (mostly: under-18, severe BMI extremes, certain ethnic groups) openly, rather than hide it behind an aggregate number. I want to be specific about who's not in the cohort because aggregate sample size without context can be misleading.
  • Recording medium: bedside smartphone (varied iPhone & Android models from 2018 onward), 50–90 cm from participant's head.
  • Ground truth: PSG with synchronized audio channel; manual scoring by AASM-trained sleep technicians, blinded to SomniSense output.
  • Comparison: per-event sensitivity, per-event precision, per-night BRI vs PSG AHI agreement (Bland–Altman analysis).

The breathing-event detection algorithm builds on years of sleep apnea research by our founder. The version powering SomniSense was retrained from scratch and rebuilt for SomniAI LLC to handle smartphone audio specifically — different microphone, different distance, different acoustic context than clinical hardware. Three companion preprints document the methodology in full: the cascaded-baselines preprint (multi-seed bootstrap), the Coordinate-Attention 1D architecture preprint (93.2% parameter reduction), and the on-device compression preprint (0.064 ms inference on Apple Neural Engine). All are published openly on Zenodo with citable DOIs (cs.LG, eess.AS); an arXiv mirror is planned. For how the two-stage system fits together and the full preprint portfolio, see the research program. The full technical hub — the architecture in depth, the SDK, and licensing for hardware and clinical partners — lives at apneasense.com/research.

Honest limitations

Here's what we don't know yet, and what I'd want to know if I were the user:

  • The cohort skews adults with sleep breathing concerns. If you're a healthy 24-year-old with no symptoms, the numbers above are likely conservative for your case (you're probably underrepresented in the sample).
  • Acoustic environment matters. If your bedroom has unusual acoustic properties — hard surfaces, partner snoring louder than you, a fan blowing directly at the phone — the model may catch fewer of your events. The methodology paper documents the conditions we tested under.
  • Borderline events. The events we miss are mostly the borderline ones near the 10-second / 30%-amplitude threshold. We miss them on purpose, by tuning for precision over sensitivity. If your real BRI is 12, we might say BRI 10. If we say BRI 18, you can trust it's at least 18.
  • Not validated for under-18. The cohort was adults only.
  • Preprints, not yet peer-reviewed. Three companion preprints are published openly on Zenodo with citable DOIs; peer-reviewed journal publication is a separate process and will be noted on this page when complete.

What this isn't

  • Not a diagnostic claim. Even at these numbers, SomniSense is not a medical device and doesn't diagnose obstructive sleep apnea (OSA). OSA diagnosis requires polysomnography, symptom assessment, and a sleep specialist's judgment. SomniSense isn't validated for users under 18.
  • Not a personal guarantee. Your specific results may differ from population averages. Read the methodology paper to know whether your scenario is in or out of distribution.
  • Not a replacement for a sleep study. If your BRI runs above 15 consistently, that's a clinic conversation. We give you the data to bring. The clinic gives you the diagnosis.

Not sure if this is your problem? Start from a symptom

If you came here to check the evidence before trusting the app, the other way in is whatever you're actually feeling. Each of these walks through the breathing pattern behind the symptom and what to do about it:

If this is the level of evidence that satisfies you

The first 7 days of Pro are free — cancel through the App Store or Google Play before day 7 and you won't be charged. After that, $7.99/mo or $49.99/yr. The methodology stays open. The numbers stay honest. When the peer-reviewed paper lands, this page gets the citation.

Join the waitlist Or read the feature-page version of this →