Article ยท 8 min read

Rosacea is underdiagnosed on darker skin. The dermatology literature has named the mechanism. Most consumer tools haven't responded.

Reported rosacea prevalence in Fitzpatrick IV-V skin sits at roughly 6 to 8 percent of the rosacea-diagnosed population. The literature is explicit that a meaningful share of that gap is diagnostic masking, not biology. Telangiectasia is harder to see against pigmented skin, and the dermatology image-classification models trained to spot it perform 3 times worse on darker skin than on lighter. We built Skinframe to not make that mistake worse.

The prevalence gap

When the rosacea literature reports prevalence by skin tone, the numbers paint a picture that does not match what dermatologists who work with Fitzpatrick IV through VI patients see in clinic. Reported prevalence in Fitzpatrick IV and V populations sits in the 6 to 8 percent range of the diagnosed rosacea population, with even lower figures in Fitzpatrick VI (Alexis AF, Callender VD, Baldwin HE, et al. J Am Acad Dermatol. 2019;80(6):1722-1729).

The Alexis review and the parallel literature on rosacea in skin of color (Maliyar K, Abdulla SJ. PMC. 2022) are explicit about what is happening: a meaningful share of the apparent prevalence gap is diagnostic masking, not biology. The features rosacea is most commonly screened on are visual: persistent centrofacial erythema and telangiectasia. Both are harder to see against pigmented skin. A patient whose rosacea looks like flushing-plus-burning-plus-a-couple-of-papules to a clinician trained on Fitzpatrick I through III may be diagnosed as late-onset acne, perioral dermatitis, or just dismissed as "reactive skin" for years before the diagnosis catches up.

The word the literature uses for this is underdiagnosis. The disease is there at population rates closer to what general rosacea epidemiology would predict; the screening apparatus is calibrated against the wrong features. The 6 to 8 percent number is a reporting artifact at least in part, not a biological boundary.

What the literature says actually looks different

The skin-of-color rosacea review literature consistently flags these presentation differences (Alexis 2019, Maliyar 2022):

- Persistent erythema and telangiectasia are less visually obvious. Telangiectasias are described as "often not observed in highly pigmented skin" with the naked eye in the Maliyar review. The vessels are there; the contrast against the pigmented background is just not enough to make them stand out. - The papulopustular component is proportionally more prominent at presentation, because it is the visible component the patient and clinician can both see. The first thing a Fitzpatrick V patient is told they have is frequently "acne," because the bumps are visible and the underlying erythema is not. - Granulomatous variants are more commonly reported. - Postinflammatory hyperpigmentation overlaps with the disease. Papules resolve into hyperpigmented macules that confound assessment, so even a clinician who picks up the rosacea diagnosis has to disentangle the residual hyperpigmentation from active lesions. - Patients are more often initially misdiagnosed as late-onset acne. This is not an inference; it is what the case-series reports describe. - The sensory phenotype (burning and stinging) is, per Maliyar 2022, often the most diagnostic clue in skin of color. That is the opposite of how popular rosacea coverage in mainstream media frames the disease, which centers redness.

For a tracker, the implication is direct. If the app surfaces only photo-derived redness as the headline metric, it is asking the wrong question of a Fitzpatrick IV through VI user. The signal that actually matches the disease is the sensory diary (frequency and intensity of burning, stinging, flushing episodes), the lesion log, and the longitudinal trajectory of those features relative to the patient's own baseline. Photos still belong in the tracker, but as evidence the patient brings to their dermatologist, not as a source of automated severity scores.

The image-classification performance gap

The bias problem in consumer dermatology tools is not theoretical. It has been measured.

In 2018, Adamson and Smith published a viewpoint in JAMA Dermatology (Adamson AS, Smith A. JAMA Dermatol. 2018;154(11):1247-1248) that named the structural concern: machine-learning systems for dermatology were being trained on image datasets (ISIC, HAM10000, and others) that under-represented Fitzpatrick V and VI by an order of magnitude. The piece argued that without disclosure of training-set demographics, deploying these tools risked codifying existing diagnostic disparities. The Adamson and Smith argument has become the canonical citation in subsequent dermatology machine-learning equity literature.

In 2022, the empirical follow-up arrived. Daneshjou et al. (Daneshjou R, Vodrahalli K, Novoa RA, et al. Sci Adv. 2022;8(31):eabq6147) released the Diverse Dermatology Images (DDI) dataset: 656 biopsy-confirmed images from 570 patients, dermatologist-labeled by Fitzpatrick skin type. It was the first publicly available, pathologically confirmed, FST-balanced dermatology benchmark.

When the team ran two widely-deployed dermatology image-classification models against DDI, the performance gap was significant. DeepDerm sensitivity was 0.69 on lighter skin and 0.23 on darker skin. ModelDerm sensitivity was 0.41 on lighter skin and 0.12 on darker skin. Roughly 3 times degradation on darker skin, on widely-cited models, on a benchmark explicitly built to test equity. The Adamson and Smith warning had been quantified.

This is not specific to rosacea; the DDI work covers dermatology image classification broadly. But the mechanism applies directly: a rosacea-detection or rosacea-scoring model trained on the same biased datasets will produce a similar performance gap. The same diagnostic disparities that produce the prevalence-gap underdiagnosis on the human side get reproduced and amplified on the machine side.

What goes wrong if you ignore this

Most popular consumer skin image-classification apps (the literature includes coverage of SkinVision, Aysa, Miiskin's body-mole features, and others, see Wen D, et al. J Am Acad Dermatol. 2022, and follow-on Practical Dermatology coverage) launched with training data dominated by Fitzpatrick I through III. The published critiques of the consumer apps focus primarily on melanoma detection, where the disparities have the highest mortality cost, but the same data shape produces the same outcome for rosacea: a higher false-negative rate on darker skin, a higher false-positive rate from pigmentation features the model interprets as inflammation, and a confidence number on the result that lulls the user into thinking the assessment is more reliable than it is.

The practical user-experience failure is the worst kind: silent and confident. A Fitzpatrick V user opens a skin app, the app captures a photo, the model outputs a "low rosacea probability" because the visible erythema features it was trained on are not visible against this skin. The user concludes their skin is fine and does not pursue the burning sensation they have been having for a year. The diagnostic delay grows.

A tracker that respects this literature does not output a confidence number on a model the literature has shown does not generalize across skin tones. There is no version of "we trained on a more diverse dataset, so we are good now" that lands honestly without a published benchmark on a balanced test set. The DDI dataset is the test, and consumer apps are not running it.

What Skinframe does instead

Skinframe does not infer a diagnosis or a severity score from a population-trained classifier. The design choice is deliberate and the reasoning is the literature above.

What Skinframe does instead:

  1. Per-patient longitudinal baselines, not population-trained classifiers. Each user is treated as a sample of one. The first seven days at onboarding establish that user's baseline: their clear-or-good-day face, their typical sensory baseline (burning and stinging frequency on a non-flare week), their lesion-count baseline. After that, the tracker computes deltas from that individual's own baseline. We never compare you against a model trained on someone else's face. This sidesteps the bias problem entirely, because the comparison is intra-personal not inter-personal.
  2. Skin-tone-branched onboarding. At onboarding the patient picks their Fitzpatrick type from a visual picker (I through VI). The downstream interface adapts. For Fitzpatrick IV through VI users, the daily-log foregrounds the sensory diary (burning, stinging, flushing episode frequency) and the lesion log; the photo-derived redness signal is recorded but not surfaced as a top-line trend, because we do not yet have published validation showing our photo handling holds up at FST V and VI. For Fitzpatrick I through III users, the photo-derived signal is shown alongside the user's PSA self-rating. An inline explainer on the Fitzpatrick screen lands the medical reasoning rather than treating it as a bias disclaimer.
  3. Cite the literature on the methodology page. Adamson and Smith 2018 and Daneshjou et al. 2022 are named on the Skinframe site's methodology page, with the link, the citation, and the one-sentence reason we read them. This is what signals to a dermatologist or a clinically-literate patient that we know what we are doing, and what signals to the broader market that this is the bar we hold ourselves to.
  4. Standardize capture without claiming clinical-grade. White balance lock, no flash, fixed distance enforced by Vision face geometry, consistent lighting prompts (face a window, avoid back-lighting). These translate to a more honest signal across skin tones than auto-exposure-plus-flash would. The capture protocol section of our companion piece on clinical photography ("Why every dermatology app fails at clinical photography") goes into the details.

The one-sentence summary: we do not pretend our tool generalizes across skin tones until the published benchmark says it does, and in the meantime we lean on per-patient baselines and the sensory diary, which work the same regardless of skin tone.

What to ask the rosacea tool you are using today

If you are using a rosacea or skin-tracking app today, four questions you can ask it.

  1. Does it ask your Fitzpatrick type at onboarding? Not as a marketing checkbox but as a piece of data that changes the downstream interface. If the interface does not change after you answer, the question was theater.
  2. Does it surface a single redness score? If yes, ask what the score is computed from. If the answer is "a model trained on photos," ask what the model's published sensitivity is on Fitzpatrick V and VI. If there is no answer, treat the score as decorative.
  3. Does it foreground a sensory diary? Burning, stinging, flushing episodes recorded on their own. The Maliyar 2022 review explicitly identifies this as the most diagnostic clue in skin of color. If the app collapses the sensory data into a redness score or hides it under "notes," it is the wrong shape.
  4. Does it cite the bias literature anywhere? The Adamson and Smith JAMA Dermatology 2018 piece and the Daneshjou et al. Sci Adv 2022 piece are the two most-cited references in this space. A tool that has done the reading will name them somewhere on its methodology or about page. A tool that has not done the reading will avoid the topic entirely or substitute a bias disclaimer that does not name the published evidence.

The gap between an honest tool and a marketing-first one is exactly the four questions above. Most apps fail at least two. The point of running the questions is not to make you switch tools right away; it is to know what shape of evidence the tool you use is producing, and what you should bring to your dermatologist when the visit comes.

Get Skinframe

Read by Fitzpatrick IV through VI rosacea patients who have been told their skin issues are something else for years, plus dermatologists and clinically-literate readers tracking the equity discourse in consumer dermatology.