Disparities in Dermatology AI Performance on a Diverse, Curated Clinical
Image Set
- URL: http://arxiv.org/abs/2203.08807v1
- Date: Tue, 15 Mar 2022 20:33:23 GMT
- Title: Disparities in Dermatology AI Performance on a Diverse, Curated Clinical
Image Set
- Authors: Roxana Daneshjou, Kailas Vodrahalli, Roberto A Novoa, Melissa Jenkins,
Weixin Liang, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E
Bailey, Olivier Gevaert, Pritam Mukherjee, Michelle Phung, Kiana Yekrang,
Bradley Fong, Rachna Sahasrabudhe, Johan A. C. Allerup, Utako Okata-Karigane,
James Zou, Albert Chiou
- Abstract summary: We show that state-of-the-art AI models perform substantially worse on Diverse Dermatology Images dataset.
We find that dermatologists, who typically provide visual labels for AI training and test datasets, also perform worse on images of dark skin tones and uncommon diseases.
- Score: 10.212881174103996
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Access to dermatological care is a major issue, with an estimated 3 billion
people lacking access to care globally. Artificial intelligence (AI) may aid in
triaging skin diseases. However, most AI models have not been rigorously
assessed on images of diverse skin tones or uncommon diseases. To ascertain
potential biases in algorithm performance in this context, we curated the
Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly
curated, and pathologically confirmed image dataset with diverse skin tones.
Using this dataset of 656 images, we show that state-of-the-art dermatology AI
models perform substantially worse on DDI, with receiver operator curve area
under the curve (ROC-AUC) dropping by 27-36 percent compared to the models'
original test results. All the models performed worse on dark skin tones and
uncommon diseases, which are represented in the DDI dataset. Additionally, we
find that dermatologists, who typically provide visual labels for AI training
and test datasets, also perform worse on images of dark skin tones and uncommon
diseases compared to ground truth biopsy annotations. Finally, fine-tuning AI
models on the well-characterized and diverse DDI images closed the performance
gap between light and dark skin tones. Moreover, algorithms fine-tuned on
diverse skin tones outperformed dermatologists on identifying malignancy on
images of dark skin tones. Our findings identify important weaknesses and
biases in dermatology AI that need to be addressed to ensure reliable
application to diverse patients and diseases.
Related papers
- PASSION for Dermatology: Bridging the Diversity Gap with Pigmented Skin Images from Sub-Saharan Africa [29.405369900938393]
Africa faces a huge shortage of dermatologists, with less than one per million people.
This is in stark contrast to the high demand for dermatologic care, with 80% of the paediatric population suffering from largely untreated skin conditions.
The PASSION project aims to address this issue by collecting images of skin diseases in Sub-Saharan countries with the aim of open-sourcing this data.
arXiv Detail & Related papers (2024-11-07T10:11:37Z) - FairSkin: Fair Diffusion for Skin Disease Image Generation [54.29840149709033]
Diffusion Model (DM) has become a leading method in generating synthetic medical images, but it suffers from a critical twofold bias.
We propose FairSkin, a novel DM framework that mitigates these biases through a three-level resampling mechanism.
Our approach significantly improves the diversity and quality of generated images, contributing to more equitable skin disease detection in clinical settings.
arXiv Detail & Related papers (2024-10-29T21:37:03Z) - Equitable Skin Disease Prediction Using Transfer Learning and Domain Adaptation [1.9505972437091028]
Existing artificial intelligence (AI) models in dermatology face challenges in accurately diagnosing diseases across diverse skin tones.
We employ a transfer-learning approach that capitalizes on the rich, transferable knowledge from various image domains.
Among all methods, Med-ViT emerged as the top performer due to its comprehensive feature representation learned from diverse image sources.
arXiv Detail & Related papers (2024-09-01T23:48:26Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary
Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information.
A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction.
The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z) - DDI-CoCo: A Dataset For Understanding The Effect Of Color Contrast In
Machine-Assisted Skin Disease Detection [51.92255321684027]
We study the interaction between skin tone and color difference effects and suggest that color difference can be an additional reason behind model performance bias between skin tones.
Our work provides a complementary angle to dermatology AI for improving skin disease detection.
arXiv Detail & Related papers (2024-01-24T07:45:24Z) - Skin Lesion Diagnosis Using Convolutional Neural Networks [0.30458514384586394]
This project aims to address the issue by collecting state-of-the-art techniques for image classification from various fields.
The models were trained using a dataset of 8012 images, and their performance was evaluated using 2003 images.
It's worth noting that this model is trained end-to-end, directly from the image to the labels, without the need for handcrafted feature extraction.
arXiv Detail & Related papers (2023-05-18T17:15:08Z) - Improving Deep Facial Phenotyping for Ultra-rare Disorder Verification
Using Model Ensembles [52.77024349608834]
We analyze the influence of replacing a DCNN with a state-of-the-art face recognition approach, iResNet with ArcFace.
Our proposed ensemble model achieves state-of-the-art performance on both seen and unseen disorders.
arXiv Detail & Related papers (2022-11-12T23:28:54Z) - Automatic Facial Skin Feature Detection for Everyone [60.31670960526022]
We present an automatic facial skin feature detection method that works across a variety of skin tones and age groups for selfies in the wild.
To be specific, we annotate the locations of acne, pigmentation, and wrinkle for selfie images with different skin tone colors, severity levels, and lighting conditions.
arXiv Detail & Related papers (2022-03-30T04:52:54Z) - Disparities in Dermatology AI: Assessments Using Diverse Clinical Images [9.767299882513825]
We show that state-of-the-art dermatology AI models perform substantially worse on Diverse Dermatology Images dataset.
We find that dark skin tones and uncommon diseases, which are well represented in the DDI dataset, lead to performance drop-offs.
arXiv Detail & Related papers (2021-11-15T07:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.