Improving dermatology classifiers across populations using images
generated by large diffusion models
- URL: http://arxiv.org/abs/2211.13352v1
- Date: Wed, 23 Nov 2022 23:53:03 GMT
- Title: Improving dermatology classifiers across populations using images
generated by large diffusion models
- Authors: Luke W. Sagers, James A. Diao, Matthew Groh, Pranav Rajpurkar, Adewole
S. Adamson, Arjun K. Manrai
- Abstract summary: We show that DALL$cdot$E 2, a large-scale text-to-image diffusion model, can produce photorealistic images of skin disease across skin types.
We demonstrate that augmenting training data with DALL$cdot$E 2-generated synthetic images improves classification of skin disease overall and especially for underrepresented groups.
- Score: 4.291548465691441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dermatological classification algorithms developed without sufficiently
diverse training data may generalize poorly across populations. While
intentional data collection and annotation offer the best means for improving
representation, new computational approaches for generating training data may
also aid in mitigating the effects of sampling bias. In this paper, we show
that DALL$\cdot$E 2, a large-scale text-to-image diffusion model, can produce
photorealistic images of skin disease across skin types. Using the Fitzpatrick
17k dataset as a benchmark, we demonstrate that augmenting training data with
DALL$\cdot$E 2-generated synthetic images improves classification of skin
disease overall and especially for underrepresented groups.
Related papers
- DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Boosting Dermatoscopic Lesion Segmentation via Diffusion Models with
Visual and Textual Prompts [27.222844687360823]
We adapt the latest advance in the generative model, with the added control flow using lesion-specific visual and textual prompts.
It can achieve a 9% increase in the SSIM image quality measure and an over 5% increase in Dice coefficients over the prior arts.
arXiv Detail & Related papers (2023-10-04T15:43:26Z) - Augmenting medical image classifiers with synthetic data from latent
diffusion models [12.077733447347592]
We show that latent diffusion models can scalably generate images of skin disease.
We generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies.
arXiv Detail & Related papers (2023-08-23T22:34:49Z) - Performance of GAN-based augmentation for deep learning COVID-19 image
classification [57.1795052451257]
The biggest challenge in the application of deep learning to the medical domain is the availability of training data.
Data augmentation is a typical methodology used in machine learning when confronted with a limited data set.
In this work, a StyleGAN2-ADA model of Generative Adversarial Networks is trained on the limited COVID-19 chest X-ray image set.
arXiv Detail & Related papers (2023-04-18T15:39:58Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - Diffusion-based Data Augmentation for Skin Disease Classification:
Impact Across Original Medical Datasets to Fully Synthetic Images [2.5075774184834803]
Deep neural networks still rely on large amounts of training data to avoid overfitting.
Labeled training data for real-world applications such as healthcare is limited and difficult to access.
We build upon the emerging success of text-to-image diffusion probabilistic models in augmenting the training samples of our macroscopic skin disease dataset.
arXiv Detail & Related papers (2023-01-12T04:22:23Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Analysis of skin lesion images with deep learning [0.0]
We evaluate the current state of the art in the classification of dermoscopic images.
Various deep neural network architectures pre-trained on the ImageNet data set are adapted to a combined training data set.
The performance and applicability of these models for the detection of eight classes of skin lesions are examined.
arXiv Detail & Related papers (2021-01-11T10:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.