Augmenting medical image classifiers with synthetic data from latent
diffusion models
- URL: http://arxiv.org/abs/2308.12453v1
- Date: Wed, 23 Aug 2023 22:34:49 GMT
- Title: Augmenting medical image classifiers with synthetic data from latent
diffusion models
- Authors: Luke W. Sagers, James A. Diao, Luke Melas-Kyriazi, Matthew Groh,
Pranav Rajpurkar, Adewole S. Adamson, Veronica Rotemberg, Roxana Daneshjou,
Arjun K. Manrai
- Abstract summary: We show that latent diffusion models can scalably generate images of skin disease.
We generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies.
- Score: 12.077733447347592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While hundreds of artificial intelligence (AI) algorithms are now approved or
cleared by the US Food and Drugs Administration (FDA), many studies have shown
inconsistent generalization or latent bias, particularly for underrepresented
populations. Some have proposed that generative AI could reduce the need for
real data, but its utility in model development remains unclear. Skin disease
serves as a useful case study in synthetic image generation due to the
diversity of disease appearance, particularly across the protected attribute of
skin tone. Here we show that latent diffusion models can scalably generate
images of skin disease and that augmenting model training with these data
improves performance in data-limited settings. These performance gains saturate
at synthetic-to-real image ratios above 10:1 and are substantially smaller than
the gains obtained from adding real images. As part of our analysis, we
generate and analyze a new dataset of 458,920 synthetic images produced using
several generation strategies. Our results suggest that synthetic data could
serve as a force-multiplier for model development, but the collection of
diverse real-world data remains the most important step to improve medical AI
algorithms.
Related papers
- Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Gadolinium dose reduction for brain MRI using conditional deep learning [66.99830668082234]
Two main challenges for these approaches are the accurate prediction of contrast enhancement and the synthesis of realistic images.
We address both challenges by utilizing the contrast signal encoded in the subtraction images of pre-contrast and post-contrast image pairs.
We demonstrate the effectiveness of our approach on synthetic and real datasets using various scanners, field strengths, and contrast agents.
arXiv Detail & Related papers (2024-03-06T08:35:29Z) - UAV-Sim: NeRF-based Synthetic Data Generation for UAV-based Perception [62.71374902455154]
We leverage recent advancements in neural rendering to improve static and dynamic novelview UAV-based image rendering.
We demonstrate a considerable performance boost when a state-of-the-art detection model is optimized primarily on hybrid sets of real and synthetic data.
arXiv Detail & Related papers (2023-10-25T00:20:37Z) - EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided
Diffusion Model [4.057796755073023]
We develop controllable diffusion models for medical image synthesis, called EMIT-Diff.
We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data.
In our approach, we ensure that the synthesized samples adhere to medically relevant constraints.
arXiv Detail & Related papers (2023-10-19T16:18:02Z) - The Beauty or the Beast: Which Aspect of Synthetic Medical Images
Deserves Our Focus? [1.6305276867803995]
Training medical AI algorithms requires large volumes of accurately labeled datasets.
Synthetic images generated from deep generative models can help alleviate the data scarcity problem, but their effectiveness relies on their fidelity to real-world images.
arXiv Detail & Related papers (2023-05-03T09:09:54Z) - Mask-conditioned latent diffusion for generating gastrointestinal polyp
images [2.027538200191349]
This study proposes a conditional DPM framework to generate synthetic GI polyp images conditioned on given segmentation masks.
Our system can generate an unlimited number of high-fidelity synthetic polyp images with the corresponding ground truth masks of polyps.
Results show that the best micro-imagewise IOU of 0.7751 was achieved from DeepLabv3+ when the training data consists of both real data and synthetic data.
arXiv Detail & Related papers (2023-04-11T14:11:17Z) - Differentially Private Diffusion Models Generate Useful Synthetic Images [53.94025967603649]
Recent studies have found that, by default, the outputs of some diffusion models do not preserve training data privacy.
By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17.
Our results demonstrate that diffusion models fine-tuned with differential privacy can produce useful and provably private synthetic data.
arXiv Detail & Related papers (2023-02-27T15:02:04Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Brain Imaging Generation with Latent Diffusion Models [2.200720122706913]
In this study, we explore using Latent Diffusion Models to generate synthetic images from high-resolution 3D brain images.
We found that our models created realistic data, and we could use the conditioning variables to control the data generation effectively.
arXiv Detail & Related papers (2022-09-15T09:16:21Z) - Overcoming Barriers to Data Sharing with Medical Image Generation: A
Comprehensive Evaluation [17.983449515155414]
We utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data.
The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information.
We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset.
arXiv Detail & Related papers (2020-11-29T15:41:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.