Augmenting medical image classifiers with synthetic data from latent
diffusion models
- URL: http://arxiv.org/abs/2308.12453v1
- Date: Wed, 23 Aug 2023 22:34:49 GMT
- Title: Augmenting medical image classifiers with synthetic data from latent
diffusion models
- Authors: Luke W. Sagers, James A. Diao, Luke Melas-Kyriazi, Matthew Groh,
Pranav Rajpurkar, Adewole S. Adamson, Veronica Rotemberg, Roxana Daneshjou,
Arjun K. Manrai
- Abstract summary: We show that latent diffusion models can scalably generate images of skin disease.
We generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies.
- Score: 12.077733447347592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While hundreds of artificial intelligence (AI) algorithms are now approved or
cleared by the US Food and Drugs Administration (FDA), many studies have shown
inconsistent generalization or latent bias, particularly for underrepresented
populations. Some have proposed that generative AI could reduce the need for
real data, but its utility in model development remains unclear. Skin disease
serves as a useful case study in synthetic image generation due to the
diversity of disease appearance, particularly across the protected attribute of
skin tone. Here we show that latent diffusion models can scalably generate
images of skin disease and that augmenting model training with these data
improves performance in data-limited settings. These performance gains saturate
at synthetic-to-real image ratios above 10:1 and are substantially smaller than
the gains obtained from adding real images. As part of our analysis, we
generate and analyze a new dataset of 458,920 synthetic images produced using
several generation strategies. Our results suggest that synthetic data could
serve as a force-multiplier for model development, but the collection of
diverse real-world data remains the most important step to improve medical AI
algorithms.
Related papers
- Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models.
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.
Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - Embryo 2.0: Merging Synthetic and Real Data for Advanced AI Predictions [69.07284335967019]
We train two generative models using two datasets, one created and made publicly available, and one existing public dataset.
We generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst.
These were combined with real images to train classification models for embryo cell stage prediction.
arXiv Detail & Related papers (2024-12-02T08:24:49Z) - MediSyn: A Generalist Text-Guided Latent Diffusion Model For Diverse Medical Image Synthesis [4.541407789437896]
MediSyn is a text-guided latent diffusion model capable of generating synthetic images from 6 medical specialties and 10 image types.
A direct comparison of the synthetic images against the real images confirms that our model synthesizes novel images and, crucially, may preserve patient privacy.
Our findings highlight the immense potential for generalist image generative models to accelerate algorithmic research and development in medicine.
arXiv Detail & Related papers (2024-05-16T04:28:44Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Gadolinium dose reduction for brain MRI using conditional deep learning [66.99830668082234]
Two main challenges for these approaches are the accurate prediction of contrast enhancement and the synthesis of realistic images.
We address both challenges by utilizing the contrast signal encoded in the subtraction images of pre-contrast and post-contrast image pairs.
We demonstrate the effectiveness of our approach on synthetic and real datasets using various scanners, field strengths, and contrast agents.
arXiv Detail & Related papers (2024-03-06T08:35:29Z) - Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data [2.04850174048739]
We train latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation.
We then detect the amount of training data memorized utilizing our novel self-supervised copy detection approach.
Our findings show a surprisingly high degree of patient data memorization across all datasets.
arXiv Detail & Related papers (2024-02-01T22:58:21Z) - UAV-Sim: NeRF-based Synthetic Data Generation for UAV-based Perception [62.71374902455154]
We leverage recent advancements in neural rendering to improve static and dynamic novelview UAV-based image rendering.
We demonstrate a considerable performance boost when a state-of-the-art detection model is optimized primarily on hybrid sets of real and synthetic data.
arXiv Detail & Related papers (2023-10-25T00:20:37Z) - The Beauty or the Beast: Which Aspect of Synthetic Medical Images
Deserves Our Focus? [1.6305276867803995]
Training medical AI algorithms requires large volumes of accurately labeled datasets.
Synthetic images generated from deep generative models can help alleviate the data scarcity problem, but their effectiveness relies on their fidelity to real-world images.
arXiv Detail & Related papers (2023-05-03T09:09:54Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Brain Imaging Generation with Latent Diffusion Models [2.200720122706913]
In this study, we explore using Latent Diffusion Models to generate synthetic images from high-resolution 3D brain images.
We found that our models created realistic data, and we could use the conditioning variables to control the data generation effectively.
arXiv Detail & Related papers (2022-09-15T09:16:21Z) - Overcoming Barriers to Data Sharing with Medical Image Generation: A
Comprehensive Evaluation [17.983449515155414]
We utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data.
The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information.
We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset.
arXiv Detail & Related papers (2020-11-29T15:41:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.