Related papers: Augmenting medical image classifiers with synthetic data from latent diffusion models

Augmenting medical image classifiers with synthetic data from latent diffusion models

URL: http://arxiv.org/abs/2308.12453v1
Date: Wed, 23 Aug 2023 22:34:49 GMT
Title: Augmenting medical image classifiers with synthetic data from latent diffusion models
Authors: Luke W. Sagers, James A. Diao, Luke Melas-Kyriazi, Matthew Groh, Pranav Rajpurkar, Adewole S. Adamson, Veronica Rotemberg, Roxana Daneshjou, Arjun K. Manrai
Abstract summary: We show that latent diffusion models can scalably generate images of skin disease. We generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies.
Score: 12.077733447347592
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms.

Related papers

AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics [3.462981934061808]
In this work, we focus on the classification of single white blood cells, a key component in the diagnosis of hematological diseases such as acute myeloid leukemia (AML)<n>We demonstrate how synthetic images generated with a fine-tuned stable diffusion model can enhance classifier performance for limited data.<n>Our results establish synthetic images as a tool in biomedical research, improving machine learning models, and facilitating medical diagnosis and research.
arXiv Detail & Related papers (2025-07-07T14:49:05Z)
PixCell: A generative foundation model for digital histopathology images [49.00921097924924]
We introduce PixCell, the first diffusion-based generative foundation model for histopathology.<n>We train PixCell on PanCan-30M, a vast, diverse dataset derived from 69,184 H&E-stained whole slide images covering various cancer types.
arXiv Detail & Related papers (2025-06-05T15:14:32Z)
Causal Disentanglement for Robust Long-tail Medical Image Generation [80.15257897500578]
We propose a novel medical image generation framework, which generates independent pathological and structural features. We leverage a diffusion model guided by pathological findings to model pathological features, enabling the generation of diverse counterfactual images.
arXiv Detail & Related papers (2025-04-20T01:54:18Z)
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z)
Merging synthetic and real embryo data for advanced AI predictions [69.07284335967019]
We train two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages. These were combined with real images to train classification models for embryo cell stage prediction. Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data.
arXiv Detail & Related papers (2024-12-02T08:24:49Z)
MediSyn: A Generalist Text-Guided Latent Diffusion Model For Diverse Medical Image Synthesis [4.541407789437896]
MediSyn is a text-guided latent diffusion model capable of generating synthetic images from 6 medical specialties and 10 image types. A direct comparison of the synthetic images against the real images confirms that our model synthesizes novel images and, crucially, may preserve patient privacy. Our findings highlight the immense potential for generalist image generative models to accelerate algorithmic research and development in medicine.
arXiv Detail & Related papers (2024-05-16T04:28:44Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets. We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z)
Gadolinium dose reduction for brain MRI using conditional deep learning [66.99830668082234]
Two main challenges for these approaches are the accurate prediction of contrast enhancement and the synthesis of realistic images. We address both challenges by utilizing the contrast signal encoded in the subtraction images of pre-contrast and post-contrast image pairs. We demonstrate the effectiveness of our approach on synthetic and real datasets using various scanners, field strengths, and contrast agents.
arXiv Detail & Related papers (2024-03-06T08:35:29Z)
Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data [2.04850174048739]
We train latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation. We then detect the amount of training data memorized utilizing our novel self-supervised copy detection approach. Our findings show a surprisingly high degree of patient data memorization across all datasets.
arXiv Detail & Related papers (2024-02-01T22:58:21Z)
UAV-Sim: NeRF-based Synthetic Data Generation for UAV-based Perception [62.71374902455154]
We leverage recent advancements in neural rendering to improve static and dynamic novelview UAV-based image rendering. We demonstrate a considerable performance boost when a state-of-the-art detection model is optimized primarily on hybrid sets of real and synthetic data.
arXiv Detail & Related papers (2023-10-25T00:20:37Z)
EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model [4.057796755073023]
We develop controllable diffusion models for medical image synthesis, called EMIT-Diff. We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data. In our approach, we ensure that the synthesized samples adhere to medically relevant constraints.
arXiv Detail & Related papers (2023-10-19T16:18:02Z)
The Beauty or the Beast: Which Aspect of Synthetic Medical Images Deserves Our Focus? [1.6305276867803995]
Training medical AI algorithms requires large volumes of accurately labeled datasets. Synthetic images generated from deep generative models can help alleviate the data scarcity problem, but their effectiveness relies on their fidelity to real-world images.
arXiv Detail & Related papers (2023-05-03T09:09:54Z)
Mask-conditioned latent diffusion for generating gastrointestinal polyp images [2.027538200191349]
This study proposes a conditional DPM framework to generate synthetic GI polyp images conditioned on given segmentation masks. Our system can generate an unlimited number of high-fidelity synthetic polyp images with the corresponding ground truth masks of polyps. Results show that the best micro-imagewise IOU of 0.7751 was achieved from DeepLabv3+ when the training data consists of both real data and synthetic data.
arXiv Detail & Related papers (2023-04-11T14:11:17Z)
Differentially Private Diffusion Models Generate Useful Synthetic Images [53.94025967603649]
Recent studies have found that, by default, the outputs of some diffusion models do not preserve training data privacy. By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17. Our results demonstrate that diffusion models fine-tuned with differential privacy can produce useful and provably private synthetic data.
arXiv Detail & Related papers (2023-02-27T15:02:04Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
Brain Imaging Generation with Latent Diffusion Models [2.200720122706913]
In this study, we explore using Latent Diffusion Models to generate synthetic images from high-resolution 3D brain images. We found that our models created realistic data, and we could use the conditioning variables to control the data generation effectively.
arXiv Detail & Related papers (2022-09-15T09:16:21Z)
Overcoming Barriers to Data Sharing with Medical Image Generation: A Comprehensive Evaluation [17.983449515155414]
We utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data. The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information. We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset.
arXiv Detail & Related papers (2020-11-29T15:41:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.