Can segmentation models be trained with fully synthetically generated
data?
- URL: http://arxiv.org/abs/2209.08256v1
- Date: Sat, 17 Sep 2022 05:24:04 GMT
- Title: Can segmentation models be trained with fully synthetically generated
data?
- Authors: Virginia Fernandez (1), Walter Hugo Lopez Pinaya (1), Pedro Borges
(1), Petru-Daniel Tudosiu (1), Mark S Graham (1), Tom Vercauteren (1), M
Jorge Cardoso ((1) King's College London)
- Abstract summary: BrainSPADE is a model which combines a synthetic diffusion-based label generator with a semantic image generator.
Our model can produce fully synthetic brain labels on-demand, with or without pathology of interest, and then generate a corresponding MRI image of an arbitrary guided style.
Experiments show that brainSPADE synthetic data can be used to train segmentation models with performance comparable to that of models trained on real data.
- Score: 0.39577682622066246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to achieve good performance and generalisability, medical image
segmentation models should be trained on sizeable datasets with sufficient
variability. Due to ethics and governance restrictions, and the costs
associated with labelling data, scientific development is often stifled, with
models trained and tested on limited data. Data augmentation is often used to
artificially increase the variability in the data distribution and improve
model generalisability. Recent works have explored deep generative models for
image synthesis, as such an approach would enable the generation of an
effectively infinite amount of varied data, addressing the generalisability and
data access problems. However, many proposed solutions limit the user's control
over what is generated. In this work, we propose brainSPADE, a model which
combines a synthetic diffusion-based label generator with a semantic image
generator. Our model can produce fully synthetic brain labels on-demand, with
or without pathology of interest, and then generate a corresponding MRI image
of an arbitrary guided style. Experiments show that brainSPADE synthetic data
can be used to train segmentation models with performance comparable to that of
models trained on real data.
Related papers
- Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic.
Our approach transforms numerical data into text, re-framing data generation as a language modeling task.
Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - A 3D generative model of pathological multi-modal MR images and
segmentations [3.4806591877889375]
We propose brainSPADE3D, a 3D generative model for brain MRI and associated segmentations.
The proposed joint imaging-segmentation generative model is shown to generate high-fidelity synthetic images and associated segmentations.
We demonstrate how the model can alleviate issues with segmentation model performance when unexpected pathologies are present in the data.
arXiv Detail & Related papers (2023-11-08T09:36:37Z) - How Good Are Synthetic Medical Images? An Empirical Study with Lung
Ultrasound [0.3312417881789094]
Adding synthetic training data using generative models offers a low-cost method to deal with the data scarcity challenge.
We show that training with both synthetic and real data outperforms training with real data alone.
arXiv Detail & Related papers (2023-10-05T15:42:53Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Analyzing Effects of Fake Training Data on the Performance of Deep
Learning Systems [0.0]
Deep learning models frequently suffer from various problems such as class imbalance and lack of robustness to distribution shift.
With the advent of Generative Adversarial Networks (GANs) it is now possible to generate high-quality synthetic data.
We analyze the effect that various quantities of synthetic data, when mixed with original data, can have on a model's robustness to out-of-distribution data and the general quality of predictions.
arXiv Detail & Related papers (2023-03-02T13:53:22Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Differentially Private Synthetic Medical Data Generation using
Convolutional GANs [7.2372051099165065]
We develop a differentially private framework for synthetic data generation using R'enyi differential privacy.
Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data.
We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget.
arXiv Detail & Related papers (2020-12-22T01:03:49Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.