Overcoming Barriers to Data Sharing with Medical Image Generation: A
Comprehensive Evaluation
- URL: http://arxiv.org/abs/2012.03769v1
- Date: Sun, 29 Nov 2020 15:41:46 GMT
- Title: Overcoming Barriers to Data Sharing with Medical Image Generation: A
Comprehensive Evaluation
- Authors: August DuMont Sch\"utte, J\"urgen Hetzel, Sergios Gatidis, Tobias
Hepp, Benedikt Dietz, Stefan Bauer and Patrick Schwab
- Abstract summary: We utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data.
The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information.
We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset.
- Score: 17.983449515155414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy concerns around sharing personally identifiable information are a
major practical barrier to data sharing in medical research. However, in many
cases, researchers have no interest in a particular individual's information
but rather aim to derive insights at the level of cohorts. Here, we utilize
Generative Adversarial Networks (GANs) to create derived medical imaging
datasets consisting entirely of synthetic patient data. The synthetic images
ideally have, in aggregate, similar statistical properties to those of a source
dataset but do not contain sensitive personal information. We assess the
quality of synthetic data generated by two GAN models for chest radiographs
with 14 different radiology findings and brain computed tomography (CT) scans
with six types of intracranial hemorrhages. We measure the synthetic image
quality by the performance difference of predictive models trained on either
the synthetic or the real dataset. We find that synthetic data performance
disproportionately benefits from a reduced number of unique label combinations
and determine at what number of samples per class overfitting effects start to
dominate GAN training. Our open-source benchmark findings also indicate that
synthetic data generation can benefit from higher levels of spatial resolution.
We additionally conducted a reader study in which trained radiologists do not
perform better than random on discriminating between synthetic and real medical
images for both data modalities to a statistically significant extent. Our
study offers valuable guidelines and outlines practical conditions under which
insights derived from synthetic medical images are similar to those that would
have been derived from real imaging data. Our results indicate that synthetic
data sharing may be an attractive and privacy-preserving alternative to sharing
real patient-level data in the right settings.
Related papers
- Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models.
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.
Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - Embryo 2.0: Merging Synthetic and Real Data for Advanced AI Predictions [69.07284335967019]
We train two generative models using two datasets, one created and made publicly available, and one existing public dataset.
We generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst.
These were combined with real images to train classification models for embryo cell stage prediction.
arXiv Detail & Related papers (2024-12-02T08:24:49Z) - Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation [0.0]
This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model (DDPM) for generating synthetic medical images.
Our approach generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints.
We evaluate the method in two scenarios: training a segmentation model exclusively on synthetic data, and augmenting real-world training data with synthetic images.
arXiv Detail & Related papers (2024-10-16T13:20:57Z) - MediSyn: A Generalist Text-Guided Latent Diffusion Model For Diverse Medical Image Synthesis [4.541407789437896]
MediSyn is a text-guided latent diffusion model capable of generating synthetic images from 6 medical specialties and 10 image types.
A direct comparison of the synthetic images against the real images confirms that our model synthesizes novel images and, crucially, may preserve patient privacy.
Our findings highlight the immense potential for generalist image generative models to accelerate algorithmic research and development in medicine.
arXiv Detail & Related papers (2024-05-16T04:28:44Z) - Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data [2.04850174048739]
We train latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation.
We then detect the amount of training data memorized utilizing our novel self-supervised copy detection approach.
Our findings show a surprisingly high degree of patient data memorization across all datasets.
arXiv Detail & Related papers (2024-02-01T22:58:21Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - Augmenting medical image classifiers with synthetic data from latent
diffusion models [12.077733447347592]
We show that latent diffusion models can scalably generate images of skin disease.
We generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies.
arXiv Detail & Related papers (2023-08-23T22:34:49Z) - A Deep Learning Approach to Private Data Sharing of Medical Images Using
Conditional GANs [1.2099130772175573]
We present a method for generating a synthetic dataset based on COSENTYX (secukinumab) Ankylosing Spondylitis clinical study.
In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
arXiv Detail & Related papers (2021-06-24T17:24:06Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.