Overcoming Barriers to Data Sharing with Medical Image Generation: A
Comprehensive Evaluation
- URL: http://arxiv.org/abs/2012.03769v1
- Date: Sun, 29 Nov 2020 15:41:46 GMT
- Title: Overcoming Barriers to Data Sharing with Medical Image Generation: A
Comprehensive Evaluation
- Authors: August DuMont Sch\"utte, J\"urgen Hetzel, Sergios Gatidis, Tobias
Hepp, Benedikt Dietz, Stefan Bauer and Patrick Schwab
- Abstract summary: We utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data.
The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information.
We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset.
- Score: 17.983449515155414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy concerns around sharing personally identifiable information are a
major practical barrier to data sharing in medical research. However, in many
cases, researchers have no interest in a particular individual's information
but rather aim to derive insights at the level of cohorts. Here, we utilize
Generative Adversarial Networks (GANs) to create derived medical imaging
datasets consisting entirely of synthetic patient data. The synthetic images
ideally have, in aggregate, similar statistical properties to those of a source
dataset but do not contain sensitive personal information. We assess the
quality of synthetic data generated by two GAN models for chest radiographs
with 14 different radiology findings and brain computed tomography (CT) scans
with six types of intracranial hemorrhages. We measure the synthetic image
quality by the performance difference of predictive models trained on either
the synthetic or the real dataset. We find that synthetic data performance
disproportionately benefits from a reduced number of unique label combinations
and determine at what number of samples per class overfitting effects start to
dominate GAN training. Our open-source benchmark findings also indicate that
synthetic data generation can benefit from higher levels of spatial resolution.
We additionally conducted a reader study in which trained radiologists do not
perform better than random on discriminating between synthetic and real medical
images for both data modalities to a statistically significant extent. Our
study offers valuable guidelines and outlines practical conditions under which
insights derived from synthetic medical images are similar to those that would
have been derived from real imaging data. Our results indicate that synthetic
data sharing may be an attractive and privacy-preserving alternative to sharing
real patient-level data in the right settings.
Related papers
- Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation [0.0]
This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model (DDPM) for generating synthetic medical images.
Our approach generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints.
We evaluate the method in two scenarios: training a segmentation model exclusively on synthetic data, and augmenting real-world training data with synthetic images.
arXiv Detail & Related papers (2024-10-16T13:20:57Z) - Dataset Distillation for Histopathology Image Classification [46.04496989951066]
We introduce a novel dataset distillation algorithm tailored for histopathology image datasets (Histo-DD)
We conduct a comprehensive evaluation of the effectiveness of the proposed algorithm and the generated histopathology samples in both patch-level and slide-level classification tasks.
arXiv Detail & Related papers (2024-08-19T05:53:38Z) - Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data [2.1375651880073834]
generative AI models have been gaining traction for facilitating open-data sharing.
These models generate patient data copies instead of novel synthetic samples.
We train 2D and 3D latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation.
arXiv Detail & Related papers (2024-02-01T22:58:21Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided
Diffusion Model [4.057796755073023]
We develop controllable diffusion models for medical image synthesis, called EMIT-Diff.
We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data.
In our approach, we ensure that the synthesized samples adhere to medically relevant constraints.
arXiv Detail & Related papers (2023-10-19T16:18:02Z) - Augmenting medical image classifiers with synthetic data from latent
diffusion models [12.077733447347592]
We show that latent diffusion models can scalably generate images of skin disease.
We generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies.
arXiv Detail & Related papers (2023-08-23T22:34:49Z) - A Deep Learning Approach to Private Data Sharing of Medical Images Using
Conditional GANs [1.2099130772175573]
We present a method for generating a synthetic dataset based on COSENTYX (secukinumab) Ankylosing Spondylitis clinical study.
In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
arXiv Detail & Related papers (2021-06-24T17:24:06Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.