A Deep Learning Approach to Private Data Sharing of Medical Images Using
Conditional GANs
- URL: http://arxiv.org/abs/2106.13199v1
- Date: Thu, 24 Jun 2021 17:24:06 GMT
- Title: A Deep Learning Approach to Private Data Sharing of Medical Images Using
Conditional GANs
- Authors: Hanxi Sun, Jason Plawinski, Sajanth Subramaniam, Amir Jamaludin, Timor
Kadir, Aimee Readie, Gregory Ligozio, David Ohlssen, Mark Baillie, Thibaud
Coroller
- Abstract summary: We present a method for generating a synthetic dataset based on COSENTYX (secukinumab) Ankylosing Spondylitis clinical study.
In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
- Score: 1.2099130772175573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sharing data from clinical studies can facilitate innovative data-driven
research and ultimately lead to better public health. However, sharing
biomedical data can put sensitive personal information at risk. This is usually
solved by anonymization, which is a slow and expensive process. An alternative
to anonymization is sharing a synthetic dataset that bears a behaviour similar
to the real data but preserves privacy. As part of the collaboration between
Novartis and the Oxford Big Data Institute, we generate a synthetic dataset
based on COSENTYX (secukinumab) Ankylosing Spondylitis (AS) clinical study. We
apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic
resonance images (MRIs) of vertebral units (VUs). The images are conditioned on
the VU location (cervical, thoracic and lumbar). In this paper, we present a
method for generating a synthetic dataset and conduct an in-depth analysis on
its properties of along three key metrics: image fidelity, sample diversity and
dataset privacy.
Related papers
- SynRL: Aligning Synthetic Clinical Trial Data with Human-preferred Clinical Endpoints Using Reinforcement Learning [23.643984146939573]
We propose SynRL which leverages reinforcement learning to improve the performance of patient data generators.
Our method includes a data value critic function to evaluate the quality of the generated data and uses reinforcement learning to align the data generator with the users' needs.
arXiv Detail & Related papers (2024-11-11T19:19:46Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing [5.900946696794718]
We present a model designed to produce high-fidelity, long and accessible complete data samples with near-real-time efficiency.
We develop our generation method based on diffusion models and introduce a protocol for medical video dataset anonymization.
We present EchoNet-Synthetic, a fully synthetic, privacy-compliant echocardiogram dataset with paired ejection fraction labels.
arXiv Detail & Related papers (2024-06-02T17:18:06Z) - Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data [2.1375651880073834]
generative AI models have been gaining traction for facilitating open-data sharing.
These models generate patient data copies instead of novel synthetic samples.
We train 2D and 3D latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation.
arXiv Detail & Related papers (2024-02-01T22:58:21Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - The Health Gym: Synthetic Health-Related Datasets for the Development of
Reinforcement Learning Algorithms [2.032684842401705]
Health Gym is a collection of synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms.
The datasets were created using a novel generative adversarial network (GAN)
The risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.
arXiv Detail & Related papers (2022-03-12T07:28:02Z) - FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources.
Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19.
The data itself is still scarce due to patient privacy concerns.
We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z) - Overcoming Barriers to Data Sharing with Medical Image Generation: A
Comprehensive Evaluation [17.983449515155414]
We utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data.
The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information.
We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset.
arXiv Detail & Related papers (2020-11-29T15:41:46Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.