Reducing bias and increasing utility by federated generative modeling of
medical images using a centralized adversary
- URL: http://arxiv.org/abs/2101.07235v1
- Date: Mon, 18 Jan 2021 18:40:46 GMT
- Title: Reducing bias and increasing utility by federated generative modeling of
medical images using a centralized adversary
- Authors: Jean-Francois Rajotte, Sumit Mukherjee, Caleb Robinson, Anthony Ortiz,
Christopher West, Juan Lavista Ferres, Raymond T Ng
- Abstract summary: We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a generative mechanism enabling collaborative learning.
We show how a data owner with limited and biased data could benefit from other data owners while keeping data from all the sources private.
This is a common scenario in medical image analysis where privacy legislation prevents data from being shared outside local premises.
- Score: 10.809871958865447
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a
generative mechanism enabling collaborative learning. In particular, we show
how a data owner with limited and biased data could benefit from other data
owners while keeping data from all the sources private. This is a common
scenario in medical image analysis where privacy legislation prevents data from
being shared outside local premises. FELICIA works for a large family of
Generative Adversarial Networks (GAN) architectures including vanilla and
conditional GANs as demonstrated in this work. We show that by using the
FELICIA mechanism, a data owner with limited image samples can generate
high-quality synthetic images with high utility while neither data owners has
to provide access to its data. The sharing happens solely through a central
discriminator that has access limited to synthetic data. Here, utility is
defined as classification performance on a real test set. We demonstrate these
benefits on several realistic healthcare scenarions using benchmark image
datasets (MNIST, CIFAR-10) as well as on medical images for the task of skin
lesion classification. With multiple experiments, we show that even in the
worst cases, combining FELICIA with real data gracefully achieves performance
on par with real data while most results significantly improves the utility.
Related papers
- DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data.
These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy.
We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - The (de)biasing effect of GAN-based augmentation methods on skin lesion
images [3.441021278275805]
New medical datasets might still be a source of spurious correlations that affect the learning process.
One approach to alleviate the data imbalance is using data augmentation with Generative Adversarial Networks (GANs)
This work explored unconditional and conditional GANs to compare their bias inheritance and how the synthetic data influenced the models.
arXiv Detail & Related papers (2022-06-30T10:32:35Z) - Escaping Data Scarcity for High-Resolution Heterogeneous Face
Hallucination [68.78903256687697]
In Heterogeneous Face Recognition (HFR), the objective is to match faces across two different domains such as visible and thermal.
Recent methods attempting to fill the gap via synthesis have achieved promising results, but their performance is still limited by the scarcity of paired training data.
In this paper, we propose a new face hallucination paradigm for HFR, which not only enables data-efficient synthesis but also allows to scale up model training without breaking any privacy policy.
arXiv Detail & Related papers (2022-03-30T20:44:33Z) - A Deep Learning Approach to Private Data Sharing of Medical Images Using
Conditional GANs [1.2099130772175573]
We present a method for generating a synthetic dataset based on COSENTYX (secukinumab) Ankylosing Spondylitis clinical study.
In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
arXiv Detail & Related papers (2021-06-24T17:24:06Z) - Overcoming Barriers to Data Sharing with Medical Image Generation: A
Comprehensive Evaluation [17.983449515155414]
We utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data.
The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information.
We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset.
arXiv Detail & Related papers (2020-11-29T15:41:46Z) - Private data sharing between decentralized users through the privGAN
architecture [1.3923892290096642]
We propose a method for data owners to share synthetic or fake versions of their data without sharing the actual data.
We demonstrate that this approach, when applied to subsets of various sizes, leads to better utility for the owners than the utility from their real datasets.
arXiv Detail & Related papers (2020-09-14T22:06:13Z) - GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially
Private Generators [74.16405337436213]
We propose Gradient-sanitized Wasserstein Generative Adrial Networks (GS-WGAN)
GS-WGAN allows releasing a sanitized form of sensitive data with rigorous privacy guarantees.
We find our approach consistently outperforms state-of-the-art approaches across multiple metrics.
arXiv Detail & Related papers (2020-06-15T10:01:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.