Private data sharing between decentralized users through the privGAN
architecture
- URL: http://arxiv.org/abs/2009.06764v1
- Date: Mon, 14 Sep 2020 22:06:13 GMT
- Title: Private data sharing between decentralized users through the privGAN
architecture
- Authors: Jean-Francois Rajotte, Raymond T Ng
- Abstract summary: We propose a method for data owners to share synthetic or fake versions of their data without sharing the actual data.
We demonstrate that this approach, when applied to subsets of various sizes, leads to better utility for the owners than the utility from their real datasets.
- Score: 1.3923892290096642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: More data is almost always beneficial for analysis and machine learning
tasks. In many realistic situations however, an enterprise cannot share its
data, either to keep a competitive advantage or to protect the privacy of the
data sources, the enterprise's clients for example. We propose a method for
data owners to share synthetic or fake versions of their data without sharing
the actual data, nor the parameters of models that have direct access to the
data. The method proposed is based on the privGAN architecture where local GANs
are trained on their respective data subsets with an extra penalty from a
central discriminator aiming to discriminate the origin of a given fake sample.
We demonstrate that this approach, when applied to subsets of various sizes,
leads to better utility for the owners than the utility from their real small
datasets. The only shared pieces of information are the parameter updates of
the central discriminator. The privacy is demonstrated with white-box attacks
on the most vulnerable elments of the architecture and the results are close to
random guessing. This method would apply naturally in a federated learning
setting.
Related papers
- Federated Face Forgery Detection Learning with Personalized Representation [63.90408023506508]
Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat.
Traditional forgery detection methods directly centralized training on data.
The paper proposes a novel federated face forgery detection learning with personalized representation.
arXiv Detail & Related papers (2024-06-17T02:20:30Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Little is Enough: Improving Privacy by Sharing Labels in Federated Semi-Supervised Learning [10.972006295280636]
In many critical applications, sensitive data is inherently distributed and cannot be centralized due to privacy concerns.
Most of these approaches either share local model parameters, soft predictions on a public dataset, or a combination of both.
This, however, still discloses private information and restricts local models to those that lend themselves to training via gradient-based methods.
We propose to share only hard labels on a public unlabeled dataset, and use a consensus over the shared labels as a pseudo-labeling to be used by clients.
arXiv Detail & Related papers (2023-10-09T13:16:10Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Differentially Private Language Models for Secure Data Sharing [19.918137395199224]
In this paper, we show how to train a generative language model in a differentially private manner and consequently sampling data from it.
Using natural language prompts and a new prompt-mismatch loss, we are able to create highly accurate and fluent textual datasets.
We perform thorough experiments indicating that our synthetic datasets do not leak information from our original data and are of high language quality.
arXiv Detail & Related papers (2022-10-25T11:12:56Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Federated Learning in Non-IID Settings Aided by Differentially Private
Synthetic Data [20.757477553095637]
Federated learning (FL) is a privacy-promoting framework that enables clients to collaboratively train machine learning models.
A major challenge in federated learning arises when the local data is heterogeneous.
We propose FedDPMS, an FL algorithm in which clients deploy variational auto-encoders to augment local datasets with data synthesized using differentially private means of latent data representations.
arXiv Detail & Related papers (2022-06-01T18:00:48Z) - Personalization Improves Privacy-Accuracy Tradeoffs in Federated
Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy.
We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z) - Federating Recommendations Using Differentially Private Prototypes [16.29544153550663]
We propose a new federated approach to learning global and local private models for recommendation without collecting raw data.
By requiring only two rounds of communication, we both reduce the communication costs and avoid the excessive privacy loss.
We show local adaptation of the global model allows our method to outperform centralized matrix-factorization-based recommender system models.
arXiv Detail & Related papers (2020-03-01T22:21:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.