Differentially Private Multi-Party Data Release for Linear Regression
- URL: http://arxiv.org/abs/2206.07998v1
- Date: Thu, 16 Jun 2022 08:32:17 GMT
- Title: Differentially Private Multi-Party Data Release for Linear Regression
- Authors: Ruihan Wu, Xin Yang, Yuanshun Yao, Jiankai Sun, Tianyi Liu, Kilian Q.
Weinberger, Chong Wang
- Abstract summary: Differentially Private (DP) data release is a promising technique to disseminate data without compromising the privacy of data subjects.
In this paper we focus on the multi-party setting, where different stakeholders own disjoint sets of attributes belonging to the same group of data subjects.
We propose our novel method and prove it converges to the optimal (non-private) solutions with increasing dataset size.
- Score: 40.66319371232736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially Private (DP) data release is a promising technique to
disseminate data without compromising the privacy of data subjects. However the
majority of prior work has focused on scenarios where a single party owns all
the data. In this paper we focus on the multi-party setting, where different
stakeholders own disjoint sets of attributes belonging to the same group of
data subjects. Within the context of linear regression that allow all parties
to train models on the complete data without the ability to infer private
attributes or identities of individuals, we start with directly applying
Gaussian mechanism and show it has the small eigenvalue problem. We further
propose our novel method and prove it asymptotically converges to the optimal
(non-private) solutions with increasing dataset size. We substantiate the
theoretical results through experiments on both artificial and real-world
datasets.
Related papers
- Privacy-Optimized Randomized Response for Sharing Multi-Attribute Data [1.1510009152620668]
We propose a privacy-optimized randomized response that guarantees the strongest privacy in sharing multi-attribute data.
We also present an efficient algorithm for constructing a near-optimal attribute mechanism.
Our methods provide significantly stronger privacy guarantees for the entire dataset than the existing method.
arXiv Detail & Related papers (2024-02-12T11:34:42Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points.
It cannot be assumed that all users sample from the same underlying distribution.
We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z) - Data Analytics with Differential Privacy [0.0]
We develop differentially private algorithms to analyze distributed and streaming data.
In the distributed model, we consider the particular problem of learning -- in a distributed fashion -- a global model of the data.
We offer one of the strongest privacy guarantees for the streaming model, user-level pan-privacy.
arXiv Detail & Related papers (2023-07-20T17:43:29Z) - Differentially Private Synthetic Data Using KD-Trees [11.96971298978997]
We exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms.
We propose both data independent and data dependent algorithms for $epsilon$-differentially private synthetic data generation.
We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.
arXiv Detail & Related papers (2023-06-19T17:08:32Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Differentially Private Ensemble Classifiers for Data Streams [3.9838304163788183]
Adapting to evolving data characteristics (concept drift) while protecting data owners' private information is an open challenge.
We present a differentially private ensemble solution to this problem with two distinguishing features.
It allows an textitunbounded number of ensemble updates to deal with the potentially never-ending data streams.
It is textitmodel agnostic, in that it treats any pre-trained differentially private classification/regression model as a black-box.
arXiv Detail & Related papers (2021-12-09T00:55:04Z) - An Analysis of the Deployment of Models Trained on Private Tabular
Synthetic Data: Unexpected Surprises [4.129847064263057]
Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models.
We study the effects of differentially private synthetic data generation on classification.
arXiv Detail & Related papers (2021-06-15T21:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.