The Cost of Privacy in Asynchronous Differentially-Private Machine
Learning
- URL: http://arxiv.org/abs/2003.08500v2
- Date: Mon, 29 Jun 2020 04:53:58 GMT
- Title: The Cost of Privacy in Asynchronous Differentially-Private Machine
Learning
- Authors: Farhad Farokhi, Nan Wu, David Smith, Mohamed Ali Kaafar
- Abstract summary: We develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets.
A central learner interacts with the private data owners one-on-one whenever they are available for communication.
We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms.
- Score: 17.707240607542236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider training machine learning models using Training data located on
multiple private and geographically-scattered servers with different privacy
settings. Due to the distributed nature of the data, communicating with all
collaborating private data owners simultaneously may prove challenging or
altogether impossible. In this paper, we develop differentially-private
asynchronous algorithms for collaboratively training machine-learning models on
multiple private datasets. The asynchronous nature of the algorithms implies
that a central learner interacts with the private data owners one-on-one
whenever they are available for communication without needing to aggregate
query responses to construct gradients of the entire fitness function.
Therefore, the algorithm efficiently scales to many data owners. We define the
cost of privacy as the difference between the fitness of a privacy-preserving
machine-learning model and the fitness of trained machine-learning model in the
absence of privacy concerns. We prove that we can forecast the performance of
the proposed privacy-preserving asynchronous algorithms. We demonstrate that
the cost of privacy has an upper bound that is inversely proportional to the
combined size of the training datasets squared and the sum of the privacy
budgets squared. We validate the theoretical results with experiments on
financial and medical datasets. The experiments illustrate that collaboration
among more than 10 data owners with at least 10,000 records with privacy
budgets greater than or equal to 1 results in a superior machine-learning model
in comparison to a model trained in isolation on only one of the datasets,
illustrating the value of collaboration and the cost of the privacy. The number
of the collaborating datasets can be lowered if the privacy budget is higher.
Related papers
- Federated Transfer Learning with Differential Privacy [21.50525027559563]
We formulate the notion of textitfederated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server.
We show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy.
arXiv Detail & Related papers (2024-03-17T21:04:48Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Privacy for Free: How does Dataset Condensation Help Privacy? [21.418263507735684]
We identify that dataset condensation (DC) is also a better solution to replace the traditional data generators for private data generation.
We empirically validate the visual privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks.
arXiv Detail & Related papers (2022-06-01T05:39:57Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z) - Personalization Improves Privacy-Accuracy Tradeoffs in Federated
Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy.
We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z) - CaPC Learning: Confidential and Private Collaborative Learning [30.403853588224987]
We introduce Confidential and Private Collaborative (CaPC) learning, the first method provably achieving both confidentiality and privacy in a collaborative setting.
We demonstrate how CaPC allows participants to collaborate without having to explicitly join their training sets or train a central model.
arXiv Detail & Related papers (2021-02-09T23:50:24Z) - Differentially Private Synthetic Data: Applied Evaluations and
Enhancements [4.749807065324706]
Differentially private data synthesis protects personal details from exposure.
We evaluate four differentially private generative adversarial networks for data synthesis.
We propose QUAIL, an ensemble-based modeling approach to generating synthetic data.
arXiv Detail & Related papers (2020-11-11T04:03:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.