Public Data-Assisted Mirror Descent for Private Model Training
- URL: http://arxiv.org/abs/2112.00193v1
- Date: Wed, 1 Dec 2021 00:21:40 GMT
- Title: Public Data-Assisted Mirror Descent for Private Model Training
- Authors: Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang
Song, Thomas Steinke, Vinith M. Suriyakumar, Om Thakkar, Abhradeep Thakurta
- Abstract summary: We revisit the problem of using public data to improve the privacy/utility tradeoffs for differentially private (DP) model training.
We show that our algorithm not only significantly improves traditional DP-SGD and DP-FedAvg, but also improves over DP-SGD and DP-FedAvg on models that have been pre-trained with the public data.
- Score: 23.717811604829148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the problem of using public data to improve the privacy/utility
trade-offs for differentially private (DP) model training. Here, public data
refers to auxiliary data sets that have no privacy concerns. We consider public
data that is from the same distribution as the private training data.
For convex losses, we show that a variant of Mirror Descent provides
population risk guarantees which are independent of the dimension of the model
($p$). Specifically, we apply Mirror Descent with the loss generated by the
public data as the mirror map, and using DP gradients of the loss generated by
the private (sensitive) data. To obtain dimension independence, we require
$G_Q^2 \leq p$ public data samples, where $G_Q$ is a measure of the isotropy of
the loss function. We further show that our algorithm has a natural ``noise
stability'' property: If around the current iterate the public loss satisfies
$\alpha_v$-strong convexity in a direction $v$, then using noisy gradients
instead of the exact gradients shifts our next iterate in the direction $v$ by
an amount proportional to $1/\alpha_v$ (in contrast with DP-SGD, where the
shift is isotropic). Analogous results in prior works had to explicitly learn
the geometry using the public data in the form of preconditioner matrices. Our
method is also applicable to non-convex losses, as it does not rely on
convexity assumptions to ensure DP guarantees.
We demonstrate the empirical efficacy of our algorithm by showing
privacy/utility trade-offs on linear regression, deep learning benchmark
datasets (WikiText-2, CIFAR-10, and EMNIST), and in federated learning
(StackOverflow). We show that our algorithm not only significantly improves
over traditional DP-SGD and DP-FedAvg, which do not have access to public data,
but also improves over DP-SGD and DP-FedAvg on models that have been
pre-trained with the public data to begin with.
Related papers
- Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Optimal Differentially Private Model Training with Public Data [13.16576244790641]
Differential privacy (DP) ensures that training a machine learning model does not leak private data.
In practice, we may have access to auxiliary public data that is free of privacy concerns.
arXiv Detail & Related papers (2023-06-26T20:40:29Z) - Private Ad Modeling with DP-SGD [58.670969449674395]
A well-known algorithm in privacy-preserving ML is differentially private gradient descent (DP-SGD)
In this work we apply DP-SGD to several ad modeling tasks including predicting click-through rates, conversion rates, and number of conversion events.
Our work is the first to empirically demonstrate that DP-SGD can provide both privacy and utility for ad modeling tasks.
arXiv Detail & Related papers (2022-11-21T22:51:16Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Pre-trained Perceptual Features Improve Differentially Private Image
Generation [8.659595986100738]
Training even moderately-sized generative models with differentially-private descent gradient (DP-SGD) is difficult.
We advocate building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation.
Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models.
arXiv Detail & Related papers (2022-05-25T16:46:01Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Differentially Private Federated Learning via Inexact ADMM with Multiple
Local Updates [0.0]
We develop a DP inexact alternating direction method of multipliers algorithm with multiple local updates for federated learning.
We show that our algorithm provides $barepsilon$-DP for every iteration, where $barepsilon$ is a privacy budget controlled by the user.
We demonstrate that our algorithm reduces the testing error by at most $31%$ compared with the existing DP algorithm, while achieving the same level of data privacy.
arXiv Detail & Related papers (2022-02-18T19:58:47Z) - Differentially Private Federated Learning via Inexact ADMM [0.0]
Differential privacy (DP) techniques can be applied to the federated learning model to protect data privacy against inference attacks.
We develop a DP inexact alternating direction method of multipliers algorithm that solves a sequence of trust-region subproblems.
Our algorithm reduces the testing error by at most $22%$ compared with the existing DP algorithm, while achieving the same level of data privacy.
arXiv Detail & Related papers (2021-06-11T02:28:07Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.