Optimal Differentially Private Model Training with Public Data
- URL: http://arxiv.org/abs/2306.15056v2
- Date: Wed, 14 Feb 2024 04:36:16 GMT
- Title: Optimal Differentially Private Model Training with Public Data
- Authors: Andrew Lowy, Zeman Li, Tianjian Huang, Meisam Razaviyayn
- Abstract summary: Differential privacy (DP) ensures that training a machine learning model does not leak private data.
In practice, we may have access to auxiliary public data that is free of privacy concerns.
- Score: 14.382649652412322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differential privacy (DP) ensures that training a machine learning model does
not leak private data. In practice, we may have access to auxiliary public data
that is free of privacy concerns. In this work, we assume access to a given
amount of public data and settle the following fundamental open questions: 1.
What is the optimal (worst-case) error of a DP model trained over a private
data set while having access to side public data? 2. How can we harness public
data to improve DP model training in practice? We consider these questions in
both the local and central models of pure and approximate DP. To answer the
first question, we prove tight (up to log factors) lower and upper bounds that
characterize the optimal error rates of three fundamental problems: mean
estimation, empirical risk minimization, and stochastic convex optimization. We
show that the optimal error rates can be attained (up to log factors) by either
discarding private data and training a public model, or treating public data
like it is private and using an optimal DP algorithm. To address the second
question, we develop novel algorithms that are "even more optimal" (i.e. better
constants) than the asymptotically optimal approaches described above. For
local DP mean estimation, our algorithm is \ul{optimal including constants}.
Empirically, our algorithms show benefits over the state-of-the-art.
Related papers
- Private Fine-tuning of Large Language Models with Zeroth-order
Optimization [54.24600476755372]
We introduce DP-ZO, a new method for fine-tuning large language models that preserves the privacy of training data by privatizing zeroth-order optimization.
We show that DP-ZO exhibits just $1.86%$ performance degradation due to privacy at $ (1,10-5)$-DP when fine-tuning OPT-66B on 1000 training samples from SQuAD.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Optimal Locally Private Nonparametric Classification with Public Data [2.631955426232593]
We investigate the problem of public data assisted non-interactive Local Differentially Private (LDP) learning with a focus on non-parametric classification.
Under the posterior drift assumption, we derive the mini-max optimal convergence rate with LDP constraint.
We present a novel approach, the locally differentially private classification tree, which attains the mini-max optimal convergence rate.
arXiv Detail & Related papers (2023-11-19T16:35:01Z) - DPGOMI: Differentially Private Data Publishing with Gaussian Optimized
Model Inversion [8.204115285718437]
We propose Differentially Private Data Publishing with Gaussian Optimized Model Inversion (DPGOMI) to address this issue.
Our approach involves mapping private data to the latent space using a public generator, followed by a lower-dimensional DP-GAN with better convergence properties.
Our results show that DPGOMI outperforms the standard DP-GAN method in terms of Inception Score, Freche't Inception Distance, and classification performance.
arXiv Detail & Related papers (2023-10-06T18:46:22Z) - Why Is Public Pretraining Necessary for Private Model Training? [50.054565310457306]
We show that pretraining on publicly available data leads to distinct gains over nonprivate settings.
We argue that the tradeoff may be a deeper loss model that requires an algorithm to go through two phases.
Guided by intuition, we provide theoretical constructions that provably demonstrate the separation between private with and without public pretraining.
arXiv Detail & Related papers (2023-02-19T05:32:20Z) - Packing Privacy Budget Efficiently [10.51351125953885]
differential privacy (DP) provides a rigorous way to bound that leakage under a given budget.
This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data.
We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency.
arXiv Detail & Related papers (2022-12-26T17:25:02Z) - DP$^2$-VAE: Differentially Private Pre-trained Variational Autoencoders [26.658723213776632]
We propose DP$2$-VAE, a training mechanism for variational autoencoders (VAE) with provable DP guarantees and improved utility via emphpre-training on private data.
We conduct extensive experiments on image datasets to illustrate our superiority over baselines under various privacy budgets and evaluation metrics.
arXiv Detail & Related papers (2022-08-05T23:57:34Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Public Data-Assisted Mirror Descent for Private Model Training [23.717811604829148]
We revisit the problem of using public data to improve the privacy/utility tradeoffs for differentially private (DP) model training.
We show that our algorithm not only significantly improves traditional DP-SGD and DP-FedAvg, but also improves over DP-SGD and DP-FedAvg on models that have been pre-trained with the public data.
arXiv Detail & Related papers (2021-12-01T00:21:40Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.