Gradient Descent with Linearly Correlated Noise: Theory and Applications
to Differential Privacy
- URL: http://arxiv.org/abs/2302.01463v3
- Date: Mon, 15 Jan 2024 17:27:40 GMT
- Title: Gradient Descent with Linearly Correlated Noise: Theory and Applications
to Differential Privacy
- Authors: Anastasia Koloskova, Ryan McKenna, Zachary Charles, Keith Rush,
Brendan McMahan
- Abstract summary: We study gradient descent under linearly correlated noise.
We use our results to develop new, effective matrix factorizations for differentially private optimization.
- Score: 17.81999485513265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study gradient descent under linearly correlated noise. Our work is
motivated by recent practical methods for optimization with differential
privacy (DP), such as DP-FTRL, which achieve strong performance in settings
where privacy amplification techniques are infeasible (such as in federated
learning). These methods inject privacy noise through a matrix factorization
mechanism, making the noise linearly correlated over iterations. We propose a
simplified setting that distills key facets of these methods and isolates the
impact of linearly correlated noise. We analyze the behavior of gradient
descent in this setting, for both convex and non-convex functions. Our analysis
is demonstrably tighter than prior work and recovers multiple important special
cases exactly (including anticorrelated perturbed gradient descent). We use our
results to develop new, effective matrix factorizations for differentially
private optimization, and highlight the benefits of these factorizations
theoretically and empirically.
Related papers
- Privacy without Noisy Gradients: Slicing Mechanism for Generative Model Training [10.229653770070202]
Training generative models with differential privacy (DP) typically involves injecting noise into gradient updates or adapting the discriminator's training procedure.
We consider the slicing privacy mechanism that injects noise into random low-dimensional projections of the private data.
We present a kernel-based estimator for this divergence, circumventing the need for adversarial training.
arXiv Detail & Related papers (2024-10-25T19:32:58Z) - Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Correlated Noise Provably Beats Independent Noise for Differentially Private Learning [25.81442865194914]
Differentially private learning algorithms inject noise into the learning process.
We show how correlated noise provably improves upon vanilla-SGD as a function of problem parameters.
arXiv Detail & Related papers (2023-10-10T16:48:18Z) - Online Sensitivity Optimization in Differentially Private Learning [8.12606646175019]
We present a novel approach to dynamically optimize the clipping threshold.
We treat this threshold as an additional learnable parameter, establishing a clean relationship between the threshold and the cost function.
Our method is thoroughly assessed against alternative fixed and adaptive strategies across diverse datasets, tasks, model dimensions, and privacy levels.
arXiv Detail & Related papers (2023-10-02T00:30:49Z) - Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge
Ensembles [34.32021888691789]
We develop a theory of feature-bagging in noisy least-squares ridge ensembles.
We demonstrate that subsampling shifts the double-descent peak of a linear predictor.
We compare the performance of a feature-subsampling ensemble to a single linear predictor.
arXiv Detail & Related papers (2023-07-06T17:56:06Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - Partial Identification with Noisy Covariates: A Robust Optimization
Approach [94.10051154390237]
Causal inference from observational datasets often relies on measuring and adjusting for covariates.
We show that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification.
Across synthetic and real datasets, we find that this approach provides ATE bounds with a higher coverage probability than existing methods.
arXiv Detail & Related papers (2022-02-22T04:24:26Z) - Adaptive Differentially Private Empirical Risk Minimization [95.04948014513226]
We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization.
We prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added.
arXiv Detail & Related papers (2021-10-14T15:02:20Z) - An automatic differentiation system for the age of differential privacy [65.35244647521989]
Tritium is an automatic differentiation-based sensitivity analysis framework for differentially private (DP) machine learning (ML)
We introduce Tritium, an automatic differentiation-based sensitivity analysis framework for differentially private (DP) machine learning (ML)
arXiv Detail & Related papers (2021-09-22T08:07:42Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Beyond variance reduction: Understanding the true impact of baselines on
policy optimization [24.09670734037029]
We show that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates.
We present theoretical results showing that, at least for bandit problems, curvature and noise are not sufficient to explain the learning dynamics.
arXiv Detail & Related papers (2020-08-31T17:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.