What You See is What You Get: Distributional Generalization for
Algorithm Design in Deep Learning
- URL: http://arxiv.org/abs/2204.03230v1
- Date: Thu, 7 Apr 2022 05:41:40 GMT
- Title: What You See is What You Get: Distributional Generalization for
Algorithm Design in Deep Learning
- Authors: Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jaros{\l}aw B{\l}asiok,
Preetum Nakkiran
- Abstract summary: We investigate and leverage a connection between Differential Privacy (DP) and the notion of Distributional Generalization (DG)
We introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard gradient descent (SGD)
- Score: 12.215964287323876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate and leverage a connection between Differential Privacy (DP)
and the recently proposed notion of Distributional Generalization (DG).
Applying this connection, we introduce new conceptual tools for designing
deep-learning methods that bypass "pathologies" of standard stochastic gradient
descent (SGD). First, we prove that differentially private methods satisfy a
"What You See Is What You Get (WYSIWYG)" generalization guarantee: whatever a
model does on its train data is almost exactly what it will do at test time.
This guarantee is formally captured by distributional generalization. WYSIWYG
enables principled algorithm design in deep learning by reducing
$\textit{generalization}$ concerns to $\textit{optimization}$ ones: in order to
mitigate unwanted behavior at test time, it is provably sufficient to mitigate
this behavior on the train data. This is notably false for standard (non-DP)
methods, hence this observation has applications even when privacy is not
required. For example, importance sampling is known to fail for standard SGD,
but we show that it has exactly the intended effect for DP-trained models.
Thus, with DP-SGD, unlike with SGD, we can influence test-time behavior by
making principled train-time interventions. We use these insights to construct
simple algorithms which match or outperform SOTA in several distributional
robustness applications, and to significantly improve the privacy vs. disparate
impact trade-off of DP-SGD. Finally, we also improve on known theoretical
bounds relating differential privacy, stability, and distributional
generalization.
Related papers
- Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent [1.0742675209112622]
Gradient Descent (SGD) is a widely used tool in machine learning.
Uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors.
We propose a novel block bootstrap for SGD under local differential privacy.
arXiv Detail & Related papers (2024-05-21T07:47:21Z) - How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z) - Differentially-Private Bayes Consistency [70.92545332158217]
We construct a Bayes consistent learning rule that satisfies differential privacy (DP)
We prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal sample complexity.
arXiv Detail & Related papers (2022-12-08T11:57:30Z) - Differentially Private Learning with Per-Sample Adaptive Clipping [8.401653565794353]
We propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function.
We show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
arXiv Detail & Related papers (2022-12-01T07:26:49Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Fast and Memory Efficient Differentially Private-SGD via JL Projections [29.37156662314245]
DP-SGD is the only known algorithm for private training of large scale neural networks.
We present a new framework to design differentially privates called DP-SGD-JL and DP-Adam-JL.
arXiv Detail & Related papers (2021-02-05T06:02:10Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.