Beyond Uniform Lipschitz Condition in Differentially Private
Optimization
- URL: http://arxiv.org/abs/2206.10713v2
- Date: Tue, 6 Jun 2023 01:26:35 GMT
- Title: Beyond Uniform Lipschitz Condition in Differentially Private
Optimization
- Authors: Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi
- Abstract summary: Most prior results on differentially private gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradient is uniformly uniform.
We provide principled guidance choosing the norm over-dependent in our general version of Lipschitzness when the per-sample Lipschitz constants are bounded.
- Score: 28.04430869054056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most prior results on differentially private stochastic gradient descent
(DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness,
i.e., the per-sample gradients are uniformly bounded. We generalize uniform
Lipschitzness by assuming that the per-sample gradients have sample-dependent
upper bounds, i.e., per-sample Lipschitz constants, which themselves may be
unbounded. We provide principled guidance on choosing the clip norm in DP-SGD
for convex over-parameterized settings satisfying our general version of
Lipschitzness when the per-sample Lipschitz constants are bounded;
specifically, we recommend tuning the clip norm only till values up to the
minimum per-sample Lipschitz constant. This finds application in the private
training of a softmax layer on top of a deep network pre-trained on public
data. We verify the efficacy of our recommendation via experiments on 8
datasets. Furthermore, we provide new convergence results for DP-SGD on convex
and nonconvex functions when the Lipschitz constants are unbounded but have
bounded moments, i.e., they are heavy-tailed.
Related papers
- Global Well-posedness and Convergence Analysis of Score-based Generative Models via Sharp Lipschitz Estimates [1.3124513975412255]
We establish global well-posedness and convergence of the score-based generative models (SGM)
For the smooth case, we start from a Lipschitz bound of the score function with optimal time length.
The optimality is validated by an example whose Lipschitz constant of scores is bounded at initial but blows up in finite time.
arXiv Detail & Related papers (2024-05-25T07:31:24Z) - Novel Quadratic Constraints for Extending LipSDP beyond Slope-Restricted
Activations [52.031701581294804]
Lipschitz bounds for neural networks can be computed with upper time preservation guarantees.
Our paper bridges the gap and extends Lipschitz beyond slope-restricted activation functions.
Our proposed analysis is general and provides a unified approach for estimating $ell$ and $ell_infty$ Lipschitz bounds.
arXiv Detail & Related papers (2024-01-25T09:23:31Z) - DP-SGD Without Clipping: The Lipschitz Neural Network Way [5.922390405022253]
State-of-the-art approaches for training Differentially Private (DP) Deep Neural Networks (DNN)
By bounding the Lipschitz constant of each layer with respect to its parameters, we prove that we can train these networks with privacy guarantees.
Our analysis not only allows the computation of the aforementioned sensitivities at scale, but also provides guidance on how to maximize the gradient-to-noise ratio for fixed privacy guarantees.
arXiv Detail & Related papers (2023-05-25T16:05:46Z) - Revisiting Gradient Clipping: Stochastic bias and tight convergence
guarantees [37.40957596986653]
We give convergence guarantees that show precise dependence on arbitrary clipping thresholds $c$.
In particular, we show that for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence.
We give matching upper and lower bounds for convergence of the gradient norm when running clipped SGD.
arXiv Detail & Related papers (2023-05-02T16:42:23Z) - Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural
Networks [77.82638674792292]
Lipschitz constants of neural networks allow for guarantees of robustness in image classification, safety in controller design, and generalizability beyond the training data.
As calculating Lipschitz constants is NP-hard, techniques for estimating Lipschitz constants must navigate the trade-off between scalability and accuracy.
In this work, we significantly push the scalability frontier of a semidefinite programming technique known as LipSDP while achieving zero accuracy loss.
arXiv Detail & Related papers (2022-04-02T11:57:52Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Tight Second-Order Certificates for Randomized Smoothing [106.06908242424481]
We show that there also exists a universal curvature-like bound for Gaussian random smoothing.
In addition to proving the correctness of this novel certificate, we show that SoS certificates are realizable and therefore tight.
arXiv Detail & Related papers (2020-10-20T18:03:45Z) - On Lipschitz Regularization of Convolutional Layers using Toeplitz
Matrix Theory [77.18089185140767]
Lipschitz regularity is established as a key property of modern deep learning.
computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard.
We introduce a new upper bound for convolutional layers that is both tight and easy to compute.
arXiv Detail & Related papers (2020-06-15T13:23:34Z) - Exactly Computing the Local Lipschitz Constant of ReLU Networks [98.43114280459271]
The local Lipschitz constant of a neural network is a useful metric for robustness, generalization, and fairness evaluation.
We show strong inapproximability results for estimating Lipschitz constants of ReLU networks.
We leverage this algorithm to evaluate the tightness of competing Lipschitz estimators and the effects of regularized training on the Lipschitz constant.
arXiv Detail & Related papers (2020-03-02T22:15:54Z) - Adaptive Sampling Distributed Stochastic Variance Reduced Gradient for
Heterogeneous Distributed Datasets [14.945821529085896]
We study distributed optimization algorithms for minimizing the average of emphterogeneous functions with a focus on communication efficiency.
We propose a novel emphadaptive sampling of machines specially catered to these settings.
We show that the new way improves the dependence of convergence rate from maximum Lipschitz constant to emphaverage Lipschitz constant across machines.
arXiv Detail & Related papers (2020-02-20T01:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.