Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter
- URL: http://arxiv.org/abs/2209.07403v6
- Date: Sat, 28 Sep 2024 01:14:29 GMT
- Title: Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter
- Authors: Andrew Lowy, Meisam Razaviyayn,
- Abstract summary: We study differentially private (DP) inequality optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data may be infinite.
For smooth loss functions, we provide linear-time algorithms with state-of-the-art excess risk.
We also provide the first algorithm to handle non-optimal convex loss functions.
- Score: 14.040676498310198
- License:
- Abstract: We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data may be extremely large or infinite. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous (i.e. stochastic gradients are uniformly bounded) over data. While this assumption is convenient, it often leads to pessimistic risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data may be huge due to outliers and/or heavy-tailed data. In such cases, the risk bounds for DP SO, which scale with the worst-case Lipschitz parameter, are vacuous. To address these limitations, we provide improved risk bounds that do not depend on the uniform Lipschitz parameter. Following a recent line of work [WXDX20, KLZ22], we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our risk bounds scale with the $k$-th moment instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For smooth convex loss functions, we provide linear-time algorithms with state-of-the-art excess risk. We complement our excess risk upper bounds with novel lower bounds. In certain parameter regimes, our linear-time excess risk bounds are minimax optimal. Second, we provide the first algorithm to handle non-smooth convex loss functions. To do so, we develop novel algorithmic and stability-based proof techniques, which we believe will be useful for future work in obtaining optimal excess risk. Finally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Novel Quadratic Constraints for Extending LipSDP beyond Slope-Restricted
Activations [52.031701581294804]
Lipschitz bounds for neural networks can be computed with upper time preservation guarantees.
Our paper bridges the gap and extends Lipschitz beyond slope-restricted activation functions.
Our proposed analysis is general and provides a unified approach for estimating $ell$ and $ell_infty$ Lipschitz bounds.
arXiv Detail & Related papers (2024-01-25T09:23:31Z) - Certified Robust Models with Slack Control and Large Lipschitz Constants [102.69689641398227]
We propose a Calibrated Lipschitz-Margin Loss (CLL) that addresses two problems.
Firstly, commonly used margin losses do not adjust the penalties to the shrinking output distribution.
Secondly, minimization of $K$ can lead to overly smooth decision functions.
Our CLL addresses these issues by explicitly calibrating the loss w.r.t. margin and Lipschitz constant.
arXiv Detail & Related papers (2023-09-12T12:23:49Z) - Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for
Full-Batch GD [31.80268332522017]
We provide sharp path-dependent and excess error guarantees for the full-batch Gradient Decent (GD) for smooth losses (possibly non-Lipschitz)
Our full-batch generalization error and excess risk bounds are significantly tighter than the existing bounds for GD, when the loss is smooth (but possibly non-Lipschitz)
arXiv Detail & Related papers (2022-04-26T17:05:57Z) - Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural
Networks [77.82638674792292]
Lipschitz constants of neural networks allow for guarantees of robustness in image classification, safety in controller design, and generalizability beyond the training data.
As calculating Lipschitz constants is NP-hard, techniques for estimating Lipschitz constants must navigate the trade-off between scalability and accuracy.
In this work, we significantly push the scalability frontier of a semidefinite programming technique known as LipSDP while achieving zero accuracy loss.
arXiv Detail & Related papers (2022-04-02T11:57:52Z) - Private Non-Convex Federated Learning Without a Trusted Server [7.971065005161566]
We propose novel algorithms for cross-silo learning (FL) with non-trusted loss functions and data from people who do not trust other silos.
Our algorithms attain the optimal strongly convex, homogeneous (i.i.d.) for ISRL-DP FL without assuming convexity or i.i.d. data.
Numerical experiments show that our algorithm has better accuracy than baselines for most privacy levels.
arXiv Detail & Related papers (2022-03-13T19:17:15Z) - Improved Rates for Differentially Private Stochastic Convex Optimization
with Heavy-Tailed Data [13.465471169118974]
We study convex optimization with heavy-tailed data under the constraint of differential privacy.
We prove nearly-matching lower bounds under the constraint of pure differential privacy, giving strong evidence that our bounds are tight.
arXiv Detail & Related papers (2021-06-02T17:45:47Z) - Stability and Deviation Optimal Risk Bounds with Convergence Rate
$O(1/n)$ [4.1499725848998965]
We show a high probability excess risk bound with the rate $O(log n/n)$ for strongly convex and Lipschitz losses valid for emphany empirical risk minimization method.
We discuss how $O(log n/n)$ high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.
arXiv Detail & Related papers (2021-03-22T17:28:40Z) - Non-Euclidean Differentially Private Stochastic Convex Optimization [15.302167005107135]
We show that noisy gradient descent (SGD) algorithms attain the optimal excess risk in low-dimensional regimes.
Our work draws upon concepts from the geometry of normed spaces, such as the notions of regularity, uniform convexity, and uniform smoothness.
arXiv Detail & Related papers (2021-03-01T19:48:44Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.