Toward Better Generalization Bounds with Locally Elastic Stability
- URL: http://arxiv.org/abs/2010.13988v2
- Date: Tue, 13 Jul 2021 17:29:29 GMT
- Title: Toward Better Generalization Bounds with Locally Elastic Stability
- Authors: Zhun Deng, Hangfeng He, Weijie J. Su
- Abstract summary: We argue that locally elastic stability implies tighter generalization bounds than those derived based on uniform stability.
We revisit the examples of bounded support vector machines, regularized least square regressions, and gradient descent.
- Score: 41.7030651617752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Algorithmic stability is a key characteristic to ensure the generalization
ability of a learning algorithm. Among different notions of stability,
\emph{uniform stability} is arguably the most popular one, which yields
exponential generalization bounds. However, uniform stability only considers
the worst-case loss change (or so-called sensitivity) by removing a single data
point, which is distribution-independent and therefore undesirable. There are
many cases that the worst-case sensitivity of the loss is much larger than the
average sensitivity taken over the single data point that is removed,
especially in some advanced models such as random feature models or neural
networks. Many previous works try to mitigate the distribution independent
issue by proposing weaker notions of stability, however, they either only yield
polynomial bounds or the bounds derived do not vanish as sample size goes to
infinity. Given that, we propose \emph{locally elastic stability} as a weaker
and distribution-dependent stability notion, which still yields exponential
generalization bounds. We further demonstrate that locally elastic stability
implies tighter generalization bounds than those derived based on uniform
stability in many situations by revisiting the examples of bounded support
vector machines, regularized least square regressions, and stochastic gradient
descent.
Related papers
- High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Stability and Generalization for Markov Chain Stochastic Gradient
Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems.
We establish the optimal excess population risk bounds for both smooth and non-smooth cases.
We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z) - Boosting the Confidence of Generalization for $L_2$-Stable Randomized
Learning Algorithms [41.082982732100696]
We show that a properly designed subbagging process leads to near-tight exponential generalization bounds over both data and algorithm.
We further derive generic results to improve high-probability generalization bounds for convex or non-tight problems with natural decaying learning rates.
arXiv Detail & Related papers (2022-06-08T12:14:01Z) - Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on
Least Squares [12.2950446921662]
Recent studies have shown that heavy tails can emerge in optimization and that the heaviness of the tails has links to the generalization error.
We establish novel links between the tail behavior and generalization properties of gradient descent (SGD) through the lens of algorithmic stability.
We support our theory with synthetic and real neural network experiments.
arXiv Detail & Related papers (2022-06-02T19:59:48Z) - Provably Auditing Ordinary Least Squares in Low Dimensions [17.655785504931913]
Most metrics measure stability of conclusions derived from Ordinary Least Squares linear regression.
Recent work proposes a simple, global, finite-sample stability metric: the minimum number of samples that need to be removed so that rerunning the analysis overturns the conclusion.
We show that in the low-dimensional regime where the number of co variables is a constant but the number of samples is large, there are efficient algorithms for provably estimating this metric.
arXiv Detail & Related papers (2022-05-28T00:45:10Z) - Towards Understanding Generalization via Decomposing Excess Risk
Dynamics [13.4379473119565]
We analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability.
Inspired by the observation that neural networks show a slow convergence rate when fitting noise, we propose decomposing the excess risk dynamics.
Under the decomposition framework, the new bound accords better with the theoretical and empirical evidence compared to the stability-based bound and uniform convergence bound.
arXiv Detail & Related papers (2021-06-11T03:42:45Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.