Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent
- URL: http://arxiv.org/abs/2006.08157v1
- Date: Mon, 15 Jun 2020 06:30:19 GMT
- Title: Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent
- Authors: Yunwen Lei and Yiming Ying
- Abstract summary: We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
- Score: 55.85456985750134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently there are a considerable amount of work devoted to the study of the
algorithmic stability and generalization for stochastic gradient descent (SGD).
However, the existing stability analysis requires to impose restrictive
assumptions on the boundedness of gradients, strong smoothness and convexity of
loss functions. In this paper, we provide a fine-grained analysis of stability
and generalization for SGD by substantially relaxing these assumptions.
Firstly, we establish stability and generalization for SGD by removing the
existing bounded gradient assumptions. The key idea is the introduction of a
new stability measure called on-average model stability, for which we develop
novel bounds controlled by the risks of SGD iterates. This yields
generalization bounds depending on the behavior of the best model, and leads to
the first-ever-known fast bounds in the low-noise setting using stability
approach. Secondly, the smoothness assumption is relaxed by considering loss
functions with Holder continuous (sub)gradients for which we show that optimal
bounds are still achieved by balancing computation and stability. To our best
knowledge, this gives the first-ever-known stability and generalization bounds
for SGD with even non-differentiable loss functions. Finally, we study learning
problems with (strongly) convex objectives but non-convex loss functions.
Related papers
- A Precise Characterization of SGD Stability Using Loss Surface Geometry [8.942671556572073]
Descent Gradient (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding.
Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates.
arXiv Detail & Related papers (2024-01-22T19:46:30Z) - Stability and Generalization of Stochastic Compositional Gradient
Descent Algorithms [61.59448949684493]
We provide the stability and generalization analysis of compositional descent algorithms built from training examples.
We establish the uniform stability results for two popular compositional gradient descent algorithms, namely SCGD and SCSC.
We derive-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors.
arXiv Detail & Related papers (2023-07-07T02:40:09Z) - Exact Mean Square Linear Stability Analysis for SGD [28.65663421598186]
We provide an explicit condition on the step size that is both necessary and sufficient for linear stability of gradient descent (SGD)
We show that SGD's stability threshold is equivalent to that of a mixture process which takes in each a full batch gradient step w.p. $1-p$, and a single sample gradient step w.p. $p$.
arXiv Detail & Related papers (2023-06-13T15:29:23Z) - The Implicit Regularization of Dynamical Stability in Stochastic
Gradient Descent [32.25490196411385]
We study the implicit regularization of gradient descent (SGD) through the lens of em dynamical stability
We analyze the generalization properties of two-layer ReLU networks and diagonal linear networks.
arXiv Detail & Related papers (2023-05-27T14:54:21Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Stability and Generalization for Markov Chain Stochastic Gradient
Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems.
We establish the optimal excess population risk bounds for both smooth and non-smooth cases.
We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Stability of SGD: Tightness Analysis and Improved Bounds [8.831597193643628]
Gradient Descent (SGD) based methods have been widely used for training largescale machine learning models that generalize well in practice.
This paper addresses the question: is analysis [18] tight for smooth functions, and if not, for what kind of loss and data can the analysis improved?
arXiv Detail & Related papers (2021-02-10T05:43:27Z) - Toward Better Generalization Bounds with Locally Elastic Stability [41.7030651617752]
We argue that locally elastic stability implies tighter generalization bounds than those derived based on uniform stability.
We revisit the examples of bounded support vector machines, regularized least square regressions, and gradient descent.
arXiv Detail & Related papers (2020-10-27T02:04:53Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.