Gradient descent follows the regularization path for general losses
- URL: http://arxiv.org/abs/2006.11226v1
- Date: Fri, 19 Jun 2020 17:01:25 GMT
- Title: Gradient descent follows the regularization path for general losses
- Authors: Ziwei Ji, Miroslav Dud\'ik, Robert E. Schapire, Matus Telgarsky
- Abstract summary: We show that for empirical risk minimization over linear predictors with arbitrary convex losses, the gradient-descent path and the algorithm-independent regularization path converge to the same direction.
We provide a justification for the widely-used exponentially-tailed losses.
- Score: 33.155195855431344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work across many machine learning disciplines has highlighted that
standard descent methods, even without explicit regularization, do not merely
minimize the training error, but also exhibit an implicit bias. This bias is
typically towards a certain regularized solution, and relies upon the details
of the learning process, for instance the use of the cross-entropy loss.
In this work, we show that for empirical risk minimization over linear
predictors with arbitrary convex, strictly decreasing losses, if the risk does
not attain its infimum, then the gradient-descent path and the
algorithm-independent regularization path converge to the same direction
(whenever either converges to a direction). Using this result, we provide a
justification for the widely-used exponentially-tailed losses (such as the
exponential loss or the logistic loss): while this convergence to a direction
for exponentially-tailed losses is necessarily to the maximum-margin direction,
other losses such as polynomially-tailed losses may induce convergence to a
direction with a poor margin.
Related papers
- Refined Risk Bounds for Unbounded Losses via Transductive Priors [58.967816314671296]
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression.
Our key tools are based on the exponential weights algorithm with carefully chosen transductive priors.
arXiv Detail & Related papers (2024-10-29T00:01:04Z) - LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - General Loss Functions Lead to (Approximate) Interpolation in High
Dimensions [6.738946307589741]
We provide a unified framework to approximately characterize the implicit bias of gradient descent in closed form.
Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm in high dimensions.
Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings.
arXiv Detail & Related papers (2023-03-13T21:23:12Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for
Full-Batch GD [31.80268332522017]
We provide sharp path-dependent and excess error guarantees for the full-batch Gradient Decent (GD) for smooth losses (possibly non-Lipschitz)
Our full-batch generalization error and excess risk bounds are significantly tighter than the existing bounds for GD, when the loss is smooth (but possibly non-Lipschitz)
arXiv Detail & Related papers (2022-04-26T17:05:57Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Implicit Regularization in ReLU Networks with the Square Loss [56.70360094597169]
We show that it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters.
Our results suggest that a more general framework may be needed to understand implicit regularization for nonlinear predictors.
arXiv Detail & Related papers (2020-12-09T16:48:03Z) - On the generalization of bayesian deep nets for multi-class
classification [27.39403411896995]
We propose a new generalization bound for Bayesian deep nets by exploiting the contractivity of the Log-Sobolev inequalities.
Using these inequalities adds an additional loss-gradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity.
arXiv Detail & Related papers (2020-02-23T09:05:03Z) - The Implicit Bias of Gradient Descent on Separable Data [44.98410310356165]
We show the predictor converges to the direction of the max-margin (hard margin SVM) solution.
This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero.
arXiv Detail & Related papers (2017-10-27T21:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.