Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability
- URL: http://arxiv.org/abs/2305.11788v2
- Date: Sun, 15 Oct 2023 17:53:26 GMT
- Title: Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability
- Authors: Jingfeng Wu, Vladimir Braverman, Jason D. Lee
- Abstract summary: In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
- Score: 69.01076284478151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research has observed that in machine learning optimization, gradient
descent (GD) often operates at the edge of stability (EoS) [Cohen, et al.,
2021], where the stepsizes are set to be large, resulting in non-monotonic
losses induced by the GD iterates. This paper studies the convergence and
implicit bias of constant-stepsize GD for logistic regression on linearly
separable data in the EoS regime. Despite the presence of local oscillations,
we prove that the logistic loss can be minimized by GD with \emph{any} constant
stepsize over a long time scale. Furthermore, we prove that with \emph{any}
constant stepsize, the GD iterates tend to infinity when projected to a
max-margin direction (the hard-margin SVM direction) and converge to a fixed
vector that minimizes a strongly convex potential when projected to the
orthogonal complement of the max-margin direction. In contrast, we also show
that in the EoS regime, GD iterates may diverge catastrophically under the
exponential loss, highlighting the superiority of the logistic loss. These
theoretical findings are in line with numerical simulations and complement
existing theories on the convergence and implicit bias of GD for logistic
regression, which are only applicable when the stepsizes are sufficiently
small.
Related papers
- Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression [28.3662709740417]
In logistic regression, gradient descent (GD) diverges in norm while converging in direction to the maximum $ell$-margin solution.
This work investigates additional regularization effects induced by early stopping in well-specified high-dimensional logistic regression.
arXiv Detail & Related papers (2025-02-18T21:04:06Z) - Any-stepsize Gradient Descent for Separable Data under Fenchel--Young Losses [17.835960292396255]
We show arbitrary-stepsize gradient convergence for a general loss function based on the framework of emphFenchel--Young losses.
We argue that these better rate is possible because of emphseparation margin of loss functions, instead of the self-bounding property.
arXiv Detail & Related papers (2025-02-07T12:52:12Z) - Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise [20.922456964393213]
We establish generalization bounds for SGD with momentum (SGDm) under heavy-tailed noise.
For quadratic loss functions, we show that SGDm admits a worse generalization bound in the presence of momentum and heavy tails.
We develop a uniform-in-time discretization error bound, which to our knowledge, is the first result of its kind for SDEs with degenerate noise.
arXiv Detail & Related papers (2025-02-02T19:25:48Z) - Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning.
GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients.
This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z) - Chaotic Regularization and Heavy-Tailed Limits for Deterministic
Gradient Descent [4.511923587827301]
gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior.
In this study, we incorporate a chaotic component to GD in a controlled manner, and introduce multiscale perturbed GD (MPGD)
MPGD is a novel optimization framework where the GD recursion is augmented with chaotic perturbations that evolve via an independent dynamical system.
arXiv Detail & Related papers (2022-05-23T14:47:55Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Last Iterate Risk Bounds of SGD with Decaying Stepsize for
Overparameterized Linear Regression [122.70478935214128]
gradient descent (SGD) has been demonstrated to generalize well in many deep learning applications.
This paper provides problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize.
arXiv Detail & Related papers (2021-10-12T17:49:54Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.