Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?
- URL: http://arxiv.org/abs/2507.11228v1
- Date: Tue, 15 Jul 2025 11:58:42 GMT
- Title: Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?
- Authors: Si Yi Meng, Baptiste Goujaud, Antonio Orvieto, Christopher De Sa,
- Abstract summary: This paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence.<n>We prove this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur.<n>We hope to inspire further studies on quantifying how common these cycles are in realistic datasets.
- Score: 40.358326886432664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient descent (GD) on logistic regression has many fascinating properties. When the dataset is linearly separable, it is known that the iterates converge in direction to the maximum-margin separator regardless of how large the step size is. In the non-separable case, however, it has been shown that GD can exhibit a cycling behaviour even when the step sizes is still below the stability threshold $2/\lambda$, where $\lambda$ is the largest eigenvalue of the Hessian at the solution. This short paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence, under any step size below the stability threshold. We prove that this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur. We hope to inspire further studies on quantifying how common these cycles are in realistic datasets, as well as finding sufficient conditions to guarantee global convergence with large step sizes.
Related papers
- Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression [9.15420898792103]
We study gradient descent (GD) with a constant stepsize for $ell$-regularized logistic regression with linearly separable data.<n>We show that this can be accelerated to $widetildemathcalO(sqrtkappa)$ by simply using a large stepsize.
arXiv Detail & Related papers (2025-06-03T00:21:22Z) - Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods [13.677904140815386]
Gradient Descent (SGD) is widely used in machine learning research.<n>This paper introduces a novel analytical method for convergence analysis of SGD under more relaxed step-size conditions.
arXiv Detail & Related papers (2025-04-17T02:56:20Z) - Riemannian Federated Learning via Averaging Gradient Stream [8.75592575216789]
This paper develops and analyzes an efficient Federated Averaging Gradient Stream (RFedAGS) algorithm.
Numerical simulations conducted on synthetic and real-world data demonstrate the performance of the proposed RFedAGS.
arXiv Detail & Related papers (2024-09-11T12:28:42Z) - Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes [38.595892152591595]
We study descent dynamics on logistic regression problems with large, constant step sizes.
Local convergence is guaranteed for all step sizes less than the critical step size, but global convergence is not.
arXiv Detail & Related papers (2024-06-07T15:53:06Z) - On the Convergence of Gradient Descent for Large Learning Rates [55.33626480243135]
We show that convergence is impossible when a fixed step size is used.<n>We provide a proof of this in the case of linear neural networks with a squared loss.<n>We also prove the impossibility of convergence for more general losses without requiring strong assumptions such as Lipschitz continuity for the gradient.
arXiv Detail & Related papers (2024-02-20T16:01:42Z) - Implicit Manifold Gaussian Process Regression [49.0787777751317]
Gaussian process regression is widely used to provide well-calibrated uncertainty estimates.
It struggles with high-dimensional data because of the implicit low-dimensional manifold upon which the data actually lies.
In this paper we propose a technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way.
arXiv Detail & Related papers (2023-10-30T09:52:48Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.