Exponential Convergence of (Stochastic) Gradient Descent for Separable Logistic Regression
- URL: http://arxiv.org/abs/2602.18946v2
- Date: Fri, 27 Feb 2026 06:28:51 GMT
- Title: Exponential Convergence of (Stochastic) Gradient Descent for Separable Logistic Regression
- Authors: Sacchit Kale, Piyushi Manupriya, Pierre Marion, Francis Bach, Anant Raj,
- Abstract summary: We show that gradient descent with a simple, non-adaptive increasing step-size schedule achieves exponential convergence for separable logistic regression under a margin condition.<n>We also establish exponential convergence of gradient descent using a lightweight adaptive step-size rule that avoids line search and specialized procedures.
- Score: 14.718691362208622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient descent and stochastic gradient descent are central to modern machine learning, yet their behavior under large step sizes remains theoretically unclear. Recent work suggests that acceleration often arises near the edge of stability, where optimization trajectories become unstable and difficult to analyze. Existing results for separable logistic regression achieve faster convergence by explicitly leveraging such unstable regimes through constant or adaptive large step sizes. In this paper, we show that instability is not inherent to acceleration. We prove that gradient descent with a simple, non-adaptive increasing step-size schedule achieves exponential convergence for separable logistic regression under a margin condition, while remaining entirely within a stable optimization regime. The resulting method is anytime and does not require prior knowledge of the optimization horizon or target accuracy. We also establish exponential convergence of stochastic gradient descent using a lightweight adaptive step-size rule that avoids line search and specialized procedures, improving upon existing polynomial-rate guarantees. Together, our results demonstrate that carefully structured step-size growth alone suffices to obtain exponential acceleration for both gradient descent and stochastic gradient descent.
Related papers
- Provably Convergent Decentralized Optimization over Directed Graphs under Generalized Smoothness [1.5892054128426507]
Hessian norm is allowed to grow linearly with the gradient norm, thereby accommodating rapidly varying gradients beyond Lipschitz smoothness.<n>We integrate gradient-tracking techniques with gradient clipping to ensure accurate convergence over directed communication graphs.<n>Our results remain valid even when the gradient dissimilarity is unbounded, making the proposed framework more applicable to realistic heterogeneous data environments.
arXiv Detail & Related papers (2026-01-07T04:25:33Z) - Stability and convergence analysis of AdaGrad for non-convex optimization via novel stopping time-based techniques [17.34603953600226]
Adaptive gradients (AdaGrad) have emerged as powerful tools in deep learning.<n>We provide a comprehensive analysis of AdaGrad and bridge the existing gaps in the literature.
arXiv Detail & Related papers (2024-09-08T08:29:51Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.<n>Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.<n>Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.<n>We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - On the Convergence of Gradient Descent for Large Learning Rates [55.33626480243135]
We show that convergence is impossible when a fixed step size is used.<n>We provide a proof of this in the case of linear neural networks with a squared loss.<n>We also prove the impossibility of convergence for more general losses without requiring strong assumptions such as Lipschitz continuity for the gradient.
arXiv Detail & Related papers (2024-02-20T16:01:42Z) - Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models
via Reparameterisation and Smoothing [1.6114012813668932]
We introduce a simple framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings.
Our main contribution is a novel variant of SGD, Diagonalisation Gradient Descent, which progressively enhances the accuracy of the smoothed approximation.
Our approach is simple, fast stable and attains orders of magnitude reduction in work-normalised variance.
arXiv Detail & Related papers (2024-02-19T00:43:22Z) - Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.