Related papers: Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences

Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences

URL: http://arxiv.org/abs/2512.13123v3
Date: Mon, 22 Dec 2025 08:50:20 GMT
Title: Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences
Authors: Liviu Aolaritei, Michael I. Jordan,
Abstract summary: We study stopping rules for gradient descent (SGD) for convex optimization.<n>We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD.<n>These are the first rigorous, time-uniform performance guarantees and finitetime $varepsilon$-optimality certificates.
Score: 51.56484100374058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study stopping rules for stochastic gradient descent (SGD) for convex optimization from the perspective of anytime-valid confidence sequences. Classical analyses of SGD provide convergence guarantees in expectation or at a fixed horizon, but offer no statistically valid way to assess, at an arbitrary time, how close the current iterate is to the optimum. We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD, constructed via nonnegative supermartingales and requiring no smoothness or strong convexity. This confidence sequence yields a simple stopping rule that is provably $\varepsilon$-optimal with probability at least $1-α$, with explicit bounds on the stopping time under standard stochastic approximation stepsizes. To the best of our knowledge, these are the first rigorous, time-uniform performance guarantees and finite-time $\varepsilon$-optimality certificates for projected SGD with general convex objectives, based solely on observable trajectory quantities.

Related papers

Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods [7.061954653503474]
Differentially rigorous gradient descent (DP-SGD) has become the standard algorithm for machine learning models with privacy guarantees.<n>We show that despite its widespread expectation of convergence, the algorithm remains in both convex nonwise regimes.
arXiv Detail & Related papers (2025-11-20T17:42:40Z)
Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent [14.19520637866741]
We establish the non-asymptotic validity of the multiplier bootstrap procedure for constructing confidence sets.<n>We derive approximation rates in convex distance of order up to $1/sqrtn$.
arXiv Detail & Related papers (2025-02-10T17:49:05Z)
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.<n>Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z)
Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses. We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z)
Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data. We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z)
Private Robust Estimation by Stabilizing Convex Relaxations [22.513117502159922]
$(epsilon, delta)$-differentially private (DP) $(epsilon, delta)$-differentially private (DP) $(epsilon, delta)$-differentially private (DP)
arXiv Detail & Related papers (2021-12-07T07:47:37Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
Bounding the expected run-time of nonconvex optimization with early stopping [2.7648976108201815]
This work examines the convergence of gradient-based optimization algorithms that use early stopping based on a validation function. We derive conditions that guarantee this stopping rule is well-defined, and provide bounds on the expected number of iterations and gradient evaluations needed to meet this criterion.
arXiv Detail & Related papers (2020-02-20T16:43:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.