Stopping Criteria for, and Strong Convergence of, Stochastic Gradient
Descent on Bottou-Curtis-Nocedal Functions
- URL: http://arxiv.org/abs/2004.00475v2
- Date: Thu, 1 Apr 2021 16:35:35 GMT
- Title: Stopping Criteria for, and Strong Convergence of, Stochastic Gradient
Descent on Bottou-Curtis-Nocedal Functions
- Authors: Vivak Patel
- Abstract summary: Stopping criteria for Gradient Descent (SGD) methods are often used to analyse non-size functions.
We develop criteria that can be used to analyse non-size functions and Bottou-Nocedal functions.
As a result of our work, our work can be used to develop new adaptive step schemes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stopping criteria for Stochastic Gradient Descent (SGD) methods play
important roles from enabling adaptive step size schemes to providing rigor for
downstream analyses such as asymptotic inference. Unfortunately, current
stopping criteria for SGD methods are often heuristics that rely on asymptotic
normality results or convergence to stationary distributions, which may fail to
exist for nonconvex functions and, thereby, limit the applicability of such
stopping criteria. To address this issue, in this work, we rigorously develop
two stopping criteria for SGD that can be applied to a broad class of nonconvex
functions, which we term Bottou-Curtis-Nocedal functions. Moreover, as a
prerequisite for developing these stopping criteria, we prove that the gradient
function evaluated at SGD's iterates converges strongly to zero for
Bottou-Curtis-Nocedal functions, which addresses an open question in the SGD
literature. As a result of our work, our rigorously developed stopping criteria
can be used to develop new adaptive step size schemes or bolster other
downstream analyses for nonconvex functions.
Related papers
- Central Limit Theorems for Transition Probabilities of Controlled Markov Chains [14.351243505824886]
We develop a central limit theorem (CLT) for the non-parametric estimator of the transition matrices in controlled Markov chains.<n>We derive CLTs for the value, Q-, and advantage functions of any stationary policy, including the optimal policy recovered from the estimated model.<n>These results provide new statistical tools for offline policy evaluation and optimal policy recovery, and enable hypothesis tests for transition probabilities.
arXiv Detail & Related papers (2025-08-02T23:33:57Z) - Inference on Strongly Identified Functionals of Weakly Identified
Functions [71.42652863687117]
We study a novel condition for the functional to be strongly identified even when the nuisance function is not.
We propose penalized minimax estimators for both the primary and debiasing nuisance functions.
arXiv Detail & Related papers (2022-08-17T13:38:31Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - An improved convergence analysis for decentralized online stochastic
non-convex optimization [17.386715847732468]
In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates.
The results are not only immediately applicable but also the currently known best convergence rates.
arXiv Detail & Related papers (2020-08-10T15:29:13Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - An Analysis of Constant Step Size SGD in the Non-convex Regime:
Asymptotic Normality and Bias [17.199063087458907]
Structured non- learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning.
We show that the SGD algorithm is widely used in practice.
arXiv Detail & Related papers (2020-06-14T13:58:44Z) - Convergence rates and approximation results for SGD and its
continuous-time counterpart [16.70533901524849]
This paper proposes a thorough theoretical analysis of convex Gradient Descent (SGD) with non-increasing step sizes.
First, we show that the SGD can be provably approximated by solutions of inhomogeneous Differential Equation (SDE) using coupling.
Recent analyses of deterministic and optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and non-asymptotic bounds.
arXiv Detail & Related papers (2020-04-08T18:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.