On Convergence-Diagnostic based Step Sizes for Stochastic Gradient
Descent
- URL: http://arxiv.org/abs/2007.00534v1
- Date: Wed, 1 Jul 2020 14:58:01 GMT
- Title: On Convergence-Diagnostic based Step Sizes for Stochastic Gradient
Descent
- Authors: Scott Pesme, Aymeric Dieuleveut, Nicolas Flammarion
- Abstract summary: Constant step-size Gradient Descent exhibits two phases: a transient phase during which iterates make fast progress towards the optimum, followed by a stationary phase during which iterates oscillate around the optimal point.
We show that efficiently detecting this transition and appropriately decreasing the step size can lead to fast convergence rates.
- Score: 24.042107117994046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Constant step-size Stochastic Gradient Descent exhibits two phases: a
transient phase during which iterates make fast progress towards the optimum,
followed by a stationary phase during which iterates oscillate around the
optimal point. In this paper, we show that efficiently detecting this
transition and appropriately decreasing the step size can lead to fast
convergence rates. We analyse the classical statistical test proposed by Pflug
(1983), based on the inner product between consecutive stochastic gradients.
Even in the simple case where the objective function is quadratic we show that
this test cannot lead to an adequate convergence diagnostic. We then propose a
novel and simple statistical procedure that accurately detects stationarity and
we provide experimental results showing state-of-the-art performance on
synthetic and real-world datasets.
Related papers
- Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models
via Reparameterisation and Smoothing [1.6114012813668932]
We introduce a simple framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings.
Our main contribution is a novel variant of SGD, Diagonalisation Gradient Descent, which progressively enhances the accuracy of the smoothed approximation.
Our approach is simple, fast stable and attains orders of magnitude reduction in work-normalised variance.
arXiv Detail & Related papers (2024-02-19T00:43:22Z) - Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - On Almost Sure Convergence Rates of Stochastic Gradient Methods [11.367487348673793]
We show for the first time that the almost sure convergence rates obtained for gradient methods are arbitrarily close to their optimal convergence rates possible.
For non- objective functions, we only show that a weighted average of squared gradient norms converges not almost surely, but also lasts to zero almost surely.
arXiv Detail & Related papers (2022-02-09T06:05:30Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Understanding and Detecting Convergence for Stochastic Gradient Descent
with Momentum [18.88380216480131]
This paper considers gradient descent with a constant learning rate and momentum.
We construct a statistical diagnostic test for convergence to the stationary phase using the inner product between successive gradients.
arXiv Detail & Related papers (2020-08-27T16:24:18Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.