Related papers: On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

URL: http://arxiv.org/abs/2301.09511v1
Date: Mon, 23 Jan 2023 16:02:54 GMT
Title: On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
Authors: Lu Xia and Michiel E. Hochstenbach and Stefano Massei
Abstract summary: We show that biased rounding errors may be beneficial since choosing a proper rounding strategy eliminates gradient problem and forces bias in a descent direction. We obtain a bound on the convergence rate that is stricter than the one achieved by unbiased rounding.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding errors may be beneficial since choosing a proper rounding strategy eliminates the vanishing gradient problem and forces the rounding bias in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point number formats.

Related papers

Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
An Inexact Halpern Iteration with Application to Distributionally Robust Optimization [8.722877733571796]
We show that by choosing the inexactness appropriately, the inexact schemes admit an $O(k-1) convergence rate in terms of the (expected) residue norm.<n>We demonstrate how the proposed methods can be applied for solving two classes of data-driven distributionally robust optimization problems.
arXiv Detail & Related papers (2024-02-08T20:12:47Z)
Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses. We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
On the influence of roundoff errors on the convergence of the gradient descent method with low-precision floating-point computation [0.0]
We propose a new rounding scheme that trades the zero bias property with a larger probability to preserve small gradients. Our method yields constant rounding bias that, at each iteration, lies in a descent direction.
arXiv Detail & Related papers (2022-02-24T18:18:20Z)
Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
On the Convergence of SGD with Biased Gradients [28.400751656818215]
We analyze the guiding domain of biased gradient methods (SGD), where individual updates are corrupted by compression. We quantify how many magnitudes of bias accuracy and convergence rates are impacted.
arXiv Detail & Related papers (2020-07-31T19:37:59Z)
Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error. Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z)
Geometry, Computation, and Optimality in Stochastic Optimization [24.154336772159745]
We study computational and statistical consequences of problem geometry in and online optimization. By focusing on constraint set and gradient geometry, we characterize the problem families for which- and adaptive-gradient methods are (minimax) optimal.
arXiv Detail & Related papers (2019-09-23T16:14:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.