On the influence of roundoff errors on the convergence of the gradient
descent method with low-precision floating-point computation
- URL: http://arxiv.org/abs/2202.12276v1
- Date: Thu, 24 Feb 2022 18:18:20 GMT
- Title: On the influence of roundoff errors on the convergence of the gradient
descent method with low-precision floating-point computation
- Authors: Lu Xia, Stefano Massei, Michiel Hochstenbach and Barry Koren
- Abstract summary: We propose a new rounding scheme that trades the zero bias property with a larger probability to preserve small gradients.
Our method yields constant rounding bias that, at each iteration, lies in a descent direction.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The employment of stochastic rounding schemes helps prevent stagnation of
convergence, due to vanishing gradient effect when implementing the gradient
descent method in low precision. Conventional stochastic rounding achieves zero
bias by preserving small updates with probabilities proportional to their
relative magnitudes. In this study, we propose a new stochastic rounding scheme
that trades the zero bias property with a larger probability to preserve small
gradients. Our method yields a constant rounding bias that, at each iteration,
lies in a descent direction. For convex problems, we prove that the proposed
rounding method has a beneficial effect on the convergence rate of gradient
descent. We validate our theoretical analysis by comparing the performances of
various rounding schemes when optimizing a multinomial logistic regression
model and when training a simple neural network with 8-bit floating-point
format.
Related papers
- An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - On the Convergence of the Gradient Descent Method with Stochastic
Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality [0.0]
We show that biased rounding errors may be beneficial since choosing a proper rounding strategy eliminates gradient problem and forces bias in a descent direction.
We obtain a bound on the convergence rate that is stricter than the one achieved by unbiased rounding.
arXiv Detail & Related papers (2023-01-23T16:02:54Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Non-asymptotic bounds for stochastic optimization with biased noisy
gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error.
Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.