Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient
Methods for Deep Network Training
- URL: http://arxiv.org/abs/2205.13711v1
- Date: Fri, 27 May 2022 02:12:59 GMT
- Title: Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient
Methods for Deep Network Training
- Authors: Antonio Robles-Kelly and Asef Nazari
- Abstract summary: We adapt the learning rate using a two-point approximation to the secant equation which quasi-Newton methods are based upon.
We evaluate our method using standard example network architectures on widely available datasets and compare against alternatives elsewhere in the literature.
- Score: 3.8762085568003406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we incorporate the Barzilai-Borwein step size into gradient
descent methods used to train deep networks. This allows us to adapt the
learning rate using a two-point approximation to the secant equation which
quasi-Newton methods are based upon. Moreover, the adaptive learning rate
method presented here is quite general in nature and can be applied to widely
used gradient descent approaches such as Adagrad and RMSprop. We evaluate our
method using standard example network architectures on widely available
datasets and compare against alternatives elsewhere in the literature. In our
experiments, our adaptive learning rate shows a smoother and faster convergence
than that exhibited by the alternatives, with better or comparable performance.
Related papers
- Interpreting Adaptive Gradient Methods by Parameter Scaling for
Learning-Rate-Free Optimization [14.009179786857802]
We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks.
While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent.
In this paper, we interpret adaptive gradient methods as steepest descent applied on parameter-scaled networks.
arXiv Detail & Related papers (2024-01-06T15:45:29Z) - Domain Generalization Guided by Gradient Signal to Noise Ratio of
Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks.
We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z) - Angle based dynamic learning rate for gradient descent [2.5077510176642805]
We propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks.
Instead of the traditional approach of selecting adaptive learning rates via the expectation of gradient-based terms, we use the angle between the current gradient and the new gradient.
We find that our method leads to the highest accuracy in most of the datasets.
arXiv Detail & Related papers (2023-04-20T16:55:56Z) - FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted
Dual Averaging [104.41634756395545]
Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data.
We propose textbfFedDA, a novel framework for local adaptive gradient methods.
We show that textbfFedDA-MVR is the first adaptive FL algorithm that achieves this rate.
arXiv Detail & Related papers (2023-02-13T05:10:30Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Penalizing Gradient Norm for Efficiently Improving Generalization in
Deep Learning [13.937644559223548]
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning.
We propose an effective method to improve the model generalization by penalizing the gradient norm of loss function during optimization.
arXiv Detail & Related papers (2022-02-08T02:03:45Z) - Adaptive Learning Rate and Momentum for Training Deep Neural Networks [0.0]
We develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework.
Experiments in image classification datasets show that our method yields faster convergence than other local solvers.
arXiv Detail & Related papers (2021-06-22T05:06:56Z) - Analytically Tractable Bayesian Deep Q-Learning [0.0]
We adapt the temporal difference Q-learning framework to make it compatible with the tractable approximate Gaussian inference (TAGI)
We demonstrate that TAGI can reach a performance comparable to backpropagation-trained networks.
arXiv Detail & Related papers (2021-06-21T13:11:52Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models.
We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z) - Disentangling Adaptive Gradient Methods from Learning Rates [65.0397050979662]
We take a deeper look at how adaptive gradient methods interact with the learning rate schedule.
We introduce a "grafting" experiment which decouples an update's magnitude from its direction.
We present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods.
arXiv Detail & Related papers (2020-02-26T21:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.