ELRA: Exponential learning rate adaption gradient descent optimization
method
- URL: http://arxiv.org/abs/2309.06274v1
- Date: Tue, 12 Sep 2023 14:36:13 GMT
- Title: ELRA: Exponential learning rate adaption gradient descent optimization
method
- Authors: Alexander Kleinsorge, Stefan Kupper, Alexander Fauck, Felix Rothe
- Abstract summary: We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
- Score: 83.88591755871734
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a novel, fast (exponential rate adaption), ab initio
(hyper-parameter-free) gradient based optimizer algorithm. The main idea of the
method is to adapt the learning rate $\alpha$ by situational awareness, mainly
striving for orthogonal neighboring gradients. The method has a high success
and fast convergence rate and does not rely on hand-tuned parameters giving it
greater universality. It can be applied to problems of any dimensions n and
scales only linearly (of order O(n)) with the dimension of the problem. It
optimizes convex and non-convex continuous landscapes providing some kind of
gradient. In contrast to the Ada-family (AdaGrad, AdaMax, AdaDelta, Adam, etc.)
the method is rotation invariant: optimization path and performance are
independent of coordinate choices. The impressive performance is demonstrated
by extensive experiments on the MNIST benchmark data-set against
state-of-the-art optimizers. We name this new class of optimizers after its
core idea Exponential Learning Rate Adaption - ELRA. We present it in two
variants c2min and p2min with slightly different control. The authors strongly
believe that ELRA will open a completely new research direction for gradient
descent optimize.
Related papers
- Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for
Preconditioning Matrix [9.629238108795013]
We propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements.
We evaluate AGD on public generalization of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys)
Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) techniques, achieving highly competitive or significantly better predictive performance.
arXiv Detail & Related papers (2023-12-04T06:20:14Z) - Learning to Optimize Quasi-Newton Methods [22.504971951262004]
This paper introduces a novel machine learning called LODO, which tries to online meta-learn the best preconditioner during optimization.
Unlike other L2O methods, LODO does not require any meta-training on a training task distribution.
We show that our gradient approximates the inverse Hessian in noisy loss landscapes and is capable of representing a wide range of inverse Hessians.
arXiv Detail & Related papers (2022-10-11T03:47:14Z) - An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm
for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community.
ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization.
In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z) - Tom: Leveraging trend of the observed gradients for faster convergence [0.0]
Tom is a novel variant of Adam that takes into account the trend observed for the gradients in the landscape in the loss traversed by the neural network.
Tom outperforms Adagrad, Adadelta, RMSProp and Adam in terms of both accuracy and has a faster convergence.
arXiv Detail & Related papers (2021-09-07T20:19:40Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Adaptive Gradient Methods for Constrained Convex Optimization and
Variational Inequalities [32.51470158863247]
Main algorithms AdaACSA and AdaAGD+ are accelerated methods for constrained convex optimization.
We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard non-accelerated convergence rate.
arXiv Detail & Related papers (2020-07-17T09:10:21Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.