We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual
Learning Rate And Beyond
- URL: http://arxiv.org/abs/2308.10740v1
- Date: Mon, 21 Aug 2023 14:08:42 GMT
- Title: We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual
Learning Rate And Beyond
- Authors: Afshin Khadangi
- Abstract summary: This paper introduces a novel method, Enhanced Velocity Estimation (EVE), which innovatively applies different learning rates to distinct components of the gradients.
By bifurcating the learning rate, EVE enables more nuanced control and faster convergence, addressing the challenges associated with traditional single learning rate approaches.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the rapidly advancing field of deep learning, optimising deep neural
networks is paramount. This paper introduces a novel method, Enhanced Velocity
Estimation (EVE), which innovatively applies different learning rates to
distinct components of the gradients. By bifurcating the learning rate, EVE
enables more nuanced control and faster convergence, addressing the challenges
associated with traditional single learning rate approaches. Utilising a
momentum term that adapts to the learning landscape, the method achieves a more
efficient navigation of the complex loss surface, resulting in enhanced
performance and stability. Extensive experiments demonstrate that EVE
significantly outperforms existing optimisation techniques across various
benchmark datasets and architectures.
Related papers
- Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving [52.808273563372126]
This paper proposes a novel hierarchical Bird's-eye-view (BEV) perception paradigm.
It aims to provide a library of fundamental perception modules and user-friendly graphical interface.
We conduct the Pretrain-Finetune strategy to effectively utilize large scale public datasets and streamline development processes.
arXiv Detail & Related papers (2024-07-17T11:17:20Z) - Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - On discretisation drift and smoothness regularisation in neural network
training [0.0]
We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation.
We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms.
We derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games.
We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regulariser
arXiv Detail & Related papers (2023-10-21T15:21:36Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Distilling Knowledge from Resource Management Algorithms to Neural
Networks: A Unified Training Assistance Approach [18.841969905928337]
knowledge distillation (KD) based algorithm distillation (AD) method is proposed in this paper to improve the performance and convergence speed of the NN-based method.
This research paves the way for the integration of traditional optimization insights and emerging NN techniques in wireless communication system optimization.
arXiv Detail & Related papers (2023-08-15T00:30:58Z) - Improving Performance in Continual Learning Tasks using Bio-Inspired
Architectures [4.2903672492917755]
We develop a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation.
Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets.
We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms.
arXiv Detail & Related papers (2023-08-08T19:12:52Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Transfer Learning Enhanced DeepONet for Long-Time Prediction of
Evolution Equations [9.748550197032785]
Deep operator network (DeepONet) has demonstrated great success in various learning tasks.
This paper proposes a em transfer-learning aided DeepONet to enhance the stability.
arXiv Detail & Related papers (2022-12-09T04:37:08Z) - On Fast Simulation of Dynamical System with Neural Vector Enhanced
Numerical Solver [59.13397937903832]
We introduce a deep learning-based corrector called Neural Vector (NeurVec)
NeurVec can compensate for integration errors and enable larger time step sizes in simulations.
Our experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability.
arXiv Detail & Related papers (2022-08-07T09:02:18Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.