Related papers: We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

URL: http://arxiv.org/abs/2308.10740v1
Date: Mon, 21 Aug 2023 14:08:42 GMT
Title: We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond
Authors: Afshin Khadangi
Abstract summary: This paper introduces a novel method, Enhanced Velocity Estimation (EVE), which innovatively applies different learning rates to distinct components of the gradients. By bifurcating the learning rate, EVE enables more nuanced control and faster convergence, addressing the challenges associated with traditional single learning rate approaches.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the rapidly advancing field of deep learning, optimising deep neural networks is paramount. This paper introduces a novel method, Enhanced Velocity Estimation (EVE), which innovatively applies different learning rates to distinct components of the gradients. By bifurcating the learning rate, EVE enables more nuanced control and faster convergence, addressing the challenges associated with traditional single learning rate approaches. Utilising a momentum term that adapts to the learning landscape, the method achieves a more efficient navigation of the complex loss surface, resulting in enhanced performance and stability. Extensive experiments demonstrate that EVE significantly outperforms existing optimisation techniques across various benchmark datasets and architectures.

Related papers

Multiscale Stochastic Gradient Descent: Efficiently Training Convolutional Neural Networks [6.805997961535213]
Multiscale Gradient Descent (Multiscale-SGD) is a novel optimization approach that exploits coarse-to-fine training strategies to estimate the gradient at a fraction of the cost. We introduce a new class of learnable, scale-independent Mesh-Free Convolutions (MFCs) that ensure consistent gradient behavior across resolutions. Our results establish a new paradigm for the efficient training of deep networks, enabling practical scalability in high-resolution and multiscale learning tasks.
arXiv Detail & Related papers (2025-01-22T09:13:47Z)
AdamZ: An Enhanced Optimisation Method for Neural Network Training [1.54994260281059]
AdamZ dynamically adjusts the learning rate by incorporating mechanisms to address overshooting and stagnation. It consistently excels in minimising the loss function, making it particularly advantageous for applications where precision is critical.
arXiv Detail & Related papers (2024-11-22T23:33:41Z)
Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving [52.808273563372126]
This paper proposes a novel hierarchical BEV perception paradigm, aiming to provide a library of fundamental perception modules and user-friendly graphical interface. We conduct the Pretrain-Finetune strategy to effectively utilize large scale public datasets and streamline development processes. We also present a Multi-Module Learning (MML) approach, enhancing performance through synergistic and iterative training of multiple models.
arXiv Detail & Related papers (2024-07-17T11:17:20Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
On discretisation drift and smoothness regularisation in neural network training [0.0]
We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. We derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regulariser
arXiv Detail & Related papers (2023-10-21T15:21:36Z)
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method. We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate. We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z)
Distilling Knowledge from Resource Management Algorithms to Neural Networks: A Unified Training Assistance Approach [18.841969905928337]
knowledge distillation (KD) based algorithm distillation (AD) method is proposed in this paper to improve the performance and convergence speed of the NN-based method. This research paves the way for the integration of traditional optimization insights and emerging NN techniques in wireless communication system optimization.
arXiv Detail & Related papers (2023-08-15T00:30:58Z)
Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures [4.2903672492917755]
We develop a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms.
arXiv Detail & Related papers (2023-08-08T19:12:52Z)
Multiplicative update rules for accelerating deep learning training and increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z)
SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning [61.419914155985886]
We propose SimVPv2, a streamlined model that eliminates the need for Unet architectures for spatial and temporal modeling. SimVPv2 not only simplifies the model architecture but also improves both performance and computational efficiency. On the standard Moving MNIST benchmark, SimVPv2 achieves superior performance compared to SimVP, with fewer FLOPs, about half the training time and 60% faster inference efficiency.
arXiv Detail & Related papers (2022-11-22T08:01:33Z)
On Fast Simulation of Dynamical System with Neural Vector Enhanced Numerical Solver [59.13397937903832]
We introduce a deep learning-based corrector called Neural Vector (NeurVec) NeurVec can compensate for integration errors and enable larger time step sizes in simulations. Our experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability.
arXiv Detail & Related papers (2022-08-07T09:02:18Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.