Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule
towards Flatter Local Minima
- URL: http://arxiv.org/abs/2208.11873v1
- Date: Thu, 25 Aug 2022 05:05:18 GMT
- Title: Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule
towards Flatter Local Minima
- Authors: Hengyu Liu, Qiang Fu, Lun Du, Tiancheng Zhang, Ge Yu, Shi Han and
Dongmei Zhang
- Abstract summary: We propose a generic learning rate schedule plugin called LEArning Rate Perturbation (LEAP)
LEAP can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate.
We conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets.
- Score: 40.70374106466073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning rate is one of the most important hyper-parameters that has a
significant influence on neural network training. Learning rate schedules are
widely used in real practice to adjust the learning rate according to
pre-defined schedules for fast convergence and good generalization. However,
existing learning rate schedules are all heuristic algorithms and lack
theoretical support. Therefore, people usually choose the learning rate
schedules through multiple ad-hoc trials, and the obtained learning rate
schedules are sub-optimal. To boost the performance of the obtained sub-optimal
learning rate schedule, we propose a generic learning rate schedule plugin,
called LEArning Rate Perturbation (LEAP), which can be applied to various
learning rate schedules to improve the model training by introducing a certain
perturbation to the learning rate. We found that, with such a simple yet
effective strategy, training processing exponentially favors flat minima rather
than sharp minima with guaranteed convergence, which leads to better
generalization ability. In addition, we conduct extensive experiments which
show that training with LEAP can improve the performance of various deep
learning models on diverse datasets using various learning rate schedules
(including constant learning rate).
Related papers
- Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach [0.9549646359252346]
We propose dynamic Learning Rate for deep Reinforcement Learning (LRRL)
LRRL is a meta-learning approach that selects the learning rate based on the agent's performance during training.
Our empirical results demonstrate that LRRL can substantially improve the performance of deep RL algorithms.
arXiv Detail & Related papers (2024-10-16T14:15:28Z) - Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - Optimal Linear Decay Learning Rate Schedules and Further Refinements [46.79573408189601]
Learning rate schedules used in practice bear little resemblance to those recommended by theory.
We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules.
arXiv Detail & Related papers (2023-10-11T19:16:35Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Frequency-aware SGD for Efficient Embedding Learning with Provable
Benefits [35.543124939636044]
We propose a large Descent (Counter-based)-aware Descent, which applies a frequency-dependent learning rate for each token, and exhibits provable speed-up compared to SGD when the token distribution is imbalanced.
arXiv Detail & Related papers (2021-10-10T16:17:43Z) - Training Aware Sigmoidal Optimizer [2.99368851209995]
Training Aware Sigmoidal functions present landscapes with much more saddle loss than local minima.
We proposed the Training Aware Sigmoidal functions (TASO), which consists of a two-phases automated learning rate schedule.
We compared the proposed approach with commonly used adaptive learning rate schedules such as Adam, RMS, and Adagrad.
arXiv Detail & Related papers (2021-02-17T12:00:46Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - The Two Regimes of Deep Network Training [93.84309968956941]
We study the effects of different learning schedules and the appropriate way to select them.
To this end, we isolate two distinct phases, which we refer to as the "large-step regime" and the "small-step regime"
Our training algorithm can significantly simplify learning rate schedules.
arXiv Detail & Related papers (2020-02-24T17:08:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.