Selecting and Composing Learning Rate Policies for Deep Neural Networks
- URL: http://arxiv.org/abs/2210.12936v1
- Date: Mon, 24 Oct 2022 03:32:59 GMT
- Title: Selecting and Composing Learning Rate Policies for Deep Neural Networks
- Authors: Yanzhao Wu, Ling Liu
- Abstract summary: This paper presents a systematic approach to selecting and composing an LR policy for effective Deep Neural Networks (DNNs) training.
We develop an LR tuning mechanism for auto-verification of a given LR policy with respect to the desired accuracy goal under the pre-defined training time constraint.
Second, we develop an LR policy recommendation system (LRBench) to select and compose good LR policies from the same and/or different LR functions through dynamic tuning.
Third, we extend LRBench by supporting different DNNs and show the significant mutual impact of different LR policies and different policies.
- Score: 10.926538783768219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The choice of learning rate (LR) functions and policies has evolved from a
simple fixed LR to the decaying LR and the cyclic LR, aiming to improve the
accuracy and reduce the training time of Deep Neural Networks (DNNs). This
paper presents a systematic approach to selecting and composing an LR policy
for effective DNN training to meet desired target accuracy and reduce training
time within the pre-defined training iterations. It makes three original
contributions. First, we develop an LR tuning mechanism for auto-verification
of a given LR policy with respect to the desired accuracy goal under the
pre-defined training time constraint. Second, we develop an LR policy
recommendation system (LRBench) to select and compose good LR policies from the
same and/or different LR functions through dynamic tuning, and avoid bad
choices, for a given learning task, DNN model and dataset. Third, we extend
LRBench by supporting different DNN optimizers and show the significant mutual
impact of different LR policies and different optimizers. Evaluated using
popular benchmark datasets and different DNN models (LeNet, CNN3, ResNet), we
show that our approach can effectively deliver high DNN test accuracy,
outperform the existing recommended default LR policies, and reduce the DNN
training time by 1.6$\sim$6.7$\times$ to meet a targeted model accuracy.
Related papers
- Where Do Large Learning Rates Lead Us? [5.305784285588872]
We show that only a narrow range of initial LRs leads to optimal results after fine-tuning with a small LR or weight averaging.
We show that these initial LRs result in a sparse set of learned features, with a clear focus on those most relevant for the task.
In contrast, starting training with too small LRs leads to unstable minima and attempts to learn all features simultaneously, resulting in poor generalization.
arXiv Detail & Related papers (2024-10-29T15:14:37Z) - ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer [68.72454974431749]
We present ClearSR, a new method that can better take advantage of latent low-resolution image (LR) embeddings for diffusion-based real-world image super-resolution (Real-ISR)
Our model can achieve better performance across multiple metrics on several test sets and generate more consistent SR results with LR images than existing methods.
arXiv Detail & Related papers (2024-10-18T08:35:57Z) - Boosting Deep Ensembles with Learning Rate Tuning [1.6021932740447968]
Learning Rate (LR) has a high impact on deep learning training performance.
This paper presents a novel framework, LREnsemble, to leverage effective learning rate tuning to boost deep ensemble performance.
arXiv Detail & Related papers (2024-10-10T02:59:38Z) - LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on
the Fly [22.754424957856052]
We propose AutoLRS, which automatically optimize the learning rate for each training stage by modeling training dynamics.
We demonstrate the advantages and the generality of AutoLRS through extensive experiments of training tasks diverse domains.
arXiv Detail & Related papers (2021-05-22T16:41:10Z) - MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks [56.66010634895913]
The learning rate (LR) is one of the most important hyper-learned network parameters in gradient descent (SGD) training networks (DNN)
In this paper, we propose to learn a proper LR schedule for MLR-SNet tasks.
We also make MLR-SNet to query tasks like different noises, architectures, data modalities, sizes from the training ones, and achieve or even better performance.
arXiv Detail & Related papers (2020-07-29T01:18:58Z) - kDecay: Just adding k-decay items on Learning-Rate Schedule to improve
Neural Networks [5.541389959719384]
k-decay is effectively improves the performance of commonly used and easy LR schedule.
We evaluate the k-decay method on CIFAR And ImageNet datasets with different neural networks.
The accuracy has been improved by 1.08% on the CIFAR-10 dataset and by 2.07% on the CIFAR-100 dataset.
arXiv Detail & Related papers (2020-04-13T12:58:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.