LRTuner: A Learning Rate Tuner for Deep Neural Networks
- URL: http://arxiv.org/abs/2105.14526v1
- Date: Sun, 30 May 2021 13:06:26 GMT
- Title: LRTuner: A Learning Rate Tuner for Deep Neural Networks
- Authors: Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian
Sivathanu
- Abstract summary: The choice of learning rate schedule determines the computational cost getting close to a minima, how close you actually get to the minima, and most importantly the kind of local minima (wide/narrow) attained.
Current systems employ hand tuned learning rate schedules, which are painstakingly tuned for each network and dataset.
We present LRTuner, a method for tuning the learning rate schedule as training proceeds.
- Score: 10.913790890826785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One very important hyperparameter for training deep neural networks is the
learning rate schedule of the optimizer. The choice of learning rate schedule
determines the computational cost of getting close to a minima, how close you
actually get to the minima, and most importantly the kind of local minima
(wide/narrow) attained. The kind of minima attained has a significant impact on
the generalization accuracy of the network. Current systems employ hand tuned
learning rate schedules, which are painstakingly tuned for each network and
dataset. Given that the state space of schedules is huge, finding a
satisfactory learning rate schedule can be very time consuming. In this paper,
we present LRTuner, a method for tuning the learning rate as training proceeds.
Our method works with any optimizer, and we demonstrate results on SGD with
Momentum, and Adam optimizers.
We extensively evaluate LRTuner on multiple datasets, models, and across
optimizers. We compare favorably against standard learning rate schedules for
the given dataset and models, including ImageNet on Resnet-50, Cifar-10 on
Resnet-18, and SQuAD fine-tuning on BERT. For example on ImageNet with
Resnet-50, LRTuner shows up to 0.2% absolute gains in test accuracy compared to
the hand-tuned baseline schedule. Moreover, LRTuner can achieve the same
accuracy as the baseline schedule in 29% less optimization steps.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Temperature Balancing, Layer-wise Weight Analysis, and Neural Network
Training [58.20089993899729]
This paper proposes TempBalance, a straightforward yet effective layerwise learning rate method.
We show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization.
We also show that TempBalance outperforms a number of state-of-the-art metrics and schedulers.
arXiv Detail & Related papers (2023-12-01T05:38:17Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Mind the (optimality) Gap: A Gap-Aware Learning Rate Scheduler for
Adversarial Nets [3.8073142980733]
Adversarial nets have proved to be powerful in various domains including generative modeling (GANs)
In this paper, we design a novel learning rate scheduler that dynamically adapts the learning rate of the adversary to maintain the right balance.
We run large-scale experiments to study the effectiveness of the scheduler on two popular applications: GANs for image generation and adversarial nets for domain adaptation.
arXiv Detail & Related papers (2023-01-31T20:36:40Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Efficient deep learning models for land cover image classification [0.29748898344267777]
This work experiments with the BigEarthNet dataset for land use land cover (LULC) image classification.
We benchmark different state-of-the-art models, including Convolution Neural Networks, Multi-Layer Perceptrons, Visual Transformers, EfficientNets and Wide Residual Networks (WRN)
Our proposed lightweight model has an order of magnitude less trainable parameters, achieves 4.5% higher averaged f-score classification accuracy for all 19 LULC classes and is trained two times faster with respect to a ResNet50 state-of-the-art model that we use as a baseline.
arXiv Detail & Related papers (2021-11-18T00:03:14Z) - Training Aware Sigmoidal Optimizer [2.99368851209995]
Training Aware Sigmoidal functions present landscapes with much more saddle loss than local minima.
We proposed the Training Aware Sigmoidal functions (TASO), which consists of a two-phases automated learning rate schedule.
We compared the proposed approach with commonly used adaptive learning rate schedules such as Adam, RMS, and Adagrad.
arXiv Detail & Related papers (2021-02-17T12:00:46Z) - Weight Update Skipping: Reducing Training Time for Artificial Neural
Networks [0.30458514384586394]
We propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations.
During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting.
Such a training approach virtually achieves the same accuracy with considerably less computational cost, thus lower training time.
arXiv Detail & Related papers (2020-12-05T15:12:10Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - SASL: Saliency-Adaptive Sparsity Learning for Neural Network
Acceleration [20.92912642901645]
We propose a Saliency-Adaptive Sparsity Learning (SASL) approach for further optimization.
Our method can reduce 49.7% FLOPs of ResNet-50 with very negligible 0.39% top-1 and 0.05% top-5 accuracy degradation.
arXiv Detail & Related papers (2020-03-12T16:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.