AdaS: Adaptive Scheduling of Stochastic Gradients
- URL: http://arxiv.org/abs/2006.06587v1
- Date: Thu, 11 Jun 2020 16:36:31 GMT
- Title: AdaS: Adaptive Scheduling of Stochastic Gradients
- Authors: Mahdi S. Hosseini and Konstantinos N. Plataniotis
- Abstract summary: We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
- Score: 50.80697760166045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The choice of step-size used in Stochastic Gradient Descent (SGD)
optimization is empirically selected in most training procedures. Moreover, the
use of scheduled learning techniques such as Step-Decaying, Cyclical-Learning,
and Warmup to tune the step-size requires extensive practical
experience--offering limited insight into how the parameters update--and is not
consistent across applications. This work attempts to answer a question of
interest to both researchers and practitioners, namely \textit{"how much
knowledge is gained in iterative training of deep neural networks?"} Answering
this question introduces two useful metrics derived from the singular values of
the low-rank factorization of convolution layers in deep neural networks. We
introduce the notions of \textit{"knowledge gain"} and \textit{"mapping
condition"} and propose a new algorithm called Adaptive Scheduling (AdaS) that
utilizes these derived metrics to adapt the SGD learning rate proportionally to
the rate of change in knowledge gain over successive iterations.
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a)
faster convergence and superior generalization over existing adaptive learning
methods; and (b) lack of dependence on a validation set to determine when to
stop training. Code is available at
\url{https://github.com/mahdihosseini/AdaS}.
Related papers
- Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Hyper-Learning for Gradient-Based Batch Size Adaptation [2.944323057176686]
Scheduling the batch size to increase is an effective strategy to control noise when training deep neural networks.
We introduce Arbiter as a new hyper-optimization algorithm to perform batch size adaptations for learnable schedulings.
We demonstrate Arbiter's effectiveness in several illustrative experiments.
arXiv Detail & Related papers (2022-05-17T11:01:14Z) - Natural continual learning: success is a journey, not (just) a
destination [9.462808515258464]
Natural Continual Learning (NCL) is a new method that unifies weight regularization and projected gradient descent.
Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs.
The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
arXiv Detail & Related papers (2021-06-15T12:24:53Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Tune smarter not harder: A principled approach to tuning learning rates
for shallow nets [13.203765985718201]
principled approach to choosing the learning rate is proposed for shallow feedforward neural networks.
It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
arXiv Detail & Related papers (2020-03-22T09:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.