Evolving Learning Rate Optimizers for Deep Neural Networks
- URL: http://arxiv.org/abs/2103.12623v1
- Date: Tue, 23 Mar 2021 15:23:57 GMT
- Title: Evolving Learning Rate Optimizers for Deep Neural Networks
- Authors: Pedro Carvalho, Nuno Louren\c{c}o, Penousal Machado
- Abstract summary: We propose a framework called AutoLR to automatically design Learning Rates.
The system evolved a classifier, ADES, that appears to be novel and innovative since, to the best of our knowledge, it has a structure that differs from state the art methods.
- Score: 2.6498598849144472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Neural Networks (ANNs) became popular due to their successful
application difficult problems such image and speech recognition. However, when
practitioners want to design an ANN they need to undergo laborious process of
selecting a set of parameters and topology. Currently, there are several
state-of-the art methods that allow for the automatic selection of some of
these aspects. Learning Rate optimizers are a set of such techniques that
search for good values of learning rates. Whilst these techniques are effective
and have yielded good results over the years, they are general solution i.e.
they do not consider the characteristics of a specific network.
We propose a framework called AutoLR to automatically design Learning Rate
Optimizers. Two versions of the system are detailed. The first one, Dynamic
AutoLR, evolves static and dynamic learning rate optimizers based on the
current epoch and the previous learning rate. The second version, Adaptive
AutoLR, evolves adaptive optimizers that can fine tune the learning rate for
each network eeight which makes them generally more effective. The results are
competitive with the best state of the art methods, even outperforming them in
some scenarios. Furthermore, the system evolved a classifier, ADES, that
appears to be novel and innovative since, to the best of our knowledge, it has
a structure that differs from state of the art methods.
Related papers
- Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Optimizing Neural Networks through Activation Function Discovery and
Automatic Weight Initialization [0.5076419064097734]
dissertation introduces techniques for discovering more powerful activation functions.
It provides new perspectives on neural network optimization.
dissertation thus makes concrete progress towards fully automatic machine learning in the future.
arXiv Detail & Related papers (2023-04-06T21:01:00Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Neural Architecture Search for Speech Emotion Recognition [72.1966266171951]
We propose to apply neural architecture search (NAS) techniques to automatically configure the SER models.
We show that NAS can improve SER performance (54.89% to 56.28%) while maintaining model parameter sizes.
arXiv Detail & Related papers (2022-03-31T10:16:10Z) - Interleaving Learning, with Application to Neural Architecture Search [12.317568257671427]
We propose a novel machine learning framework referred to as interleaving learning (IL)
In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion.
We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2021-03-12T00:54:22Z) - Reverse engineering learned optimizers reveals known and novel
mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z) - Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural
Architecture Search [60.965024145243596]
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance.
To alleviate this problem, we present a simple yet effective architecture distillation method.
We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.
arXiv Detail & Related papers (2020-10-29T17:55:05Z) - Tasks, stability, architecture, and compute: Training more effective
learned optimizers, and using them to train themselves [53.37905268850274]
We introduce a new, hierarchical, neural network parameterized, hierarchical with access to additional features such as validation loss to enable automatic regularization.
Most learneds have been trained on only a single task, or a small number of tasks.
We train ours on thousands of tasks, making use of orders of magnitude more compute, resulting in generalizes that perform better to unseen tasks.
arXiv Detail & Related papers (2020-09-23T16:35:09Z) - AutoLR: An Evolutionary Approach to Learning Rate Policies [2.3577368017815705]
This work presents AutoLR, a framework that evolves Learning Rate Schedulers for a specific Neural Network Architecture.
Results show that training performed using certain evolved policies is more efficient than the established baseline.
arXiv Detail & Related papers (2020-07-08T16:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.