Tasks, stability, architecture, and compute: Training more effective
learned optimizers, and using them to train themselves
- URL: http://arxiv.org/abs/2009.11243v1
- Date: Wed, 23 Sep 2020 16:35:09 GMT
- Title: Tasks, stability, architecture, and compute: Training more effective
learned optimizers, and using them to train themselves
- Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha
Sohl-Dickstein
- Abstract summary: We introduce a new, hierarchical, neural network parameterized, hierarchical with access to additional features such as validation loss to enable automatic regularization.
Most learneds have been trained on only a single task, or a small number of tasks.
We train ours on thousands of tasks, making use of orders of magnitude more compute, resulting in generalizes that perform better to unseen tasks.
- Score: 53.37905268850274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much as replacing hand-designed features with learned functions has
revolutionized how we solve perceptual tasks, we believe learned algorithms
will transform how we train models. In this work we focus on general-purpose
learned optimizers capable of training a wide variety of problems with no
user-specified hyperparameters. We introduce a new, neural network
parameterized, hierarchical optimizer with access to additional features such
as validation loss to enable automatic regularization. Most learned optimizers
have been trained on only a single task, or a small number of tasks. We train
our optimizers on thousands of tasks, making use of orders of magnitude more
compute, resulting in optimizers that generalize better to unseen tasks. The
learned optimizers not only perform well, but learn behaviors that are distinct
from existing first order optimizers. For instance, they generate update steps
that have implicit regularization and adapt as the problem hyperparameters
(e.g. batch size) or architecture (e.g. neural network width) change. Finally,
these learned optimizers show evidence of being useful for out of distribution
tasks such as training themselves from scratch.
Related papers
- Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Practical tradeoffs between memory, compute, and performance in learned
optimizers [46.04132441790654]
We identify and quantify the memory, compute, and performance trade-offs for many learned and hand-designeds features.
We leverage our analysis to construct a learned is both faster and more efficient than previous work.
arXiv Detail & Related papers (2022-03-22T16:36:36Z) - Training Learned Optimizers with Randomly Initialized Learned Optimizers [49.67678615506608]
We show that a population of randomly learneds can be used to train themselves from scratch in an online fashion.
A form of population based training is used to orchestrate this self-training.
We believe feedback loops of this type will be important and powerful in the future of machine learning.
arXiv Detail & Related papers (2021-01-14T19:07:17Z) - Reverse engineering learned optimizers reveals known and novel
mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.