Reverse engineering learned optimizers reveals known and novel
mechanisms
- URL: http://arxiv.org/abs/2011.02159v2
- Date: Tue, 7 Dec 2021 19:54:09 GMT
- Title: Reverse engineering learned optimizers reveals known and novel
mechanisms
- Authors: Niru Maheswaranathan, David Sussillo, Luke Metz, Ruoxi Sun, Jascha
Sohl-Dickstein
- Abstract summary: Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
- Score: 50.50540910474342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned optimizers are algorithms that can themselves be trained to solve
optimization problems. In contrast to baseline optimizers (such as momentum or
Adam) that use simple update rules derived from theoretical principles, learned
optimizers use flexible, high-dimensional, nonlinear parameterizations.
Although this can lead to better performance in certain settings, their inner
workings remain a mystery. How is a learned optimizer able to outperform a well
tuned baseline? Has it learned a sophisticated combination of existing
optimization techniques, or is it implementing completely new behavior? In this
work, we address these questions by careful analysis and visualization of
learned optimizers. We study learned optimizers trained from scratch on three
disparate tasks, and discover that they have learned interpretable mechanisms,
including: momentum, gradient clipping, learning rate schedules, and a new form
of learning rate adaptation. Moreover, we show how the dynamics of learned
optimizers enables these behaviors. Our results help elucidate the previously
murky understanding of how learned optimizers work, and establish tools for
interpreting future learned optimizers.
Related papers
- Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning [69.95292905263393]
We show that gradient-based optimization and large language models (MsLL) are complementary to each other, suggesting a collaborative optimization approach.
Our code is released at https://www.guozix.com/guozix/LLM-catalyst.
arXiv Detail & Related papers (2024-05-30T06:24:14Z) - Investigation into the Training Dynamics of Learned Optimizers [0.0]
We look at the concept of learneds as a way to accelerate the optimization process by replacing traditional, hand-crafted algorithms with meta-learned functions.
Our work examines their optimization from the perspective of network architecture symmetries and update parameters.
We identify several key insights that demonstrate how each approach can benefit from the strengths of the other.
arXiv Detail & Related papers (2023-12-12T11:18:43Z) - Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - Learning to Optimize with Dynamic Mode Decomposition [0.0]
We show how to utilize the dynamic mode decomposition method for extracting informative features about optimization dynamics.
We show that our learned generalizes much better to unseen optimization problems in short.
arXiv Detail & Related papers (2022-11-29T14:55:59Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - A Closer Look at Learned Optimization: Stability, Robustness, and
Inductive Biases [44.01339030872185]
Blackbox learneds often struggle with stability and generalization when applied to tasks unlike those in their meta-training set.
We investigate the inductive biases and stability properties of optimization algorithms, and apply the resulting insights to designing inductive biases for blackboxs.
We learn to a variety of neural network training tasks, where it outperforms the current state of the art learned.
arXiv Detail & Related papers (2022-09-22T17:47:21Z) - Training Learned Optimizers with Randomly Initialized Learned Optimizers [49.67678615506608]
We show that a population of randomly learneds can be used to train themselves from scratch in an online fashion.
A form of population based training is used to orchestrate this self-training.
We believe feedback loops of this type will be important and powerful in the future of machine learning.
arXiv Detail & Related papers (2021-01-14T19:07:17Z) - Tasks, stability, architecture, and compute: Training more effective
learned optimizers, and using them to train themselves [53.37905268850274]
We introduce a new, hierarchical, neural network parameterized, hierarchical with access to additional features such as validation loss to enable automatic regularization.
Most learneds have been trained on only a single task, or a small number of tasks.
We train ours on thousands of tasks, making use of orders of magnitude more compute, resulting in generalizes that perform better to unseen tasks.
arXiv Detail & Related papers (2020-09-23T16:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.