Related papers: VeLO: Training Versatile Learned Optimizers by Scaling Up

VeLO: Training Versatile Learned Optimizers by Scaling Up

URL: http://arxiv.org/abs/2211.09760v1
Date: Thu, 17 Nov 2022 18:39:07 GMT
Title: VeLO: Training Versatile Learned Optimizers by Scaling Up
Authors: Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts, Jascha Sohl-Dickstein
Abstract summary: We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
Score: 67.90237498659397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.

Related papers

Learning Versatile Optimizers on a Compute Diet [20.69804303768643]
Key elements in learned architectures and meta-training procedures can lead to strong meta-generalization. We propose evaluation metrics to reliably assess quantitative performance of an at scale on a set of evaluation tasks. Our proposed approach, Celo, makes a significant leap in improving the meta-generalization performance of learneds.
arXiv Detail & Related papers (2025-01-22T06:10:27Z)
Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning [69.95292905263393]
We show that gradient-based optimization and large language models (MsLL) are complementary to each other, suggesting a collaborative optimization approach. Our code is released at https://www.guozix.com/guozix/LLM-catalyst.
arXiv Detail & Related papers (2024-05-30T06:24:14Z)
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training. We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars. We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z)
Learning to Optimize Quasi-Newton Methods [22.504971951262004]
This paper introduces a novel machine learning called LODO, which tries to online meta-learn the best preconditioner during optimization. Unlike other L2O methods, LODO does not require any meta-training on a training task distribution. We show that our gradient approximates the inverse Hessian in noisy loss landscapes and is capable of representing a wide range of inverse Hessians.
arXiv Detail & Related papers (2022-10-11T03:47:14Z)
Practical tradeoffs between memory, compute, and performance in learned optimizers [46.04132441790654]
We identify and quantify the memory, compute, and performance trade-offs for many learned and hand-designeds features. We leverage our analysis to construct a learned is both faster and more efficient than previous work.
arXiv Detail & Related papers (2022-03-22T16:36:36Z)
Training Learned Optimizers with Randomly Initialized Learned Optimizers [49.67678615506608]
We show that a population of randomly learneds can be used to train themselves from scratch in an online fashion. A form of population based training is used to orchestrate this self-training. We believe feedback loops of this type will be important and powerful in the future of machine learning.
arXiv Detail & Related papers (2021-01-14T19:07:17Z)
Reverse engineering learned optimizers reveals known and novel mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems. Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z)
Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves [53.37905268850274]
We introduce a new, hierarchical, neural network parameterized, hierarchical with access to additional features such as validation loss to enable automatic regularization. Most learneds have been trained on only a single task, or a small number of tasks. We train ours on thousands of tasks, making use of orders of magnitude more compute, resulting in generalizes that perform better to unseen tasks.
arXiv Detail & Related papers (2020-09-23T16:35:09Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.