Learning to Optimize for Reinforcement Learning
- URL: http://arxiv.org/abs/2302.01470v3
- Date: Tue, 4 Jun 2024 07:52:31 GMT
- Title: Learning to Optimize for Reinforcement Learning
- Authors: Qingfeng Lan, A. Rupam Mahmood, Shuicheng Yan, Zhongwen Xu,
- Abstract summary: Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
- Score: 58.01132862590378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, by leveraging more data, computation, and diverse tasks, learned optimizers have achieved remarkable success in supervised learning, outperforming classical hand-designed optimizers. Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learned optimizers do not work well even in simple RL tasks. We investigate this phenomenon and identify two issues. First, the agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training. Moreover, due to highly stochastic agent-environment interactions, the agent-gradients have high bias and variance, which increases the difficulty of learning an optimizer for RL. We propose pipeline training and a novel optimizer structure with a good inductive bias to address these issues, making it possible to learn an optimizer for reinforcement learning from scratch. We show that, although only trained in toy tasks, our learned optimizer can generalize to unseen complex tasks in Brax.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach [0.9549646359252346]
We propose dynamic Learning Rate for deep Reinforcement Learning (LRRL)
LRRL is a meta-learning approach that selects the learning rate based on the agent's performance during training.
Our empirical results demonstrate that LRRL can substantially improve the performance of deep RL algorithms.
arXiv Detail & Related papers (2024-10-16T14:15:28Z) - HUB: Guiding Learned Optimizers with Continuous Prompt Tuning [45.662334160254176]
Learneds are a crucial component of meta-learning.
Recent advancements in scalable learneds have demonstrated their superior performance over hand-designeds in various tasks.
We propose a hybrid-update-based (HUB) optimization strategy to tackle the issue of generalization in scalable learneds.
arXiv Detail & Related papers (2023-05-26T11:08:20Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - A Closer Look at Learned Optimization: Stability, Robustness, and
Inductive Biases [44.01339030872185]
Blackbox learneds often struggle with stability and generalization when applied to tasks unlike those in their meta-training set.
We investigate the inductive biases and stability properties of optimization algorithms, and apply the resulting insights to designing inductive biases for blackboxs.
We learn to a variety of neural network training tasks, where it outperforms the current state of the art learned.
arXiv Detail & Related papers (2022-09-22T17:47:21Z) - Training Learned Optimizers with Randomly Initialized Learned Optimizers [49.67678615506608]
We show that a population of randomly learneds can be used to train themselves from scratch in an online fashion.
A form of population based training is used to orchestrate this self-training.
We believe feedback loops of this type will be important and powerful in the future of machine learning.
arXiv Detail & Related papers (2021-01-14T19:07:17Z) - Reverse engineering learned optimizers reveals known and novel
mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z) - Tasks, stability, architecture, and compute: Training more effective
learned optimizers, and using them to train themselves [53.37905268850274]
We introduce a new, hierarchical, neural network parameterized, hierarchical with access to additional features such as validation loss to enable automatic regularization.
Most learneds have been trained on only a single task, or a small number of tasks.
We train ours on thousands of tasks, making use of orders of magnitude more compute, resulting in generalizes that perform better to unseen tasks.
arXiv Detail & Related papers (2020-09-23T16:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.