On the role of planning in model-based deep reinforcement learning
- URL: http://arxiv.org/abs/2011.04021v2
- Date: Wed, 17 Mar 2021 11:36:47 GMT
- Title: On the role of planning in model-based deep reinforcement learning
- Authors: Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez,
Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar
Veli\v{c}kovi\'c, Th\'eophane Weber
- Abstract summary: We study the performance of MuZero, a state-of-the-art model-based reinforcement learning algorithm.
We find that planning is most useful for policy updates and for providing a more useful data distribution.
Using shallow trees with simple Monte-Carlo rollouts is as performant as more complex methods, except in the most difficult reasoning tasks.
- Score: 19.082481172513635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based planning is often thought to be necessary for deep, careful
reasoning and generalization in artificial agents. While recent successes of
model-based reinforcement learning (MBRL) with deep function approximation have
strengthened this hypothesis, the resulting diversity of model-based methods
has also made it difficult to track which components drive success and why. In
this paper, we seek to disentangle the contributions of recent methods by
focusing on three questions: (1) How does planning benefit MBRL agents? (2)
Within planning, what choices drive performance? (3) To what extent does
planning improve generalization? To answer these questions, we study the
performance of MuZero (Schrittwieser et al., 2019), a state-of-the-art MBRL
algorithm with strong connections and overlapping components with many other
MBRL algorithms. We perform a number of interventions and ablations of MuZero
across a wide range of environments, including control tasks, Atari, and 9x9
Go. Our results suggest the following: (1) Planning is most useful in the
learning process, both for policy updates and for providing a more useful data
distribution. (2) Using shallow trees with simple Monte-Carlo rollouts is as
performant as more complex methods, except in the most difficult reasoning
tasks. (3) Planning alone is insufficient to drive strong generalization. These
results indicate where and how to utilize planning in reinforcement learning
settings, and highlight a number of open questions for future MBRL research.
Related papers
- MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - A Survey of Meta-Reinforcement Learning [83.95180398234238]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - AI Planning Annotation for Sample Efficient Reinforcement Learning [39.4624736757278]
We show that a suitably defined planning model can be used to improve the efficiency of Reinforcement Learning (RL)
Our experiments demonstrate an improved sample efficiency on a variety of RL environments over the previous state-of-the-art.
arXiv Detail & Related papers (2022-03-01T18:38:41Z) - Learning to Execute: Efficient Learning of Universal Plan-Conditioned
Policies in Robotics [20.148408520475655]
We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans.
In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
arXiv Detail & Related papers (2021-11-15T16:58:50Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - Model-based Reinforcement Learning: A Survey [2.564530030795554]
Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence.
Two key approaches to this problem are reinforcement learning (RL) and planning.
This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning.
arXiv Detail & Related papers (2020-06-30T12:10:07Z) - A Game Theoretic Framework for Model Based Reinforcement Learning [39.45066100705418]
Model-based reinforcement learning (MBRL) has recently gained immense interest due to its potential for sample efficiency and ability to incorporate off-policy data.
We develop a new framework that casts MBRL as a game between: (1) a policy player, which attempts to maximize rewards under the learned model; (2) a model player, which attempts to fit the real-world data collected by the policy player.
Our framework is consistent with and provides a clear basis for gradients known to be important in practice from prior works.
arXiv Detail & Related papers (2020-04-16T17:51:45Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.