Understanding Decision-Time vs. Background Planning in Model-Based
Reinforcement Learning
- URL: http://arxiv.org/abs/2206.08442v1
- Date: Thu, 16 Jun 2022 20:48:19 GMT
- Title: Understanding Decision-Time vs. Background Planning in Model-Based
Reinforcement Learning
- Authors: Safa Alver, Doina Precup
- Abstract summary: Two prevalent approaches are decision-time planning and background planning.
This study is interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other.
Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations, it can perform on par or better than background planning.
- Score: 56.50123642237106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In model-based reinforcement learning, an agent can leverage a learned model
to improve its way of behaving in different ways. Two prevalent approaches are
decision-time planning and background planning. In this study, we are
interested in understanding under what conditions and in which settings one of
these two planning styles will perform better than the other in domains that
require fast responses. After viewing them through the lens of dynamic
programming, we first consider the classical instantiations of these planning
styles and provide theoretical results and hypotheses on which one will perform
better in the pure planning, planning & learning, and transfer learning
settings. We then consider the modern instantiations of these planning styles
and provide hypotheses on which one will perform better in the last two of the
considered settings. Lastly, we perform several illustrative experiments to
empirically validate both our theoretical results and hypotheses. Overall, our
findings suggest that even though decision-time planning does not perform as
well as background planning in their classical instantiations, in their modern
instantiations, it can perform on par or better than background planning in
both the planning & learning and transfer learning settings.
Related papers
- Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
We construct a benchmark suite encompassing both classical planning domains and natural language scenarios.
Second, we investigate the use of in-context learning (ICL) to enhance LLM planning, exploring the direct relationship between increased context length and improved planning performance.
Third, we demonstrate the positive impact of fine-tuning LLMs on optimal planning paths, as well as the effectiveness of incorporating model-driven search procedures.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models [7.216683826556268]
Large language models (LLMs) are increasingly used for applications that require planning capabilities.
We introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup.
arXiv Detail & Related papers (2024-02-18T07:42:49Z) - Deep hybrid models: infer and plan in the real world [0.0]
We present an effective solution, based on active inference, to complex control tasks.
The proposed architecture exploits hybrid (discrete and continuous) processing to construct a hierarchical and dynamic representation of the self and the environment.
We evaluate this deep hybrid model on a non-trivial task: reaching a moving object after having picked a moving tool.
arXiv Detail & Related papers (2024-02-01T15:15:25Z) - Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem.
The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z) - Forethought and Hindsight in Credit Assignment [62.05690959741223]
We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models.
We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
arXiv Detail & Related papers (2020-10-26T16:00:47Z) - Robust Hierarchical Planning with Policy Delegation [6.1678491628787455]
We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation.
We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains.
arXiv Detail & Related papers (2020-10-25T04:36:20Z) - A Unifying Framework for Reinforcement Learning and Planning [2.564530030795554]
This paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP)
At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions.
arXiv Detail & Related papers (2020-06-26T14:30:41Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z) - The Two Regimes of Deep Network Training [93.84309968956941]
We study the effects of different learning schedules and the appropriate way to select them.
To this end, we isolate two distinct phases, which we refer to as the "large-step regime" and the "small-step regime"
Our training algorithm can significantly simplify learning rate schedules.
arXiv Detail & Related papers (2020-02-24T17:08:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.