Understanding Decision-Time vs. Background Planning in Model-Based
Reinforcement Learning
- URL: http://arxiv.org/abs/2206.08442v1
- Date: Thu, 16 Jun 2022 20:48:19 GMT
- Title: Understanding Decision-Time vs. Background Planning in Model-Based
Reinforcement Learning
- Authors: Safa Alver, Doina Precup
- Abstract summary: Two prevalent approaches are decision-time planning and background planning.
This study is interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other.
Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations, it can perform on par or better than background planning.
- Score: 56.50123642237106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In model-based reinforcement learning, an agent can leverage a learned model
to improve its way of behaving in different ways. Two prevalent approaches are
decision-time planning and background planning. In this study, we are
interested in understanding under what conditions and in which settings one of
these two planning styles will perform better than the other in domains that
require fast responses. After viewing them through the lens of dynamic
programming, we first consider the classical instantiations of these planning
styles and provide theoretical results and hypotheses on which one will perform
better in the pure planning, planning & learning, and transfer learning
settings. We then consider the modern instantiations of these planning styles
and provide hypotheses on which one will perform better in the last two of the
considered settings. Lastly, we perform several illustrative experiments to
empirically validate both our theoretical results and hypotheses. Overall, our
findings suggest that even though decision-time planning does not perform as
well as background planning in their classical instantiations, in their modern
instantiations, it can perform on par or better than background planning in
both the planning & learning and transfer learning settings.
Related papers
- Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning [8.552540426753]
This paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning.
Results indicate that our method improves efficiency of the planning process.
arXiv Detail & Related papers (2024-06-27T22:24:46Z) - Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - The Virtues of Laziness in Model-based RL: A Unified Objective and
Algorithms [37.025378882978714]
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL)
Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy.
We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains.
arXiv Detail & Related papers (2023-03-01T17:42:26Z) - Hierarchical Optimization-Derived Learning [58.69200830655009]
We establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process.
This is the first theoretical guarantee for these two coupled ODL components: optimization and learning.
arXiv Detail & Related papers (2023-02-11T03:35:13Z) - Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z) - Efficient Real-world Testing of Causal Decision Making via Bayesian
Experimental Design for Contextual Optimisation [12.37745209793872]
We introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making.
Our method is used for the data-efficient evaluation of the regret of past treatment assignments.
arXiv Detail & Related papers (2022-07-12T01:20:11Z) - Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem.
The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z) - Discriminator Augmented Model-Based Reinforcement Learning [47.094522301093775]
It is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance.
This paper aims to improve planning with an importance sampling framework that accounts for discrepancy between the true and learned dynamics.
arXiv Detail & Related papers (2021-03-24T06:01:55Z) - Provable Representation Learning for Imitation Learning via Bi-level
Optimization [60.059520774789654]
A common strategy in modern learning systems is to learn a representation that is useful for many tasks.
We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available.
We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone.
arXiv Detail & Related papers (2020-02-24T21:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.