Related papers: Understanding Decision-Time vs. Background Planning in Model-Based Reinforcement Learning

Understanding Decision-Time vs. Background Planning in Model-Based Reinforcement Learning

URL: http://arxiv.org/abs/2206.08442v1
Date: Thu, 16 Jun 2022 20:48:19 GMT
Title: Understanding Decision-Time vs. Background Planning in Model-Based Reinforcement Learning
Authors: Safa Alver, Doina Precup
Abstract summary: Two prevalent approaches are decision-time planning and background planning. This study is interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other. Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations, it can perform on par or better than background planning.
Score: 56.50123642237106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In model-based reinforcement learning, an agent can leverage a learned model to improve its way of behaving in different ways. Two prevalent approaches are decision-time planning and background planning. In this study, we are interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other in domains that require fast responses. After viewing them through the lens of dynamic programming, we first consider the classical instantiations of these planning styles and provide theoretical results and hypotheses on which one will perform better in the pure planning, planning & learning, and transfer learning settings. We then consider the modern instantiations of these planning styles and provide hypotheses on which one will perform better in the last two of the considered settings. Lastly, we perform several illustrative experiments to empirically validate both our theoretical results and hypotheses. Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations, it can perform on par or better than background planning in both the planning & learning and transfer learning settings.

Related papers

Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning [8.552540426753]
This paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning. Results indicate that our method improves efficiency of the planning process.
arXiv Detail & Related papers (2024-06-27T22:24:46Z)
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks. Recent practices tend to distill optimized action sequences into an RL policy during the training phase. We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z)
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms [37.025378882978714]
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL) Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy. We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains.
arXiv Detail & Related papers (2023-03-01T17:42:26Z)
Hierarchical Optimization-Derived Learning [58.69200830655009]
We establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process. This is the first theoretical guarantee for these two coupled ODL components: optimization and learning.
arXiv Detail & Related papers (2023-02-11T03:35:13Z)
Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance. We develop a framework that permits data-dependent weighted cross-entropy losses. Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z)
Efficient Real-world Testing of Causal Decision Making via Bayesian Experimental Design for Contextual Optimisation [12.37745209793872]
We introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making. Our method is used for the data-efficient evaluation of the regret of past treatment assignments.
arXiv Detail & Related papers (2022-07-12T01:20:11Z)
Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z)
Discriminator Augmented Model-Based Reinforcement Learning [47.094522301093775]
It is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance. This paper aims to improve planning with an importance sampling framework that accounts for discrepancy between the true and learned dynamics.
arXiv Detail & Related papers (2021-03-24T06:01:55Z)
Provable Representation Learning for Imitation Learning via Bi-level Optimization [60.059520774789654]
A common strategy in modern learning systems is to learn a representation that is useful for many tasks. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone.
arXiv Detail & Related papers (2020-02-24T21:03:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.