Model-based Reinforcement Learning with Multi-step Plan Value Estimation
- URL: http://arxiv.org/abs/2209.05530v1
- Date: Mon, 12 Sep 2022 18:22:11 GMT
- Title: Model-based Reinforcement Learning with Multi-step Plan Value Estimation
- Authors: Haoxin Lin, Yihao Sun, Jiaji Zhang, Yang Yu
- Abstract summary: We introduce multi-step plans to replace multi-step actions for model-based RL.
The new model-based reinforcement learning algorithm MPPVE shows a better utilization of the learned model and achieves a better sample efficiency than state-of-the-art model-based RL approaches.
- Score: 4.158979444110977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A promising way to improve the sample efficiency of reinforcement learning is
model-based methods, in which many explorations and evaluations can happen in
the learned models to save real-world samples. However, when the learned model
has a non-negligible model error, sequential steps in the model are hard to be
accurately evaluated, limiting the model's utilization. This paper proposes to
alleviate this issue by introducing multi-step plans to replace multi-step
actions for model-based RL. We employ the multi-step plan value estimation,
which evaluates the expected discounted return after executing a sequence of
action plans at a given state, and updates the policy by directly computing the
multi-step policy gradient via plan value estimation. The new model-based
reinforcement learning algorithm MPPVE (Model-based Planning Policy Learning
with Multi-step Plan Value Estimation) shows a better utilization of the
learned model and achieves a better sample efficiency than state-of-the-art
model-based RL approaches.
Related papers
- Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z) - Policy-Aware Model Learning for Policy Gradient Methods [29.129883702165774]
This paper considers the problem of learning a model in model-based reinforcement learning (MBRL)
We propose that the model learning module should incorporate the way the planner is going to use the model.
We call this approach Policy-Aware Model Learning (PAML)
arXiv Detail & Related papers (2020-02-28T19:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.