COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL
- URL: http://arxiv.org/abs/2310.07220v2
- Date: Sat, 30 Dec 2023 04:16:38 GMT
- Title: COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL
- Authors: Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn
Wongkamjan, Huazhe Xu, Furong Huang
- Abstract summary: Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
- Score: 50.385005413810084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dyna-style model-based reinforcement learning contains two phases: model
rollouts to generate sample for policy learning and real environment
exploration using current policy for dynamics model learning. However, due to
the complex real-world environment, it is inevitable to learn an imperfect
dynamics model with model prediction error, which can further mislead policy
learning and result in sub-optimal solutions. In this paper, we propose
$\texttt{COPlanner}$, a planning-driven framework for model-based methods to
address the inaccurately learned dynamics model problem with conservative model
rollouts and optimistic environment exploration. $\texttt{COPlanner}$ leverages
an uncertainty-aware policy-guided model predictive control (UP-MPC) component
to plan for multi-step uncertainty estimation. This estimated uncertainty then
serves as a penalty during model rollouts and as a bonus during real
environment exploration respectively, to choose actions. Consequently,
$\texttt{COPlanner}$ can avoid model uncertain regions through conservative
model rollouts, thereby alleviating the influence of model error.
Simultaneously, it explores high-reward model uncertain regions to reduce model
error actively through optimistic real environment exploration.
$\texttt{COPlanner}$ is a plug-and-play framework that can be applied to any
dyna-style model-based methods. Experimental results on a series of
proprioceptive and visual continuous control tasks demonstrate that both sample
efficiency and asymptotic performance of strong model-based methods are
significantly improved combined with $\texttt{COPlanner}$.
Related papers
- When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - Conservative Bayesian Model-Based Value Expansion for Offline Policy
Optimization [41.774837419584735]
offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy.
Model-based approaches are particularly appealing since they can extract more learning signals from the logged dataset by learning a model of the environment.
arXiv Detail & Related papers (2022-10-07T20:13:50Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Bootstrapped model learning and error correction for planning with
uncertainty in model-based RL [1.370633147306388]
A natural aim is to learn a model that reflects accurately the dynamics of the environment.
This paper explores the problem of model misspecification through uncertainty-aware reinforcement learning agents.
We propose a bootstrapped multi-headed neural network that learns the distribution of future states and rewards.
arXiv Detail & Related papers (2020-04-15T15:41:21Z) - Policy-Aware Model Learning for Policy Gradient Methods [29.129883702165774]
This paper considers the problem of learning a model in model-based reinforcement learning (MBRL)
We propose that the model learning module should incorporate the way the planner is going to use the model.
We call this approach Policy-Aware Model Learning (PAML)
arXiv Detail & Related papers (2020-02-28T19:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.