Bidirectional Model-based Policy Optimization
- URL: http://arxiv.org/abs/2007.01995v2
- Date: Tue, 29 Sep 2020 13:58:33 GMT
- Title: Bidirectional Model-based Policy Optimization
- Authors: Hang Lai, Jian Shen, Weinan Zhang, Yong Yu
- Abstract summary: Model-based reinforcement learning approaches leverage a forward dynamics model to support planning and decision making.
In this paper, we propose to additionally construct a backward dynamics model to reduce the reliance on accuracy in forward model predictions.
We develop a novel method, called Bidirectional Model-based Policy (BMPO), to utilize both the forward model and backward model to generate short branched rollouts for policy optimization.
- Score: 30.732572976324516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning approaches leverage a forward dynamics
model to support planning and decision making, which, however, may fail
catastrophically if the model is inaccurate. Although there are several
existing methods dedicated to combating the model error, the potential of the
single forward model is still limited. In this paper, we propose to
additionally construct a backward dynamics model to reduce the reliance on
accuracy in forward model predictions. We develop a novel method, called
Bidirectional Model-based Policy Optimization (BMPO) to utilize both the
forward model and backward model to generate short branched rollouts for policy
optimization. Furthermore, we theoretically derive a tighter bound of return
discrepancy, which shows the superiority of BMPO against the one using merely
the forward model. Extensive experiments demonstrate that BMPO outperforms
state-of-the-art model-based methods in terms of sample efficiency and
asymptotic performance.
Related papers
- Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning [11.679095516650593]
We propose the Any-step Dynamics Model (ADM) to mitigate the compounding error by reducing bootstrapping prediction to direct prediction.
ADM allows for the use of variable-length plans as inputs for predicting future states without frequent bootstrapping.
We design two algorithms, ADMPO-ON and ADMPO-OFF, which apply ADM in online and offline model-based frameworks.
arXiv Detail & Related papers (2024-05-27T10:33:53Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem.
The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization
without Compounding Errors [10.906666680425754]
We propose a Dyna-style model-based reinforcement learning algorithm, which we called Maximum Entropy Model Rollouts (MEMR)
To eliminate the compounding errors, we only use our model to generate single-step rollouts.
arXiv Detail & Related papers (2020-06-08T21:38:15Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.