Policy-Aware Model Learning for Policy Gradient Methods
- URL: http://arxiv.org/abs/2003.00030v2
- Date: Mon, 4 Jan 2021 03:20:54 GMT
- Title: Policy-Aware Model Learning for Policy Gradient Methods
- Authors: Romina Abachi, Mohammad Ghavamzadeh, Amir-massoud Farahmand
- Abstract summary: This paper considers the problem of learning a model in model-based reinforcement learning (MBRL)
We propose that the model learning module should incorporate the way the planner is going to use the model.
We call this approach Policy-Aware Model Learning (PAML)
- Score: 29.129883702165774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers the problem of learning a model in model-based
reinforcement learning (MBRL). We examine how the planning module of an MBRL
algorithm uses the model, and propose that the model learning module should
incorporate the way the planner is going to use the model. This is in contrast
to conventional model learning approaches, such as those based on maximum
likelihood estimate, that learn a predictive model of the environment without
explicitly considering the interaction of the model and the planner. We focus
on policy gradient type of planning algorithms and derive new loss functions
for model learning that incorporate how the planner uses the model. We call
this approach Policy-Aware Model Learning (PAML). We theoretically analyze a
generic model-based policy gradient algorithm and provide a convergence
guarantee for the optimized policy. We also empirically evaluate PAML on some
benchmark problems, showing promising results.
Related papers
- COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - The Virtues of Laziness in Model-based RL: A Unified Objective and
Algorithms [37.025378882978714]
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL)
Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy.
We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains.
arXiv Detail & Related papers (2023-03-01T17:42:26Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Model-based Reinforcement Learning with Multi-step Plan Value Estimation [4.158979444110977]
We introduce multi-step plans to replace multi-step actions for model-based RL.
The new model-based reinforcement learning algorithm MPPVE shows a better utilization of the learned model and achieves a better sample efficiency than state-of-the-art model-based RL approaches.
arXiv Detail & Related papers (2022-09-12T18:22:11Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.