Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
- URL: http://arxiv.org/abs/2207.12141v4
- Date: Mon, 26 Jun 2023 02:38:27 GMT
- Title: Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
- Authors: Xiyao Wang, Wichayaporn Wongkamjan, Furong Huang
- Abstract summary: We learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies.
We then propose a novel dynamics model learning method, named textitPolicy-adapted Dynamics Model Learning (PDML).
Experiments on a range of continuous control environments in MuJoCo show that PDML achieves significant improvement in sample efficiency and higher performance combined with the state-of-the-art model-based RL methods.
- Score: 13.819070455425075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (RL) often achieves higher sample
efficiency in practice than model-free RL by learning a dynamics model to
generate samples for policy learning. Previous works learn a dynamics model
that fits under the empirical state-action visitation distribution for all
historical policies, i.e., the sample replay buffer. However, in this paper, we
observe that fitting the dynamics model under the distribution for \emph{all
historical policies} does not necessarily benefit model prediction for the
\emph{current policy} since the policy in use is constantly evolving over time.
The evolving policy during training will cause state-action visitation
distribution shifts. We theoretically analyze how this distribution shift over
historical policies affects the model learning and model rollouts. We then
propose a novel dynamics model learning method, named \textit{Policy-adapted
Dynamics Model Learning (PDML)}. PDML dynamically adjusts the historical policy
mixture distribution to ensure the learned model can continually adapt to the
state-action visitation distribution of the evolving policy. Experiments on a
range of continuous control environments in MuJoCo show that PDML achieves
significant improvement in sample efficiency and higher asymptotic performance
combined with the state-of-the-art model-based RL methods.
Related papers
- Learning from Random Demonstrations: Offline Reinforcement Learning with Importance-Sampled Diffusion Models [19.05224410249602]
We propose a novel approach for offline reinforcement learning with closed-loop policy evaluation and world-model adaptation.
We analyzed the performance of the proposed method and provided an upper bound on the return gap between our method and the real environment under an optimal policy.
arXiv Detail & Related papers (2024-05-30T09:34:31Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - Model-Based Offline Reinforcement Learning with Pessimism-Modulated
Dynamics Belief [3.0036519884678894]
Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously collected static dataset and a dynamics model.
In this work, we maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the belief.
We show that the biased sampling naturally induces an updated dynamics belief with policy-dependent reweighting factor, termed Pessimism-Modulated Dynamics Belief.
arXiv Detail & Related papers (2022-10-13T03:14:36Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Model Generation with Provable Coverability for Offline Reinforcement
Learning [14.333861814143718]
offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization.
But due to the limitation under the offline setting, the learned model could not mimic real dynamics well enough to support reliable out-of-distribution exploration.
We propose an algorithm to generate models optimizing their coverage for the real dynamics.
arXiv Detail & Related papers (2022-06-01T08:34:09Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z) - Trajectory-wise Multiple Choice Learning for Dynamics Generalization in
Reinforcement Learning [137.39196753245105]
We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization.
We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector.
Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
arXiv Detail & Related papers (2020-10-26T03:20:42Z) - Model Embedding Model-Based Reinforcement Learning [4.566180616886624]
Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL)
Despite the impressive results it achieves, it still faces a trade-off between the ease of data generation and model bias.
We propose a simple and elegant model-embedding model-based reinforcement learning (MEMB) algorithm in the framework of the probabilistic reinforcement learning.
arXiv Detail & Related papers (2020-06-16T15:10:28Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.