A Unified Framework for Alternating Offline Model Training and Policy
Learning
- URL: http://arxiv.org/abs/2210.05922v1
- Date: Wed, 12 Oct 2022 04:58:51 GMT
- Title: A Unified Framework for Alternating Offline Model Training and Policy
Learning
- Authors: Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou
- Abstract summary: In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
- Score: 62.19209005400561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In offline model-based reinforcement learning (offline MBRL), we learn a
dynamic model from historically collected data, and subsequently utilize the
learned model and fixed datasets for policy learning, without further
interacting with the environment. Offline MBRL algorithms can improve the
efficiency and stability of policy learning over the model-free algorithms.
However, in most of the existing offline MBRL algorithms, the learning
objectives for the dynamic models and the policies are isolated from each
other. Such an objective mismatch may lead to inferior performance of the
learned agents. In this paper, we address this issue by developing an iterative
offline MBRL framework, where we maximize a lower bound of the true expected
return, by alternating between dynamic-model training and policy learning. With
the proposed unified model-policy learning framework, we achieve competitive
performance on a wide range of continuous-control offline reinforcement
learning datasets. Source code is publicly released.
Related papers
- Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning [5.663006149337036]
offline model-based reinforcement learning (MBRL) is a powerful approach for data-driven decision-making and control.
There could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging.
We introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces.
arXiv Detail & Related papers (2024-10-15T03:36:43Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Unified Off-Policy Learning to Rank: a Reinforcement Learning
Perspective [61.4025671743675]
Off-policy learning to rank methods often make strong assumptions about how users generate the click data.
We show that offline reinforcement learning can adapt to various click models without complex debiasing techniques and prior knowledge of the model.
Results on various large-scale datasets demonstrate that CUOLR consistently outperforms the state-of-the-art off-policy learning to rank algorithms.
arXiv Detail & Related papers (2023-06-13T03:46:22Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning
via Transition Occupancy Matching [28.743727234246126]
We propose a new "transition occupancy matching" (TOM) objective for model learning.
TOM is good to the extent that the current policy experiences the same distribution of transitions inside the model as in the real environment.
We show that TOM successfully focuses model learning on policy-relevant experience and drives policies faster to higher task rewards.
arXiv Detail & Related papers (2023-05-22T03:06:09Z) - Model Generation with Provable Coverability for Offline Reinforcement
Learning [14.333861814143718]
offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization.
But due to the limitation under the offline setting, the learned model could not mimic real dynamics well enough to support reliable out-of-distribution exploration.
We propose an algorithm to generate models optimizing their coverage for the real dynamics.
arXiv Detail & Related papers (2022-06-01T08:34:09Z) - Online and Offline Reinforcement Learning by Planning with a Learned
Model [15.8026041700727]
We describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points.
We show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions.
We introduce MuZero Unplugged, a single unified algorithm for any data budget, including offline RL.
arXiv Detail & Related papers (2021-04-13T15:36:06Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.