MOPO: Model-based Offline Policy Optimization
- URL: http://arxiv.org/abs/2005.13239v6
- Date: Sun, 22 Nov 2020 07:04:17 GMT
- Title: MOPO: Model-based Offline Policy Optimization
- Authors: Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey
Levine, Chelsea Finn, Tengyu Ma
- Abstract summary: offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
- Score: 183.6449600580806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) refers to the problem of learning
policies entirely from a large batch of previously collected data. This problem
setting offers the promise of utilizing such datasets to acquire policies
without any costly or dangerous active exploration. However, it is also
challenging, due to the distributional shift between the offline training data
and those states visited by the learned policy. Despite significant recent
progress, the most successful prior methods are model-free and constrain the
policy to the support of data, precluding generalization to unseen states. In
this paper, we first observe that an existing model-based RL algorithm already
produces significant gains in the offline setting compared to model-free
approaches. However, standard model-based RL methods, designed for the online
setting, do not provide an explicit mechanism to avoid the offline setting's
distributional shift issue. Instead, we propose to modify the existing
model-based RL methods by applying them with rewards artificially penalized by
the uncertainty of the dynamics. We theoretically show that the algorithm
maximizes a lower bound of the policy's return under the true MDP. We also
characterize the trade-off between the gain and risk of leaving the support of
the batch data. Our algorithm, Model-based Offline Policy Optimization (MOPO),
outperforms standard model-based RL algorithms and prior state-of-the-art
model-free offline RL algorithms on existing offline RL benchmarks and two
challenging continuous control tasks that require generalizing from data
collected for a different task. The code is available at
https://github.com/tianheyu927/mopo.
Related papers
- Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning [5.663006149337036]
offline model-based reinforcement learning (MBRL) is a powerful approach for data-driven decision-making and control.
There could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging.
We introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces.
arXiv Detail & Related papers (2024-10-15T03:36:43Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.