MOReL : Model-Based Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2005.05951v3
- Date: Tue, 2 Mar 2021 04:35:04 GMT
- Title: MOReL : Model-Based Offline Reinforcement Learning
- Authors: Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten
Joachims
- Abstract summary: In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment.
We present MOReL, an algorithmic framework for model-based offline RL.
We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
- Score: 49.30091375141527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In offline reinforcement learning (RL), the goal is to learn a highly
rewarding policy based solely on a dataset of historical interactions with the
environment. The ability to train RL policies offline can greatly expand the
applicability of RL, its data efficiency, and its experimental velocity. Prior
work in offline RL has been confined almost exclusively to model-free RL
approaches. In this work, we present MOReL, an algorithmic framework for
model-based offline RL. This framework consists of two steps: (a) learning a
pessimistic MDP (P-MDP) using the offline dataset; and (b) learning a
near-optimal policy in this P-MDP. The learned P-MDP has the property that for
any policy, the performance in the real environment is approximately
lower-bounded by the performance in the P-MDP. This enables it to serve as a
good surrogate for purposes of policy evaluation and learning, and overcome
common pitfalls of model-based RL like model exploitation. Theoretically, we
show that MOReL is minimax optimal (up to log factors) for offline RL. Through
experiments, we show that MOReL matches or exceeds state-of-the-art results in
widely studied offline RL benchmarks. Moreover, the modular design of MOReL
enables future advances in its components (e.g. generative modeling,
uncertainty estimation, planning etc.) to directly translate into advances for
offline RL.
Related papers
- Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning [5.663006149337036]
offline model-based reinforcement learning (MBRL) is a powerful approach for data-driven decision-making and control.
There could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging.
We introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces.
arXiv Detail & Related papers (2024-10-15T03:36:43Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Online Policy Optimization for Robust MDP [17.995448897675068]
Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go.
In this work, we consider online robust Markov decision process (MDP) by interacting with an unknown nominal system.
We propose a robust optimistic policy optimization algorithm that is provably efficient.
arXiv Detail & Related papers (2022-09-28T05:18:20Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - Recurrent Model-Free RL is a Strong Baseline for Many POMDPs [73.39666827525782]
Many problems in RL, such as meta RL, robust RL, and generalization in RL, can be cast as POMDPs.
In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs.
Prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs.
arXiv Detail & Related papers (2021-10-11T07:09:14Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.