Revisiting Design Choices in Model-Based Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2110.04135v1
- Date: Fri, 8 Oct 2021 13:51:34 GMT
- Title: Revisiting Design Choices in Model-Based Offline Reinforcement Learning
- Authors: Cong Lu, Philip J. Ball, Jack Parker-Holder, Michael A. Osborne,
Stephen J. Roberts
- Abstract summary: Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies.
This paper compares and designs novel protocols to investigate their interaction with other hyper parameters, such as the number of models, or imaginary rollout horizon.
- Score: 39.01805509055988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning enables agents to leverage large pre-collected
datasets of environment transitions to learn control policies, circumventing
the need for potentially expensive or unsafe online data collection.
Significant progress has been made recently in offline model-based
reinforcement learning, approaches which leverage a learned dynamics model.
This typically involves constructing a probabilistic model, and using the model
uncertainty to penalize rewards where there is insufficient data, solving for a
pessimistic MDP that lower bounds the true MDP. Existing methods, however,
exhibit a breakdown between theory and practice, whereby pessimistic return
ought to be bounded by the total variation distance of the model from the true
dynamics, but is instead implemented through a penalty based on estimated model
uncertainty. This has spawned a variety of uncertainty heuristics, with little
to no comparison between differing approaches. In this paper, we compare these
heuristics, and design novel protocols to investigate their interaction with
other hyperparameters, such as the number of models, or imaginary rollout
horizon. Using these insights, we show that selecting these key hyperparameters
using Bayesian Optimization produces superior configurations that are vastly
different to those currently used in existing hand-tuned state-of-the-art
methods, and result in drastically stronger performance.
Related papers
- Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - Model-based Offline Policy Optimization with Adversarial Network [0.36868085124383626]
We propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN)
Key idea is to use adversarial learning to build a transition model with better generalization.
Our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks.
arXiv Detail & Related papers (2023-09-05T11:49:33Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.