RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2204.12581v1
- Date: Tue, 26 Apr 2022 20:42:14 GMT
- Title: RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
- Authors: Marc Rigter, Bruno Lacerda, Nick Hawes
- Abstract summary: We present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL.
To achieve conservatism, we formulate the problem as a two-player zero sum game against an adversarial environment model.
We evaluate our approach on widely studied offline RL benchmarks, and demonstrate that our approach achieves state of the art performance.
- Score: 11.183124892686239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) aims to find near-optimal policies from
logged data without further environment interaction. Model-based algorithms,
which learn a model of the environment from the dataset and perform
conservative policy optimisation within that model, have emerged as a promising
approach to this problem. In this work, we present Robust Adversarial
Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL. To
achieve conservatism, we formulate the problem as a two-player zero sum game
against an adversarial environment model. The model is trained minimise the
value function while still accurately predicting the transitions in the
dataset, forcing the policy to act conservatively in areas not covered by the
dataset. To approximately solve the two-player game, we alternate between
optimising the policy and optimising the model adversarially. The problem
formulation that we address is theoretically grounded, resulting in a PAC
performance guarantee and a pessimistic value function which lower bounds the
value function in the true environment. We evaluate our approach on widely
studied offline RL benchmarks, and demonstrate that our approach achieves state
of the art performance.
Related papers
- MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Conservative Bayesian Model-Based Value Expansion for Offline Policy
Optimization [41.774837419584735]
offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy.
Model-based approaches are particularly appealing since they can extract more learning signals from the logged dataset by learning a model of the environment.
arXiv Detail & Related papers (2022-10-07T20:13:50Z) - Online Policy Optimization for Robust MDP [17.995448897675068]
Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go.
In this work, we consider online robust Markov decision process (MDP) by interacting with an unknown nominal system.
We propose a robust optimistic policy optimization algorithm that is provably efficient.
arXiv Detail & Related papers (2022-09-28T05:18:20Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Robust Reinforcement Learning using Offline Data [23.260211453437055]
We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI)
RFQI uses only an offline dataset to learn the optimal robust policy.
We prove that RFQI learns a near-optimal robust policy under standard assumptions.
arXiv Detail & Related papers (2022-08-10T03:47:45Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.