Related papers: Adversarial Model for Offline Reinforcement Learning

Adversarial Model for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2302.11048v2
Date: Sun, 24 Dec 2023 14:19:08 GMT
Title: Adversarial Model for Offline Reinforcement Learning
Authors: Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng
Abstract summary: We propose a model-based offline Reinforcement Learning framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR) ARMOR can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage. We show that ARMOR achieves competent performance with both state-of-the-art offline model-free and model-based RL algorithms.
Score: 39.77825908168264
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage. ARMOR is designed to optimize policies for the worst-case performance relative to the reference policy through adversarially training a Markov decision process model. In theory, we prove that ARMOR, with a well-tuned hyperparameter, can compete with the best policy within data coverage when the reference policy is supported by the data. At the same time, ARMOR is robust to hyperparameter choices: the policy learned by ARMOR, with "any" admissible hyperparameter, would never degrade the performance of the reference policy, even when the reference policy is not covered by the dataset. To validate these properties in practice, we design a scalable implementation of ARMOR, which by adversarial training, can optimize policies without using model ensembles in contrast to typical model-based methods. We show that ARMOR achieves competent performance with both state-of-the-art offline model-free and model-based RL algorithms and can robustly improve the reference policy over various hyperparameter choices.

Related papers

MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning [5.399953810215838]
We develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself from existing literature by employing an unrestricted policy class. The effectiveness of MoMA is demonstrated via numerical studies.
arXiv Detail & Related papers (2024-01-21T03:11:50Z)
Model-based Offline Reinforcement Learning with Local Misspecification [35.75701143290119]
We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch. We propose an empirical algorithm for optimal offline policy selection.
arXiv Detail & Related papers (2023-01-26T21:26:56Z)
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data [27.007647483635516]
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR) ARMOR robustly learns policies to improve upon an arbitrary baseline policy regardless of data coverage.
arXiv Detail & Related papers (2022-11-08T20:15:28Z)
A Unified Framework for Alternating Offline Model Training and Policy Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning. We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z)
Robust Reinforcement Learning using Offline Data [23.260211453437055]
We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI) RFQI uses only an offline dataset to learn the optimal robust policy. We prove that RFQI learns a near-optimal robust policy under standard assumptions.
arXiv Detail & Related papers (2022-08-10T03:47:45Z)
Supported Policy Optimization for Offline Reinforcement Learning [74.1011309005488]
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization. Regularization methods reduce the divergence between the learned policy and the behavior policy. This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint.
arXiv Detail & Related papers (2022-02-13T07:38:36Z)
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy. We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z)
COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable. We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions. We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.