Domain Generalization for Robust Model-Based Offline Reinforcement
Learning
- URL: http://arxiv.org/abs/2211.14827v1
- Date: Sun, 27 Nov 2022 13:37:49 GMT
- Title: Domain Generalization for Robust Model-Based Offline Reinforcement
Learning
- Authors: Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen
Chung, David Krueger
- Abstract summary: Existing offline reinforcement learning algorithms assume that training data is either generated by a known policy, or of entirely unknown origin.
We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators.
We propose Domain-Invariant Model-based Offline RL (DIMORL), where we apply Risk Extrapolation (REx) to the process of learning dynamics and rewards models.
- Score: 5.653790804686631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing offline reinforcement learning (RL) algorithms typically assume that
training data is either: 1) generated by a known policy, or 2) of entirely
unknown origin. We consider multi-demonstrator offline RL, a middle ground
where we know which demonstrators generated each dataset, but make no
assumptions about the underlying policies of the demonstrators. This is the
most natural setting when collecting data from multiple human operators, yet
remains unexplored. Since different demonstrators induce different data
distributions, we show that this can be naturally framed as a domain
generalization problem, with each demonstrator corresponding to a different
domain. Specifically, we propose Domain-Invariant Model-based Offline RL
(DIMORL), where we apply Risk Extrapolation (REx) (Krueger et al., 2020) to the
process of learning dynamics and rewards models. Our results show that models
trained with REx exhibit improved domain generalization performance when
compared with the natural baseline of pooling all demonstrators' data. We
observe that the resulting models frequently enable the learning of superior
policies in the offline model-based RL setting, can improve the stability of
the policy learning process, and potentially enable increased exploration.
Related papers
- Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data.
Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z) - Learning from Random Demonstrations: Offline Reinforcement Learning with Importance-Sampled Diffusion Models [19.05224410249602]
We propose a novel approach for offline reinforcement learning with closed-loop policy evaluation and world-model adaptation.
We analyzed the performance of the proposed method and provided an upper bound on the return gap between our method and the real environment under an optimal policy.
arXiv Detail & Related papers (2024-05-30T09:34:31Z) - Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences.
Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples.
We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z) - Diffusion Policies for Out-of-Distribution Generalization in Offline
Reinforcement Learning [1.9336815376402723]
offline RL methods leverage previous experiences to learn better policies than the behavior policy used for data collection.
However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training.
We introduce a novel method named State Reconstruction for Diffusion Policies (SRDP), incorporating state reconstruction feature learning in the recent class of diffusion policies.
arXiv Detail & Related papers (2023-07-10T17:34:23Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous
Agents via Personalized Simulators [19.026312915461553]
We propose a model-based offline reinforcement learning (RL) approach called PerSim.
We first learn a personalized simulator for each agent by collectively using the historical trajectories across all agents prior to learning a policy.
This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data.
arXiv Detail & Related papers (2021-02-13T17:16:41Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.