What is Essential for Unseen Goal Generalization of Offline
Goal-conditioned RL?
- URL: http://arxiv.org/abs/2305.18882v2
- Date: Fri, 2 Jun 2023 05:31:19 GMT
- Title: What is Essential for Unseen Goal Generalization of Offline
Goal-conditioned RL?
- Authors: Rui Yang, Yong Lin, Xiaoteng Ma, Hao Hu, Chongjie Zhang, Tong Zhang
- Abstract summary: offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully offline datasets.
We propose a new offline GCRL method, Generalizable Offline goAl-condiTioned RL (GOAT)
On a new benchmark containing 9 independent identically distributed (IID) tasks and 17 OOD tasks, GOAT outperforms current state-of-the-art methods by a large margin.
- Score: 31.202506227437937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline goal-conditioned RL (GCRL) offers a way to train general-purpose
agents from fully offline datasets. In addition to being conservative within
the dataset, the generalization ability to achieve unseen goals is another
fundamental challenge for offline GCRL. However, to the best of our knowledge,
this problem has not been well studied yet. In this paper, we study
out-of-distribution (OOD) generalization of offline GCRL both theoretically and
empirically to identify factors that are important. In a number of experiments,
we observe that weighted imitation learning enjoys better generalization than
pessimism-based offline RL method. Based on this insight, we derive a theory
for OOD generalization, which characterizes several important design choices.
We then propose a new offline GCRL method, Generalizable Offline
goAl-condiTioned RL (GOAT), by combining the findings from our theoretical and
empirical studies. On a new benchmark containing 9 independent identically
distributed (IID) tasks and 17 OOD tasks, GOAT outperforms current
state-of-the-art methods by a large margin.
Related papers
- Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement
Learning [41.971465819626005]
We present Open RL Benchmark, a set of fully tracked RL experiments.
Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data.
Special care is taken to ensure that each experiment is precisely reproducible.
arXiv Detail & Related papers (2024-02-05T14:32:00Z) - SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning [33.125187822259186]
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.
We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe.
arXiv Detail & Related papers (2023-11-03T16:19:33Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.
Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold.
This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.