The Challenges of Exploration for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2201.11861v1
- Date: Thu, 27 Jan 2022 23:59:56 GMT
- Title: The Challenges of Exploration for Offline Reinforcement Learning
- Authors: Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan,
Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller
- Abstract summary: We study the two processes of reinforcement learning: collecting informative experience and inferring optimal behaviour.
The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest.
We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL.
- Score: 8.484491887821473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Reinforcement Learning (ORL) enablesus to separately study the two
interlinked processes of reinforcement learning: collecting informative
experience and inferring optimal behaviour. The second step has been widely
studied in the offline setting, but just as critical to data-efficient RL is
the collection of informative data. The task-agnostic setting for data
collection, where the task is not known a priori, is of particular interest due
to the possibility of collecting a single dataset and using it to solve several
downstream tasks as they arise. We investigate this setting via curiosity-based
intrinsic motivation, a family of exploration methods which encourage the agent
to explore those states or transitions it has not yet learned to model. With
Explore2Offline, we propose to evaluate the quality of collected data by
transferring the collected data and inferring policies with reward relabelling
and standard offline RL algorithms. We evaluate a wide variety of data
collection strategies, including a new exploration agent, Intrinsic Model
Predictive Control (IMPC), using this scheme and demonstrate their performance
on various tasks. We use this decoupled framework to strengthen intuitions
about exploration and the data prerequisites for effective offline RL.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - CUDC: A Curiosity-Driven Unsupervised Data Collection Method with
Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection.
With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity.
Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z) - Semi-Supervised Offline Reinforcement Learning with Action-Free
Trajectories [37.14064734165109]
Natural agents can learn from multiple data sources that differ in size, quality, and types of measurements.
We study this in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting.
arXiv Detail & Related papers (2022-10-12T18:22:23Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.