Semi-Supervised Offline Reinforcement Learning with Action-Free
Trajectories
- URL: http://arxiv.org/abs/2210.06518v3
- Date: Thu, 22 Jun 2023 16:12:20 GMT
- Title: Semi-Supervised Offline Reinforcement Learning with Action-Free
Trajectories
- Authors: Qinqing Zheng, Mikael Henaff, Brandon Amos, Aditya Grover
- Abstract summary: Natural agents can learn from multiple data sources that differ in size, quality, and types of measurements.
We study this in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting.
- Score: 37.14064734165109
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural agents can effectively learn from multiple data sources that differ
in size, quality, and types of measurements. We study this heterogeneity in the
context of offline reinforcement learning (RL) by introducing a new,
practically motivated semi-supervised setting. Here, an agent has access to two
sets of trajectories: labelled trajectories containing state, action and reward
triplets at every timestep, along with unlabelled trajectories that contain
only state and reward information. For this setting, we develop and study a
simple meta-algorithmic pipeline that learns an inverse dynamics model on the
labelled data to obtain proxy-labels for the unlabelled data, followed by the
use of any offline RL algorithm on the true and proxy-labelled trajectories.
Empirically, we find this simple pipeline to be highly successful -- on several
D4RL benchmarks~\cite{fu2020d4rl}, certain offline RL algorithms can match the
performance of variants trained on a fully labelled dataset even when we label
only 10\% of trajectories which are highly suboptimal. To strengthen our
understanding, we perform a large-scale controlled empirical study
investigating the interplay of data-centric properties of the labelled and
unlabelled datasets, with algorithmic design choices (e.g., choice of inverse
dynamics, offline RL algorithm) to identify general trends and best practices
for training RL agents on semi-supervised offline datasets.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based
Trajectory Stitching [21.263554926053178]
In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets.
We introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline.
DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms.
arXiv Detail & Related papers (2024-02-04T10:30:23Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Improving and Benchmarking Offline Reinforcement Learning Algorithms [87.67996706673674]
This work aims to bridge the gaps caused by low-level choices and datasets.
We empirically investigate 20 implementation choices using three representative algorithms.
We find two variants CRR+ and CQL+ achieving new state-of-the-art on D4RL.
arXiv Detail & Related papers (2023-06-01T17:58:46Z) - The Challenges of Exploration for Offline Reinforcement Learning [8.484491887821473]
We study the two processes of reinforcement learning: collecting informative experience and inferring optimal behaviour.
The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest.
We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL.
arXiv Detail & Related papers (2022-01-27T23:59:56Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - Interpretable performance analysis towards offline reinforcement
learning: A dataset perspective [6.526790418943535]
We propose a two-fold taxonomy for existing offline RL algorithms.
We explore the correlation between the performance of different types of algorithms and the distribution of actions under states.
We create a benchmark platform on the Atari domain, entitled easy go (RLEG), at an estimated cost of more than 0.3 million dollars.
arXiv Detail & Related papers (2021-05-12T07:17:06Z) - PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous
Agents via Personalized Simulators [19.026312915461553]
We propose a model-based offline reinforcement learning (RL) approach called PerSim.
We first learn a personalized simulator for each agent by collectively using the historical trajectories across all agents prior to learning a policy.
This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data.
arXiv Detail & Related papers (2021-02-13T17:16:41Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.