Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2109.08128v1
- Date: Thu, 16 Sep 2021 17:34:06 GMT
- Title: Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
- Authors: Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey
Levine, Chelsea Finn
- Abstract summary: We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks.
We develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data.
- Score: 119.85598717477016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) algorithms have shown promising results
in domains where abundant pre-collected data is available. However, prior
methods focus on solving individual problems from scratch with an offline
dataset without considering how an offline RL agent can acquire multiple
skills. We argue that a natural use case of offline RL is in settings where we
can pool large amounts of data collected in various scenarios for solving
different tasks, and utilize all of this data to learn behaviors for all the
tasks more effectively rather than training each one in isolation. However,
sharing data across all tasks in multi-task offline RL performs surprisingly
poorly in practice. Thorough empirical analysis, we find that sharing data can
actually exacerbate the distributional shift between the learned policy and the
dataset, which in turn can lead to divergence of the learned policy and poor
performance. To address this challenge, we develop a simple technique for
data-sharing in multi-task offline RL that routes data based on the improvement
over the task-specific data. We call this approach conservative data sharing
(CDS), and it can be applied with multiple single-task offline RL methods. On a
range of challenging multi-task locomotion, navigation, and vision-based
robotic manipulation problems, CDS achieves the best or comparable performance
compared to prior offline multi-task RL methods and previous data sharing
approaches.
Related papers
- Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning [11.790581500542439]
Reinforcement learning (RL) with diverse offline datasets can have the advantage of leveraging the relation of multiple tasks.
We present a skill-based multi-task RL technique on heterogeneous datasets that are generated by behavior policies of different quality.
We show that our multi-task offline RL approach is robust to the mixed configurations of different-quality datasets.
arXiv Detail & Related papers (2024-08-28T07:36:20Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning [116.87367592920171]
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset.
In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks.
We propose an uncertainty-based Multi-Task Data Sharing (MTDS) approach that shares the entire dataset without data selection.
arXiv Detail & Related papers (2024-04-30T08:16:52Z) - CUDC: A Curiosity-Driven Unsupervised Data Collection Method with
Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection.
With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity.
Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z) - Launchpad: Learning to Schedule Using Offline and Online RL Methods [9.488752723308954]
Existing RL schedulers overlook the importance of learning from historical data and improving upon custom policies.
offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction.
These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL.
arXiv Detail & Related papers (2022-12-01T16:40:11Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.