Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled
Datasets
- URL: http://arxiv.org/abs/2304.08742v2
- Date: Sat, 13 May 2023 00:05:05 GMT
- Title: Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled
Datasets
- Authors: Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn
- Abstract summary: We propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset.
We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data.
Our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images.
- Score: 73.2096288987301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enabling robots to learn novel visuomotor skills in a data-efficient manner
remains an unsolved problem with myriad challenges. A popular paradigm for
tackling this problem is through leveraging large unlabeled datasets that have
many behaviors in them and then adapting a policy to a specific task using a
small amount of task-specific human supervision (i.e. interventions or
demonstrations). However, how best to leverage the narrow task-specific
supervision and balance it with offline data remains an open question. Our key
insight in this work is that task-specific data not only provides new data for
an agent to train on but can also inform the type of prior data the agent
should use for learning. Concretely, we propose a simple approach that uses a
small amount of downstream expert data to selectively query relevant behaviors
from an offline, unlabeled dataset (including many sub-optimal behaviors). The
agent is then jointly trained on the expert and queried data. We observe that
our method learns to query only the relevant transitions to the task, filtering
out sub-optimal or task-irrelevant data. By doing so, it is able to learn more
effectively from the mix of task-specific and offline data compared to naively
mixing the data or only using the task-specific data. Furthermore, we find that
our simple querying approach outperforms more complex goal-conditioned methods
by 20% across simulated and real robotic manipulation tasks from images. See
https://sites.google.com/view/behaviorretrieval for videos and code.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration [26.17597857264231]
In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration.
Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, the test task and the solving strategy do not exist directly in the training data.
We propose a Memory-related Multi-task Method (M3) to address this problem.
arXiv Detail & Related papers (2022-09-09T03:02:49Z) - Using Self-Supervised Pretext Tasks for Active Learning [7.214674613451605]
We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative.
The pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and grouped into batches by their pretext task losses.
In each iteration, the main task model is used to sample the most uncertain data in a batch to be annotated.
arXiv Detail & Related papers (2022-01-19T07:58:06Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z) - Meta-learning for mixed linear regression [44.27602704497616]
In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data.
We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data?
We show that we can efficiently utilize small data tasks with the help of $tildeOmega(k3/2)$ medium data tasks each with $tildeOmega(k1/2)$ examples.
arXiv Detail & Related papers (2020-02-20T18:34:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.