Related papers: Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

URL: http://arxiv.org/abs/2304.08742v2
Date: Sat, 13 May 2023 00:05:05 GMT
Title: Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets
Authors: Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn
Abstract summary: We propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. Our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images.
Score: 73.2096288987301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.

Related papers

DataMIL: Selecting Data for Robot Imitation Learning with Datamodels [77.48472034791213]
We introduce DataMIL, a policy-driven data selection framework built on the datamodels paradigm.<n>Unlike standard practices that filter data using human notions of quality, DataMIL directly optimize data selection for task success.<n>We validate our approach on a suite of more than 60 simulation and real-world manipulation tasks.
arXiv Detail & Related papers (2025-05-14T17:55:10Z)
One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering [31.025439143093585]
Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets. These models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks. We propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models.
arXiv Detail & Related papers (2024-11-04T16:04:59Z)
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks. We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks. Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z)
A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration [26.17597857264231]
In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration. Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, the test task and the solving strategy do not exist directly in the training data. We propose a Memory-related Multi-task Method (M3) to address this problem.
arXiv Detail & Related papers (2022-09-09T03:02:49Z)
Using Self-Supervised Pretext Tasks for Active Learning [7.214674613451605]
We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. The pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and grouped into batches by their pretext task losses. In each iteration, the main task model is used to sample the most uncertain data in a batch to be annotated.
arXiv Detail & Related papers (2022-01-19T07:58:06Z)
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming. We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task. We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z)
Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task. We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)
Meta-learning for mixed linear regression [44.27602704497616]
In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We show that we can efficiently utilize small data tasks with the help of $tildeOmega(k3/2)$ medium data tasks each with $tildeOmega(k1/2)$ examples.
arXiv Detail & Related papers (2020-02-20T18:34:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.