Related papers: Efficient Self-Supervised Data Collection for Offline Robot Learning

Efficient Self-Supervised Data Collection for Offline Robot Learning

URL: http://arxiv.org/abs/2105.04607v1
Date: Mon, 10 May 2021 18:42:58 GMT
Title: Efficient Self-Supervised Data Collection for Offline Robot Learning
Authors: Shadi Endrawis, Gal Leibovich, Guy Jacob, Gal Novik and Aviv Tamar
Abstract summary: A practical approach to robot reinforcement learning is to first collect a large batch of real or simulated robot interaction data. We develop a simple-yet-effective goal-conditioned reinforcement-learning method that actively focuses data collection on novel observations.
Score: 17.461103383630853
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A practical approach to robot reinforcement learning is to first collect a large batch of real or simulated robot interaction data, using some data collection policy, and then learn from this data to perform various tasks, using offline learning algorithms. Previous work focused on manually designing the data collection policy, and on tasks where suitable policies can easily be designed, such as random picking policies for collecting data about object grasping. For more complex tasks, however, it may be difficult to find a data collection policy that explores the environment effectively, and produces data that is diverse enough for the downstream task. In this work, we propose that data collection policies should actively explore the environment to collect diverse data. In particular, we develop a simple-yet-effective goal-conditioned reinforcement-learning method that actively focuses data collection on novel observations, thereby collecting a diverse data-set. We evaluate our method on simulated robot manipulation tasks with visual inputs and show that the improved diversity of active data collection leads to significant improvements in the downstream learning tasks.

Related papers

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning [8.860366821983211]
STRAP is a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion. This work proposes STRAP, a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion.
arXiv Detail & Related papers (2024-12-19T18:54:06Z)
FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning [28.523528119584526]
Few-shot imitation learning relies on only a small amount of task-specific demonstrations to efficiently adapt a policy for a given downstream tasks. We propose FlowRetrieval, an approach that leverages optical flow representations for both extracting similar motions to target tasks from prior data. Our results show FlowRetrieval significantly outperforms prior methods across simulated and real-world domains.
arXiv Detail & Related papers (2024-08-29T23:48:08Z)
Efficient Data Collection for Robotic Manipulation via Compositional Generalization [70.76782930312746]
We show that policies can compose environmental factors from their data to succeed when encountering unseen factor combinations. We propose better in-domain data collection strategies that exploit composition. We provide videos at http://iliad.stanford.edu/robot-data-comp/.
arXiv Detail & Related papers (2024-03-08T07:15:38Z)
PoCo: Policy Composition from and for Heterogeneous Robot Learning [44.1315170137613]
Current methods usually collect and pool all data from one domain to train a single policy. We present a flexible approach, dubbed Policy Composition, to combine information across diverse modalities and domains. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time.
arXiv Detail & Related papers (2024-02-04T14:51:49Z)
CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection. With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity. Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets [73.2096288987301]
We propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. Our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images.
arXiv Detail & Related papers (2023-04-18T05:42:53Z)
Information Maximizing Curriculum: A Curriculum-Based Approach for Imitating Diverse Skills [14.685043874797742]
We propose a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent. To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning. A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution.
arXiv Detail & Related papers (2023-03-27T16:02:50Z)
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets [122.85598648289789]
We study how multi-domain and multi-task datasets can improve the learning of new tasks in new environments. We also find that data for only a few tasks in a new domain can bridge the domain gap and make it possible for a robot to perform a variety of prior tasks that were only seen in other domains.
arXiv Detail & Related papers (2021-09-27T23:42:12Z)
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming. We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task. We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z)
Probabilistic Active Meta-Learning [15.432006404678981]
We introduce task selection based on prior experience into a meta-learning algorithm. We provide empirical evidence that our approach improves data-efficiency when compared to strong baselines on simulated robotic experiments.
arXiv Detail & Related papers (2020-07-17T12:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.