Related papers: STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

URL: http://arxiv.org/abs/2412.15182v1
Date: Thu, 19 Dec 2024 18:54:06 GMT
Title: STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
Authors: Marius Memmel, Jacob Berg, Bingqing Chen, Abhishek Gupta, Jonathan Francis,
Abstract summary: STRAP is a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion.<n>This work proposes STRAP, a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion.
Score: 8.860366821983211
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Robot learning is witnessing a significant increase in the size, diversity, and complexity of pre-collected datasets, mirroring trends in domains such as natural language processing and computer vision. Many robot learning methods treat such datasets as multi-task expert data and learn a multi-task, generalist policy by training broadly across them. Notably, while these generalist policies can improve the average performance across many tasks, the performance of generalist policies on any one task is often suboptimal due to negative transfer between partitions of the data, compared to task-specific specialist policies. In this work, we argue for the paradigm of training policies during deployment given the scenarios they encounter: rather than deploying pre-trained policies to unseen problems in a zero-shot manner, we non-parametrically retrieve and train models directly on relevant data at test time. Furthermore, we show that many robotics tasks share considerable amounts of low-level behaviors and that retrieval at the "sub"-trajectory granularity enables significantly improved data utilization, generalization, and robustness in adapting policies to novel problems. In contrast, existing full-trajectory retrieval methods tend to underutilize the data and miss out on shared cross-task content. This work proposes STRAP, a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion. STRAP outperforms both prior retrieval algorithms and multi-task learning methods in simulated and real experiments, showing the ability to scale to much larger offline datasets in the real world as well as the ability to learn robust control policies with just a handful of real-world demonstrations.

Related papers

Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets. We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z)
Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance [66.51390591688802]
Value-Guided Policy Steering (V-GPS) is compatible with a wide range of different generalist policies, without needing to fine-tune or even access the weights of the policy. We show that the same value function can improve the performance of five different state-of-the-art policies with different architectures.
arXiv Detail & Related papers (2024-10-17T17:46:26Z)
Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation [29.49883684368039]
offline meta-reinforcement learning (OMRL) proficiently allows an agent to tackle novel tasks while relying on a static dataset. We introduce a novel algorithm to disentangle the impact of behavior policy from task representation learning.
arXiv Detail & Related papers (2024-03-12T02:38:36Z)
PoCo: Policy Composition from and for Heterogeneous Robot Learning [44.1315170137613]
Current methods usually collect and pool all data from one domain to train a single policy.<n>We present a flexible approach, dubbed Policy Composition, to combine information across diverse modalities and domains.<n>Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time.
arXiv Detail & Related papers (2024-02-04T14:51:49Z)
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning. Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy. Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z)
Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials [97.95400776235736]
We present a framework based on offline RL that attempts to effectively learn new tasks. It combines pre-training on existing robotic datasets with rapid fine-tuning on a new task, with as few as 10 demonstrations. To our knowledge, PTR is the first RL method that succeeds at learning new tasks in a new domain on a real WidowX robot.
arXiv Detail & Related papers (2022-10-11T06:30:53Z)
Self-Supervised Learning of Multi-Object Keypoints for Robotic Manipulation [8.939008609565368]
In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning. We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.
arXiv Detail & Related papers (2022-05-17T13:15:07Z)
Efficient Self-Supervised Data Collection for Offline Robot Learning [17.461103383630853]
A practical approach to robot reinforcement learning is to first collect a large batch of real or simulated robot interaction data. We develop a simple-yet-effective goal-conditioned reinforcement-learning method that actively focuses data collection on novel observations.
arXiv Detail & Related papers (2021-05-10T18:42:58Z)
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.