Related papers: Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

URL: http://arxiv.org/abs/2410.13816v1
Date: Thu, 17 Oct 2024 17:46:26 GMT
Title: Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Authors: Mitsuhiko Nakamoto, Oier Mees, Aviral Kumar, Sergey Levine,
Abstract summary: Value-Guided Policy Steering (V-GPS) is compatible with a wide range of different generalist policies, without needing to fine-tune or even access the weights of the policy. We show that the same value function can improve the performance of five different state-of-the-art policies with different architectures.
Score: 66.51390591688802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large, general-purpose robotic policies trained on diverse demonstration datasets have been shown to be remarkably effective both for controlling a variety of robots in a range of different scenes, and for acquiring broad repertoires of manipulation skills. However, the data that such policies are trained on is generally of mixed quality -- not only are human-collected demonstrations unlikely to perform the task perfectly, but the larger the dataset is, the harder it is to curate only the highest quality examples. It also remains unclear how optimal data from one embodiment is for training on another embodiment. In this paper, we present a general and broadly applicable approach that enhances the performance of such generalist robot policies at deployment time by re-ranking their actions according to a value function learned via offline RL. This approach, which we call Value-Guided Policy Steering (V-GPS), is compatible with a wide range of different generalist policies, without needing to fine-tune or even access the weights of the policy. We show that the same value function can improve the performance of five different state-of-the-art policies with different architectures, even though they were trained on distinct datasets, attaining consistent performance improvement on multiple robotic platforms across a total of 12 tasks. Code and videos can be found at: https://nakamotoo.github.io/V-GPS

Related papers

Latent Policy Steering with Embodiment-Agnostic Pretrained World Models [8.265847442964045]
We aim to reduce data collection efforts when learning visuomotor robot policies by leveraging existing or cost-effective data.<n>We use optic flow as an embodiment-agnostic action representation to train a World Model (WM) across multi-embodiment datasets.<n>We develop a method, Latent Policy Steering (LPS), to improve the output of a behavior-cloned policy by searching in the latent space of the WM for better action sequences.
arXiv Detail & Related papers (2025-07-17T17:57:57Z)
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning [8.860366821983211]
STRAP is a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion. This work proposes STRAP, a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion.
arXiv Detail & Related papers (2024-12-19T18:54:06Z)
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning [53.8293458872774]
We propose Reinforcement Learning Distilled Generalists (RLDG) to generate high-quality training data for finetuning generalist policies. We demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations. Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems.
arXiv Detail & Related papers (2024-12-13T04:57:55Z)
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers [41.069074375686164]
We propose Heterogeneous Pre-trained Transformers (HPT), which pre-train a trunk of a policy neural network to learn a task and embodiment shared representation. We conduct experiments to investigate the scaling behaviors of training objectives, to the extent of 52 datasets. HPTs outperform several baselines and enhance the fine-tuned policy performance by over 20% on unseen tasks.
arXiv Detail & Related papers (2024-09-30T17:39:41Z)
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation [49.03165169369552]
By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets. We propose CrossFormer, a scalable and flexible transformer-based policy that can consume data from any embodiment. We demonstrate that the same network weights can control vastly different robots, including single and dual arm manipulation systems, wheeled robots, quadcopters, and quadrupeds.
arXiv Detail & Related papers (2024-08-21T17:57:51Z)
PoCo: Policy Composition from and for Heterogeneous Robot Learning [44.1315170137613]
Current methods usually collect and pool all data from one domain to train a single policy. We present a flexible approach, dubbed Policy Composition, to combine information across diverse modalities and domains. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time.
arXiv Detail & Related papers (2024-02-04T14:51:49Z)
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking [54.776890150458385]
We develop an efficient system for training universal agents capable of multi-task manipulation skills. We are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks. On average, RoboAgent outperforms prior methods by over 40% in unseen situations.
arXiv Detail & Related papers (2023-09-05T03:14:39Z)
Polybot: Training One Policy Across Robots While Embracing Variability [70.74462430582163]
We propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms. Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras. We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes.
arXiv Detail & Related papers (2023-07-07T17:21:16Z)
GNM: A General Navigation Model to Drive Any Robot [67.40225397212717]
General goal-conditioned model for vision-based navigation can be trained on data obtained from many distinct but structurally similar robots. We analyze the necessary design decisions for effective data sharing across robots. We deploy the trained GNM on a range of new robots, including an under quadrotor.
arXiv Detail & Related papers (2022-10-07T07:26:41Z)
Multi-Robot Deep Reinforcement Learning for Mobile Navigation [82.62621210336881]
We propose a deep reinforcement learning algorithm with hierarchically integrated models (HInt) At training time, HInt learns separate perception and dynamics models, and at test time, HInt integrates the two models in a hierarchical manner and plans actions with the integrated model. Our mobile navigation experiments show that HInt outperforms conventional hierarchical policies and single-source approaches.
arXiv Detail & Related papers (2021-06-24T19:07:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.