PoCo: Policy Composition from and for Heterogeneous Robot Learning
- URL: http://arxiv.org/abs/2402.02511v2
- Date: Mon, 27 May 2024 14:29:57 GMT
- Title: PoCo: Policy Composition from and for Heterogeneous Robot Learning
- Authors: Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, Russ Tedrake,
- Abstract summary: Current methods usually collect and pool all data from one domain to train a single policy.
We present a flexible approach, dubbed Policy Composition, to combine information across diverse modalities and domains.
Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time.
- Score: 44.1315170137613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy to handle such heterogeneity in tasks and domains, which is prohibitively expensive and difficult. In this work, we present a flexible approach, dubbed Policy Composition, to combine information across such diverse modalities and domains for learning scene-level and task-level generalized manipulation skills, by composing different data distributions represented with diffusion models. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time. We train our method on simulation, human, and real robot data and evaluate in tool-use tasks. The composed policy achieves robust and dexterous performance under varying scenes and tasks and outperforms baselines from a single data source in both simulation and real-world experiments. See https://liruiw.github.io/policycomp for more details .
Related papers
- Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance [66.51390591688802]
Value-Guided Policy Steering (V-GPS) is compatible with a wide range of different generalist policies, without needing to fine-tune or even access the weights of the policy.
We show that the same value function can improve the performance of five different state-of-the-art policies with different architectures.
arXiv Detail & Related papers (2024-10-17T17:46:26Z) - Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers [41.069074375686164]
We propose Heterogeneous Pre-trained Transformers (HPT), which pre-train a trunk of a policy neural network to learn a task and embodiment shared representation.
We conduct experiments to investigate the scaling behaviors of training objectives, to the extent of 52 datasets.
HPTs outperform several baselines and enhance the fine-tuned policy performance by over 20% on unseen tasks.
arXiv Detail & Related papers (2024-09-30T17:39:41Z) - EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning [36.0274770291531]
We propose Equibot, a robust, data-efficient, and generalizable approach for robot manipulation task learning.
Our approach combines SIM(3)-equivariant neural network architectures with diffusion models.
We show that our method can easily generalize to novel objects and scenes after learning from just 5 minutes of human demonstrations in each task.
arXiv Detail & Related papers (2024-07-01T17:09:43Z) - Efficient Data Collection for Robotic Manipulation via Compositional Generalization [70.76782930312746]
We show that policies can compose environmental factors from their data to succeed when encountering unseen factor combinations.
We propose better in-domain data collection strategies that exploit composition.
We provide videos at http://iliad.stanford.edu/robot-data-comp/.
arXiv Detail & Related papers (2024-03-08T07:15:38Z) - Robot Fleet Learning via Policy Merging [58.5086287737653]
We propose FLEET-MERGE to efficiently merge policies in the fleet setting.
We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment.
We introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks.
arXiv Detail & Related papers (2023-10-02T17:23:51Z) - Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world.
However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots.
One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z) - Information Maximizing Curriculum: A Curriculum-Based Approach for
Imitating Diverse Skills [14.685043874797742]
We propose a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent.
To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning.
A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution.
arXiv Detail & Related papers (2023-03-27T16:02:50Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Efficient Self-Supervised Data Collection for Offline Robot Learning [17.461103383630853]
A practical approach to robot reinforcement learning is to first collect a large batch of real or simulated robot interaction data.
We develop a simple-yet-effective goal-conditioned reinforcement-learning method that actively focuses data collection on novel observations.
arXiv Detail & Related papers (2021-05-10T18:42:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.