Masked Imitation Learning: Discovering Environment-Invariant Modalities
in Multimodal Demonstrations
- URL: http://arxiv.org/abs/2209.07682v1
- Date: Fri, 16 Sep 2022 02:45:13 GMT
- Title: Masked Imitation Learning: Discovering Environment-Invariant Modalities
in Multimodal Demonstrations
- Authors: Yilun Hao, Ruinan Wang, Zhangjie Cao, Zihan Wang, Yuchen Cui, Dorsa
Sadigh
- Abstract summary: Extraneous data modalities can lead to state over-specification.
State over-specification leads to issues such as the learned policy not generalizing outside of the training data distribution.
We develop a bi-level optimization algorithm that learns this mask to accurately filter over-specified modalities.
- Score: 37.33625951008865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal demonstrations provide robots with an abundance of information to
make sense of the world. However, such abundance may not always lead to good
performance when it comes to learning sensorimotor control policies from human
demonstrations.
Extraneous data modalities can lead to state over-specification, where the
state contains modalities that are not only useless for decision-making but
also can change data distribution across environments. State over-specification
leads to issues such as the learned policy not generalizing outside of the
training data distribution.
In this work, we propose Masked Imitation Learning (MIL) to address state
over-specification by selectively using informative modalities. Specifically,
we design a masked policy network with a binary mask to block certain
modalities. We develop a bi-level optimization algorithm that learns this mask
to accurately filter over-specified modalities. We demonstrate empirically that
MIL outperforms baseline algorithms in simulated domains including MuJoCo and a
robot arm environment using the Robomimic dataset, and effectively recovers the
environment-invariant modalities on a multimodal dataset collected on a real
robot. Our project website presents supplemental details and videos of our
results at: https://tinyurl.com/masked-il
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning [36.0274770291531]
We propose Equibot, a robust, data-efficient, and generalizable approach for robot manipulation task learning.
Our approach combines SIM(3)-equivariant neural network architectures with diffusion models.
We show that our method can easily generalize to novel objects and scenes after learning from just 5 minutes of human demonstrations in each task.
arXiv Detail & Related papers (2024-07-01T17:09:43Z) - PoCo: Policy Composition from and for Heterogeneous Robot Learning [44.1315170137613]
Current methods usually collect and pool all data from one domain to train a single policy.
We present a flexible approach, dubbed Policy Composition, to combine information across diverse modalities and domains.
Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time.
arXiv Detail & Related papers (2024-02-04T14:51:49Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world.
However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots.
One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Curriculum-Based Imitation of Versatile Skills [15.97723808124603]
Learning skills by imitation is a promising concept for the intuitive teaching of robots.
A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations.
Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways.
arXiv Detail & Related papers (2023-04-11T12:10:41Z) - Information Maximizing Curriculum: A Curriculum-Based Approach for
Imitating Diverse Skills [14.685043874797742]
We propose a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent.
To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning.
A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution.
arXiv Detail & Related papers (2023-03-27T16:02:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.