Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for
End-to-End Visual Robotic Manipulation Learning
- URL: http://arxiv.org/abs/2206.08321v1
- Date: Thu, 16 Jun 2022 17:26:06 GMT
- Title: Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for
End-to-End Visual Robotic Manipulation Learning
- Authors: Hyunwoo Ryu, Jeong-Hoon Lee, Hong-in Lee, Jongeun Choi
- Abstract summary: We present end-to-end SE(3)-equivariant models for visual robotic manipulation from a point cloud input.
We show that our models can learn from scratch without prior knowledge yet is highly sample efficient.
- Score: 2.8388425545775386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end learning for visual robotic manipulation is known to suffer from
sample inefficiency, requiring a large number of demonstrations. The spatial
roto-translation equivariance, or the SE(3)-equivariance can be exploited to
improve the sample efficiency for learning robotic manipulation. In this paper,
we present fully end-to-end SE(3)-equivariant models for visual robotic
manipulation from a point cloud input. By utilizing the representation theory
of the Lie group, we construct novel SE(3)-equivariant energy-based models that
allow highly sample efficient end-to-end learning. We show that our models can
learn from scratch without prior knowledge yet is highly sample efficient (~10
demonstrations are enough). Furthermore, we show that the trained models can
generalize to tasks with (i) previously unseen target object poses, (ii)
previously unseen target object instances of the category, and (iii) previously
unseen visual distractors. We experiment with 6-DoF robotic manipulation tasks
to validate our models' sample efficiency and generalizability. Codes are
available at: https://github.com/tomato1mule/edf
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3)
for Visual Robotic Manipulation [5.11432473998551]
Diffusion-EDFs is a novel SE(3)-equivariant diffusion-based approach for visual robotic manipulation tasks.
We show that our proposed method achieves remarkable data efficiency, requiring only 5 to 10 human demonstrations for effective end-to-end training in less than an hour.
arXiv Detail & Related papers (2023-09-06T03:42:20Z) - Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation [25.47207030637466]
Large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems.
We introduce "lossless adaptation" to address this shortcoming of classical fine-tuning.
We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning.
arXiv Detail & Related papers (2023-04-13T15:06:28Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - Factored World Models for Zero-Shot Generalization in Robotic
Manipulation [7.258229016768018]
We learn to generalize over robotic pick-and-place tasks using object-factored world models.
We use a residual stack of graph neural networks that receive action information at multiple levels in both their node and edge neural networks.
We show that an ensemble of our models can be used to plan for tasks involving up to 12 pick and place actions using search.
arXiv Detail & Related papers (2022-02-10T21:26:11Z) - Learning to See before Learning to Act: Visual Pre-training for
Manipulation [48.731528716324355]
We find that pre-training on vision tasks significantly improves generalization and sample efficiency for learning to manipulate objects.
We explore directly transferring model parameters from vision networks to affordance prediction networks, and show that this can result in successful zero-shot adaptation.
With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.
arXiv Detail & Related papers (2021-07-01T17:58:37Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.