USEEK: Unsupervised SE(3)-Equivariant 3D Keypoints for Generalizable
Manipulation
- URL: http://arxiv.org/abs/2209.13864v1
- Date: Wed, 28 Sep 2022 06:42:29 GMT
- Title: USEEK: Unsupervised SE(3)-Equivariant 3D Keypoints for Generalizable
Manipulation
- Authors: Zhengrong Xue, Zhecheng Yuan, Jiashun Wang, Xueqian Wang, Yang Gao,
Huazhe Xu
- Abstract summary: U.S.EEK is an unsupervised SE(3)-equivariant keypoints method that enjoys alignment across instances in a category.
With USEEK in hand, the robot can infer the category-level task-relevant object frames in an efficient and explainable manner.
- Score: 19.423310410631085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can a robot manipulate intra-category unseen objects in arbitrary poses with
the help of a mere demonstration of grasping pose on a single object instance?
In this paper, we try to address this intriguing challenge by using USEEK, an
unsupervised SE(3)-equivariant keypoints method that enjoys alignment across
instances in a category, to perform generalizable manipulation. USEEK follows a
teacher-student structure to decouple the unsupervised keypoint discovery and
SE(3)-equivariant keypoint detection. With USEEK in hand, the robot can infer
the category-level task-relevant object frames in an efficient and explainable
manner, enabling manipulation of any intra-category objects from and to any
poses. Through extensive experiments, we demonstrate that the keypoints
produced by USEEK possess rich semantics, thus successfully transferring the
functional knowledge from the demonstration object to the novel ones. Compared
with other object representations for manipulation, USEEK is more adaptive in
the face of large intra-category shape variance, more robust with limited
demonstrations, and more efficient at inference time.
Related papers
- Keypoint Abstraction using Large Models for Object-Relative Imitation Learning [78.92043196054071]
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics.
Keypoint-based representations have been proven effective as a succinct representation for essential object capturing features.
We propose KALM, a framework that leverages large pre-trained vision-language models to automatically generate task-relevant and cross-instance consistent keypoints.
arXiv Detail & Related papers (2024-10-30T17:37:31Z) - Kinematic-aware Prompting for Generalizable Articulated Object
Manipulation with LLMs [53.66070434419739]
Generalizable articulated object manipulation is essential for home-assistant robots.
We propose a kinematic-aware prompting framework that prompts Large Language Models with kinematic knowledge of objects to generate low-level motion waypoints.
Our framework outperforms traditional methods on 8 categories seen and shows a powerful zero-shot capability for 8 unseen articulated object categories.
arXiv Detail & Related papers (2023-11-06T03:26:41Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - You Only Look at One: Category-Level Object Representations for Pose
Estimation From a Single Example [26.866356430469757]
We present a method for achieving category-level pose estimation by inspection of just a single object from a desired category.
We demonstrate that our method runs in real-time, enabling a robot manipulator equipped with an RGBD sensor to perform online 6D pose estimation for novel objects.
arXiv Detail & Related papers (2023-05-22T01:32:24Z) - 3D-QueryIS: A Query-based Framework for 3D Instance Segmentation [74.6998931386331]
Previous methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.
We propose a novel query-based method, termed as 3D-QueryIS, which is detector-free, semantic segmentation-free, and cluster-free.
Our 3D-QueryIS is free from the accumulated errors caused by the inter-task dependencies.
arXiv Detail & Related papers (2022-11-17T07:04:53Z) - You Only Demonstrate Once: Category-Level Manipulation from Single
Visual Demonstration [9.245605426105922]
This work proposes a novel, category-level manipulation framework.
It uses an object-centric, category-level representation and model-free 6 DoF motion tracking.
Experiments demonstrate its efficacy in a range of challenging industrial tasks in high-precision assembly.
arXiv Detail & Related papers (2022-01-30T03:59:14Z) - Neural Descriptor Fields: SE(3)-Equivariant Object Representations for
Manipulation [75.83319382105894]
We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target.
NDFs are trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints.
Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.
arXiv Detail & Related papers (2021-12-09T18:57:15Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.