Related papers: Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation

Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation

URL: http://arxiv.org/abs/2112.05124v1
Date: Thu, 9 Dec 2021 18:57:15 GMT
Title: Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation
Authors: Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann
Abstract summary: We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target. NDFs are trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints. Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.
Score: 75.83319382105894
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target (such as a robot gripper or a rack used for hanging) via category-level descriptors. We employ this representation for object manipulation, where given a task demonstration, we want to repeat the same task on a new object instance from the same category. We propose to achieve this objective by searching (via optimization) for the pose whose descriptor matches that observed in the demonstration. NDFs are conveniently trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints. Further, NDFs are SE(3)-equivariant, guaranteeing performance that generalizes across all possible 3D object translations and rotations. We demonstrate learning of manipulation tasks from few (5-10) demonstrations both in simulation and on a real robot. Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors. Project website: https://yilundu.github.io/ndf/.

Related papers

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting [54.92763171355442]
ObjectGS is an object-aware framework that unifies 3D scene reconstruction with semantic understanding.<n>We show through experiments that ObjectGS outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks.
arXiv Detail & Related papers (2025-07-21T10:06:23Z)
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level. Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z)
ROAM: Robust and Object-Aware Motion Generation Using Neural Pose Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object. We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object. We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z)
ShapeShift: Superquadric-based Object Pose Estimation for Robotic Grasping [85.38689479346276]
Current techniques heavily rely on a reference 3D object, limiting their generalizability and making it expensive to expand to new object categories. This paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object.
arXiv Detail & Related papers (2023-04-10T20:55:41Z)
USEEK: Unsupervised SE(3)-Equivariant 3D Keypoints for Generalizable Manipulation [19.423310410631085]
U.S.EEK is an unsupervised SE(3)-equivariant keypoints method that enjoys alignment across instances in a category. With USEEK in hand, the robot can infer the category-level task-relevant object frames in an efficient and explainable manner.
arXiv Detail & Related papers (2022-09-28T06:42:29Z)
Object-Compositional Neural Implicit Surfaces [45.274466719163925]
The neural implicit representation has shown its effectiveness in novel view synthesis and high-quality 3D reconstruction from multi-view images. This paper proposes a novel framework, ObjectSDF, to build an object-compositional neural implicit representation with high fidelity in 3D reconstruction and object representation.
arXiv Detail & Related papers (2022-07-20T06:38:04Z)
Point2Seq: Detecting 3D Objects as Sequences [58.63662049729309]
We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds. We view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner.
arXiv Detail & Related papers (2022-03-25T00:20:31Z)
Supervised Training of Dense Object Nets using Optimal Descriptors for Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community. In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs. We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z)
Rapid Pose Label Generation through Sparse Representation of Unknown Objects [7.32172860877574]
This work presents an approach for rapidly generating real-world, pose-annotated RGB-D data for unknown objects. We first source minimalistic labelings of an ordered set of arbitrarily chosen keypoints over a set of RGB-D videos. By solving an optimization problem, we combine these labels under a world frame to recover a sparse, keypoint-based representation of the object.
arXiv Detail & Related papers (2020-11-07T15:14:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.