Related papers: SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields

SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields

URL: http://arxiv.org/abs/2211.09786v1
Date: Thu, 17 Nov 2022 18:55:42 GMT
Title: SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields
Authors: Anthony Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, Leslie Pack Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal
Abstract summary: We present a method for performing tasks involving spatial relations between novel object instances in arbitrary poses. Our framework provides a scalable way for specifying new tasks using only 5-10 demonstrations. The method is tested on three multi-object rearrangement tasks in simulation and on a real robot.
Score: 39.562247503513156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a method for performing tasks involving spatial relations between novel object instances initialized in arbitrary poses directly from point cloud observations. Our framework provides a scalable way for specifying new tasks using only 5-10 demonstrations. Object rearrangement is formalized as the question of finding actions that configure task-relevant parts of the object in a desired alignment. This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment. We overcome the key technical challenge of determining task-relevant local coordinate frames from a few demonstrations by developing an optimization method based on Neural Descriptor Fields (NDFs) and a single annotated 3D keypoint. An energy-based learning scheme to model the joint configuration of the objects that satisfies a desired relational task further improves performance. The method is tested on three multi-object rearrangement tasks in simulation and on a real robot. Project website, videos, and code: https://anthonysimeonov.github.io/r-ndf/

Related papers

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding. An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z)
A Modern Take on Visual Relationship Reasoning for Grasp Planning [10.543168383800532]
We present a modern take on visual relational reasoning for grasp planning. We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories. We also propose D3G, a new end-to-end transformer-based dependency graph generation model.
arXiv Detail & Related papers (2024-09-03T16:30:48Z)
DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding [7.470587868134298]
Point scene understanding is a challenging task to process real-world scene point cloud. Recent state-of-the-art method first segments each object and then processes them independently with multiple stages for the different sub-tasks. We propose a novel Disentangled Object-Centric TRansformer (DOCTR) that explores object-centric representation.
arXiv Detail & Related papers (2024-03-25T05:22:34Z)
ROAM: Robust and Object-Aware Motion Generation Using Neural Pose Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object. We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object. We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z)
Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation. The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects. Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z)
Robust Change Detection Based on Neural Descriptor Fields [53.111397800478294]
We develop an object-level online change detection approach that is robust to partially overlapping observations and noisy localization results. By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises.
arXiv Detail & Related papers (2022-08-01T17:45:36Z)
Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation [75.83319382105894]
We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target. NDFs are trained in a self-supervised fashion via a 3D auto-encoding task that does not rely on expert-labeled keypoints. Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.
arXiv Detail & Related papers (2021-12-09T18:57:15Z)
SORNet: Spatial Object-Centric Representations for Sequential Manipulation [39.88239245446054]
Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state. We propose SORNet, which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest.
arXiv Detail & Related papers (2021-09-08T19:36:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.