Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks
- URL: http://arxiv.org/abs/2404.13478v1
- Date: Sat, 20 Apr 2024 22:16:56 GMT
- Title: Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks
- Authors: Ben Eisner, Yi Yang, Todor Davchev, Mel Vecerik, Jonathan Scholz, David Held,
- Abstract summary: We propose a method for precise relative pose prediction which is provably SE(3)-equivariant.
We accomplish this by factoring the problem into learning an SE(3) invariant task-specific representation of the scene.
We demonstrate that our method can yield substantially more precise placement predictions in simulated placement tasks.
- Score: 24.721590424515043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many robot manipulation tasks can be framed as geometric reasoning tasks, where an agent must be able to precisely manipulate an object into a position that satisfies the task from a set of initial conditions. Often, task success is defined based on the relationship between two objects - for instance, hanging a mug on a rack. In such cases, the solution should be equivariant to the initial position of the objects as well as the agent, and invariant to the pose of the camera. This poses a challenge for learning systems which attempt to solve this task by learning directly from high-dimensional demonstrations: the agent must learn to be both equivariant as well as precise, which can be challenging without any inductive biases about the problem. In this work, we propose a method for precise relative pose prediction which is provably SE(3)-equivariant, can be learned from only a few demonstrations, and can generalize across variations in a class of objects. We accomplish this by factoring the problem into learning an SE(3) invariant task-specific representation of the scene and then interpreting this representation with novel geometric reasoning layers which are provably SE(3) equivariant. We demonstrate that our method can yield substantially more precise placement predictions in simulated placement tasks than previous methods trained with the same amount of data, and can accurately represent relative placement relationships data collected from real-world demonstrations. Supplementary information and videos can be found at https://sites.google.com/view/reldist-iclr-2023.
Related papers
- Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects [14.481805160449282]
Manipulating objects with varying geometries and deformable objects is a major challenge in robotics.
In this work, we frame this problem through the lens of a heterogeneous graph that comprises smaller sub-graphs.
We present a novel and challenging reinforcement learning benchmark, including rigid insertion of diverse objects.
arXiv Detail & Related papers (2025-02-10T20:10:25Z) - 3D-Aware Hypothesis & Verification for Generalizable Relative Object
Pose Estimation [69.73691477825079]
We present a new hypothesis-and-verification framework to tackle the problem of generalizable object pose estimation.
To measure reliability, we introduce a 3D-aware verification that explicitly applies 3D transformations to the 3D object representations learned from the two input images.
arXiv Detail & Related papers (2023-10-05T13:34:07Z) - Weakly-supervised 3D Pose Transfer with Keypoints [57.66991032263699]
Main challenges of 3D pose transfer are: 1) Lack of paired training data with different characters performing the same pose; 2) Disentangling pose and shape information from the target mesh; 3) Difficulty in applying to meshes with different topologies.
We propose a novel weakly-supervised keypoint-based framework to overcome these difficulties.
arXiv Detail & Related papers (2023-07-25T12:40:24Z) - Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation [49.56131393810713]
We present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner.
Our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs.
arXiv Detail & Related papers (2023-06-08T22:55:32Z) - ShapeShift: Superquadric-based Object Pose Estimation for Robotic
Grasping [85.38689479346276]
Current techniques heavily rely on a reference 3D object, limiting their generalizability and making it expensive to expand to new object categories.
This paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object.
arXiv Detail & Related papers (2023-04-10T20:55:41Z) - Image to Sphere: Learning Equivariant Features for Efficient Pose
Prediction [3.823356975862006]
Methods that predict a single point estimate do not predict the pose of objects with symmetries well and cannot represent uncertainty.
We propose a novel mapping of features from the image domain to the 3D rotation manifold.
We demonstrate the effectiveness of our method at object orientation prediction, and achieve state-of-the-art performance on the popular PASCAL3D+ dataset.
arXiv Detail & Related papers (2023-02-27T16:23:19Z) - SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields [39.562247503513156]
We present a method for performing tasks involving spatial relations between novel object instances in arbitrary poses.
Our framework provides a scalable way for specifying new tasks using only 5-10 demonstrations.
The method is tested on three multi-object rearrangement tasks in simulation and on a real robot.
arXiv Detail & Related papers (2022-11-17T18:55:42Z) - Leveraging Equivariant Features for Absolute Pose Regression [9.30597356471664]
We show that a translation and rotation equivariant Convolutional Neural Network directly induces representations of camera motions into the feature space.
We then show that this geometric property allows for implicitly augmenting the training data under a whole group of image plane-preserving transformations.
arXiv Detail & Related papers (2022-04-05T12:44:20Z) - Self-Supervised 3D Hand Pose Estimation from monocular RGB via
Contrastive Learning [50.007445752513625]
We propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
We experimentally investigate the impact of invariant and equivariant contrastive objectives.
We show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of $7.6%$ in PA-EPE on FreiHAND.
arXiv Detail & Related papers (2021-06-10T17:48:57Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.