Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal
Rearrangement
- URL: http://arxiv.org/abs/2307.04751v1
- Date: Mon, 10 Jul 2023 17:56:06 GMT
- Title: Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal
Rearrangement
- Authors: Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina
Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox
- Abstract summary: We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship.
The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects.
- Score: 49.888011242939385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a system for rearranging objects in a scene to achieve a desired
object-scene placing relationship, such as a book inserted in an open slot of a
bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of
both scenes and objects, and is trained from demonstrations to operate directly
on 3D point clouds. Our system overcomes challenges associated with the
existence of many geometrically-similar rearrangement solutions for a given
scene. By leveraging an iterative pose de-noising training procedure, we can
fit multi-modal demonstration data and produce multi-modal outputs while
remaining precise and accurate. We also show the advantages of conditioning on
relevant local geometric features while ignoring irrelevant global structure
that harms both generalization and precision. We demonstrate our approach on
three distinct rearrangement tasks that require handling multi-modality and
generalization over object shape and pose in both simulation and the real
world. Project website, code, and videos:
https://anthonysimeonov.github.io/rpdiff-multi-modal/
Related papers
- Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection [24.00828999360765]
This paper addresses the challenge of robotic grasping of general objects.
The proposed model first runs by proposing a number of most likely grasp points in the scene.
Around each grasp point, a module is designed to infer any voxel in its neighborhood to be either void or occupied by some object.
The model further estimates 6-DoF grasp poses utilizing the local occupancy-enhanced object shape information.
arXiv Detail & Related papers (2024-07-22T16:22:28Z) - GenS: Generalizable Neural Surface Reconstruction from Multi-View Images [20.184657468900852]
GenS is an end-to-end generalizable neural surface reconstruction model.
Our representation is more powerful, which can recover high-frequency details while maintaining global smoothness.
Experiments on popular benchmarks show that our model can generalize well to new scenes.
arXiv Detail & Related papers (2024-06-04T17:13:10Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Multi-Model 3D Registration: Finding Multiple Moving Objects in
Cluttered Point Clouds [23.923838486208524]
We investigate a variation of the 3D registration problem, named multi-model 3D registration.
In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses.
We want to simultaneously reconstruct how all objects moved between the two point clouds.
arXiv Detail & Related papers (2024-02-16T18:01:43Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - MMRDN: Consistent Representation for Multi-View Manipulation
Relationship Detection in Object-Stacked Scenes [62.20046129613934]
We propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN)
We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions.
We select a set of $K$ Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects.
arXiv Detail & Related papers (2023-04-25T05:55:29Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.