EFEM: Equivariant Neural Field Expectation Maximization for 3D Object
Segmentation Without Scene Supervision
- URL: http://arxiv.org/abs/2303.15440v1
- Date: Mon, 27 Mar 2023 17:59:29 GMT
- Title: EFEM: Equivariant Neural Field Expectation Maximization for 3D Object
Segmentation Without Scene Supervision
- Authors: Jiahui Lei and Congyue Deng and Karl Schmeckpeper and Leonidas Guibas
and Kostas Daniilidis
- Abstract summary: We introduce Equivariant Neural Field Expectation Maximization (EFEM) to segment objects in 3D scenes without annotations or training on scenes.
First, we introduce equivariant shape representations to this problem to eliminate the complexity induced by the variation in object configuration.
Second, we propose a novel EM algorithm that can iteratively refine segmentation masks using the equivariant shape prior.
- Score: 35.232051353760035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Equivariant Neural Field Expectation Maximization (EFEM), a
simple, effective, and robust geometric algorithm that can segment objects in
3D scenes without annotations or training on scenes. We achieve such
unsupervised segmentation by exploiting single object shape priors. We make two
novel steps in that direction. First, we introduce equivariant shape
representations to this problem to eliminate the complexity induced by the
variation in object configuration. Second, we propose a novel EM algorithm that
can iteratively refine segmentation masks using the equivariant shape prior. We
collect a novel real dataset Chairs and Mugs that contains various object
configurations and novel scenes in order to verify the effectiveness and
robustness of our method. Experimental results demonstrate that our method
achieves consistent and robust performance across different scenes where the
(weakly) supervised methods may fail. Code and data available at
https://www.cis.upenn.edu/~leijh/projects/efem
Related papers
- ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects [44.38881095466177]
Implicit neural fields have made remarkable progress in reconstructing 3D surfaces from multiple images.
Previous work has attempted to tackle this problem by introducing a framework to train separate signed distance fields.
We introduce our method, ObjectCarver, to tackle the problem of object separation from just click input in a single view.
arXiv Detail & Related papers (2024-07-26T22:13:20Z) - Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models [0.8749675983608172]
We consider the task of generating segmentation masks for the target object from an object manipulation instruction.
In this study, we propose a novel method that generates segmentation masks from open vocabulary instructions.
arXiv Detail & Related papers (2024-07-01T05:48:48Z) - Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation [49.56131393810713]
We present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner.
Our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs.
arXiv Detail & Related papers (2023-06-08T22:55:32Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - Image to Sphere: Learning Equivariant Features for Efficient Pose
Prediction [3.823356975862006]
Methods that predict a single point estimate do not predict the pose of objects with symmetries well and cannot represent uncertainty.
We propose a novel mapping of features from the image domain to the 3D rotation manifold.
We demonstrate the effectiveness of our method at object orientation prediction, and achieve state-of-the-art performance on the popular PASCAL3D+ dataset.
arXiv Detail & Related papers (2023-02-27T16:23:19Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - Objects are Different: Flexible Monocular 3D Object Detection [87.82253067302561]
We propose a flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation.
Experiments demonstrate that our method outperforms the state-of-the-art method by relatively 27% for the moderate level and 30% for the hard level in the test set of KITTI benchmark.
arXiv Detail & Related papers (2021-04-06T07:01:28Z) - SIMstack: A Generative Shape and Instance Model for Unordered Object
Stacks [38.042876641457255]
We propose a depth-conditioned Variational Auto-Encoder (VAE) trained on a dataset of objects stacked under physics simulation.
We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn't require setting the maximum number of objects in the scene.
Our method has practical applications in providing robots some of the ability humans have to make rapid intuitive inferences of partially observed scenes.
arXiv Detail & Related papers (2021-03-30T15:42:43Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.