Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion
- URL: http://arxiv.org/abs/2306.04633v2
- Date: Fri, 1 Dec 2023 19:04:51 GMT
- Title: Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion
- Authors: Yash Bhalgat, Iro Laina, Jo\~ao F. Henriques, Andrew Zisserman, Andrea
Vedaldi
- Abstract summary: We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
- Score: 110.84357383258818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instance segmentation in 3D is a challenging task due to the lack of
large-scale annotated datasets. In this paper, we show that this task can be
addressed effectively by leveraging instead 2D pre-trained models for instance
segmentation. We propose a novel approach to lift 2D segments to 3D and fuse
them by means of a neural field representation, which encourages multi-view
consistency across frames. The core of our approach is a slow-fast clustering
objective function, which is scalable and well-suited for scenes with a large
number of objects. Unlike previous approaches, our method does not require an
upper bound on the number of objects or object tracking across frames. To
demonstrate the scalability of the slow-fast clustering, we create a new
semi-realistic dataset called the Messy Rooms dataset, which features scenes
with up to 500 objects per scene. Our approach outperforms the state-of-the-art
on challenging scenes from the ScanNet, Hypersim, and Replica datasets, as well
as on our newly created Messy Rooms dataset, demonstrating the effectiveness
and scalability of our slow-fast clustering method.
Related papers
- 3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding.
This paper introduces a novel approach to instance segmentation and tracking in first-person video.
By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z) - SeMoLi: What Moves Together Belongs Together [51.72754014130369]
We tackle semi-supervised object detection based on motion cues.
Recent results suggest that motion-based clustering methods can be used to pseudo-label instances of moving objects.
We re-think this approach and suggest that both, object detection, as well as motion-inspired pseudo-labeling, can be tackled in a data-driven manner.
arXiv Detail & Related papers (2024-02-29T18:54:53Z) - Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters.
We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters.
The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point
Clouds [4.709764624933227]
We propose the first unsupervised method, called OGC, to simultaneously identify multiple 3D objects in a single forward pass.
We extensively evaluate our method on five datasets, demonstrating the superior performance for object part instance segmentation.
arXiv Detail & Related papers (2022-10-10T07:01:08Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Rapid Pose Label Generation through Sparse Representation of Unknown
Objects [7.32172860877574]
This work presents an approach for rapidly generating real-world, pose-annotated RGB-D data for unknown objects.
We first source minimalistic labelings of an ordered set of arbitrarily chosen keypoints over a set of RGB-D videos.
By solving an optimization problem, we combine these labels under a world frame to recover a sparse, keypoint-based representation of the object.
arXiv Detail & Related papers (2020-11-07T15:14:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.