CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation
- URL: http://arxiv.org/abs/2411.16319v2
- Date: Tue, 26 Nov 2024 07:14:34 GMT
- Title: CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation
- Authors: Leon Sick, Dominik Engel, Sebastian Hartwig, Pedro Hermosilla, Timo Ropinski,
- Abstract summary: We propose to cut semantic masks in 3D to obtain the final 2D instances by utilizing a point cloud representation of the scene.
We also propose to augment the training of a class-agnostic detector with three Spatial Confidence components aiming to isolate a clean learning signal.
- Score: 13.871856894814005
- License:
- Abstract: Traditionally, algorithms that learn to segment object instances in 2D images have heavily relied on large amounts of human-annotated data. Only recently, novel approaches have emerged tackling this problem in an unsupervised fashion. Generally, these approaches first generate pseudo-masks and then train a class-agnostic detector. While such methods deliver the current state of the art, they often fail to correctly separate instances overlapping in 2D image space since only semantics are considered. To tackle this issue, we instead propose to cut the semantic masks in 3D to obtain the final 2D instances by utilizing a point cloud representation of the scene. Furthermore, we derive a Spatial Importance function, which we use to resharpen the semantics along the 3D borders of instances. Nevertheless, these pseudo-masks are still subject to mask ambiguity. To address this issue, we further propose to augment the training of a class-agnostic detector with three Spatial Confidence components aiming to isolate a clean learning signal. With these contributions, our approach outperforms competing methods across multiple standard benchmarks for unsupervised instance segmentation and object detection.
Related papers
- SA3DIP: Segment Any 3D Instance with Potential 3D Priors [41.907914881608995]
We propose SA3DIP, a novel method for Segmenting Any 3D Instances via exploiting potential 3D Priors.
Specifically, on one hand, we generate complementary 3D primitives based on both geometric and textural priors.
On the other hand, we introduce supplemental constraints from the 3D space by using a 3D detector to guide a further merging process.
arXiv Detail & Related papers (2024-11-06T10:39:00Z) - MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors [11.118490283303407]
We propose a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D.
Our method produces accurate semantics and geometry in both 3D and 2D space.
arXiv Detail & Related papers (2024-09-21T05:12:13Z) - Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery [46.711276257688326]
NeRFs have become a powerful tool for modeling 3D scenes from multiple images.
Previous approaches to 3D segmentation of NeRFs either require user interaction to isolate a single object, or they rely on 2D semantic masks with a limited number of classes for supervision.
We propose a method that is robust to inconsistent segmentations and successfully decomposes the scene into a set of objects of any class.
arXiv Detail & Related papers (2024-08-19T12:07:24Z) - MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation [11.123421412837336]
Open-vocabulary 3D instance segmentation is cutting-edge for its ability to segment 3D instances without predefined categories.
Recent works first generate 2D open-vocabulary masks through 2D models and then merge them into 3D instances based on metrics calculated between two neighboring frames.
We propose a novel metric, view consensus rate, to enhance the utilization of multi-view observations.
arXiv Detail & Related papers (2024-01-15T14:56:15Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes [35.38074724231105]
UnScene3D is a fully unsupervised 3D learning approach for class-agnostic 3D instance segmentation of indoor scans.
We operate on a basis of geometric oversegmentation, enabling efficient representation and learning on high-resolution 3D data.
Our approach improves over state-of-the-art unsupervised 3D instance segmentation methods by more than 300% Average Precision score.
arXiv Detail & Related papers (2023-03-25T19:15:16Z) - Mask3D: Mask Transformer for 3D Semantic Instance Segmentation [89.41640045953378]
We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds.
Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales.
Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP),LS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP)
arXiv Detail & Related papers (2022-10-06T17:55:09Z) - Unsupervised Object Detection with LiDAR Clues [70.73881791310495]
We present the first practical method for unsupervised object detection with the aid of LiDAR clues.
In our approach, candidate object segments based on 3D point clouds are firstly generated.
Then, an iterative segment labeling process is conducted to assign segment labels and to train a segment labeling network.
The labeling process is carefully designed so as to mitigate the issue of long-tailed and open-ended distribution.
arXiv Detail & Related papers (2020-11-25T18:59:54Z) - Semantic Correspondence via 2D-3D-2D Cycle [58.023058561837686]
We propose a new method on predicting semantic correspondences by leveraging it to 3D domain.
We show that our method gives comparative and even superior results on standard semantic benchmarks.
arXiv Detail & Related papers (2020-04-20T05:27:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.