Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
- URL: http://arxiv.org/abs/2210.03105v2
- Date: Wed, 12 Apr 2023 09:22:53 GMT
- Title: Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
- Authors: Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu
Tang, Bastian Leibe
- Abstract summary: We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds.
Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales.
Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP),LS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP)
- Score: 89.41640045953378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern 3D semantic instance segmentation approaches predominantly rely on
specialized voting mechanisms followed by carefully designed geometric
clustering techniques. Building on the successes of recent Transformer-based
methods for object detection and image segmentation, we propose the first
Transformer-based approach for 3D semantic instance segmentation. We show that
we can leverage generic Transformer building blocks to directly predict
instance masks from 3D point clouds. In our model called Mask3D each object
instance is represented as an instance query. Using Transformer decoders, the
instance queries are learned by iteratively attending to point cloud features
at multiple scales. Combined with point features, the instance queries directly
yield all instance masks in parallel. Mask3D has several advantages over
current state-of-the-art approaches, since it neither relies on (1) voting
schemes which require hand-selected geometric properties (such as centers) nor
(2) geometric grouping mechanisms requiring manually-tuned hyper-parameters
(e.g. radii) and (3) enables a loss that directly optimizes instance masks.
Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold
(+10.1 mAP), STPLS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP).
Related papers
- Efficient 3D Instance Mapping and Localization with Neural Fields [39.73128916618561]
We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images.
We introduce 3DIML, a novel framework that efficiently learns a neural label field which can render 3D instance segmentation masks from novel viewpoints.
arXiv Detail & Related papers (2024-03-28T19:25:25Z) - MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation [11.123421412837336]
Open-vocabulary 3D instance segmentation is cutting-edge for its ability to segment 3D instances without predefined categories.
Recent works first generate 2D open-vocabulary masks through 2D models and then merge them into 3D instances based on metrics calculated between two neighboring frames.
We propose a novel metric, view consensus rate, to enhance the utilization of multi-view observations.
arXiv Detail & Related papers (2024-01-15T14:56:15Z) - Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance [49.14140194332482]
We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance within 3D scenes.
Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task.
arXiv Detail & Related papers (2023-12-17T10:07:03Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - Mask-Attention-Free Transformer for 3D Instance Segmentation [68.29828726317723]
transformer-based methods have dominated 3D instance segmentation, where mask attention is commonly involved.
We develop a series of position-aware designs to overcome the low-recall issue and perform cross-attention by imposing positional prior.
Experiments show that our approach converges 4x faster than existing work, sets a new state of the art on ScanNetv2 3D instance segmentation benchmark, and also demonstrates superior performance across various datasets.
arXiv Detail & Related papers (2023-09-04T16:09:28Z) - OpenMask3D: Open-Vocabulary 3D Instance Segmentation [84.58747201179654]
OpenMask3D is a zero-shot approach for open-vocabulary 3D instance segmentation.
Our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings.
arXiv Detail & Related papers (2023-06-23T17:36:44Z) - Superpoint Transformer for 3D Scene Instance Segmentation [7.07321040534471]
This paper proposes a novel end-to-end 3D instance segmentation method based on Superpoint Transformer, named as SPFormer.
It groups potential features from point clouds into superpoints, and directly predicts instances through query vectors.
It exceeds compared state-of-the-art methods by 4.3% on ScanNetv2 hidden test set in terms of mAP and keeps fast inference speed (247ms per frame) simultaneously.
arXiv Detail & Related papers (2022-11-28T20:52:53Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z) - SDOD:Real-time Segmenting and Detecting 3D Object by Depth [5.97602869680438]
This paper proposes a real-time framework that segmenting and detecting 3D objects by depth.
We discretize the objects' depth into depth categories and transform the instance segmentation task into a pixel-level classification task.
Experiments on the challenging KITTI dataset show that our approach outperforms LklNet about 1.8 times on the speed of segmentation and 3D detection.
arXiv Detail & Related papers (2020-01-26T09:06:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.