Mask-Attention-Free Transformer for 3D Instance Segmentation
- URL: http://arxiv.org/abs/2309.01692v1
- Date: Mon, 4 Sep 2023 16:09:28 GMT
- Title: Mask-Attention-Free Transformer for 3D Instance Segmentation
- Authors: Xin Lai, Yuhui Yuan, Ruihang Chu, Yukang Chen, Han Hu, Jiaya Jia
- Abstract summary: transformer-based methods have dominated 3D instance segmentation, where mask attention is commonly involved.
We develop a series of position-aware designs to overcome the low-recall issue and perform cross-attention by imposing positional prior.
Experiments show that our approach converges 4x faster than existing work, sets a new state of the art on ScanNetv2 3D instance segmentation benchmark, and also demonstrates superior performance across various datasets.
- Score: 68.29828726317723
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, transformer-based methods have dominated 3D instance segmentation,
where mask attention is commonly involved. Specifically, object queries are
guided by the initial instance masks in the first cross-attention, and then
iteratively refine themselves in a similar manner. However, we observe that the
mask-attention pipeline usually leads to slow convergence due to low-recall
initial instance masks. Therefore, we abandon the mask attention design and
resort to an auxiliary center regression task instead. Through center
regression, we effectively overcome the low-recall issue and perform
cross-attention by imposing positional prior. To reach this goal, we develop a
series of position-aware designs. First, we learn a spatial distribution of 3D
locations as the initial position queries. They spread over the 3D space
densely, and thus can easily capture the objects in a scene with a high recall.
Moreover, we present relative position encoding for the cross-attention and
iterative refinement for more accurate position queries. Experiments show that
our approach converges 4x faster than existing work, sets a new state of the
art on ScanNetv2 3D instance segmentation benchmark, and also demonstrates
superior performance across various datasets. Code and models are available at
https://github.com/dvlab-research/Mask-Attention-Free-Transformer.
Related papers
- Efficient 3D Instance Mapping and Localization with Neural Fields [39.73128916618561]
3DIML is a novel framework that efficiently learns a label field to produce view-consistent instance segmentation masks.
We evaluate 3DIML on sequences from the Replica and ScanNet datasets.
arXiv Detail & Related papers (2024-03-28T19:25:25Z) - AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans [41.17467024268349]
Making sense of 3D environments requires fine-grained scene understanding.
We propose to predict instance segmentations for 3D scenes in an unsupervised way.
Our approach attains 13.3% higher Average Precision and 9.1% higher F1 score compared to the best-performing baseline.
arXiv Detail & Related papers (2024-03-24T22:53:16Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes [35.38074724231105]
UnScene3D is a fully unsupervised 3D learning approach for class-agnostic 3D instance segmentation of indoor scans.
We operate on a basis of geometric oversegmentation, enabling efficient representation and learning on high-resolution 3D data.
Our approach improves over state-of-the-art unsupervised 3D instance segmentation methods by more than 300% Average Precision score.
arXiv Detail & Related papers (2023-03-25T19:15:16Z) - Position-Guided Point Cloud Panoptic Segmentation Transformer [118.17651196656178]
This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline.
We observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain.
The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% on Semantic KITTI and nuScenes benchmark, respectively.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - Mask3D: Mask Transformer for 3D Semantic Instance Segmentation [89.41640045953378]
We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds.
Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales.
Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP),LS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP)
arXiv Detail & Related papers (2022-10-06T17:55:09Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - SDOD:Real-time Segmenting and Detecting 3D Object by Depth [5.97602869680438]
This paper proposes a real-time framework that segmenting and detecting 3D objects by depth.
We discretize the objects' depth into depth categories and transform the instance segmentation task into a pixel-level classification task.
Experiments on the challenging KITTI dataset show that our approach outperforms LklNet about 1.8 times on the speed of segmentation and 3D detection.
arXiv Detail & Related papers (2020-01-26T09:06:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.