Mask-Attention-Free Transformer for 3D Instance Segmentation
- URL: http://arxiv.org/abs/2309.01692v1
- Date: Mon, 4 Sep 2023 16:09:28 GMT
- Title: Mask-Attention-Free Transformer for 3D Instance Segmentation
- Authors: Xin Lai, Yuhui Yuan, Ruihang Chu, Yukang Chen, Han Hu, Jiaya Jia
- Abstract summary: transformer-based methods have dominated 3D instance segmentation, where mask attention is commonly involved.
We develop a series of position-aware designs to overcome the low-recall issue and perform cross-attention by imposing positional prior.
Experiments show that our approach converges 4x faster than existing work, sets a new state of the art on ScanNetv2 3D instance segmentation benchmark, and also demonstrates superior performance across various datasets.
- Score: 68.29828726317723
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, transformer-based methods have dominated 3D instance segmentation,
where mask attention is commonly involved. Specifically, object queries are
guided by the initial instance masks in the first cross-attention, and then
iteratively refine themselves in a similar manner. However, we observe that the
mask-attention pipeline usually leads to slow convergence due to low-recall
initial instance masks. Therefore, we abandon the mask attention design and
resort to an auxiliary center regression task instead. Through center
regression, we effectively overcome the low-recall issue and perform
cross-attention by imposing positional prior. To reach this goal, we develop a
series of position-aware designs. First, we learn a spatial distribution of 3D
locations as the initial position queries. They spread over the 3D space
densely, and thus can easily capture the objects in a scene with a high recall.
Moreover, we present relative position encoding for the cross-attention and
iterative refinement for more accurate position queries. Experiments show that
our approach converges 4x faster than existing work, sets a new state of the
art on ScanNetv2 3D instance segmentation benchmark, and also demonstrates
superior performance across various datasets. Code and models are available at
https://github.com/dvlab-research/Mask-Attention-Free-Transformer.
Related papers
- MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation [7.400926717561454]
MSTA3D is a novel framework for superpoint-based 3D instance segmentation.
It exploits multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them.
Our approach surpasses state-of-the-art 3D instance segmentation methods.
arXiv Detail & Related papers (2024-11-04T04:14:39Z) - Efficient 3D Instance Mapping and Localization with Neural Fields [39.73128916618561]
We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images.
We introduce 3DIML, a novel framework that efficiently learns a neural label field which can render 3D instance segmentation masks from novel viewpoints.
arXiv Detail & Related papers (2024-03-28T19:25:25Z) - AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans [41.17467024268349]
Making sense of 3D environments requires fine-grained scene understanding.
We propose to predict instance segmentations for 3D scenes in an unsupervised way.
Our approach attains 13.3% higher Average Precision and 9.1% higher F1 score compared to the best-performing baseline.
arXiv Detail & Related papers (2024-03-24T22:53:16Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - Position-Guided Point Cloud Panoptic Segmentation Transformer [118.17651196656178]
This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline.
We observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain.
The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% on Semantic KITTI and nuScenes benchmark, respectively.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - Mask3D: Mask Transformer for 3D Semantic Instance Segmentation [89.41640045953378]
We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds.
Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales.
Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP),LS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP)
arXiv Detail & Related papers (2022-10-06T17:55:09Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - SDOD:Real-time Segmenting and Detecting 3D Object by Depth [5.97602869680438]
This paper proposes a real-time framework that segmenting and detecting 3D objects by depth.
We discretize the objects' depth into depth categories and transform the instance segmentation task into a pixel-level classification task.
Experiments on the challenging KITTI dataset show that our approach outperforms LklNet about 1.8 times on the speed of segmentation and 3D detection.
arXiv Detail & Related papers (2020-01-26T09:06:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.