Superpoint Transformer for 3D Scene Instance Segmentation
- URL: http://arxiv.org/abs/2211.15766v1
- Date: Mon, 28 Nov 2022 20:52:53 GMT
- Title: Superpoint Transformer for 3D Scene Instance Segmentation
- Authors: Jiahao Sun, Chunmei Qing, Junpeng Tan, Xiangmin Xu
- Abstract summary: This paper proposes a novel end-to-end 3D instance segmentation method based on Superpoint Transformer, named as SPFormer.
It groups potential features from point clouds into superpoints, and directly predicts instances through query vectors.
It exceeds compared state-of-the-art methods by 4.3% on ScanNetv2 hidden test set in terms of mAP and keeps fast inference speed (247ms per frame) simultaneously.
- Score: 7.07321040534471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing methods realize 3D instance segmentation by extending those
models used for 3D object detection or 3D semantic segmentation. However, these
non-straightforward methods suffer from two drawbacks: 1) Imprecise bounding
boxes or unsatisfactory semantic predictions limit the performance of the
overall 3D instance segmentation framework. 2) Existing method requires a
time-consuming intermediate step of aggregation. To address these issues, this
paper proposes a novel end-to-end 3D instance segmentation method based on
Superpoint Transformer, named as SPFormer. It groups potential features from
point clouds into superpoints, and directly predicts instances through query
vectors without relying on the results of object detection or semantic
segmentation. The key step in this framework is a novel query decoder with
transformers that can capture the instance information through the superpoint
cross-attention mechanism and generate the superpoint masks of the instances.
Through bipartite matching based on superpoint masks, SPFormer can implement
the network training without the intermediate aggregation step, which
accelerates the network. Extensive experiments on ScanNetv2 and S3DIS
benchmarks verify that our method is concise yet efficient. Notably, SPFormer
exceeds compared state-of-the-art methods by 4.3% on ScanNetv2 hidden test set
in terms of mAP and keeps fast inference speed (247ms per frame)
simultaneously. Code is available at https://github.com/sunjiahao1999/SPFormer.
Related papers
- MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation [7.400926717561454]
MSTA3D is a novel framework for superpoint-based 3D instance segmentation.
It exploits multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them.
Our approach surpasses state-of-the-art 3D instance segmentation methods.
arXiv Detail & Related papers (2024-11-04T04:14:39Z) - SegPoint: Segment Any Point Cloud via Large Language Model [62.69797122055389]
We propose a model, called SegPoint, to produce point-wise segmentation masks across a diverse range of tasks.
SegPoint is the first model to address varied segmentation tasks within a single framework.
arXiv Detail & Related papers (2024-07-18T17:58:03Z) - SPGroup3D: Superpoint Grouping Network for Indoor 3D Object Detection [23.208654655032955]
Current 3D object detection methods for indoor scenes mainly follow the voting-and-grouping strategy to generate proposals.
We propose a novel superpoint grouping network for indoor anchor-free one-stage 3D object detection.
Experimental results demonstrate our method achieves state-of-the-art performance on ScanNet V2, SUN RGB-D, and S3DIS datasets.
arXiv Detail & Related papers (2023-12-21T08:08:02Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - PSGformer: Enhancing 3D Point Cloud Instance Segmentation via Precise
Semantic Guidance [11.097083846498581]
PSGformer is a novel 3D instance segmentation network.
It incorporates two key advancements to enhance the performance of 3D instance segmentation.
It exceeds compared state-of-the-art methods by 2.2% on ScanNetv2 hidden test set in terms of mAP.
arXiv Detail & Related papers (2023-07-15T04:45:37Z) - ISBNet: a 3D Point Cloud Instance Segmentation Network with
Instance-aware Sampling and Box-aware Dynamic Convolution [14.88505076974645]
ISBNet is a novel method that represents instances as kernels and decodes instance masks via dynamic convolution.
We set new state-of-the-art results on ScanNetV2 (55.9), S3DIS (60.8), S3LS3D (49.2) in terms of AP and retains fast inference time (237ms per scene on ScanNetV2.
arXiv Detail & Related papers (2023-03-01T06:06:28Z) - 3D-QueryIS: A Query-based Framework for 3D Instance Segmentation [74.6998931386331]
Previous methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.
We propose a novel query-based method, termed as 3D-QueryIS, which is detector-free, semantic segmentation-free, and cluster-free.
Our 3D-QueryIS is free from the accumulated errors caused by the inter-task dependencies.
arXiv Detail & Related papers (2022-11-17T07:04:53Z) - Mask3D: Mask Transformer for 3D Semantic Instance Segmentation [89.41640045953378]
We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds.
Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales.
Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP),LS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP)
arXiv Detail & Related papers (2022-10-06T17:55:09Z) - PointInst3D: Segmenting 3D Instances by Points [136.7261709896713]
We propose a fully-convolutional 3D point cloud instance segmentation method that works in a per-point prediction fashion.
We find the key to its success is assigning a suitable target to each sampled point.
Our approach achieves promising results on both ScanNet and S3DIS benchmarks.
arXiv Detail & Related papers (2022-04-25T02:41:46Z) - Instance Segmentation in 3D Scenes using Semantic Superpoint Tree
Networks [64.27814530457042]
We propose an end-to-end solution of Semantic Superpoint Tree Network (SSTNet) for proposing object instances from scene points.
Key in SSTNet is an intermediate, semantic superpoint tree (SST), which is constructed based on the learned semantic features of superpoints.
SSTNet ranks top on the ScanNet (V2) leaderboard, with 2% higher of mAP than the second best method.
arXiv Detail & Related papers (2021-08-17T07:25:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.