OneFormer3D: One Transformer for Unified Point Cloud Segmentation
- URL: http://arxiv.org/abs/2311.14405v1
- Date: Fri, 24 Nov 2023 10:56:27 GMT
- Title: OneFormer3D: One Transformer for Unified Point Cloud Segmentation
- Authors: Maxim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich
- Abstract summary: This paper presents a unified, simple, and effective model addressing semantic, instance, and panoptic segmentation tasks jointly.
The model, named OneFormer3D, performs instance and semantic segmentation consistently, using a group of learnable kernels.
We also demonstrate the state-of-the-art results in semantic, instance, and panoptic segmentation of ScanNet, ScanNet200, and S3DIS datasets.
- Score: 5.530212768657545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic, instance, and panoptic segmentation of 3D point clouds have been
addressed using task-specific models of distinct design. Thereby, the
similarity of all segmentation tasks and the implicit relationship between them
have not been utilized effectively. This paper presents a unified, simple, and
effective model addressing all these tasks jointly. The model, named
OneFormer3D, performs instance and semantic segmentation consistently, using a
group of learnable kernels, where each kernel is responsible for generating a
mask for either an instance or a semantic category. These kernels are trained
with a transformer-based decoder with unified instance and semantic queries
passed as an input. Such a design enables training a model end-to-end in a
single run, so that it achieves top performance on all three segmentation tasks
simultaneously. Specifically, our OneFormer3D ranks 1st and sets a new
state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also
demonstrate the state-of-the-art results in semantic, instance, and panoptic
segmentation of ScanNet (+21 PQ), ScanNet200 (+3.8 mAP50), and S3DIS (+0.8
mIoU) datasets.
Related papers
- Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation [50.51125319374404]
We propose a novel self-training network InsTeacher3D to explore and exploit pure instance knowledge from unlabeled data.
Experimental results on multiple large-scale datasets show that the InsTeacher3D significantly outperforms prior state-of-the-art semi-supervised approaches.
arXiv Detail & Related papers (2024-06-24T16:35:58Z) - S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery [23.965291952048872]
This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching.
Our method dentifies and leverages the intrinsic link between these two tasks, leading to a more accurate understanding of semantic information and disparity estimation.
Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and reduces the D1-Error and average endpoint error (EPE) in disparity estimation from 10.051 to 9.579 and 1.439 to 1.403 respectively.
arXiv Detail & Related papers (2024-01-03T09:37:33Z) - Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene
Understanding [40.68012530554327]
We introduce a pretrained 3D backbone, called SST, for 3D indoor scene understanding.
We design a 3D Swin transformer as our backbone network, which enables efficient self-attention on sparse voxels with linear memory complexity.
A series of extensive ablation studies further validate the scalability, generality, and superior performance enabled by our approach.
arXiv Detail & Related papers (2023-04-14T02:49:08Z) - You Only Need One Thing One Click: Self-Training for Weakly Supervised
3D Scene Understanding [107.06117227661204]
We propose One Thing One Click'', meaning that the annotator only needs to label one point per object.
We iteratively conduct the training and label propagation, facilitated by a graph propagation module.
Our model can be compatible to 3D instance segmentation equipped with a point-clustering strategy.
arXiv Detail & Related papers (2023-03-26T13:57:00Z) - ISBNet: a 3D Point Cloud Instance Segmentation Network with
Instance-aware Sampling and Box-aware Dynamic Convolution [14.88505076974645]
ISBNet is a novel method that represents instances as kernels and decodes instance masks via dynamic convolution.
We set new state-of-the-art results on ScanNetV2 (55.9), S3DIS (60.8), S3LS3D (49.2) in terms of AP and retains fast inference time (237ms per scene on ScanNetV2.
arXiv Detail & Related papers (2023-03-01T06:06:28Z) - Superpoint Transformer for 3D Scene Instance Segmentation [7.07321040534471]
This paper proposes a novel end-to-end 3D instance segmentation method based on Superpoint Transformer, named as SPFormer.
It groups potential features from point clouds into superpoints, and directly predicts instances through query vectors.
It exceeds compared state-of-the-art methods by 4.3% on ScanNetv2 hidden test set in terms of mAP and keeps fast inference speed (247ms per frame) simultaneously.
arXiv Detail & Related papers (2022-11-28T20:52:53Z) - Unsupervised Representation Learning for 3D Point Cloud Data [66.92077180228634]
We propose a simple yet effective approach for unsupervised point cloud learning.
In particular, we identify a very useful transformation which generates a good contrastive version of an original point cloud.
We conduct experiments on three downstream tasks which are 3D object classification, shape part segmentation and scene segmentation.
arXiv Detail & Related papers (2021-10-13T10:52:45Z) - K-Net: Towards Unified Image Segmentation [78.32096542571257]
The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels.
K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free.
arXiv Detail & Related papers (2021-06-28T17:18:21Z) - One Thing One Click: A Self-Training Approach for Weakly Supervised 3D
Semantic Segmentation [78.36781565047656]
We propose "One Thing One Click," meaning that the annotator only needs to label one point per object.
We iteratively conduct the training and label propagation, facilitated by a graph propagation module.
Our results are also comparable to those of the fully supervised counterparts.
arXiv Detail & Related papers (2021-04-06T02:27:25Z) - SALA: Soft Assignment Local Aggregation for Parameter Efficient 3D
Semantic Segmentation [65.96170587706148]
We focus on designing a point local aggregation function that yields parameter efficient networks for 3D point cloud semantic segmentation.
We explore the idea of using learnable neighbor-to-grid soft assignment in grid-based aggregation functions.
arXiv Detail & Related papers (2020-12-29T20:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.