LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object
Detection, and Panoptic Segmentation in a Single Multi-task Network
- URL: http://arxiv.org/abs/2206.11428v2
- Date: Fri, 24 Jun 2022 00:38:40 GMT
- Title: LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object
Detection, and Panoptic Segmentation in a Single Multi-task Network
- Authors: Dongqiangzi Ye, Weijia Chen, Zixiang Zhou, Yufei Xie, Yu Wang, Panqu
Wang and Hassan Foroosh
- Abstract summary: LidarMultiNet is a strong 3D voxel-based encoder-decoder network with a novel Global Context Pooling module.
Our solution achieves a mIoU of 71.13 and is the best for most of the 22 classes on the 3D semantic segmentation test set.
- Score: 15.785527155108966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This technical report presents the 1st place winning solution for the Waymo
Open Dataset 3D semantic segmentation challenge 2022. Our network, termed
LidarMultiNet, unifies the major LiDAR perception tasks such as 3D semantic
segmentation, object detection, and panoptic segmentation in a single
framework. At the core of LidarMultiNet is a strong 3D voxel-based
encoder-decoder network with a novel Global Context Pooling (GCP) module
extracting global contextual features from a LiDAR frame to complement its
local features. An optional second stage is proposed to refine the first-stage
segmentation or generate accurate panoptic segmentation results. Our solution
achieves a mIoU of 71.13 and is the best for most of the 22 classes on the
Waymo 3D semantic segmentation test set, outperforming all the other 3D
semantic segmentation methods on the official leaderboard. We demonstrate for
the first time that major LiDAR perception tasks can be unified in a single
strong network that can be trained end-to-end.
Related papers
- 3D-GRES: Generalized 3D Referring Expression Segmentation [77.10044505645064]
3D Referring Expression (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description.
Generalized 3D Referring Expression (3D-GRES) extends the capability to segment any number of instances based on natural language instructions.
arXiv Detail & Related papers (2024-07-30T08:59:05Z) - SegPoint: Segment Any Point Cloud via Large Language Model [62.69797122055389]
We propose a model, called SegPoint, to produce point-wise segmentation masks across a diverse range of tasks.
SegPoint is the first model to address varied segmentation tasks within a single framework.
arXiv Detail & Related papers (2024-07-18T17:58:03Z) - Segment Any 3D Object with Language [58.471327490684295]
We introduce Segment any 3D Object with LanguagE (SOLE), a semantic geometric and-aware visual-language learning framework with strong generalizability.
Specifically, we propose a multimodal fusion network to incorporate multimodal semantics in both backbone and decoder.
Our SOLE outperforms previous methods by a large margin on ScanNetv2, ScanNet200, and Replica benchmarks.
arXiv Detail & Related papers (2024-04-02T17:59:10Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving [12.713417063678335]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - A Simple Framework for Open-Vocabulary Segmentation and Detection [85.21641508535679]
We present OpenSeeD, a simple Open-vocabulary and Detection framework that jointly learns from different segmentation and detection datasets.
We first introduce a pre-trained text encoder to encode all the visual concepts in two tasks and learn a common semantic space for them.
After pre-training, our model exhibits competitive or stronger zero-shot transferability for both segmentation and detection.
arXiv Detail & Related papers (2023-03-14T17:58:34Z) - AOP-Net: All-in-One Perception Network for Joint LiDAR-based 3D Object
Detection and Panoptic Segmentation [9.513467995188634]
AOP-Net is a LiDAR-based multi-task framework that combines 3D object detection and panoptic segmentation.
The AOP-Net achieves state-of-the-art performance for published works on the nuScenes benchmark for both 3D object detection and panoptic segmentation tasks.
arXiv Detail & Related papers (2023-02-02T05:31:53Z) - LidarMultiNet: Towards a Unified Multi-Task Network for LiDAR Perception [15.785527155108966]
LidarMultiNet is a LiDAR-based multi-task network that unifies 3D object detection, semantic segmentation, and panoptic segmentation.
At the core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture with a Global Context Pooling (GCP) module.
LidarMultiNet is extensively tested on both Open dataset and nuScenes dataset.
arXiv Detail & Related papers (2022-09-19T23:39:15Z) - (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection
for Sparse Semantic Segmentation Network [3.6967381030744515]
We propose AF2-S3Net, an end-to-end encoder-decoder CNN network for 3D LiDAR semantic segmentation.
We present a novel multi-branch attentive feature fusion module in the encoder and a unique adaptive feature selection module with feature map re-weighting in the decoder.
Our experimental results show that the proposed method outperforms the state-of-the-art approaches on the large-scale SemanticKITTI benchmark.
arXiv Detail & Related papers (2021-02-08T21:04:21Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - DoDNet: Learning to segment multi-organ and tumors from multiple
partially labeled datasets [102.55303521877933]
We propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labelled datasets.
DoDNet consists of a shared encoder-decoder architecture, a task encoding module, a controller for generating dynamic convolution filters, and a single but dynamic segmentation head.
arXiv Detail & Related papers (2020-11-20T04:56:39Z) - JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D
Point Clouds [37.703770427574476]
In this paper, we tackle the 3D semantic edge detection task for the first time.
We present a new two-stream fully-convolutional network that jointly performs the two tasks.
In particular, we design a joint refinement module that explicitly wires region information and edge information to improve the performances of both tasks.
arXiv Detail & Related papers (2020-07-14T08:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.