LidarMultiNet: Towards a Unified Multi-Task Network for LiDAR Perception
- URL: http://arxiv.org/abs/2209.09385v2
- Date: Tue, 21 Mar 2023 20:30:25 GMT
- Title: LidarMultiNet: Towards a Unified Multi-Task Network for LiDAR Perception
- Authors: Dongqiangzi Ye, Zixiang Zhou, Weijia Chen, Yufei Xie, Yu Wang, Panqu
Wang and Hassan Foroosh
- Abstract summary: LidarMultiNet is a LiDAR-based multi-task network that unifies 3D object detection, semantic segmentation, and panoptic segmentation.
At the core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture with a Global Context Pooling (GCP) module.
LidarMultiNet is extensively tested on both Open dataset and nuScenes dataset.
- Score: 15.785527155108966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LiDAR-based 3D object detection, semantic segmentation, and panoptic
segmentation are usually implemented in specialized networks with distinctive
architectures that are difficult to adapt to each other. This paper presents
LidarMultiNet, a LiDAR-based multi-task network that unifies these three major
LiDAR perception tasks. Among its many benefits, a multi-task network can
reduce the overall cost by sharing weights and computation among multiple
tasks. However, it typically underperforms compared to independently combined
single-task models. The proposed LidarMultiNet aims to bridge the performance
gap between the multi-task network and multiple single-task networks. At the
core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture
with a Global Context Pooling (GCP) module extracting global contextual
features from a LiDAR frame. Task-specific heads are added on top of the
network to perform the three LiDAR perception tasks. More tasks can be
implemented simply by adding new task-specific heads while introducing little
additional cost. A second stage is also proposed to refine the first-stage
segmentation and generate accurate panoptic segmentation results. LidarMultiNet
is extensively tested on both Waymo Open Dataset and nuScenes dataset,
demonstrating for the first time that major LiDAR perception tasks can be
unified in a single strong network that is trained end-to-end and achieves
state-of-the-art performance. Notably, LidarMultiNet reaches the official 1st
place in the Waymo Open Dataset 3D semantic segmentation challenge 2022 with
the highest mIoU and the best accuracy for most of the 22 classes on the test
set, using only LiDAR points as input. It also sets the new state-of-the-art
for a single model on the Waymo 3D object detection benchmark and three
nuScenes benchmarks.
Related papers
- RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR
Perception [15.919789515451615]
We introduce a new LiDAR multi-task learning paradigm based on the transformer.
LiDARFormer exploits cross-task synergy to boost the performance of LiDAR perception tasks.
LiDARFormer is evaluated on the large-scale nuScenes and the Open datasets for both 3D detection and semantic segmentation tasks.
arXiv Detail & Related papers (2023-03-21T20:52:02Z) - LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object
Detection, and Panoptic Segmentation in a Single Multi-task Network [15.785527155108966]
LidarMultiNet is a strong 3D voxel-based encoder-decoder network with a novel Global Context Pooling module.
Our solution achieves a mIoU of 71.13 and is the best for most of the 22 classes on the 3D semantic segmentation test set.
arXiv Detail & Related papers (2022-06-23T00:22:13Z) - LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks.
We extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames.
arXiv Detail & Related papers (2022-03-14T15:25:42Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - A Simple and Efficient Multi-task Network for 3D Object Detection and
Road Understanding [20.878931360708343]
We show that it is possible to perform all perception tasks via a simple and efficient multi-task network.
Our proposed network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six perception outputs for 3D object detection and road understanding.
arXiv Detail & Related papers (2021-03-06T08:00:26Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - Deep Multimodal Neural Architecture Search [178.35131768344246]
We devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks.
Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone.
On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks.
arXiv Detail & Related papers (2020-04-25T07:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.